Using a Database

Goals

Understand why a database is more useful than a Python list for storing application data.
Use Python's built-in sqlite3 module to create a database and populate it with data.
Write SQL queries to retrieve all rows from a table and to look up a single row by ID.
Rewrite the server to read sightings from a database file instead of an in-memory list.
Use a temporary database in tests so that tests do not depend on production data.

Why Not a List?

What are the problems with storing the sightings data in a Python list?

Every time the server restarts, the list is re-created from the source code
- Any records added while the server was running are lost
If two people run the server on different machines, they each have their own private copy of the data
- A sighting added on one machine is invisible to everyone else
Searching through a list means reading every item in order
- For twenty sightings this is fast; for two million it is not

What is a database and why is SQLite a good choice for this application?

A database stores data in a way that survives program restarts
A SQL database organizes data into tables, where each table has named columns and zero or more rows
SQLite stores an entire database in a single file on disk, which is easy to back up, share, and version-control
Python includes the sqlite3 module in its standard library, so no extra packages are needed

Creating a Database

Write a Python script called create_db.py that uses sqlite3 to create a file called sightings.db with one table named sightings. The table should be populated with the data from dataset.py.

sqlite3.connect(path) opens the database at path, creating the file if it does not exist
conn.execute(sql) runs a query against the open database
create table if not exists sightings (...) defines the shape of the table:
- integer primary key marks id as the unique identifier for each row
- text not null means the column must contain a value; text alone allows null
- real stores a floating-point number, matching Python's float
conn.commit() saves all the changes to disk; without it the writes are discarded when the connection closes
conn.close() releases the file so other programs can access it

import sqlite3
from pathlib import Path

SIGHTINGS = [
    {
        "id": 1,
        "species": "G. canadensis",
        "sex": "Female",
        "weight": 142,
        "color": "dark brown",
        "datetime": "2024-01-08 07:14",
        "latitude": 49.23,
        "longitude": -121.45,
    },
# ...190 lines not shown...
]

DB_PATH = Path(__file__).parent / "sightings.db"

CREATE_TABLE = """
    create table if not exists sightings (
        id integer primary key,
        species text not null,
        sex text,
        weight real,
        color text not null,
        datetime text not null,
        latitude real not null,
        longitude real not null
    )
"""

INSERT_ROW = "insert into sightings values (?, ?, ?, ?, ?, ?, ?, ?)"


def create_db(path):
    conn = sqlite3.connect(path)
    conn.execute(CREATE_TABLE)
    for s in SIGHTINGS:
        conn.execute(
            INSERT_ROW,
            [
                s["id"],
                s["species"],
                s["sex"],
                s["weight"],
                s["color"],
                s["datetime"],
                s["latitude"],
                s["longitude"],
            ],
        )
    conn.commit()
    conn.close()


if __name__ == "__main__":
    create_db(DB_PATH)
    print(f"Created {DB_PATH}")

Run the script once to create the database:

python create_db.py

A Database-Backed Server

Rewrite server_testable.py as server_db.py so that make_app takes a database path instead of a list of sightings. The index route should read all rows from the database, and the detail route should look up one row by ID.

sqlite3.connect(db_path) opens the database; conn.close() releases it when done
Setting conn.row_factory = sqlite3.Row makes each result row behave like a dictionary, so row["species"] works the same way s["species"] did with the list
conn.execute("select * from sightings").fetchall() retrieves every row in the table
- select * means "return all columns"
- from sightings names the table to read from
- fetchall() collects the results into a Python list
conn.execute("select * from sightings where id = ?", [sighting_id]).fetchone() retrieves one row
- where id = ? filters the results to rows where id matches the given value
- The ? is a placeholder: sqlite3 fills it in safely, preventing a common security mistake called SQL injection
- fetchone() returns the first matching row, or None if no row matched
if row is None: raise NotFoundException(...) works exactly as before, because fetchone() returns None for a missing ID just as the old loop did

SQL Injection

If you built the query with an f-string, like f"select * from sightings where id = {sighting_id}", a user could visit a URL like /sighting/1 or 1=1, which would make the query read select * from sightings where id = 1 or 1=1. Because 1=1 is always true, this returns every row in the table instead of just one. With a destructive statement the damage is worse: 1; drop table sightings would delete all the data. The ? placeholder prevents this by telling sqlite3 to treat the value as data, never as SQL.

def make_app(db_path=DB_PATH):
    @get("/", media_type=MediaType.HTML)
    async def index() -> str:
        conn = sqlite3.connect(db_path)
        conn.row_factory = sqlite3.Row
        rows = conn.execute("select * from sightings").fetchall()
        conn.close()
        return str(
            html(lang="en")[
                head[
                    title["Sasquatch Sightings"],
                    link(rel="stylesheet", href="/style.css"),
                ],
                body[
                    table[
                        tr[[th[col] for col in HEADERS]],
                        [
                            tr[
                                td[a(href=f"/sighting/{row['id']}")[str(row["id"])]],
                                [td[fmt(row[k])] for k in KEYS[1:]],
                            ]
                            for row in rows
                        ],
                    ],
                ],
            ]
        )

    @get("/sighting/{sighting_id:int}", media_type=MediaType.HTML)
    async def detail(sighting_id: int) -> str:
        conn = sqlite3.connect(db_path)
        conn.row_factory = sqlite3.Row
        row = conn.execute(
            "select * from sightings where id = ?", [sighting_id]
        ).fetchone()
        conn.close()
        if row is None:
            raise NotFoundException(f"No sighting with ID {sighting_id}")
        return str(
            html(lang="en")[
                head[
                    title[f"Sighting {sighting_id}"],
                    link(rel="stylesheet", href="/style.css"),
                ],
                body[
                    table[
                        [
                            tr[
                                td(class_="label")[label],
                                td[fmt(row[key])],
                            ]
                            for key, label in LABELS.items()
                        ],
                    ],
                    a(href="/")["Back to all sightings"],
                ],
            ]
        )

    @get("/style.css", media_type="text/css")
    async def styles() -> str:
        return (LESSON_DIR / "style.css").read_text()

    return Litestar([index, detail, styles])


app = make_app()

Start the server the same way as before (after running create_db.py first):

litestar run --app server_db:app

Testing with a Temporary Database

Write a test file called test_server_db.py that uses pytest's tmp_path fixture to create a small temporary database for each test, and uses make_app with that database path to test the index, detail, and 404 routes.

pytest's built-in tmp_path fixture provides a fresh temporary directory for each test
- The directory and everything in it is automatically deleted when the test finishes
A @pytest.fixture named small_db creates a two-row database in tmp_path and returns its path
- Each test that lists small_db as a parameter receives a fresh database
- Tests are completely isolated: one test's writes cannot affect another's
make_app(small_db) creates a server that reads from the temporary database, not from sightings.db
- Tests no longer break if someone adds rows to the production database

import sqlite3

import pytest
from litestar.testing import TestClient

from schema import CREATE_TABLE
from server_db import make_app
from small import SMALL

INSERT_ROW = "insert into sightings values (?, ?, ?, ?, ?, ?, ?, ?)"


@pytest.fixture
def small_db(tmp_path):
    db_path = tmp_path / "test.db"
    conn = sqlite3.connect(db_path)
    conn.execute(CREATE_TABLE)
    for s in SMALL:
        conn.execute(
            INSERT_ROW,
            [
                s["id"],
                s["species"],
                s["sex"],
                s["weight"],
                s["color"],
                s["datetime"],
                s["latitude"],
                s["longitude"],
            ],
        )
    conn.commit()
    conn.close()
    return db_path


def test_index_ok(small_db):
    with TestClient(app=make_app(small_db)) as client:
        response = client.get("/")
    assert response.status_code == 200
    assert "Sasquatch Sightings" in response.text


def test_detail_ok(small_db):
    with TestClient(app=make_app(small_db)) as client:
        response = client.get("/sighting/1")
    assert response.status_code == 200
    assert "G. canadensis" in response.text


def test_missing_sighting(small_db):
    with TestClient(app=make_app(small_db)) as client:
        response = client.get("/sighting/9999")
    assert response.status_code == 404


def test_none_displayed_as_empty(small_db):
    with TestClient(app=make_app(small_db)) as client:
        response = client.get("/sighting/2")
    assert response.status_code == 200
    assert "None" not in response.text

Run the tests from the database/ directory:

pytest test_server_db.py -v

Check Understanding

See [sqlite2025] for the SQLite documentation and [python-sqlite2025] for the Python sqlite3 module reference.

What happens if you call conn.execute(...) several times but never call conn.commit()?

The changes are held in memory but never written to disk. When conn.close() is called, all of them are discarded. The database file on disk remains unchanged, as if the execute calls never happened.

The code below always prints zero rows, even though sightings.db has twenty rows. What is wrong?

conn.commit() is missing after the insert. Without it, the insert is not saved to disk, so the select that follows still sees the original twenty rows---and once conn.close() is called, the insert is discarded entirely. Add conn.commit() between the execute and the select.

conn = sqlite3.connect("sightings.db")
conn.execute("insert into sightings values (?, ?, ?, ?, ?, ?, ?, ?)", [21, "G. canadensis", None, 160, "brown", "2024-08-01 10:00", 50.0, -120.0])
rows = conn.execute("select * from sightings").fetchall()
print(len(rows))
conn.close()

What does fetchone() return when no row matches the where clause?

It returns None. This is why server_db.py checks if row is None before trying to read values from the row: accessing a column on None would raise an AttributeError.

Exercises

Count the Rows

After calling create_db, open sightings.db in a Python script and print the number of rows in the sightings table. Look up the SQL function count(*) to do this in a single query.

Check for Duplicates

Modify create_db.py so that running it twice does not add duplicate rows. The create table if not exists clause already handles the table; think about how to handle the insert statements.

Filter by Species in the Tests

Add a test to test_server_db.py that visits the index page and confirms that both "G. canadensis" and "G. horribilus" appear in the response when SMALL contains one row of each.

Inspect the Database Directly

SQLite databases can be opened and queried from the command line. Run sqlite3 sightings.db and use the .tables command to list tables, then run select count(*) from sightings; to confirm the row count. Exit with .quit.