Published on

SQLite with Python

8 min read
Authors

SQLite is a commonly misunderstood database. I myself first heard of the database when I was experimenting with android app development. My thoughts were that since the database was a single file and its name had lite in it, that it could not be taken seriously. But I could not have been more wrong. Now I love sqlite and use it for much of my scientific research. Sqlite is everywhere. It is on every android phone, airplanes, and the storage for many computer applications. The sqlite website details when using sqlite would be appropriate to use and can be summarized with these three questions.

  1. Is the data separated from the application by a network? (nfs, glusterfs, ceph)
  2. Many concurrent writers? (> 100 writes / second)
  3. Big data? (> 140 Terabytes)

If you answered no to all of these questions then sqlite is a perfect solution for you. Often times (1) is not an issue for a small number of writers and I use it quite successfully with ceph. Now that you have considered SQLite I want to show you how python makes it even more awesome. SQLite is included with every python install so it is as simple as import sqlite3.

import sqlite3

Next we connect to an sqlite database. The file does not have to already exist. SQLite also allows an in memory database with the filename :memory:. The database is not persistent but will read and write significantly faster. My tests show writes to be about 1000x faster taking 10μs10 \mu s. The reason for the fast writes is that the system call fsync does not need to be called. It has an overhead of between 110ms1 - 10 ms.

db_filename = '/tmp/example.db'
connection = sqlite3.connect(db_filename)

Now lets create a simple table for demonstration. IF NOT EXISTS is an sqlite specific feature.

connection.execute("""
CREATE TABLE IF NOT EXISTS person (
    first_name TEXT,
    last_name  TEXT,
    age        FLOAT
)
""")

Insert myself the author into the database.

connection.execute("""
INSERT INTO person (first_name, last_name, age)
VALUES (?, ?, ?)
""", ('chris', 'ostrouchov', 27))
connection.commit()

Next we might want to insert many people at once into the database under one transaction.

connection.executemany("""
INSERT INTO person (first_name, last_name, age)
VALUES (?, ?, ?)
""", (('john', 'smith', 32), ('jane', 'doe', 15)))
connection.commit()

We can query using a simple select statement. But notice that it only returns tuples. This is not very efficient as we would like to get the column names as well.

for row in connection.execute('SELECT * FROM person'):
    print(row)
('chris', 'ostrouchov', 27.0)
('john', 'smith', 32.0)
('jane', 'doe', 15.0)

To add column names to the cursor we can use the included row factory for sqlite. The python documentation states that it should be more performant than a custom implementation.

connection.row_factory = sqlite3.Row

for row in connection.execute('SELECT * FROM person;'):
    print(row.keys())
    print(tuple(row))
['first_name', 'last_name', 'age']
('chris', 'ostrouchov', 27.0)
['first_name', 'last_name', 'age']
('john', 'smith', 32.0)
['first_name', 'last_name', 'age']
('jane', 'doe', 15.0)

Once you are comfortable inserting, deleting, and querying rows of the database the next step is to perform transactions. The python sqlite interface provides a convenient way to do this with a context manager for starting, committing, rolling back a transaction. Notice how executing the failing sql statement this should fail leads to bob smith not being inserted into the database.

try:
    with connection:
        connection.execute("""
        INSERT INTO person (first_name, last_name, age)
        VALUES (?, ?, ?)
        """, ('bob', 'smith', 54))
        connection.execute("this should fail")
except sqlite3.OperationalError:
    print('SQL statment failed with Operational error')

for row in connection.execute('SELECT * FROM person;'):
    print(tuple(row))
SQL statment failed with Operational error
('chris', 'ostrouchov', 27.0)
('john', 'smith', 32.0)
('jane', 'doe', 15.0)

Now finally to advanced features specific to SQLite and the python interface. SQLite allows has five basic types NULL, INTEGER, REAL, TEXT, and BLOB. Suppose we would like to add datetime and json support to the database. There are conversion functions you can declare to and from the database.

For datetimes we declare this interface and add a birthday column to person. Note that it is important to connect to the sqlite database with the PARSE_DECLTYPES option.

import datetime as dt

connection = sqlite3.connect(db_filename, detect_types=sqlite3.PARSE_DECLTYPES)

def adapt_datetime(datetime):
    return (datetime.strftime('%Y-%m-%d %H:%M:%S')).encode()

def convert_datetime(blob):
    return dt.datetime.strptime(blob.decode(), '%Y-%m-%d %H:%M:%S')

sqlite3.register_adapter(dt.datetime, adapt_datetime)
sqlite3.register_adapter(dt.date, adapt_datetime)
sqlite3.register_converter('datetime', convert_datetime)

with connection:
    connection.execute("ALTER TABLE person ADD COLUMN birthday DATETIME")

We can easily check that the columns have be added by using sqlite's special pragma statements. Here we print every column of the sqlite table person.

for row in connection.execute('PRAGMA table_info(person)'):
    print(tuple(row))
(0, 'first_name', 'TEXT', 0, None, 0)
(1, 'last_name', 'TEXT', 0, None, 0)
(2, 'age', 'FLOAT', 0, None, 0)
(3, 'birthday', 'DATETIME', 0, None, 0)

Lets look what happens when we insert a person with a birthday into the table.

with connection:
    connection.execute("""
    UPDATE person
    SET birthday = ?
    WHERE first_name = ? AND last_name = ?
    """, (dt.date(1977, 4, 23), 'john', 'smith'))

for row in connection.execute('SELECT * FROM person;'):
    print(tuple(row))
('chris', 'ostrouchov', 27.0, None)
('john', 'smith', 32.0, datetime.datetime(1977, 4, 23, 0, 0))
('jane', 'doe', 15.0, None)

A commonly asked question is if it is possible to store json in an sql database. While this is a highly debated topic on if it is the right thing to do, it is certainly possible in sqlite. In fact using similar code with the adapters and converters we can create a simple json interface.

import json

def adapt_json(data):
    return (json.dumps(data, sort_keys=True)).encode()

def convert_json(blob):
    return json.loads(blob.decode())

sqlite3.register_adapter(dict, adapt_json)
sqlite3.register_adapter(list, adapt_json)
sqlite3.register_adapter(tuple, adapt_json)
sqlite3.register_converter('JSON', convert_json)

with connection:
    connection.execute("ALTER TABLE person ADD COLUMN interests JSON")

for row in connection.execute('PRAGMA table_info(person)'):
    print(tuple(row))
(0, 'first_name', 'TEXT', 0, None, 0)
(1, 'last_name', 'TEXT', 0, None, 0)
(2, 'age', 'FLOAT', 0, None, 0)
(3, 'birthday', 'DATETIME', 0, None, 0)
(4, 'interests', 'JSON', 0, None, 0)

Let us insert some interests for myself into the database. We now see how adaptable sqlite is! However one thing to note is that sqlite has no easy way of querying the json column because to the library it is just a binary blob. If you would like support directly from SQLite see the json1 extension. Note that the json1 extension will have better performance and allows expressions on paths within the json column. We will show next how we can achieve the same scalabilty with Python and perform complex expressions on paths within the json column.

with connection:
    interests = {
        'books': ['C Programming Language', 'Daniels\' Running Formula'],
        'hobbies': ['running', 'programming']
    }
    connection.execute("""
    UPDATE person
    SET interests = ?
    WHERE first_name = ? AND last_name = ?
    """, (interests, 'chris', 'ostrouchov'))

for row in connection.execute('SELECT * FROM person;'):
    print(tuple(type(_) for _ in row))
    print(tuple(row))
(<class 'str'>, <class 'str'>, <class 'float'>, <class 'NoneType'>, <class 'dict'>)
('chris', 'ostrouchov', 27.0, None, {'books': ['C Programming Language', "Daniels' Running Formula"], 'hobbies': ['running', 'programming']})
(<class 'str'>, <class 'str'>, <class 'float'>, <class 'datetime.datetime'>, <class 'NoneType'>)
('john', 'smith', 32.0, datetime.datetime(1977, 4, 23, 0, 0), None)
(<class 'str'>, <class 'str'>, <class 'float'>, <class 'NoneType'>, <class 'NoneType'>)
('jane', 'doe', 15.0, None, None)

SQLite allow you to create functions and aggregate functions. The python interface to SQLite provides a convenient method of creating functions that will be directly executed within sqlite. These functions will run directly inside of sqlite.

Here we create a simple json path function to return subpath of a json data.

import re

sqlite3.enable_callback_tracebacks(True)

def json_path(json_path, blob):
    if blob is None: return None
    paths = [int(_) if re.match('\d+', _) else _ for _ in  json_path.split('.')]
    path_value = json.loads(blob.decode())
    for path in paths:
        path_value = path_value[path]
    if isinstance(path_value, (int, float, str, bytes)):
        return path_value
    return (json.dumps(path_value)).encode()

with connection:
    connection.create_function("json_path", 2, json_path)

Using sqlite custom functions can really help performance for queries that would normally return a large amount of data. In this example we query what could be large json files. If perform json_path on each value we reduce the amount of data the query returns. Since a json path can return another json object see SQLite CAST expression. This allow you to state the type of the return expression. In this case CAST json_path("books.1", interests) AS JSON.

with connection:
    for row in connection.execute('SELECT json_path(?, interests) FROM person', ("books.1",)):
        print(row)
("Daniels' Running Formula",)
(None,)
(None,)

Overall python exposes a beautiful interface to sqlite. I use it on many of my projects including the advanced features such as the adapter functions and custom python functions. I hope this tutorial has been informative. Do share your views and corrections if I made a mistake.