Relational database introduction with Python and SQLite

This is my OLD blog. I've copied this post over to my NEW blog at:

You should be redirected in 2 seconds.

I mentioned in a previous post that I have an Excel spreadsheet containing a bunch of information about the development C code I'm working on. It is a large table showing which variables are input and output from which functions. The variable names are in the first column and the function names are in the first row. I use "i" or "o" to denote if a variable is an input to or an output from a particular function. A simplified example is shown in the table below.

name module type desc ExtSource DoThis CalcThis CalcThat ModifyStuff ExtSink
foo ModuleExt double Description of foo o i   i    
bar ModuleExt double Description of bar o i i      
knark Module1 int Description of knark   io i      
wert Module1 double Description of wert   o   i i  
jib Module1 double Description of jib     o   i  
laz Module2 double Description of laz     o   o i
kew Module2 double Description of kew       o o i

As shown above, the table also includes the variable descriptions, type, and scope. In my real, unsimplified Excel table, I also include function-related information such as the function description, and the function prototype. This additional information makes the table very messy. Try to imagine where you would put the function description and function prototype information in the table above. I came up with a kludge solution, but it involved repeated information, and rows and columns with inconsistent meaning. I was realizing the limitations of a 2-dimensional table.

I had been interested in learning about relational databases, so this seemed like a good opportunity. Relational databases imply the SQL query language-- relational databases that use SQL are about the only kind around. They include: Oracle, Microsoft SQL Server, PostgreSQL, MySQL, and SQLite among others. I chose SQLite for my task because it is lightweight (no separate server), free, popular, and fast for small databases. (See for other distinctive features.) Also Python 2.5 now includes an interface module, sqlite3, as part of the standard distribution. See for the documentation.

Relational model design

I could create a relational database using my original table in the Excel spreadsheet. However, this organization is not in the spirit of relational database design. The [relational] approach is to create many smaller tables and relationships between those tables. For my example, I created a variable table, a function table, and a variable_function table to indicate the relationship between the variable and function tables. Though it looks more complicated than the original single table form, inputting this into a relational database, such as SQLite, will allow for more advanced manipulation of the data. It is also much more [conducive] to maintenance and expansion. For example, now there is a natural place to put the additional function-related information which was so clumsy in the original single table. To do this, I add desc and prototype columns to the function table. If I needed to add more information to about the different modules, I could create a separate module table and include the module_id as a column in the variable table.

id name module type desc
1 foo ModuleExt double Description of foo
2 bar ModuleExt double Description of bar
3 knark Module1 int Description of knark
4 wert Module1 double Description of wert
5 jib Module1 double Description of jib
6 laz Module2 double Description of laz
7 kew Module2 double Description of kew


variable_id function_id type
1 1 output
1 2 input
1 4 input
2 1 output
2 2 input
2 3 input
3 2 input-output
3 3 input
4 2 output
4 4 input
4 5 input
5 3 output
5 5 input
6 3 output
6 5 output
6 6 input
7 4 output
7 5 output
7 6 input

SQLite Implementation (w/ Python)

To implement this database, all I needed was Python 2.5. If you have an older version of Python, you can install the pysqlite module. Here is the Python code to implement the database above.

import sqlite3

# data
    (1, 'foo', 'ModuleExt', 'double', 'Description of foo'),
    (2, 'bar', 'ModuleExt', 'double', 'Description of bar'),
    (3, 'knark', 'Module1', 'int', 'Description of knark'),
    (4, 'wert', 'Module1', 'double', 'Description of wert'),
    (5, 'jib', 'Module1', 'double', 'Description of jib'),
    (6, 'laz', 'Module2', 'double', 'Description of laz'),
    (7, 'kew', 'Module2', 'double', 'Description of kew'),
    (1, 'ExtSource'),
    (2, 'DoThis'),
    (3, 'CalcThis'),
    (4, 'CalcThat'),
    (5, 'ModifyStuff'),
    (6, 'ExtSink'),
    (1, 1, 'output'),
    (1, 2, 'input'),
    (1, 4, 'input'),
    (2, 1, 'output'),
    (2, 2, 'input'),
    (2, 3, 'input'),
    (3, 2, 'input-output'),
    (3, 3, 'input'),
    (4, 2, 'output'),
    (4, 4, 'input'),
    (4, 5, 'input'),
    (5, 3, 'output'),
    (5, 5, 'input'),
    (6, 3, 'output'),
    (6, 5, 'output'),
    (6, 6, 'input'),
    (7, 4, 'output'),
    (7, 5, 'output'),
    (7, 6, 'input'),

# get connection and cursor objects
conn = sqlite3.connect('iodatabase')
c = conn.cursor()

# create tables
c.execute('''create table variable (
    id integer,
    name text,
    module text,
    type text,
    desc text
c.execute('''create table function (
    id integer,
    name text
c.execute('''create table var_func (
    variable_id integer,
    function_id integer,
    type text

# fill tables with data
for row in VARIABLES:
    c.execute('insert into variable values (?,?,?,?,?)', row)
for row in FUNCTIONS:
    c.execute('insert into function values (?,?)', row)
for row in VAR_FUNC:
    c.execute('insert into var_func values (?,?,?)', row)

Querying the Database

Now that I have created the database, I can get answers to interesting questions, such as What variables are output by CalcThis? Here is the Python/SQLite code to answer this question:

            'SELECT, variable.module, variable.type, variable.desc ',
            'FROM variable, var_func, function ',
            'WHERE ', 
            'AND ',
            'AND"CalcThis" ',
            'AND var_func.type="output" ',
FORMAT = '%-6s%-10s%-8s%-20s'
print FORMAT % ('name', 'module', 'type', 'desc')
print '-' * 44
for row in c:
    print FORMAT % row
Here is the output.
name    module    type    desc
jib     Module1   double  Description of jib
laz     Module2   double  Description of laz
I can see this is consistent with my original table. The query works. For such a small example, the original method may seem easier, but as the number of entries grows, the benefit of the relational database grows as well. Here is another example which asks Which functions use the variable wert as an input?
            'SELECT ',
            'FROM variable, var_func, function ',
            'WHERE ', 
            'AND ',
            'AND"wert" ',
            'AND var_func.type="input" ',
print 'name'
print '------------'
for row in c:
    print '%s' % row
I would like to do even more complicated things like determine the prerequisite input variables across all functions for a given output variable. However, I still need to read more about that, so that will have to wait.


jean said...

Spreadsheet? python?
You should take a look at our product:

jean viry-babel

sofeng said...

jean viry-babel:
looks like a cool product-- wish it was my own.


This is my *OLD* blog. I've copied all of my posts and comments over to my NEW blog at:

Please go there for my updated posts. I will leave this blog up for a short time, but eventually plan to delete it. Thanks for reading.