Managing a simple database with Python, SQLite and wxPython, 3

Phase 2 3 Comments »

In the last post we saw how to connect to a SQLite database file and generate a cursor that would allow us to actually interact with such database. Now we need some functionality that will interact with the data, add, read, delete and search. As was mentioned before the idea is to have a generic database interaction class and have unique instantiated class objects for each database of the project. In the db_obj.py file we have an initial structure set, so let’s check the DB_Generic class.

class DB_Generic():
    '''generic class to add DB functionality'''
    def __init__(self, table_name):
        #par= name of the table to be used
        self.table_name = table_name

    def delete_entry(self):
        pass

    def get_data_generic(self):
        '''gets the data from the database
        generic so far, needs to be updated to include range'''        

        range = 1
        (cursor, database) = link_db()

        if range == 1:
            cursor.execute("""SELECT * from %s""" % self.table_name)

        table_data = cursor.fetchall()
        raw_data = []
        for i in table_data:
            raw_data.append(list(i))

        self.table_data = raw_data

        database.close()

    def insert_data(self, values_list, insert_string):
        '''inserts data in the database'''

        (cursor, database) = link_db()
        cursor.execute(insert_string % self.table_name, values_list)

        database.commit()
        database.close()

There are different functions in this class, we will take a look at each one individually. We can see that the class is far from being complete, something that we’ll do in the next posts. We start with the class initialization:

def __init__(self, table_name):
        #par= name of the table to be used
        self.table_name = table_name

Very simple and direct, it receives the table name that is being used by the interface (in this case). The table name is then stored in a object that can be accessed by other functions in the class. The function to delete entries is not finished as we only have a pass in it. We’ll will do it soon. Next we have a function that gets the data from the table.

    def get_data_generic(self):
        '''gets the data from the database
        generic so far, needs to be updated to include range'''        

        range = 1
        (cursor, database) = link_db()

        if range == 1:
            cursor.execute("""SELECT * from %s""" % self.table_name)

        table_data = cursor.fetchall()
        raw_data = []
        for i in table_data:
            raw_data.append(list(i))

        self.table_data = raw_data

        database.close()

So far it grabs everything, there is no range selection based on any of the table fields, so it’s a generic SQL SELECT. Let’s dissect it. The range object is a dummy variable that at the moment is there only to remind us that we need to include a range select. The next line is the most important in this function: it will call the link_db function and start the connection. Remember that link_db returns a tuple with the cursor and database connection. Basically we will work with cursor methods to get the data, and the first action is to execute a SQL SELECT stetement where we select everything in the table

cursor.execute("""SELECT * from %s""" % self.table_name)

Notice that the statement is a regular string and we use string formating % in ordert o add the table that we are searching, which was defined when we initialized the class object in the first place. Also, notice the triple quotes around the select statement: this will avoid any problems in parsing it when sending to the database, making it a string literal.

So this line executes the statement we pass to the method, but it does not actually get the data per se. We need to use another method and fetch everything returned by the select. This is done by

table_data = cursor.fetchall()

A couple of things here. The data fetched will be always (or in most cases) in unicode, when it’s a string field. And the data returned will be in a list of tuples, with the actual number of fields from the table. We know that it’s easier to work with lists than tuples, so we code something to convert types

table_data = cursor.fetchall()
raw_data = []
for i in table_data:
    raw_data.append(list(i))

self.table_data = raw_data

There are extra lines here that are not needed, and we will get rid of them in a code refactoring soon. This short function is able to connect to database, execute a SQL statement on a specified table and grab the data selected, returning a list of lists with every field and value available. We need to add a better selection statement later, and we will do as soon as we have a good structure set.

The last function in this generic class is the one that inserts data into the table.

def insert_data(self, values_list, insert_string):
    '''inserts data in the database'''

    cursor, database) = link_db()

    cursor.execute(insert_string % self.table_name, values_list)

    database.commit()
    database.close()

Identical procedure: connect, get a cursor, execute a statement. But in this case the extra step is not to get the data, but to commit the data to the table, which is done by the commit method. We will explain later how the execute method works here and what are the insert_string and values_list. Notice at the end that we need to close the connection to the database, so we know that the data has been inserted properly.

Next, we will instantiate a class from this generic one and see how easy is to manipulate the data. Stay tuned.

Previously in the series:
Part 1
Part 2

Reblog this post [with Zemanta]

Managing a simple database with Python, SQLite and wxPython, 2

Phase 2 Comments Off

Let’s continue coding our small Python + SQLite application. The initial idea was to have a file for the interface and another file for the DB access. We will start with the later. If you have access to the repository you will see two Python files, bac_form.py and db_obj.py. At the moment they are not well commented and have some junk lines at the bottom, legacy from older versions. Take a look on db_obj.py.

It has two class declarations, one called DB_Generic and another one called Bac. Remember in the last post where I mentioned that the idea was to have different simple tables in the same SQLite database and each table would have a simple input/output interface (If I didn’t mention that, I just did!). So, we can create a generic DB access class and we can subtype from it for every table that we will be using. In the db_obj.py file we have at the moment the generic database management class, a class derived from the generic to access the Bac database and an initialization function, that opens the access to the SQLite file. Let’s take a look at it:

def link_db():
    '''initializes the database file'''
    try:
        db = sqlite3.connect("samples.db")
    except sqlite3.Error, errmsg:
        print 'DB not available ' + str(errmsg)
        sys.exit()

    db_cursor = db.cursor()
    return db_cursor, db

In order to access a SQLite database file we need initially the name of the file. After importing sqlite3 (we’re using the latest version of SQLite here) Python has everything it needs to access, change and manipulate data in a SQLite database. Just to be sure the database file is there and we don’t get an error, we have the initialization code inside an exception. We have seen exceptions before and in this case we use it to be sure the database file has been accessed with no problems. The exception structure looks like

try:
	#here we try to do something
	#the code placed here would be executed
	#if no error reported it will go until the end and exit
	#if not, some error (exception) raised
except:
	#the code under except will be executed

So, the first step is to connect to the database file

db = sqlite3.connect("samples.db")

In our case it’s a fixed file, but the connect method receives any kind of string. It could have been some parameter obtained from the command line or a string from a configuration file. If the connect is successful, no error will be raised and we are safe to continue. We connected to database, now what? We need a cursor, an object that will actually access the data and allow us to execute SQL commands on it.

db_cursor = db.cursor()

cursor method works on the database connection object that we created previously. We’re set. This function returns the cursor and database connection objects that we created, in a tuple and this function can be called from the classes we are going to work. The classes will then have connection to the database and a cursor that would manage, select, delete and add data to it.

Next time we’ll see how our generic table class works.

Previously in the series:
Part 1

Reblog this post [with Zemanta]

Managing a simple database with Python, SQLite and wxPython, 1

Phase 2 17 Comments »
The official wxPython logo
Image via Wikipedia

A little break from reviewing the book, let’s check some database topics in Python. I was asked to create a simple database to organize wet-lab stuff. No relationships needs, no relational tables required. Just a simple table with determined columns, and a nice GUI to go with it so people can edit, search and use.

My first idea was to use SQLite database, and I stuck with it. After the initial phase of “interviews” to check database requirements, I ended up with a list of tables and decided to start working on the table that organizes the BACs used in the lab. BAC is a DNA vector into which large DNA fragments can be inserted and cloned in a bacterial host, and are used mainly in cytogenetics around here. In the end the table had this structure

CREATE TABLE bac
(idbac INTEGER PRIMARY KEY,
clone Text,
sdate Date,
source Text,
gene TEXT,
chromosome Text,
startpos Integer,
endpos Integer,
antibiotic Text,
location1 Text,
temperature Integer,
tubes Integer,
box Integer,
cell Integer,
dnaex Boolean,
validation Boolean,
pcr Boolean,
projects Text,
comments Text,
genelink Text,
refs Text);

I won’t explain in detail each of the fields, but we can see that there is a mix of different types. SQLite doesn’t allow many different field types, so we stick to the basics.

And why SQLite? The module to access it comes with Python 2.5, the whole database is stored in one file that can be moved around and it allows a full SQL query language, which is perfect for these simple cases. So we will going to use Python, SQLite and wxPython to create a simple application to manage our simple database.

I already created a GitHub repository for the upcoming code.

Reblog this post [with Zemanta]

BPforB is now PEP 8 compliant!

Phase 2 Comments Off

As mentioned in the previous post, Robin Stocker kindly provided a git patch with the required changes to all scripts stored on the repository to be compliant with the PEP 8.

The changes were mainly regarding variable/object names, but they were important as make the code available here more Pythonic following the rules of the Benevolent Dictator for Life.

I would like to thank Robin for spending his time doing this. Much appreciated.

Now, just a quick git tutorial on how to apply patches:

git apply __patch_file__
git commit -a -m “patch applied”
git push

That’s it. Apply, commit, push and you’re done. The repository is already updated.

Finally it’s 2009 …

Phase 2 Comments Off
Python logo, 1990s-2005
Image via Wikipedia

And … we’re back. The long and cold winter is still out there and January 2009 is almost in the books. After a long period without updating I’ll try to “rush” some posts this week, trying to get back on track. So, a little bit of what’s up and coming:

- a patch provided by Robin Stocker to make all scripts published here (at least the ones on GitHub) PEP 8 compliant.

- using SQLite databases in Python

- developing an interface to access the database

- anything that you might suggest, just leave a comment.

Let’s start 2009 then.

Reblog this post [with Zemanta]

Twitter

Phase 2 Comments Off

I’m on Twitter, for quite some time. Some Python stuff, some biology, some bioinformatics, and a little bit of everything else.

nuin.

That’s it for 2008

Phase 2 1 Comment »

The date came and is now gone, and I forgot to “celebrate” two years of Beginning Python for Bioinformatics on December 13th. I would like to thank everyone that commented, helped with posts and suggested anything that would make this website better. Clearly it is far from being what I wanted it to be, but slowly but surely we will get there.

Thanks again and I wish an excellent holiday season and a great 2009 to everyone!

See you in 2009.

Git repository updated

Phase 2 2 Comments »

Commercial Street is an important commercial a...Image via Wikipedia I just updated the git repository of BPB. Click here to access it. Most of the code presented in the blog is there, some with extra comments, some being updated.

This close another phase in the blog and soon we will check some different aspects of Python programming in Bioinformatics.

Reblog this post [with Zemanta]

Python, overepresented motifs, the Grand Finale

Phase 2, motifs Comments Off

In this final part, let’s do some very simple refactoring and modify the output section to make the result a little bit better. There are not many options about the functions to calculate the binomial expansion. But Andrew posted some opinions on how to slight change the quorum function.

def get_quorums(seqs, mlen):
    """
    add seq id_no to a set
    use explicit counter to create seq_no
    """
    quorum = defaultdict(int)
    for seq in seqs:
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]] += 1
    return quorum

His modifications were small but improved the code a bit, as you remove one variable/object from the function. At the same time there is need to change a bit our output section of the code, as we don’t use a defaultdict initialized with a set, but with an integer.

for i in foreground:
    term1 = choose(background[i], foreground[i])
    term2 = choose((N - background[i]), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 < p <= 0.0001:
        print i, foreground[i], background[i], p

Notice that in the term1 line we don’t check for the set length anymore and just use the integer stored in foreground and background. Again a small change, that can make the code a little bit more clear. But we need to modify this section so the output is a little bit more clear, maybe ordered by motif sequence.

But as we are reading the sequences as they are our results are not ordered. It would be great to have a final list starting with AAAAAAAA and ending with TTTTTTTTT. There is an easy way to do that, and very inexpensive regarding code and final performance. Basically we append each one of the motifs (and their extra information) to a list and use the sort method for lists. So our output section of the code will be

res_motifs = []
for i in foreground:
    term1 = choose(background[i], foreground[i])
    term2 = choose((N - background[i]), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 < p <= 0.0001:
        res_motifs.append(i + '\t' + str(foreground[i]) + '\t' + str(background[i]) + '\t' + str(p))

res_motifs.sort()
for i in res_motifs:
    print i

Putting everything together our final motif determination script is (batteries included):

#!/usr/bin/env python

import fasta
import sys
from collections import defaultdict

def choose(n, k):
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

def get_quorums(seqs, mlen):
    """
    add seq id_no to a set
    use explicit counter to create seq_no
    """
    quorum = defaultdict(int)
    for seq in seqs:
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]] += 1
    return quorum

input_seqs = fasta.read_seqs(open(sys.argv[1]).readlines())
input_seqs2 = fasta.read_seqs(open(sys.argv[2]).readlines())

foreground = get_quorums(input_seqs, 10)
background = get_quorums(input_seqs2, 10)

N = len(input_seqs) + len(input_seqs2)

res_motifs = []
for i in foreground:
    term1 = choose(background[i], len(foreground[i])
    term2 = choose((N - background[i]), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 < p <= 0.0001:
        res_motifs.append(i + '\t' + str(foreground[i]) + '\t' + str(background[i]) + '\t' + str(p))

res_motifs.sort()
for i in res_motifs:
    print i

Next we will see some basic Python methods. And maybe start a new series and phase.

Reblog this post [with Zemanta]

Obtaining overrepresented motifs in DNA sequences, final

Phase 2 1 Comment »

The part 13 of the motifs series is the last one. In a couple of weeks I will post a refactored code, including the suggestions from Andrew in the last post. I will update the blog contents on OWW and commit some of the code to the GitHub repository.

Reblog this post [with Zemanta]
Design by j david macor.com.Original WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in