Managing a simple database with Python, SQLite and wxPython, 5

Phase 2, wxPython 3 Comments »

We have seen how to connect, get and insert data (at least theoretically) in the database. Now, a little not about the SQL engine of choice here: SQLite. SQLite databases have the main characteristic that they are self-contained files. Also it does not require an installation, works without a server and works pretty well in most operating systems.

Basically for the type of application we’re developing here, SQLite seems ideal. It eliminates a lot of infrastructure that would be needed if we were working with MySQL or postgresql. We don’t need a server or know how to configure users or manage the databases and tables. All we need is contained in a single file that can be transported from system to system and can be accesed from the computers used in the lab, mainly XP and OS X. Also some web frameworks (Rails and Django, for instance) can use SQLite, so in the end we can have a desktop application and a web application accessing the same file without extra configuration.

Now the database created for this application has 8 tables and almost no relationships among them. SQLite allows the creation of relationships but in our case only a couple of cases were required. For the table we are using at the moment (bac) there is no need for relationships, although there are some fileds that can benefit from a more relational structure. Also SQLite don’t have the same data types that are found on the bigger SQL engines. All values can be stored as text, integer, real (floating point numbers), null and blob (verbose type, what you store is what you get). As actual types, you can set columns as Boolean and Data for instance and SQLite will understand them. If you have no experience in creating databases, let’s check again the table we are using in this small project. First, I would recommend the use of some SQLite database editor. You can find pretty good ones for any computer system and there is even a Firefox extension that allows you to edit some files. Editors make it easier to generate the SQL table creation scripts and make easier to visualize what we are doing.

So, the table bac looks like

CREATE TABLE bac
(idbac INTEGER PRIMARY KEY,
clone Text,
sdate Date,
source Text,
gene Text,
chromosome Text,
startpos Integer,
endpos Integer,
antibiotic Text,
location1 Text,
temperature Integer,
tubes Integer,
box Integer,
cell Integer,
dnaex Boolean,
validation Boolean,
pcr Boolean,
projects Text,
comments Text,
genelink Text,
refs Text);

If you go back to our last post, you will see that in the insert statement there is no mention of the idbac field. We don’t actually insert ay value there, the values that populate this field are created automatically. And idbac is our primary key, meaning it’s the unique identifier of each bac we insert in this table. And in SQLite a integer primary key is automatically incremented whenever values are inserted in the table. So our first insertion will create idbac 1, the second will create idbac 2 and so on.

I’m not going to enter in details about database development and administration, but it’s usual and safe to create tables with an auto-incremental integer primary keys. These fields, apart from make it easier t identify records, make access to such records faster and are great when relationships among tables are set. Let’s say that we had a column user in our bac table. And let’s say we had an user table with two columns: user_id and name, user_id being a auto-increment primary key. The user column in back could be linked with the user_id column in the user table, in what we call a one-to-many relationship (one user can insert as many bacs as he wants). One day we want to know who is actually working in the lab and we want to check how many bacs were catalogued by each user. We can easily search the user table and extract information from bacs at the same time thanks to the relationship between the tables. And the result should be returned quite quickly, as we are only searching integers.

All the other fields/columns in our table are straightforward to understand. They are basically related to the type of data they need to store. validation is a boolean because the bac might have been validated or not, just as danex (DNA extraction). At the same time, the number of tubes stored in the freezer will always be an integer. So, why does temperature is an integer? Because we can only store bacs in two type of freezers: -80 (ultra freezers) or -20 (regular freezer that we can have at home), and we don’t need to worry about fractional numbers.

Well, this is a very short and limited explanation of tables and SQLite. The web is full of resources about it, so next time we will get back to Python.

Previously in the series:
Part 1
Part 2
Part 3
Part 4

Reblog this post [with Zemanta]

Managing a simple database with Python, SQLite and wxPython, 4

Phase 2 4 Comments »
The :en:SQLite logo as of 2007-12-15
Image via Wikipedia

Let’s continue building our small db app. As mentioned in the previous post we need now to instantiate a specific class from our generic SQLite access class. In order to do this we just have to declare a new class and its type will be DB_Generic.

class Bac(DB_Generic)

This new class is called Bac because it’s linked to the bac table in our database file. A side note, bacs are Bacterial Artificial Chromosomes and are used in different molecular biology techniques. Mainly in our case bacs have incorporated human DNA segments and are used as probes for deletion, duplication, etc studies.

Now, back to our Python code, as soon as we instantiate our generic class, the object (class) we create has access to all methods and functions from the parent class (by using self), but we still need to create functionality and expose other methods that can be accessed from a class object derived from Bac.

Our instantiated class will be

class Bac(DB_Generic):
    def __init_(self):
        self.bac_data = []
        DB_Generic.__init__(self, 'bac')

    def get_data(self):
        return self.get_data_generic()

    def load_data(self):
        pass

    def add_data(self, values_list):
        insert_string = """INSERT INTO %s (projects, comments, temperature, cell, box, tubes, chromosome, sdate, clone, source,
        location1, startpos, endpos, gene, genelink, dnaex, validation, pcr, refs, antibiotic)
        VALUES (:projects, :comments, :temperature, :cell, :box, :tubes, :chromo, :date, :clone, :source, :location, :start,
        :end, :gene, :genelink, :dna, :validation, :pcr, :refs, :antibiotic)"""
        self.insert_data(values_list, insert_string)

Pretty simple so far, as we don’t have a lot of declared methods. Let’s check one by one

def __init_(self):
    DB_Generic.__init__(self, 'bac')

The only line is the initialization required by the parent class, and we’re passing the value that is the table to be accessed.

def get_data(self):
	self.get_data_generic()
	return self.table_data

The get_data function returns the all elements in our table (So far, we still don’t have an elegant range option) and has one too many lines in it. We will get rid of some useless code here in the future, but it’s OK the way it is. Basically this code access the get_data_generic from the parent class and gets all the values stored in the table.

There is a function not yet complete that will load data, and will be used in the future. And the last one is the function that actually adds the data to the table with a SQL insert statement

def add_data(self, values_list):
	insert_string = """INSERT INTO %s (projects, comments, temperature, cell, box, tubes, chromosome, sdate, clone, source,
	location1, startpos, endpos, gene, genelink, dnaex, validation, pcr, refs, antibiotic)
	VALUES (:projects, :comments, :temperature, :cell, :box, :tubes, :chromo, :date, :clone, :source, :location, :start,
	:end, :gene, :genelink, :dna, :validation, :pcr, :refs, :antibiotic)"""
	self.insert_data(values_list, insert_string)

In this function, we have a large string with all the SQL insert options. A SQL insert statement is divided into two parts, one where you point where to insert the values and another where you input the values. Usually simple insert statements will have this structure

INSERT INTO my_table_name (table_column1, table_column2) VALUES (value1, value2);

So, we have the table we want to insert values into, its columns and the values we set for each column. After executed this will put value1 into table_column1 and value2 into table_column2. The actual syntax can vary a bit for different SQL engines but the structure is identical in most cases. Pretty simple.

For our insert string above, there are some aspects to call for attention. Again note the triple quote around the statement. This make sure that it’s not changed and parsed correctly. We also have a %s for the table name, which will be parsed by the parent class function that insert values, then a list of all the tables in the database and then a list of values to insert. And why the values to be inserted have this :value syntax? Because we are previously storing the values in a dictionary, and the “:” indicates that we need to get the dictionary value for the correspondent key.

The insert string, and the list of values (actually a dictionary, not the best variable/object name I must admit) is then sent to the parent class to be inserted. Storing the values to be inserted in a dictionary is OK for a one time insert case, where the values are obtained from a form. If you are parsing a large CSV or TSV file, ideally it’s better to put it in a list, and dump them at the same time.

We’re progressing. Next we will take a look on some simple SQL table structure and then move to create the form to insert the values and check the table.

Previously in the series:
Part 1
Part 2
Part 3

Reblog this post [with Zemanta]

evi

Why do I blog. Or: Science Blogging, is it worth?

Phase 2 8 Comments »
High/Low album cover
Image via Wikipedia

Mirroring the post that appeared on Blind.Scientist

Some time ago there was a meme about science blogging and one of the questions were “why do you blog”. Well, I do it because of the “Nada Surf effect”. You don’t know the “Nada Surf effect”? Pity you weren’t in Washington, DC 2001.

In March or April of 2001, Nada Surf played a concert there. It was a small bar on 14th Street W, close to the more famous Black Cat. It was a spring night, I was with a couple of Dutch friends that had told me about the concert, if I’m not wrong, a couple of days before. It was also mid-week, so you wouldn’t expect big crowds in most concerts. We left ISH around 7 pm, with spare time for the 9 pm concert. We didn’t know the venue, we got there and it was empty, just a couple of souls at the bar. We sat and for about an hour we were pondering if we were in the right place, until a guy came and asked if we were staying for the concert. We said yes, paid th US$ 7.50 of the admittance and sipped our beers waiting for the opening act. Soon after we paid, a van parked outside and some guys started bringing music equipment inside. At that time there must have been around 20 people there. The van guys set the instruments, wasted 5 minutes soundchecking, and started. IT was Ashtray Babyhead.

They played for 40 minutes and as fast as they arrived they left. Another van parked outside and this time Nada Surf members started unloading and setting up the stage. Now roadies. OK, maybe one guy helped, but I’m getting old and the memory sometimes falters. At that point in time, almost 9 pm, the number of brave souls was at 50. They played as they were playing for 50.000 people in Wembley. Nice set, great songs, unforgettable night. After the show, they sold CDs at the usual after-show gathering, we talked about New York, Brazil and feijoada.

And why do I call it the “Nada Surf effect”? A band that used to play for thousands of people in festivals and stadiums, had a number one video on MTV (Popular), played in a midweek night in a small bar in Washington, DC as it was the band farewell. Every fan that night felt that they were the most important one, maybe even the only one.

And this type of example is the one that brings me to write this, and to write Beginning Python for Bioinformatics. Especially the latter (as I spend too much time here, writing about non-important stuff). If I can make one person have an idea, one person there to use Python, or least to learn something extra in their lives, I’m happy. I don’t care if I have a huge audience or if I’m famous. I care for the undergrad that is starting today, the high school kid that is hacking at night, or the Java coder that is looking for some better language (Ok, not really, but I couldn’t resist). And this is one of the things that I learned in Science, always give back and don’t expect anything in return. Just add to the pile of knowledge.

So, my advice for the three people that read this site is: let Nada Surf in!

Reblog this post [with Zemanta]

Managing a simple database with Python, SQLite and wxPython, 3

Phase 2 3 Comments »

In the last post we saw how to connect to a SQLite database file and generate a cursor that would allow us to actually interact with such database. Now we need some functionality that will interact with the data, add, read, delete and search. As was mentioned before the idea is to have a generic database interaction class and have unique instantiated class objects for each database of the project. In the db_obj.py file we have an initial structure set, so let’s check the DB_Generic class.

class DB_Generic():
    '''generic class to add DB functionality'''
    def __init__(self, table_name):
        #par= name of the table to be used
        self.table_name = table_name

    def delete_entry(self):
        pass

    def get_data_generic(self):
        '''gets the data from the database
        generic so far, needs to be updated to include range'''        

        range = 1
        (cursor, database) = link_db()

        if range == 1:
            cursor.execute("""SELECT * from %s""" % self.table_name)

        table_data = cursor.fetchall()
        raw_data = []
        for i in table_data:
            raw_data.append(list(i))

        self.table_data = raw_data

        database.close()

    def insert_data(self, values_list, insert_string):
        '''inserts data in the database'''

        (cursor, database) = link_db()
        cursor.execute(insert_string % self.table_name, values_list)

        database.commit()
        database.close()

There are different functions in this class, we will take a look at each one individually. We can see that the class is far from being complete, something that we’ll do in the next posts. We start with the class initialization:

def __init__(self, table_name):
        #par= name of the table to be used
        self.table_name = table_name

Very simple and direct, it receives the table name that is being used by the interface (in this case). The table name is then stored in a object that can be accessed by other functions in the class. The function to delete entries is not finished as we only have a pass in it. We’ll will do it soon. Next we have a function that gets the data from the table.

    def get_data_generic(self):
        '''gets the data from the database
        generic so far, needs to be updated to include range'''        

        range = 1
        (cursor, database) = link_db()

        if range == 1:
            cursor.execute("""SELECT * from %s""" % self.table_name)

        table_data = cursor.fetchall()
        raw_data = []
        for i in table_data:
            raw_data.append(list(i))

        self.table_data = raw_data

        database.close()

So far it grabs everything, there is no range selection based on any of the table fields, so it’s a generic SQL SELECT. Let’s dissect it. The range object is a dummy variable that at the moment is there only to remind us that we need to include a range select. The next line is the most important in this function: it will call the link_db function and start the connection. Remember that link_db returns a tuple with the cursor and database connection. Basically we will work with cursor methods to get the data, and the first action is to execute a SQL SELECT stetement where we select everything in the table

cursor.execute("""SELECT * from %s""" % self.table_name)

Notice that the statement is a regular string and we use string formating % in ordert o add the table that we are searching, which was defined when we initialized the class object in the first place. Also, notice the triple quotes around the select statement: this will avoid any problems in parsing it when sending to the database, making it a string literal.

So this line executes the statement we pass to the method, but it does not actually get the data per se. We need to use another method and fetch everything returned by the select. This is done by

table_data = cursor.fetchall()

A couple of things here. The data fetched will be always (or in most cases) in unicode, when it’s a string field. And the data returned will be in a list of tuples, with the actual number of fields from the table. We know that it’s easier to work with lists than tuples, so we code something to convert types

table_data = cursor.fetchall()
raw_data = []
for i in table_data:
    raw_data.append(list(i))

self.table_data = raw_data

There are extra lines here that are not needed, and we will get rid of them in a code refactoring soon. This short function is able to connect to database, execute a SQL statement on a specified table and grab the data selected, returning a list of lists with every field and value available. We need to add a better selection statement later, and we will do as soon as we have a good structure set.

The last function in this generic class is the one that inserts data into the table.

def insert_data(self, values_list, insert_string):
    '''inserts data in the database'''

    cursor, database) = link_db()

    cursor.execute(insert_string % self.table_name, values_list)

    database.commit()
    database.close()

Identical procedure: connect, get a cursor, execute a statement. But in this case the extra step is not to get the data, but to commit the data to the table, which is done by the commit method. We will explain later how the execute method works here and what are the insert_string and values_list. Notice at the end that we need to close the connection to the database, so we know that the data has been inserted properly.

Next, we will instantiate a class from this generic one and see how easy is to manipulate the data. Stay tuned.

Previously in the series:
Part 1
Part 2

Reblog this post [with Zemanta]

Managing a simple database with Python, SQLite and wxPython, 2

Phase 2 Comments Off

Let’s continue coding our small Python + SQLite application. The initial idea was to have a file for the interface and another file for the DB access. We will start with the later. If you have access to the repository you will see two Python files, bac_form.py and db_obj.py. At the moment they are not well commented and have some junk lines at the bottom, legacy from older versions. Take a look on db_obj.py.

It has two class declarations, one called DB_Generic and another one called Bac. Remember in the last post where I mentioned that the idea was to have different simple tables in the same SQLite database and each table would have a simple input/output interface (If I didn’t mention that, I just did!). So, we can create a generic DB access class and we can subtype from it for every table that we will be using. In the db_obj.py file we have at the moment the generic database management class, a class derived from the generic to access the Bac database and an initialization function, that opens the access to the SQLite file. Let’s take a look at it:

def link_db():
    '''initializes the database file'''
    try:
        db = sqlite3.connect("samples.db")
    except sqlite3.Error, errmsg:
        print 'DB not available ' + str(errmsg)
        sys.exit()

    db_cursor = db.cursor()
    return db_cursor, db

In order to access a SQLite database file we need initially the name of the file. After importing sqlite3 (we’re using the latest version of SQLite here) Python has everything it needs to access, change and manipulate data in a SQLite database. Just to be sure the database file is there and we don’t get an error, we have the initialization code inside an exception. We have seen exceptions before and in this case we use it to be sure the database file has been accessed with no problems. The exception structure looks like

try:
	#here we try to do something
	#the code placed here would be executed
	#if no error reported it will go until the end and exit
	#if not, some error (exception) raised
except:
	#the code under except will be executed

So, the first step is to connect to the database file

db = sqlite3.connect("samples.db")

In our case it’s a fixed file, but the connect method receives any kind of string. It could have been some parameter obtained from the command line or a string from a configuration file. If the connect is successful, no error will be raised and we are safe to continue. We connected to database, now what? We need a cursor, an object that will actually access the data and allow us to execute SQL commands on it.

db_cursor = db.cursor()

cursor method works on the database connection object that we created previously. We’re set. This function returns the cursor and database connection objects that we created, in a tuple and this function can be called from the classes we are going to work. The classes will then have connection to the database and a cursor that would manage, select, delete and add data to it.

Next time we’ll see how our generic table class works.

Previously in the series:
Part 1

Reblog this post [with Zemanta]

Managing a simple database with Python, SQLite and wxPython, 1

Phase 2 17 Comments »
The official wxPython logo
Image via Wikipedia

A little break from reviewing the book, let’s check some database topics in Python. I was asked to create a simple database to organize wet-lab stuff. No relationships needs, no relational tables required. Just a simple table with determined columns, and a nice GUI to go with it so people can edit, search and use.

My first idea was to use SQLite database, and I stuck with it. After the initial phase of “interviews” to check database requirements, I ended up with a list of tables and decided to start working on the table that organizes the BACs used in the lab. BAC is a DNA vector into which large DNA fragments can be inserted and cloned in a bacterial host, and are used mainly in cytogenetics around here. In the end the table had this structure

CREATE TABLE bac
(idbac INTEGER PRIMARY KEY,
clone Text,
sdate Date,
source Text,
gene TEXT,
chromosome Text,
startpos Integer,
endpos Integer,
antibiotic Text,
location1 Text,
temperature Integer,
tubes Integer,
box Integer,
cell Integer,
dnaex Boolean,
validation Boolean,
pcr Boolean,
projects Text,
comments Text,
genelink Text,
refs Text);

I won’t explain in detail each of the fields, but we can see that there is a mix of different types. SQLite doesn’t allow many different field types, so we stick to the basics.

And why SQLite? The module to access it comes with Python 2.5, the whole database is stored in one file that can be moved around and it allows a full SQL query language, which is perfect for these simple cases. So we will going to use Python, SQLite and wxPython to create a simple application to manage our simple database.

I already created a GitHub repository for the upcoming code.

Reblog this post [with Zemanta]

Expert Python Programming by Tarek Ziadé – a review of Chapter 3

off topic 2 Comments »

The chapter 3 review that I promised for “tomorrow” (last Saturday) was lazily postponed until today. So, let’s get to it. Tarek in this chapter continues with syntax best practices, but at this time at class level. As expected the chapter requires that you have a minimal knowledge of Python classes, so I can say it’s geared to somewhat experienced programmers, and not to newcomers. There is a short explanation on sub-classing that warms up things for the next sections.

Next is the built-in method (type?) super, which was new to me. Basically super gives you access a method or attribute of a class by calling its parent directly. This is a segue into understanding the Method Resolution Order in Python, which is understanding which class has precedence over the others. For me, I haven’t dealt with such structures before it was a good and straight explanation, especially when he explains about possible pitfalls of using super. A short list of best practices helps:

  • Multiple inheritance should be avoided:
  • super usage has to be consistent: Mixing super and classic calls is a confusing practice.
  • Don’t mix old-style and new-style classes
  • Class hierarchy has to be looked over when a parent class is called

After dealing with MRO, comes what I think is one of the best sections of the book so far, where Tarek explains about object descriptors and gives a little bit of the Python’s approach to introspection. This short section is basically all code, but it’s good to have a good best practices reference, including here properties and slots.

The last part of the chapter covers meta programming, and as Chris pointed in the comments, that’s a difficult area of Python (maybe for the ones like me that don’t have a CS formation). I would have to try the examples by hand and maybe define areas in my code where I can use it, so to take fully advantage and fully understand it.

Overall, the chapter gives a good series of topics about Python classes and I enjoyed learning a little bit more things that I couldn’t understand previously. Next we will see a review of chapter 4, that deals with PEP 8 and naming best practices.

Expert Python Programming by Tarek Ziadé – a review of Chapter 2

off topic 2 Comments »

So we’re up to the second chapter of Tarek’s book. A short disclaimer before diving into it. I started this blog, basically one year after I had started programming with Python. The initial idea was to “convert” the Beginning Perl for Bioinformatics book to Python and see what were the advantages and disadvantages of both languages. I was far from being a advanced Python programmer, and the inception of the blog helped me getting close to that, even though I consider myself far from being an expert programmer in Python. I learned a lot working on converting the Perl and learned a lot from the comments and interaction with other programmers and visitors of the blog. As anything in life one’s path is long and tortuous and there’s nothing better than daily learning and exercise.

So, as I mentioned in the previous post, this book was tailored for someone like me. I needed a boost on advanced Python techniques and the second chapter just gave me that. Tarek writes in this chapter about good syntax practices below the class level, functions and methods that are common in daily usage. He starts with list comprehensions, that we have seen in this site. It’s a short and concise section and gives you exactly what you need about this functionality.

Next, iterators and generators. I had a little bit of background on iterators, and have used them here and there, but not a lot on generators. I learned a bit from this section, what you expect from a book like this, things like the close and throw. Although this was good first step on generators, I wished the section could be longer, but that maybe not the focus of the book.

Coroutines was a completely new subject for me. Maybe I haven’t been diving into Python as much as I needed to, but time is short these days and programming Python is not the first objective of my work. The example is complete and easy to understand, but again I wish it was a tad bit longer. Tarek then explains a bit of generator expressions (list comprehension for generators) and enters the itertools module. So far so good, it’s a nice summary (at least for me) of simple techniques that can be incorporated into daily coding. And then … Decorators.

I blame on my poor CS skills or maybe my whole background on programming, but I still cannot get decorators. In my short-sighted view of the programming world I cannot see a place, at least on the things I’m doing, where I can use a decorator. And here comes the first criticism of the book: I still cannot get after reading the section. One thing that would help a bit would be to have colours on the examples and maybe go over them explaining some code lines. But at the same time, I admit that this might be a personal problem, where the concept of decorators don’t fit into my brain, and maybe the focus of the book is to show this advanced technique to someone that has a better grasp of the concept.

Overall, it’s a very good chapter and a good pointer to some expert/advanced techniques in Python. Tomorrow, chapter 3, and we’re a going to see classes.

Reblog this post [with Zemanta]

Expert Python Programming by Tarek Ziadé – a review of Chapter 1

off topic Comments Off

I’ve bought (no, Packt Publishing didn’t send me a copy for review) Tarek’s book quite sometime ago, but job changes, and extra-Python issues kept me away from reading it with the attention if fully deserve. When I saw the announcement, I thought that this was the book I wanted in Python. First, a little bit of perspective.

I’m a a biologist, self-taught programmer/coder/you-name-it. I only had a brief course on programming logic with Pascal in 1993 (I think). I first learned Basic on Apple ][, then on PC, then moved to Visual Basic, Pascal, C and C++, most of them with the help of books. About three and a half years ago, I got tired of compiling things and decided to learn a different language that would be more agile to code with. Not liking Perl, made me check Python. And I got hooked. Of course as a lay programmer, I won’t discuss why it’s better or worse than any other language using technical terms, but I can say that Python fits my needs in fast and efficient programming and I’m quite happy with the choice I’ve made. So, this review will not be technical, but will try to expose the book’s strengths and the weak parts.

Chapter one gives a good introduction on how to install Python and some nice pointers on how to program Python, such as IDEs and initial settings you can add to it. Also there is a short overview of the modern Python implementations. Is it a necessary chapter? Yes and no. No, because the schooled Python user won’t need it, his or her programming environment will already be installed, configured, set and ready to go. Yes, because this chapter works as a disclaimer for the not-so-experienced Python programmer, and shows everyone of what is expected of this book and what standards will be used. In my opinion, it’s a necessary starting point, so the author knows that everyone is at the same level. This chapter is also a good short summary of good practices of installing and setting up Python.

Tomorrow, chapter 2.

BPforB is now PEP 8 compliant!

Phase 2 Comments Off

As mentioned in the previous post, Robin Stocker kindly provided a git patch with the required changes to all scripts stored on the repository to be compliant with the PEP 8.

The changes were mainly regarding variable/object names, but they were important as make the code available here more Pythonic following the rules of the Benevolent Dictator for Life.

I would like to thank Robin for spending his time doing this. Much appreciated.

Now, just a quick git tutorial on how to apply patches:

git apply __patch_file__
git commit -a -m “patch applied”
git push

That’s it. Apply, commit, push and you’re done. The repository is already updated.

Design by j david macor.com.Original WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in