Creating an interface for the motif finding script, part 3

Section 2, motifs, wxPython 2 Comments »

Today we will add some elements to our interface. Looking at the previous screencap it is easy to conclude that our interface needs a lot of work to be ready. First, it has a dark gray background that does not resemble the usual window background (it looks more like a MDI frame). We need to change that. Also, there are no menu bars or menus, or tool bars. It’s pretty bare bones, and not exactly good or useful.

There many ways of customizing the look of a window/frame in wxPython, and two of these methods are adding a panel to the frame or adding the so-called sizers. The latter is a difficult method to master, but powerful and very good to customize objects, look and feels of a window. Addin a panel and subsequently adding objects to it is a more laborious process, but easier to understand. We will start by adding the panel to you __do_layout function (where most of our changes will happen for now).

Basically, only one line is required:

#adding the panel
panel = wx.Panel(self)

That’s it, the wx.Panel method only needs one parameter, where the panel is being added to. The name panel is the one that we will be using to access methods and properties associated with the wx.Panel derivation that we just created.

Adding the menu would require a little bit more code. As its predecessor wxWidgets, wxPython divides the menu in subcategories. The menubar is based on wx.Menubar method, the menu itself (File, Edit, etc) is a wx.Menu wehre each of the entries is added. At the end each menu derived from wx.Menu will be added to the menubar. In order case we have to initialize a menubar

#defines the menubar
menubar = wx.MenuBar()

and then initialize a menu element, which we will call filemenu and will be labeled File

#file menu
filemenu = wx.Menu()

This will only initialize a menu element with the name filemenu, it won’t add anything anywhere. In our case from the start, as we didn’t do any planning on how our interface would look like (no UML, no case studies, nothing!), we need at least three menu items: one to open/set the foreground sequence file, one to open/set the background sequence file and one to quit the application. So what we are going to do is append these items to the filemenu

convertmenu = filemenu.Append(-1, 'Select foreground file')
seqmenu = filemenu.Append(-1, 'Select background file')
sep = filemenu.AppendSeparator()
treenooutmenu = filemenu.Append(-1, 'Quit')

that simple. The first two lines and the last one append the items that open/set files. The -1 parameter is an ID, as we saw previously, when no ID is required for our code we use -1, and the second parameter is the label of that menu item. The menu item sep is a separator, keeping apart the file open/set items and the quit element. One final thing is append the derived wx.Menu to the menubar and set it. We accomplish that by

#appends the menu to the menubar and creates it
menubar.Append(filemenu, 'File')
self.SetMenuBar(menubar)

Line 2 initializes menubar on self, also known as pymotGUI, our main window. Putting everything together our code would look like

#!/usr/bin/env python

import wx
import pymot
import fasta

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    def __init__(self, parent, id):

        wx.Frame.__init__(self, parent, id,  'Python Motif Finder', style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()
#        self.__do_binding()

    def __do_layout(self):

        #adding the panel
        panel = wx.Panel(self)

        #defines the menubar
        menubar = wx.MenuBar()

        #file menu
        filemenu = wx.Menu()
        foreground_menu = filemenu.Append(-1, 'Select foreground file')
        background_menu = filemenu.Append(-1, 'Select background file')
        sep = filemenu.AppendSeparator()
        quit_menu = filemenu.Append(-1, 'Quit')

        #appends the menu to the menubar and creates it
        menubar.Append(filemenu, 'File')
        self.SetMenuBar(menubar)

#if __name__ == '__main__':
app = pymot()
frame = pymotGUI(parent=None, id = -1)
#frame.CentreOnScreen()
frame.Show()
app.MainLoop()

and this would look like the screencap below (on Vista).

gui2

Next time we will work on more elements and activate the menu items.

Creating an interface for the motif finding script, part 2

motifs, wxPython Comments Off

Let’s take a deeper look on the code we started yesterday, piece by piece

class pymot(wx.App):
    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect, filename)

This is the class pymot we derived from wx.App, and this will be the main class for your application. As any other class derived it needs a OnInit or a __init__ function that will take care of initializing things. As usual, we pass self and a redirect parameter, that will tell the application to redirect some output to the command line. We actually don’t need a redirect, but it can be useful in the future to track errors. It’s set to false as we don’t need it now.

class pymotGUI(wx.Frame):
    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  'Python Motif Finder', style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()

    def __do_layout(self):
        pass

This is the pymotGUI class derived, in this case, from wx.Frame. a wx.Frame is the common window you see in most OS. As above, it needs a OnInit or __init__ function, and here it initializes the window (but does not show it). In the first line of __init__ we have a call to format the window we want to display. The frame method would need these paramaters to customize the window

__init__(self, parent, id, title, pos, size, style, name)

Both title and style are set by default (not that they cannot be changed) in the frame definition, and whe this is called and properly initialized, other parameters can be passed and/or changed. There is a second defined function in the pymotGUI class, __do_layout. This is a personal preference of having all the layout methods for the window grouped in one function. It helps organizing a bit the code and easier to browse and correct it if needed.

Most of the main part of the script could be moved to the wx.App class derivation, but for now, we can keep it there.

app = pymot()
frame = pymotGUI(parent=None, id = -1)
frame.Show()
app.MainLoop()

The first line initializes the application, the second calls and initializes the frame. The method Show makes the window to be displayed. MainLoop we saw last time.

The skeleton of a wxPython script and application is very simple. Now we need to populate our window, create menus, buttons, and specially events. Next time we will include a menu on the form and check how events are linked to elements.

Creating an interface for the motif finding script, part 1

motifs, wxPython Comments Off

And we are back. After much ado about real life, I am able to “restart” this blog and probably with a good frequency of posts. Last time we saw the final product of our motif finding series. We ended up creating a very elegant script in Python that efficiently counts words in FASTA sequences and then using a basic statistical method, calculates the significance of each word and output the overrepresented ones.

Our script used a little bit less than 50 lines, and if you include the imported fasta module, it won’t top 100. But the number of lines is not important. The efficiency, clarity and speed are key here. At the same time, running a script from the command line is not something everyone is used to do. In order to add more visibility to our simple script, why not including a GUI? With a visual interface, more people can use our script, in different systems. Sounds great.

Python has many options of GUI frameworks, some more cross-platform that others. In the end finding the right framework is more a matter of taste, or availability. My personal experience with wxWidgets lead me to start developing in wxPython, and for me this was a natural choice. But there are many other GUI frameworks for Python, each one providing more or less integration and portability (you can “choose” you own here).

So, let’s create a skeleton for our GUI. First step is to install wxPython. Packages for Windows are available from their website, RPMs for Linux and DMG for Macs (I’m quite sure OS X Leopard comes with wxPython by default, just test importing it). After installing it, start Python and check if everything is in place

import wx
wx.__version__

On my machine, I get no errors and the version is 2.8.9.1 (you don’t need the latest version to create the GUI). Everything seems to be fine. A wxPython script has the same format as any Python script, the only difference is that its output is not directed to the prompt or a file. The script’s product will be the screen, so in most cases the output and program usage will depend on the user’s interaction with objects on the screen. Like any other graphical interface. A very simple script would look like

#!/usr/bin/env python

import wx

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  'Python Motif Finder', style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()

    def __do_layout(self):
        pass

app = pymot()
frame = pymotGUI(parent=None, id = -1)
frame.Show()
app.MainLoop()

Usually a wxPython interface has three parts in its script: a class for the window/frame/dialog, a class for the application and a initialization routine. All wxPython applications, and scripts, need to derive an wx.App class and initialize it (on OnInit or on __init__ functions), i.e. create the window, begin the program, etc. Another class, derived from wx.Frame in this case, will build the window/frame/dialog per se and will also contain initialization for the window, objects, events, etc. The last part is the main script where the application is started, by calling the derived class, the window is also called and shown. The last line is the MainLoop, present in every wxPython script, and it is the main line of the script, the heart of the application. MainLoop processes all the events and manages how the objects interact by receiving and dispatching such events.

The script above could have been created differently, some lines of it omitted and there is also no need to derive an specific class for the frame. But this way it is easier to get a grasp of the script as it will need to be enlarged so accomodates the objects and maybe a couple of extra windows and dialogs. Running the above script will generate the window below

First screencap of our GUI

very simple and barebones. Next will explore the script above, include some extra elements and learn a little bit more of wxPython.

Git repository updated

Phase 2 2 Comments »

Commercial Street is an important commercial a...Image via Wikipedia I just updated the git repository of BPB. Click here to access it. Most of the code presented in the blog is there, some with extra comments, some being updated.

This close another phase in the blog and soon we will check some different aspects of Python programming in Bioinformatics.

Reblog this post [with Zemanta]

Python, overepresented motifs, the Grand Finale

Phase 2, motifs Comments Off

In this final part, let’s do some very simple refactoring and modify the output section to make the result a little bit better. There are not many options about the functions to calculate the binomial expansion. But Andrew posted some opinions on how to slight change the quorum function.

def get_quorums(seqs, mlen):
    """
    add seq id_no to a set
    use explicit counter to create seq_no
    """
    quorum = defaultdict(int)
    for seq in seqs:
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]] += 1
    return quorum

His modifications were small but improved the code a bit, as you remove one variable/object from the function. At the same time there is need to change a bit our output section of the code, as we don’t use a defaultdict initialized with a set, but with an integer.

for i in foreground:
    term1 = choose(background[i], foreground[i])
    term2 = choose((N - background[i]), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 < p <= 0.0001:
        print i, foreground[i], background[i], p

Notice that in the term1 line we don’t check for the set length anymore and just use the integer stored in foreground and background. Again a small change, that can make the code a little bit more clear. But we need to modify this section so the output is a little bit more clear, maybe ordered by motif sequence.

But as we are reading the sequences as they are our results are not ordered. It would be great to have a final list starting with AAAAAAAA and ending with TTTTTTTTT. There is an easy way to do that, and very inexpensive regarding code and final performance. Basically we append each one of the motifs (and their extra information) to a list and use the sort method for lists. So our output section of the code will be

res_motifs = []
for i in foreground:
    term1 = choose(background[i], foreground[i])
    term2 = choose((N - background[i]), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 < p <= 0.0001:
        res_motifs.append(i + '\t' + str(foreground[i]) + '\t' + str(background[i]) + '\t' + str(p))

res_motifs.sort()
for i in res_motifs:
    print i

Putting everything together our final motif determination script is (batteries included):

#!/usr/bin/env python

import fasta
import sys
from collections import defaultdict

def choose(n, k):
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

def get_quorums(seqs, mlen):
    """
    add seq id_no to a set
    use explicit counter to create seq_no
    """
    quorum = defaultdict(int)
    for seq in seqs:
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]] += 1
    return quorum

input_seqs = fasta.read_seqs(open(sys.argv[1]).readlines())
input_seqs2 = fasta.read_seqs(open(sys.argv[2]).readlines())

foreground = get_quorums(input_seqs, 10)
background = get_quorums(input_seqs2, 10)

N = len(input_seqs) + len(input_seqs2)

res_motifs = []
for i in foreground:
    term1 = choose(background[i], len(foreground[i])
    term2 = choose((N - background[i]), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 < p <= 0.0001:
        res_motifs.append(i + '\t' + str(foreground[i]) + '\t' + str(background[i]) + '\t' + str(p))

res_motifs.sort()
for i in res_motifs:
    print i

Next we will see some basic Python methods. And maybe start a new series and phase.

Reblog this post [with Zemanta]

Obtaining overrepresented motifs in DNA sequences, final

Phase 2 1 Comment »

The part 13 of the motifs series is the last one. In a couple of weeks I will post a refactored code, including the suggestions from Andrew in the last post. I will update the blog contents on OWW and commit some of the code to the GitHub repository.

Reblog this post [with Zemanta]

Obtaining overrepresented motifs in DNA sequences, part 13

Phase 2, Section 3, Section 5, motifs 1 Comment »

Now that we have the best quorum determination function and the ideal function to calculate the binomial expansions it is easy to program a script to calculate the p value of motifs in DNA sequences. To the script

below in the code there are a couple of errors that wordpress don’t let me fix. The > and < are replaced by their literal html enconding. I am working on it, sorry

#!/usr/bin/env python

import fasta
import sys
from collections import defaultdict

def choose(n, k):
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        #print ntok // ktok
        return ntok // ktok
    else:
        return 0

def get_quorums(seqs, mlen):
    """
    add seq id_no to a set
    use explicit counter to create seq_no
    """
    quorum = defaultdict(set)
    id_no = 0
    for seq in seqs:
        id_no += 1
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]].add(id_no)
    return quorum

input_seqs = fasta.read_seqs(open(sys.argv[1]).readlines())
input_seqs2 = fasta.read_seqs(open(sys.argv[2]).readlines())

foreground = get_quorums(input_seqs, 10)
background = get_quorums(input_seqs2, 10)

N = len(input_seqs) + len(input_seqs2)

for i in foreground:
    term1 = choose(len(background[i]), len(foreground[i]))
    term2 = choose((N - len(background[i])), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 < p <= 0.0001:
        print i, len(foreground[i]), len(background[i]), p

We already defined choose in the last post (more information in the link from the Python’s cookbook) and earlier Mike sent us a series of quorum-determination functions and one of the best was portrayed and explained here. We also need our fasta module to read the sequences (and only the sequences) in order to use it in the quorum function.

Basically we use the foreground and background files as input, determine the quorum of the different words (width 10) and then we iterate over the results, calculating the p value for each motif found in the foreground set. The tree terms of the Hypergeometric Distribution are calculated separately and we test for a p value smaller that 0.0001 (this can be modified) and we only print the results that fall in this category.>

Reblog this post [with Zemanta]

Obtaining overrepresented motifs in DNA sequences, part 12.5

Phase 2 2 Comments »

So let’s modify a little bit the factorial function and then benchmark both by using timeit. Ideally our factorial function would need to calculate a value similar to the binomial expansion, as we have three factorials to calculate in for each binomial in the Hypergeometric Distribution.

So we can add two extra factorial calculations to our function and perform the multiplication and division to return the equivalent to the binomial calculation. So the function would be

def fac(n, m):
    value1 = 1
    for i in xrange(2, n + 1):
        value1 *= i
    value2 = 1
    for i in xrange(2, m + 1):
        value2 *= i
    value3 = 1
    for i in xrange(2, (n - m) + 1):
        value3 *= i 

    return  value1 / (value2 * value3)

m and n are both values of the binomial and n - m is the subtraction of one by the other that forms the last factorial to be calculated. This way it makes easier to time the performance of both functions. In the end the complete script would look like

#!/usr/bin/env python

import timeit

def fac(n, m):
    result1 = 1
    for i in xrange(2, n + 1):
        result1 *= i
    result2 = 1
    for i in xrange(2, m + 1):
        result2 *= i
    result3 = 1
    for i in xrange(2, (n - m) + 1):
        result3 *= i 

    return  result1 / (result2 * result3) 

def binom(n, m):
    b = [0] * (n + 1)
    b[0] = 1
    for i in xrange(1, n + 1):
        b[i] = 1
        j = i - 1
        while j &gt; 0:
            b[j] += b[j - 1]
            j -= 1
    return b[m] 

def choose(n, k):
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        #print ntok // ktok
        return ntok // ktok
    else:
        return 0

if __name__ == "__main__":

    stmt = "fac(3000, 7)"
    t = timeit.Timer(stmt = stmt, setup='from __main__ import fac')
    stmt2 = "binom(3000, 7)"
    t2 = timeit.Timer(stmt = stmt2, setup = 'from __main__ import binom')
    stmt3 = "choose(3000, 7)"
    t3 = timeit.Timer(stmt = stmt3, setup = 'from __main__ import choose')

    print 'fac: %.9f' % (t.timeit(100)/100)
    print 'binom: %.2f' % (t2.timeit(10)/10)
    print 'choose %.9f' % (t3.timeit(100)/100)

The final result of the average for ten repetitions is as follow

fac = 0.10 s
binom = 43.24 s
choose = 0.000005 s

Clearly, the factorial function gets a huge advantage over the binomial one. So we will modify it a little bit and use it for our HD script. Clearly the choose function is the fastest one, so we will incorporate it on out HD script.

Reblog this post [with Zemanta]

Obtaining overrepresented motifs in DNA sequences, part 11

Phase 2 6 Comments »

After a long hiatus we are (almost) back on track in order to get our scripts to determine overrepresented motifs in DNA sequences. Last time we checked we defined the “best” factorial function in Python

def fac_01(n):
    result = 1
    for i in xrange(2, n+1):
        result *= i
    return result

and Andrew Dalke sent a couple of links pointing out to a binomial calculation function, one of them is below

# This file contains the Python code from Program 14.10 of
# "Data Structures and Algorithms
# with Object-Oriented Design Patterns in Python"
# by Bruno R. Preiss.
#
# Copyright (c) 2003 by Bruno R. Preiss, P.Eng. All rights reserved.
#
# http://www.brpreiss.com/books/opus7/programs/pgm14_10.txt
#
def binom(n, m):
    b = [0] * (n + 1)
    b[0] = 1
    for i in xrange(1, n + 1):
        b[i] = 1
        j = i - 1
        while j > 0:
            b[j] += b[j - 1]
            j -= 1
    return b[m]

There is a similar implementation in the Python Cookbook online and it is clearly more convenient using this function than actually coding to calculate an identical value using factorials. But anyway, let’s see how a function that calculates one binomial coefficient from the Hypergeometric Distribution (HD) by using the factorial function. Later we benchmark both methods.

Each binomial coefficient can be expanded as

and the HD has three of them. From the Wikipedia “In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement.” In the next post we will define each term of the HD for the motifs case.

In each binomial expansion we would have to calculate three factorial values, nine in total. With the binomial function, only three values need to be calculated. So, using the factorial function we would need to code something like this in order to calculate one of the binomials


#let's say the motif quorum in the foreground is 7
#and the total number of sequences is 3000
#we won't touch the other required values
fore = 7
total = 3000
hd = fac_01(3000) / (fac_01(7) * fac_01(2993)
print hd

Next, we will benchmark and see if there is an advantage to either method.

Reblog this post [with Zemanta]

Test from Zoundry Raven

off topic Comments Off

I am testing a offline/desktop bloggin tool, called Zoundry Raven. New posts are on the way, as promised.

Design by j david macor.com.Original WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in