<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Beginning Python for Bioinformatics &#187; motifs</title>
	<atom:link href="http://python.genedrift.org/category/motifs/feed/" rel="self" type="application/rss+xml" />
	<link>http://python.genedrift.org</link>
	<description>a step-by-step guide to create Python applications in bioinformatics</description>
	<lastBuildDate>Wed, 10 Mar 2010 13:03:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=3.0-alpha</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Creating an interface for the motif finding script, final</title>
		<link>http://python.genedrift.org/2008/11/19/creating-an-interface-for-the-motif-finding-script-final/</link>
		<comments>http://python.genedrift.org/2008/11/19/creating-an-interface-for-the-motif-finding-script-final/#comments</comments>
		<pubDate>Wed, 19 Nov 2008 21:57:24 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[motifs]]></category>
		<category><![CDATA[wxPython]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/2008/11/19/creating-an-interface-for-the-motif-finding-script-final/</guid>
		<description><![CDATA[We can say that this would be our final version of the script. There are many nice wxPython programming resources, and one is a very good book called wxPython in Action, which is co-written by Robin Dunn, the wxPython maintainer. Go check it out.
So for the last entry in this series, we just need to [...]]]></description>
			<content:encoded><![CDATA[<p>We can say that this would be our final version of the script. There are many nice wxPython programming resources, and one is a very good book called <a href ="http://manning.com/rappin/">wxPython in Action</a>, which is co-written by Robin Dunn, the wxPython maintainer. Go check it out.</p>
<p>So for the last entry in this series, we just need to add a couple of changes to our interface and motif finding scripts. Basically on the interface script we need to add a line that gets the value entered (or the default one, if not changed) in the motif width input box. And we can do that by including the line below in the <code>run_finder</code> function.</p>
<pre name="code" class="python">
width = self.motif_width.GetValue()
</pre>
<p>This line tells the script to get the value of the box and assign to the variable width. This method will get whatever is inside the input box and save as a string to the variable assigned. Now, we need to create the structure to actually send this value to the motif finder functions. Last version of our function <code>calculate_motifs</code> received two parameters, we need to add an extra one, and also change the lines that call the function that get the quorums. Basically the first lines of the function will be</p>
<pre name="code" class="python">
def calculate_motifs(input_seqs, input_seqs2, width):

    print input_seqs, input_seqs2
    input_seqs = fasta.read_seqs(open(input_seqs).readlines())
    input_seqs2 = fasta.read_seqs(open(input_seqs2).readlines())

    foreground = get_quorums(input_seqs, width)
    background = get_quorums(input_seqs2, width)
</pre>
<p>And that&#8217;s it. Our simple interface is ready to primetime. OK, not prime primetime, we didn&#8217;t add a series of features that will make it useful by everyone. For instance, there is no error control, so someone could enter &#8216;ABC&#8217; in the width input box and that value would be sent and an error will occur. Also you can click the run button without any file selected. And we could go on and on. But this is just a primer, and we can build from it.</p>
<p>The code is on <a href="http://github.com/nuin/beginning-python-for-bioinformatics/tree/master/scripts%2Fmotifs">Github</a>, so get it there and have fun. Next time we will see &#8230; no plans yet. We&#8217;ll see &#8230; </p>
<p>Technorati Tags: <a class="performancingtags" href="http://technorati.com/tag/wxPython" rel="tag">wxPython</a>, <a class="performancingtags" href="http://technorati.com/tag/motifs" rel="tag">motifs</a>, <a class="performancingtags" href="http://technorati.com/tag/Python" rel="tag">Python</a>, <a class="performancingtags" href="http://technorati.com/tag/bioinformatics" rel="tag">bioinformatics</a></p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/11/19/creating-an-interface-for-the-motif-finding-script-final/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating an interface for the motif finding script, some corrections</title>
		<link>http://python.genedrift.org/2008/11/18/creating-an-interface-for-the-motif-finding-script-some-corrections/</link>
		<comments>http://python.genedrift.org/2008/11/18/creating-an-interface-for-the-motif-finding-script-some-corrections/#comments</comments>
		<pubDate>Tue, 18 Nov 2008 19:45:31 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[motifs]]></category>
		<category><![CDATA[wxPython]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/2008/11/18/creating-an-interface-for-the-motif-finding-script-some-corrections/</guid>
		<description><![CDATA[We need to pause a bit and do some corrections on our code. First the code I posted on the last entry for the pymotif.py module is wrong. Ok, not wrong, but some of the code I use to test ended up on the blog. Ths first two lines of the calculate_motifs function contained a [...]]]></description>
			<content:encoded><![CDATA[<p>We need to pause a bit and do some corrections on our code. First the code I posted on the last entry for the pymotif.py module is wrong. Ok, not wrong, but some of the code I use to test ended up on the blog. Ths first two lines of the calculate_motifs function contained a link to the files I use for testing and should be replaced by</p>
<pre name="code" class="python">
input_seqs = fasta.read_seqs(open(input_seqs).readlines())
input_seqs2 = fasta.read_seqs(open(input_seqs2).readlines())
</pre>
<p>Also both variables that store the filenames and paths in pymoteGUI.py are declared in the wrong scope. The should have be declared at the pymotGUI class level, so it is accessible to all the functions in that class. This also means that every time we access the variable it should be preceded by the class name in order for the interpreter to know where the to get the value from. So both corrected files would be</p>
<pre name="code" class="python">
#!/usr/bin/env python

import wx
import pymot
import pymotif
import fasta
import os

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    fore_file = &#039;&#039;
    back_file = &#039;&#039;

    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  &#039;Python Motif Finder&#039;, style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()

    def __do_layout(self):

        #adding the panel
        panel = wx.Panel(self)

        #defines the menubar
        menubar = wx.MenuBar()

        #file menu
        filemenu = wx.Menu()
        foreground_menu = filemenu.Append(-1, &#039;Select foreground file&#039;)
        background_menu = filemenu.Append(-1, &#039;Select background file&#039;)
        sep = filemenu.AppendSeparator()
        quitmenu = filemenu.Append(-1, &#039;Quit&#039;)

        #appends the menu to the menubar and creates it
        menubar.Append(filemenu, &#039;File&#039;)
        self.SetMenuBar(menubar)

        #input box for motif width, and label
        self.one_label = wx.StaticText(panel, -1, &#039;Motif width&#039;, (10,50))
        self.motif_width = wx.TextCtrl(panel, -1, &#039;10&#039;, (95, 50), (40,18))
        #result textbox
        self.results = wx.TextCtrl(panel, -1, &#039;&#039;, (150, 50), (200, 100), wx.TE_MULTILINE | wx.TE_AUTO_SCROLL | wx.HSCROLL)

        #run bbutton
        self.run_button = wx.Button(panel, -1, &#039;Run&#039;, (10, 80))

        #labels
        self.fore_label = wx.StaticText(panel, -1, &#039;Select the foreground file&#039;, (10, 10))
        self.back_label = wx.StaticText(panel, -1, &#039;Select the background file&#039;, (10, 30))

        #binding the menus to functions
        self.Bind(wx.EVT_MENU, self.on_foreground, foreground_menu)
        self.Bind(wx.EVT_MENU, self.on_background, background_menu)
        self.Bind(wx.EVT_BUTTON, self.run_finder, self.run_button)

    def on_foreground(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            pymotGUI.fore_file = dialog.GetPath()
            self.fore_label.SetLabel(pymotGUI.fore_file)

    def on_background(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            pymotGUI.back_file = dialog.GetPath()
            self.back_label.SetLabel(pymotGUI.back_file)

    def run_finder(self, event):
        print pymotGUI.fore_file
        result = pymotif.calculate_motifs(pymotGUI.fore_file, pymotGUI.back_file)
        for motif in result:
            self.results.WriteText(motif + &#039;n&#039;)
        #wx.MessageBox(&#039;It should run, eh?&#039;)

#if __name__ == &#039;__main__&#039;:
app = pymot()
frame = pymotGUI(parent=None, id = -1)
#frame.CentreOnScreen()
frame.Show()
app.MainLoop()
</pre>
<p>and </p>
<pre name="code" class="python">
#!/usr/bin/env python

import fasta
import sys
from collections import defaultdict

def choose(n, k):
    if 0 &lt;= k &lt;= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

def get_quorums(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use explicit counter to create seq_no
    &quot;&quot;&quot;
    quorum = defaultdict(int)
    for seq in seqs:
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]] += 1
    return quorum

def calculate_motifs(input_seqs, input_seqs2):

    print input_seqs, input_seqs2
    input_seqs = fasta.read_seqs(open(input_seqs).readlines())
    input_seqs2 = fasta.read_seqs(open(input_seqs2).readlines())

    foreground = get_quorums(input_seqs, 10)
    background = get_quorums(input_seqs2, 10)

    N = len(input_seqs) + len(input_seqs2)

    res_motifs = []
    for i in foreground:
        term1 = choose(background[i], foreground[i])
        term2 = choose((N - background[i]), len(input_seqs) - 1)
        term3 = choose(N, len(input_seqs))
        p = (float(term1) * float(term2)) / term3
        if 0 &lt; p &lt;= 0.0001:
            res_motifs.append(i + &#039;t&#039; + str(foreground[i]) + &#039;t&#039; + str(background[i]) + &#039;t&#039; + str(p))

    res_motifs.sort()
    return res_motifs
</pre>
<p>On the next post, the last in the series, we will just check how to get the value from the width input box and wrap-up everything.</p>
<p>Technorati Tags: <a class="performancingtags" href="http://technorati.com/tag/wxPython" rel="tag">wxPython</a>, <a class="performancingtags" href="http://technorati.com/tag/python" rel="tag">python</a>, <a class="performancingtags" href="http://technorati.com/tag/motifs" rel="tag">motifs</a>, <a class="performancingtags" href="http://technorati.com/tag/GUI" rel="tag">GUI</a></p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/11/18/creating-an-interface-for-the-motif-finding-script-some-corrections/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating an interface for the motif finding script, part 8</title>
		<link>http://python.genedrift.org/2008/11/13/creating-an-interface-for-the-motif-finding-script-part-8/</link>
		<comments>http://python.genedrift.org/2008/11/13/creating-an-interface-for-the-motif-finding-script-part-8/#comments</comments>
		<pubDate>Thu, 13 Nov 2008 22:28:35 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[motifs]]></category>
		<category><![CDATA[wxPython]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/2008/11/13/creating-an-interface-for-the-motif-finding-script-part-8/</guid>
		<description><![CDATA[Let&#8217;s see now how do we connect our GUI to the the pymotif file (I changed the name because of some conflicts with the app name [my bad!], the git repo was updated accordingly). And also how to display the results, in a simpler manner. 
Ok, first to connecting the script to the function file, [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s see now how do we connect our GUI to the the pymotif file (I changed the name because of some conflicts with the app name [my bad!], the git repo was updated accordingly). And also how to display the results, in a simpler manner. </p>
<p>Ok, first to connecting the script to the function file, pymotif.py. The file is already imported in our script and we have used it before. We need to find the exact point and which parameters to pass. pytmotif.py is a slightly modified version of your command line script, and the code is below.</p>
<pre name="code" class="python">
#!/usr/bin/env python

import fasta
import sys
from collections import defaultdict

def choose(n, k):
    if 0 &lt;= k &lt;= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

def get_quorums(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use explicit counter to create seq_no
    &quot;&quot;&quot;
    quorum = defaultdict(int)
    for seq in seqs:
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]] += 1
    return quorum

def calculate_motifs(input_seqs, input_seqs2):

    input_seqs = fasta.read_seqs(open(&#039;celladhesion1000.fa&#039;).readlines())
    input_seqs2 = fasta.read_seqs(open(&#039;celladhesion1000C.fa&#039;).readlines())

    foreground = get_quorums(input_seqs, 10)
    background = get_quorums(input_seqs2, 10)

    N = len(input_seqs) + len(input_seqs2)

    res_motifs = []
    for i in foreground:
        term1 = choose(background[i], foreground[i])
        term2 = choose((N - background[i]), len(input_seqs) - 1)
        term3 = choose(N, len(input_seqs))
        p = (float(term1) * float(term2)) / term3
        if 0 &lt; p &lt;= 0.0001:
            res_motifs.append(i + &#039;t&#039; + str(foreground[i]) + &#039;t&#039; + str(background[i]) + &#039;t&#039; + str(p))

    res_motifs.sort()
    return res_motifs
</pre>
<p>So, basically the line we are interested is this one</p>
<pre name="code" class="python">
def calculate_motifs(input_seqs, input_seqs2):
</pre>
<p>We replace the wx.MessageBox line in our run_finder function and use the input files selected by the user as parameters for calculate_motifs, and we are done</p>
<pre name="code" class="python">
def run_finder(self, event):
	result = pymotif.calculate_motifs(self.fore_file, self.back_file)
</pre>
<p>Very simple and direct. This should take care of everything except the motif width, what we will see in the next post. We still need a place to write the overrepresented motifs. We can add a text box to the frame, and we do that by adding an extra declaration in our __do_layout function. This time we need to add some extra style to the box, so it can show multiple lines and has a scroll bar.</p>
<pre name="code" class="python">
self.results = wx.TextCtrl(panel, -1, &#039;&#039;, (150, 50), (200, 100), wx.TE_MULTILINE | wx.TE_AUTO_SCROLL | wx.HSCROLL)
</pre>
<p>Notice the wx. flags added. MULTILINE allows the box to have multiple lines and the other two turn on the auto scroll and horizontal scroll. Great. And how do we write the results. Notice above that the function that calculates the motifs, returns a list where each item has the motif sequence and the p value, sorted. So the only thing we need to do is to iterate over the list and print each line to the result box. That simple, and we accomplish it by using the WriteText method, that receives as a parameter a string, either literal or a string object. Our run_finder function will have a couple of extra lines</p>
<pre name="code" class="python">
def run_finder(self, event):
	result = pymotif.calculate_motifs(self.fore_file, self.back_file)
	for motif in result:
		self.results.WriteText(motif + &#039;n&#039;)
</pre>
<p>That will present in a very simplistic way the resulting overrepresented motifs, but it&#8217;s enough for now. Our GUI script will be</p>
<pre name="code" class="python">
#!/usr/bin/env python

import wx
import pymot
import pymotif
import fasta
import os

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  &#039;Python Motif Finder&#039;, style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()
        self.fore_file = &#039;&#039;
        self.back_file = &#039;&#039;

    def __do_layout(self):

        #adding the panel
        panel = wx.Panel(self)

        #defines the menubar
        menubar = wx.MenuBar()

        #file menu
        filemenu = wx.Menu()
        foreground_menu = filemenu.Append(-1, &#039;Select foreground file&#039;)
        background_menu = filemenu.Append(-1, &#039;Select background file&#039;)
        sep = filemenu.AppendSeparator()
        quitmenu = filemenu.Append(-1, &#039;Quit&#039;)

        #appends the menu to the menubar and creates it
        menubar.Append(filemenu, &#039;File&#039;)
        self.SetMenuBar(menubar)

        #input box for motif width, and label
        self.one_label = wx.StaticText(panel, -1, &#039;Motif width&#039;, (10,50))
        self.motif_width = wx.TextCtrl(panel, -1, &#039;10&#039;, (95, 50), (40,18))
        #result textbox
        self.results = wx.TextCtrl(panel, -1, &#039;&#039;, (150, 50), (200, 100), wx.TE_MULTILINE | wx.TE_AUTO_SCROLL | wx.HSCROLL)

        #run bbutton
        self.run_button = wx.Button(panel, -1, &#039;Run&#039;, (10, 80))

        #labels
        self.fore_label = wx.StaticText(panel, -1, &#039;Select the foreground file&#039;, (10, 10))
        self.back_label = wx.StaticText(panel, -1, &#039;Select the background file&#039;, (10, 30))

        #binding the menus to functions
        self.Bind(wx.EVT_MENU, self.on_foreground, foreground_menu)
        self.Bind(wx.EVT_MENU, self.on_background, background_menu)
        self.Bind(wx.EVT_BUTTON, self.run_finder, self.run_button)

    def on_foreground(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            fore_file = dialog.GetPath()
            self.fore_label.SetLabel(fore_file)

    def on_background(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            back_file = dialog.GetPath()
            self.back_label.SetLabel(back_file)

    def run_finder(self, event):
        result = pymotif.calculate_motifs(self.fore_file, self.back_file)
        for motif in result:
            self.results.WriteText(motif + &#039;n&#039;)
        #wx.MessageBox(&#039;It should run, eh?&#039;)

#if __name__ == &#039;__main__&#039;:
app = pymot()
frame = pymotGUI(parent=None, id = -1)
#frame.CentreOnScreen()
frame.Show()
app.MainLoop()
</pre>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/11/13/creating-an-interface-for-the-motif-finding-script-part-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating an interface for the motif finding script, part 7</title>
		<link>http://python.genedrift.org/2008/11/11/creating-an-interface-for-the-motif-finding-script-part-7/</link>
		<comments>http://python.genedrift.org/2008/11/11/creating-an-interface-for-the-motif-finding-script-part-7/#comments</comments>
		<pubDate>Tue, 11 Nov 2008 18:53:09 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[motifs]]></category>
		<category><![CDATA[wxPython]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/2008/11/11/creating-an-interface-for-the-motif-finding-script-part-7/</guid>
		<description><![CDATA[Let&#8217;s get back to the last post and check one line we entered

self.motif_width = wx.TextCtrl(panel, -1, &#039;10&#039;, (95, 50), (40,18))

There is something in this line that I did not explain. The third parameter in the test box declaration is '10'. How does this affect our box? That&#8217;s the default text that will be displayed inside [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s get back to the last post and check one line we entered</p>
<pre name="code" class="python">
self.motif_width = wx.TextCtrl(panel, -1, &#039;10&#039;, (95, 50), (40,18))
</pre>
<p>There is something in this line that I did not explain. The third parameter in the test box declaration is <code>'10'</code>. How does this affect our box? That&#8217;s the default text that will be displayed inside the box as soon as it is created. In our case, 10 is the motif width, and it&#8217;s the value we consider to be the most common search width.</p>
<p>Another aspect not explained is the <code>run_finder</code>. We added a line </p>
<pre name="code" class="python">
wx.MessageBox(&#039;It should run, eh?&#039;)
</pre>
<p>where we declare a wx.MessageBox. What is it? A message box is the usual error/information dialog that you see in most programs. In our case it is very simple, just a warning/reminder that we need to include some code there.</p>
<p>Next time we will connect some Python source files and make our script find some motifs.</p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/11/11/creating-an-interface-for-the-motif-finding-script-part-7/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating an interface for the motif finding script, part 6</title>
		<link>http://python.genedrift.org/2008/11/04/creating-an-interface-for-the-motif-finding-script-part-6/</link>
		<comments>http://python.genedrift.org/2008/11/04/creating-an-interface-for-the-motif-finding-script-part-6/#comments</comments>
		<pubDate>Tue, 04 Nov 2008 22:39:01 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[motifs]]></category>
		<category><![CDATA[wxPython]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=193</guid>
		<description><![CDATA[Last entry we saw how to allow the user to open a file. Now we need to work on this file and store its path so the script can process it later on. After the file is selected on the file menu, the filename is printed on the label. Let&#8217;s think for a second &#8230; [...]]]></description>
			<content:encoded><![CDATA[<p>Last entry we saw how to allow the user to open a file. Now we need to work on this file and store its path so the script can process it later on. After the file is selected on the file menu, the filename is printed on the label. Let&#8217;s think for a second &#8230; If we get only the filename from the dialog, the program won&#8217;t work, because the file might be located in another directory, partition, you name it. So we need tp get the file&#8217;s full path. We need to change the lines</p>
<pre name="code" class="python">
back_file = dialog.GetFilename()
self.fore_label.SetLabel(dialog.GetFilename())
</pre>
<p>by </p>
<pre name="code" class="python">
back_file = dialog.GetPath()
self.fore_label.SetLabel(back_file)
</pre>
<p>(do not forget to do the same to the fore_file!).</p>
<p>Let&#8217;s run the script and check what happens. The frame should look like the one below (with a little stretching for me).<br />
<a href="http://python.genedrift.org/wordpress/wp-content/uploads/2008/11/gui4.png"><img src="http://python.genedrift.org/wordpress/wp-content/uploads/2008/11/gui4.png" alt="new gui" title="new gui" width="535" height="264" class="alignnone size-full wp-image-194" /></a></p>
<p>OK, so this is part is solved. As we haven&#8217;t planned our application from the start, we will spend sometime thinking of the basic functionality that we migth need. So far, we need one input box, where the user can enter the motif width to be searched, and a button to start the process. Fine, let&#8217;s add the input box. For this we also need an extra label to tell the user what the box is for. Always working on our __do_layout function we add two lines</p>
<pre name="code" class="python">
self.one_label = wx.StaticText(panel, -1, &#039;Motif width&#039;, (10,50))
self.motif_width = wx.TextCtrl(panel, -1, &#039;10&#039;, (95, 50), (40,18))
</pre>
<p>Simple as that we have a input box. For the button, one line will suffice</p>
<pre name="code" class="python">
self.run_button = wx.Button(panel, -1, &#039;Run&#039;, (10, 80))
</pre>
<p>As we can see there is not much difference in any of the declarations, they follow a similar process and the parameters are more or less identical in some of them. Now, we need to bind the button to a function, that we will call <code>run_finder</code>. Remember that binding needs an event type, a target function and an object. This time the event is a button event, but the other two parameters are similar.</p>
<pre name="code" class="python">
self.Bind(wx.EVT_BUTTON, self.run_finder, self.run_button)
</pre>
<p>and the function, for now will look like</p>
<pre name="code" class="python">
def run_finder(self, event):
    wx.MessageBox(&#039;It should run, eh?&#039;)
</pre>
<p>That&#8217;s all for today. Our script is growing and the full code is below</p>
<pre name="code" class="python">
#!/usr/bin/env python

import wx
import pymot
import fasta
import os

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  &#039;Python Motif Finder&#039;, style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()
        self.fore_file = &#039;&#039;
        self.back_file = &#039;&#039;

    def __do_layout(self):

        #adding the panel
        panel = wx.Panel(self)

        #defines the menubar
        menubar = wx.MenuBar()

        #file menu
        filemenu = wx.Menu()
        foreground_menu = filemenu.Append(-1, &#039;Select foreground file&#039;)
        background_menu = filemenu.Append(-1, &#039;Select background file&#039;)
        sep = filemenu.AppendSeparator()
        quitmenu = filemenu.Append(-1, &#039;Quit&#039;)

        #appends the menu to the menubar and creates it
        menubar.Append(filemenu, &#039;File&#039;)
        self.SetMenuBar(menubar)

        #input box for motif width, and label
        self.one_label = wx.StaticText(panel, -1, &#039;Motif width&#039;, (10,50))
        self.motif_width = wx.TextCtrl(panel, -1, &#039;10&#039;, (95, 50), (40,18))

        #run bbutton
        self.run_button = wx.Button(panel, -1, &#039;Run&#039;, (10, 80))

        #labels
        self.fore_label = wx.StaticText(panel, -1, &#039;Select the foreground file&#039;, (10, 10))
        self.back_label = wx.StaticText(panel, -1, &#039;Select the background file&#039;, (10, 30))

        #binding the menus to functions
        self.Bind(wx.EVT_MENU, self.on_foreground, foreground_menu)
        self.Bind(wx.EVT_MENU, self.on_background, background_menu)
        self.Bind(wx.EVT_BUTTON, self.run_finder, self.run_button)

    def on_foreground(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            fore_file = dialog.GetPath()
            self.fore_label.SetLabel(fore_file)

    def on_background(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            back_file = dialog.GetPath()
            self.back_label.SetLabel(back_file)

    def run_finder(self, event):
        wx.MessageBox(&#039;It should run, eh?&#039;)

#if __name__ == &#039;__main__&#039;:
app = pymot()
frame = pymotGUI(parent=None, id = -1)
#frame.CentreOnScreen()
frame.Show()
app.MainLoop()
</pre>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/11/04/creating-an-interface-for-the-motif-finding-script-part-6/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Motif finding interface on github</title>
		<link>http://python.genedrift.org/2008/10/30/motif-interface-on-github/</link>
		<comments>http://python.genedrift.org/2008/10/30/motif-interface-on-github/#comments</comments>
		<pubDate>Thu, 30 Oct 2008 19:58:46 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[motifs]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[Github]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=189</guid>
		<description><![CDATA[There has been actual development of the GUI and most of the development is being added simultaneously in the blog and on the code. So, go get a copy of git, install it and play with the BPB repository there. 
http://github.com/nuin/beginning-python-for-bioinformatics/tree/master
]]></description>
			<content:encoded><![CDATA[<p>There has been actual development of the GUI and most of the development is being added simultaneously in the blog and on the code. So, go get a copy of git, install it and play with the BPB repository there. </p>
<p><a href="http://github.com/nuin/beginning-python-for-bioinformatics/tree/master">http://github.com/nuin/beginning-python-for-bioinformatics/tree/master</a></p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/10/30/motif-interface-on-github/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating an interface for the motif finding script, part 5</title>
		<link>http://python.genedrift.org/2008/10/30/creating-an-interface-for-the-motif-finding-script-part-5/</link>
		<comments>http://python.genedrift.org/2008/10/30/creating-an-interface-for-the-motif-finding-script-part-5/#comments</comments>
		<pubDate>Thu, 30 Oct 2008 19:45:19 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[motifs]]></category>
		<category><![CDATA[wxPython]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=186</guid>
		<description><![CDATA[Last time we saw how to bind an interface element to a function. Now we need to make good use of it, and make the script have some actual functionality. First thing we are going to do is to include a label (or static text) on the interface. Remember that initially we added a panel [...]]]></description>
			<content:encoded><![CDATA[<p>Last time we saw how to bind an interface element to a function. Now we need to make good use of it, and make the script have some actual functionality. First thing we are going to do is to include a label (or static text) on the interface. Remember that initially we added a panel to the frame, so the label should go on the panel. For a label we use a <a href="http://wxpython.org/docs/api/wx.StaticText-class.htm">wx.StaticText</a> and has these parameters</p>
<pre name="code" class="python">
(self, parent, id=-1, label=EmptyString, pos=DefaultPosition, size=DefaultSize, style=0, name=StaticTextNameStr)
</pre>
<p>We don&#8217;t need all of them, just a couple would be enough. Basically, parent, id, label and pos will do it, as the size would be default and based on the text length we input. We are going to work on our __do_layout function and add two labels to the panel on the frame, one for each the fore and background files</p>
<pre name="code" class="python">
self.fore_label = wx.StaticText(panel, -1, &#039;Select the foreground file&#039;, (10, 10))
self.back_label = wx.StaticText(panel, -1, &#039;Select the background file&#039;, (10, 30))
</pre>
<p>These two lines are very similar, only the label, position and name change. </code>panel</code> is the name of the panel we created previously, -1 is the ID, the string is the actual text that will appear on the label and the values between parentheses are the X, Y coordinates to display them on the frame. In the beginning (or when a size needs to be set) we can add <code>pos=</code> to the label declaration in order to make clearer what the values are setting</p>
<pre name="code" class="python">
self.fore_label = wx.StaticText(panel, -1, &#039;Select the foreground file&#039;, pos=(10, 10))
</pre>
<p>If we add these two lines and run our script, both labels will be there on the frame, as can be seen in the screencap below.</p>
<p><a href="http://python.genedrift.org/wordpress/wp-content/uploads/2008/10/gui3.png"><img src="http://python.genedrift.org/wordpress/wp-content/uploads/2008/10/gui3-300x187.png" alt="GUI with labels" title="GUI with labels" width="300" height="187" class="aligncenter size-medium wp-image-187" /></a></p>
<p>Now, we need to add some functionality to the menus. The menu items set previously, basically should work by presenting a file open dialog to the user, where he/she can select a file that will be processed later (or immediately). wxPython provides an option of automatically creating a file dialog, by using the <a href="http://wxpython.org/docs/api/wx.FileDialog-class.html">wx.FileDialog method</a>. This method requires only one parameter, which is the style of the dialog. The dialog can be of many types, i.e. for opening (single and multiple files) and saving. the dialog call would look like</p>
<pre name="code" class="python">
dialog = wx.FileDialog(self, style=wx.OPEN)
</pre>
<p>very simple and objective. But just declaring won't make it show up on the screen. We need to actually call the dialog's show method. Usually, most dialogs are <a href="http://en.wikipedia.org/wiki/Modal_window">modal</a>, requiring some kind of interaction between the user and the dialog before returning to the application that called the dialog. Because of this behaviour we need to use an if clause when showing the dialog, to check what type of result returns from the user/dialog interaction. </p>
<pre name="code" class="python">
if dialog.ShowModal() == wx.ID_OK:
</pre>
<p>wx.ID_OK is a internal method of wxPython that checks if the user pressed the OK button on the file open dialog. If so, the program will process the code, otherwise it will destroy the dialog and return to the main application (or do something else if we set an elif clause). So, all we need is set, we just need to put things together and add some code when the user selects a file</p>
<pre name="code" class="python">
def on_foreground(self, event):
    dialog = wx.FileDialog(self, style=wx.OPEN)
    if dialog.ShowModal() == wx.ID_OK:
        fore_file = dialog.GetFilename()
        self.fore_label.SetLabel(forefile)
</pre>
<p>After the if clause, the script will get the name of the selected file from the dialog and then set the label of our StaticText (label!) with it. Straightforward. We do the same thing for the background file and we have some code going. One last thing, the objects <code>fore_file</code> and <code>back_file</code> are declared on the __init__ function of the frame class, so they are available to the whole frame scope. Our script will look like</p>
<pre name="code" class="python">
#!/usr/bin/env python

import wx
import pymot
import fasta
import os

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  &#039;Python Motif Finder&#039;, style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()
        self.fore_file = &#039;&#039;
        self.back_file = &#039;&#039;

    def __do_layout(self):

        #adding the panel
        panel = wx.Panel(self)

        #defines the menubar
        menubar = wx.MenuBar()

        #file menu
        filemenu = wx.Menu()
        foreground_menu = filemenu.Append(-1, &#039;Select foreground file&#039;)
        background_menu = filemenu.Append(-1, &#039;Select background file&#039;)
        sep = filemenu.AppendSeparator()
        quitmenu = filemenu.Append(-1, &#039;Quit&#039;)

        #appends the menu to the menubar and creates it
        menubar.Append(filemenu, &#039;File&#039;)
        self.SetMenuBar(menubar)

        self.fore_label = wx.StaticText(panel, -1, &#039;Select the foreground file&#039;, (10, 10))
        self.back_label = wx.StaticText(panel, -1, &#039;Select the background file&#039;, (10, 30))

        self.Bind(wx.EVT_MENU, self.on_foreground, foreground_menu)
        self.Bind(wx.EVT_MENU, self.on_background, background_menu)

    def on_foreground(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            fore_file = dialog.GetFilename()
            self.fore_label.SetLabel(dialog.GetFilename())

    def on_background(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            back_file = dialog.GetFilename()
            self.back_label.SetLabel(dialog.GetFilename())

#if __name__ == &#039;__main__&#039;:
app = pymot()
frame = pymotGUI(parent=None, id = -1)
#frame.CentreOnScreen()
frame.Show()
app.MainLoop()
</pre>
<p>Next we will keep adding elements on the screen and functionality.</p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/10/30/creating-an-interface-for-the-motif-finding-script-part-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating an interface for the motif finding script, part 4</title>
		<link>http://python.genedrift.org/2008/10/29/creating-an-interface-for-the-motif-finding-script-part-4/</link>
		<comments>http://python.genedrift.org/2008/10/29/creating-an-interface-for-the-motif-finding-script-part-4/#comments</comments>
		<pubDate>Wed, 29 Oct 2008 18:35:27 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[motifs]]></category>
		<category><![CDATA[wxPython]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=183</guid>
		<description><![CDATA[Last time we checked how to add a menu to our simple frame. Unfortunately, just adding it won&#8217;t make the menu useful. In order to do that we need to bind some events to it. As any interface framework, wxPython is governed by events generated by the user, being these events mouse clicks on buttons [...]]]></description>
			<content:encoded><![CDATA[<p>Last time we checked how to add a menu to our simple frame. Unfortunately, just adding it won&#8217;t make the menu useful. In order to do that we need to bind some events to it. As any interface framework, wxPython is governed by events generated by the user, being these events mouse clicks on buttons and menus, objects getting/losing focus, etc. In our case, so far, we evidently need a event called menu event, which will tell the code what path to use when a menu is clicked. </p>
<p>My personal preference for binding an event to menu is to create a separate function to store these procedures, <code>__do_binding</code>. But by using this route we would need to change some code in the menu declaration, and to simplify things we will add the menu binding at the end of the <code>__do_layout</code> function.</p>
<p>And how we create a binding? In order to bind an object/menu to a function that will contain the executed code after the event is fired up, we need the name of the object/menu, the target function and the menu type. We already know the first and the last, we just need the function name then. Remember that we created the menu last time by using (the menu name were changed in the previous entry &#8211; it was some old code that got in the way &#8211; my mistake)</p>
<pre name="code" class="python">
foreground_menu = filemenu.Append(-1, &#039;Select foreground file&#039;)
background_menu = filemenu.Append(-1, &#039;Select background file&#039;)
...
quitmenu = filemenu.Append(-1, &#039;Quit&#039;)
</pre>
<p>hence our menu names are <code>foreground_menu</code>, <code>background_menu</code> and <code>quit_menu</code>. Basically a wx.Bind method has this structure</p>
<pre name="code" class="python">
self.Bind(EVENT_TYPE, handler, source)
</pre>
<p>where the handler is the function and the source is the actual source of the event. Let&#8217;s say then we want to use function <code>on_foreground</code> everytime someone clicks on foreground menu, and <code>on_background</code> everytime someone clicks on the background menu. We add a couple of lines to our layout function</p>
<pre name="code" class="python">
self.Bind(wx.EVT_MENU, self.on_foreground, foreground_menu)
self.Bind(wx.EVT_MENU, self.on_background, background_menu)
</pre>
<p>This will tell the code where to go when these items are clicked. If you start the interface, and error will be generated because we still haven&#8217;t created the event handler functions. We should define them</p>
<pre name="code" class="python">
def on_foreground(self, event):
    pass

def on_background(self, event):
    pass
</pre>
<p>Note that these function receive an <code>event</code> parameter, which is the actual event itself. The <code>pass</code> line means that the function is defined but no actual code has been added, so execution can bypass it and do nothing when the function is called. Our complete code would look like </p>
<pre name="code" class="python">
#!/usr/bin/env python

import wx
import pymot
import fasta

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  &#039;Python Motif Finder&#039;, style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()

    def __do_layout(self):

        #adding the panel
        panel = wx.Panel(self)

        #defines the menubar
        menubar = wx.MenuBar()

        #file menu
        filemenu = wx.Menu()
        foreground_menu = filemenu.Append(-1, &#039;Select foreground file&#039;)
        background_menu = filemenu.Append(-1, &#039;Select background file&#039;)
        sep = filemenu.AppendSeparator()
        quitmenu = filemenu.Append(-1, &#039;Quit&#039;)

        #appends the menu to the menubar and creates it
        menubar.Append(filemenu, &#039;File&#039;)
        self.SetMenuBar(menubar)

        self.Bind(wx.EVT_MENU, self.on_foreground, foreground_menu)
        self.Bind(wx.EVT_MENU, self.on_background, background_menu)

    def on_foreground(self, event):
        pass

    def on_background(self, event):
        pass

app = pymot()
frame = pymotGUI(parent=None, id = -1)
frame.Show()
app.MainLoop()
</pre>
<p>Next time we will make good use of the events.</p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/10/29/creating-an-interface-for-the-motif-finding-script-part-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating an interface for the motif finding script, part 3</title>
		<link>http://python.genedrift.org/2008/10/22/creating-an-interface-for-the-motif-finding-script-part-3/</link>
		<comments>http://python.genedrift.org/2008/10/22/creating-an-interface-for-the-motif-finding-script-part-3/#comments</comments>
		<pubDate>Wed, 22 Oct 2008 20:30:13 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[Section 2]]></category>
		<category><![CDATA[motifs]]></category>
		<category><![CDATA[wxPython]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=178</guid>
		<description><![CDATA[Today we will add some elements to our interface. Looking at the previous screencap it is easy to conclude that our interface needs a lot of work to be ready. First, it has a dark gray background that does not resemble the usual window background (it looks more like a MDI frame). We need to [...]]]></description>
			<content:encoded><![CDATA[<p>Today we will add some elements to our interface. Looking at the previous screencap it is easy to conclude that our interface needs a lot of work to be ready. First, it has a dark gray background that does not resemble the usual window background (it looks more like a MDI frame). We need to change that. Also, there are no menu bars or menus, or tool bars. It&#8217;s pretty bare bones, and not exactly good or useful.</p>
<p>There many ways of customizing the look of a window/frame in wxPython, and two of these methods are adding a panel to the frame or adding the so-called sizers. The latter is a difficult method to master, but powerful and very good to customize objects, look and feels of a window. Addin a panel and subsequently adding objects to it is a more laborious process, but easier to understand. We will start by adding the <a href="http://www.wxpython.org/docs/api/wx.Panel-class.html">panel</a> to you <code>__do_layout</code> function (where most of our changes will happen for now).</p>
<p>Basically, only one line is required:</p>
<pre name="code" class="python">
#adding the panel
panel = wx.Panel(self)
</pre>
<p>That&#8217;s it, the wx.Panel method only needs one parameter, where the panel is being added to. The name <code>panel</code> is the one that we will be using to access methods and properties associated with the wx.Panel derivation that we just created.</p>
<p>Adding the menu would require a little bit more code. As its predecessor wxWidgets, wxPython divides the menu in subcategories. The menubar is based on wx.Menubar method, the menu itself (File, Edit, etc) is a wx.Menu wehre each of the entries is added. At the end each menu derived from wx.Menu will be added to the menubar. In order case we have to initialize a menubar</p>
<pre name="code" class="python">
#defines the menubar
menubar = wx.MenuBar()
</pre>
<p>and then initialize a menu element, which we will call filemenu and will be labeled File</p>
<pre name="code" class="python">
#file menu
filemenu = wx.Menu()
</pre>
<p>This will only initialize a menu element with the name <code>filemenu</code>, it won&#8217;t add anything anywhere. In our case from the start, as we didn&#8217;t do any planning on how our interface would look like (no UML, no case studies, nothing!), we need at least three menu items: one to open/set the foreground sequence file, one to open/set the background sequence file and one to quit the application. So what we are going to do is append these items to the <code>filemenu</code></p>
<pre name="code" class="python">
convertmenu = filemenu.Append(-1, &#039;Select foreground file&#039;)
seqmenu = filemenu.Append(-1, &#039;Select background file&#039;)
sep = filemenu.AppendSeparator()
treenooutmenu = filemenu.Append(-1, &#039;Quit&#039;)
</pre>
<p>that simple. The first two lines and the last one append the items that open/set files. The -1 parameter is an ID, as we saw previously, when no ID is required for our code we use -1, and the second parameter is the label of that menu item. The menu item <code>sep</code> is a separator, keeping apart the file open/set items and the quit element. One final thing is append the derived wx.Menu to the menubar and set it. We accomplish that by </p>
<pre name="code" class="python">
#appends the menu to the menubar and creates it
menubar.Append(filemenu, &#039;File&#039;)
self.SetMenuBar(menubar)
</pre>
<p>Line 2 initializes menubar on self, also known as pymotGUI, our main window. Putting everything together our code would look like</p>
<pre name="code" class="python">
#!/usr/bin/env python

import wx
import pymot
import fasta

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    def __init__(self, parent, id):

        wx.Frame.__init__(self, parent, id,  &#039;Python Motif Finder&#039;, style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()
#        self.__do_binding()

    def __do_layout(self):

        #adding the panel
        panel = wx.Panel(self)

        #defines the menubar
        menubar = wx.MenuBar()

        #file menu
        filemenu = wx.Menu()
        foreground_menu = filemenu.Append(-1, &#039;Select foreground file&#039;)
        background_menu = filemenu.Append(-1, &#039;Select background file&#039;)
        sep = filemenu.AppendSeparator()
        quit_menu = filemenu.Append(-1, &#039;Quit&#039;)

        #appends the menu to the menubar and creates it
        menubar.Append(filemenu, &#039;File&#039;)
        self.SetMenuBar(menubar)

#if __name__ == &#039;__main__&#039;:
app = pymot()
frame = pymotGUI(parent=None, id = -1)
#frame.CentreOnScreen()
frame.Show()
app.MainLoop()
</pre>
<p>and this would look like the screencap below (on Vista).</p>
<p><a href="http://python.genedrift.org/wordpress/wp-content/uploads/2008/10/gui2.png"><img src="http://python.genedrift.org/wordpress/wp-content/uploads/2008/10/gui2-150x150.png" alt="gui2" title="gui2" width="150" height="150" class="aligncenter size-thumbnail wp-image-179" /></a></p>
<p>Next time we will work on more elements and activate the menu items.</p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/10/22/creating-an-interface-for-the-motif-finding-script-part-3/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Creating an interface for the motif finding script, part 2</title>
		<link>http://python.genedrift.org/2008/10/21/creating-an-interface-for-the-motif-finding-script-part-2/</link>
		<comments>http://python.genedrift.org/2008/10/21/creating-an-interface-for-the-motif-finding-script-part-2/#comments</comments>
		<pubDate>Tue, 21 Oct 2008 16:40:33 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[motifs]]></category>
		<category><![CDATA[wxPython]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=176</guid>
		<description><![CDATA[Let&#8217;s take a deeper look on the code we started yesterday, piece by piece

class pymot(wx.App):
    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect, filename)

This is the class pymot we derived from wx.App, and this will be the main class for your application. As any other class derived it [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s take a deeper look on the code we started yesterday, piece by piece</p>
<pre name="code" class="python">
class pymot(wx.App):
    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect, filename)
</pre>
<p>This is the class <code>pymot</code> we derived from wx.App, and this will be the main class for your application. As any other class derived it needs a OnInit or a __init__ function that will take care of initializing things. As usual, we pass <code>self</code> and a <code>redirect</code> parameter, that will tell the application to redirect some output to the command line. We actually don&#8217;t need a <code>redirect</code>, but it can be useful in the future to track errors. It&#8217;s set to false as we don&#8217;t need it now.</p>
<pre name="code" class="python">
class pymotGUI(wx.Frame):
    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  &#039;Python Motif Finder&#039;, style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()

    def __do_layout(self):
        pass
</pre>
<p>This is the pymotGUI class derived, in this case, from wx.Frame. a wx.Frame is the common window you see in most OS. As above, it needs a OnInit or __init__ function, and here it initializes the window (but does not show it). In the first line of __init__ we have a call to format the window we want to display. The frame method would need these <a href="http://www.wxpython.org/docs/api/wx.Frame-class.html">paramaters</a> to customize the window</p>
<pre name="code" class="python">
__init__(self, parent, id, title, pos, size, style, name)
</pre>
<p>Both title and style are set by default (not that they cannot be changed) in the frame definition, and whe this is called and properly initialized, other parameters can be passed and/or changed. There is a second defined function in the <code>pymotGUI</code> class, <code>__do_layout</code>. This is a personal preference of having all the layout methods for the window grouped in one function. It helps organizing a bit the code and easier to browse and correct it if needed.</p>
<p>Most of the main part of the script could be moved to the wx.App class derivation, but for now, we can keep it there.</p>
<pre name="code" class="python">
app = pymot()
frame = pymotGUI(parent=None, id = -1)
frame.Show()
app.MainLoop()
</pre>
<p>The first line initializes the application, the second calls and initializes the frame. The method Show makes the window to be displayed. MainLoop we saw last time. </p>
<p>The skeleton of a wxPython script and application is very simple. Now we need to populate our window, create menus, buttons, and specially events. Next time we will include a menu on the form and check how events are linked to elements.</p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/10/21/creating-an-interface-for-the-motif-finding-script-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating an interface for the motif finding script, part 1</title>
		<link>http://python.genedrift.org/2008/10/20/creating-an-interface-for-the-motif-finding-script/</link>
		<comments>http://python.genedrift.org/2008/10/20/creating-an-interface-for-the-motif-finding-script/#comments</comments>
		<pubDate>Mon, 20 Oct 2008 21:31:50 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[motifs]]></category>
		<category><![CDATA[wxPython]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[interface]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=171</guid>
		<description><![CDATA[And we are back. After much ado about real life, I am able to &#8220;restart&#8221; this blog and probably with a good frequency of posts. Last time we saw the final product of our motif finding series. We ended up creating a very elegant script in Python that efficiently counts words in FASTA sequences and [...]]]></description>
			<content:encoded><![CDATA[<p>And we are back. After much ado about real life, I am able to &#8220;restart&#8221; this blog and probably with a good frequency of posts. Last time we saw the final product of our motif finding series. We ended up creating a very elegant script in Python that efficiently counts words in FASTA sequences and then using a basic statistical method, calculates the significance of each word and output the overrepresented ones.</p>
<p>Our script used a little bit less than 50 lines, and if you include the imported fasta module, it won&#8217;t top 100. But the number of lines is not important. The efficiency, clarity and speed are key here. At the same time, running a script from the command line is not something everyone is used to do. In order to add more visibility to our simple script, why not including a GUI? With a visual interface, more people can use our script, in different systems. Sounds great.</p>
<p>Python has many options of GUI frameworks, some more cross-platform that others. In the end finding the right framework is more a matter of taste, or availability. My personal experience with <a href="http://www.wxwidgets.org">wxWidgets</a> lead me to start developing in <a href="http://www.wxpython.org">wxPython</a>, and for me this was a natural choice. But there are many other GUI frameworks for Python, each one providing more or less integration and portability (you can &#8220;choose&#8221; you own <a href="http://www.awaretek.com/toolkits.html">here</a>).</p>
<p>So, let&#8217;s create a skeleton for our GUI. First step is to install wxPython. Packages for Windows are available from their website, RPMs for Linux and DMG for Macs (I&#8217;m quite sure OS X Leopard comes with wxPython by default, just test importing it). After installing it, start Python and check if everything is in place</p>
<pre name="code" class="python">
import wx
wx.__version__
</pre>
<p>On my machine, I get no errors and the version is 2.8.9.1 (you don&#8217;t need the latest version to create the GUI). Everything seems to be fine. A wxPython script has the same format as any Python script, the only difference is that its output is not directed to the prompt or a file. The script&#8217;s product will be the screen, so in most cases the output and program usage will depend on the user&#8217;s interaction with objects on the screen. Like any other graphical interface. A very simple script would look like</p>
<pre name="code" class="python">
#!/usr/bin/env python

import wx

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  &#039;Python Motif Finder&#039;, style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()

    def __do_layout(self):
        pass

app = pymot()
frame = pymotGUI(parent=None, id = -1)
frame.Show()
app.MainLoop()
</pre>
<p>Usually a wxPython interface has three parts in its script: a class for the window/frame/dialog, a class for the application and a initialization routine. All wxPython applications, and scripts, need to derive an wx.App class and initialize it (on OnInit or on __init__ functions), i.e. create the window, begin the program, etc. Another class, derived from wx.Frame in this case, will build the window/frame/dialog <i>per se</i> and will also contain initialization for the window, objects, events, etc. The last part is the main script where the application is started, by calling the derived class, the window is also called and shown. The last line is the <code>MainLoop</code>, present in every wxPython script, and it is the main line of the script, the heart of the application. MainLoop processes all the events and manages how the objects interact by receiving and dispatching such events. </p>
<p>The script above could have been created differently, some lines of it omitted and there is also no need to derive an specific class for the frame. But this way it is easier to  get a grasp of the script as it will need to be enlarged so accomodates the objects and maybe a couple of extra windows and dialogs. Running the above script will generate the window below</p>
<p><a href="http://python.genedrift.org/wordpress/wp-content/uploads/2008/10/gui1.png"><img src="http://python.genedrift.org/wordpress/wp-content/uploads/2008/10/gui1-150x150.png" alt="First screencap of our GUI" title="First screencap of our GUI" width="150" height="150" class="aligncenter size-thumbnail wp-image-172" /></a></p>
<p>very simple and barebones. Next will explore the script above, include some extra elements and learn a little bit more of wxPython.</p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/10/20/creating-an-interface-for-the-motif-finding-script/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python, overepresented motifs, the Grand Finale</title>
		<link>http://python.genedrift.org/2008/09/05/python-overepresented-motifs-the-grand-finale/</link>
		<comments>http://python.genedrift.org/2008/09/05/python-overepresented-motifs-the-grand-finale/#comments</comments>
		<pubDate>Sat, 06 Sep 2008 02:27:56 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[Phase 2]]></category>
		<category><![CDATA[motifs]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[determination]]></category>
		<category><![CDATA[Motif]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=163</guid>
		<description><![CDATA[In this final part, let&#8217;s do some very simple refactoring and modify the output section to make the result a little bit better. There are not many options about the functions to calculate the binomial expansion. But Andrew posted some opinions on how to slight change the quorum function.

def get_quorums(seqs, mlen):
    &#34;&#34;&#34;
 [...]]]></description>
			<content:encoded><![CDATA[<p>In this final part, let&#8217;s do some very simple refactoring and modify the output section to make the result a little bit better. There are not many options about the functions to calculate the <a href="http://en.wikipedia.org/wiki/Binomial_theorem" title="Binomial theorem" rel="wikipedia" class="zem_slink">binomial expansion</a>. But Andrew posted some opinions on how to slight change the quorum function.</p>
<pre name="code" class="python">
def get_quorums(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use explicit counter to create seq_no
    &quot;&quot;&quot;
    quorum = defaultdict(int)
    for seq in seqs:
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]] += 1
    return quorum
</pre>
<p>His modifications were small but improved the code a bit, as you remove one variable/object from the function. At the same time there is need to change a bit our output section of the code, as we don&#8217;t use a <code>defaultdict</code> initialized with a set, but with an integer.</p>
<pre name="code" class="python">
for i in foreground:
    term1 = choose(background[i], foreground[i])
    term2 = choose((N - background[i]), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 &lt; p &lt;= 0.0001:
        print i, foreground[i], background[i], p
</pre>
<p>Notice that in the <code>term1</code> line we don&#8217;t check for the set length anymore and just use the integer stored in <code>foreground</code> and <code>background</code>. Again a small change, that can make the code a little bit more clear. But we need to modify this section so the output is a little bit more clear, maybe ordered by motif sequence.</p>
<p>But as we are reading the sequences as they are our results are not ordered. It would be great to have a final list starting with AAAAAAAA and ending with TTTTTTTTT. There is an easy way to do that, and very inexpensive regarding code and final performance. Basically we append each one of the motifs (and their extra information) to a list and use the <code>sort</code> method for lists. So our output section of the code will be</p>
<pre name="code" class="python">
res_motifs = []
for i in foreground:
    term1 = choose(background[i], foreground[i])
    term2 = choose((N - background[i]), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 &lt; p &lt;= 0.0001:
        res_motifs.append(i + &#039;\t&#039; + str(foreground[i]) + &#039;\t&#039; + str(background[i]) + &#039;\t&#039; + str(p))

res_motifs.sort()
for i in res_motifs:
    print i
</pre>
<p>Putting everything together our final motif determination script is (batteries included):</p>
<pre name="code" class="python">
#!/usr/bin/env python

import fasta
import sys
from collections import defaultdict

def choose(n, k):
    if 0 &lt;= k &lt;= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

def get_quorums(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use explicit counter to create seq_no
    &quot;&quot;&quot;
    quorum = defaultdict(int)
    for seq in seqs:
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]] += 1
    return quorum

input_seqs = fasta.read_seqs(open(sys.argv[1]).readlines())
input_seqs2 = fasta.read_seqs(open(sys.argv[2]).readlines())

foreground = get_quorums(input_seqs, 10)
background = get_quorums(input_seqs2, 10)

N = len(input_seqs) + len(input_seqs2)

res_motifs = []
for i in foreground:
    term1 = choose(background[i], len(foreground[i])
    term2 = choose((N - background[i]), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 &lt; p &lt;= 0.0001:
        res_motifs.append(i + &#039;\t&#039; + str(foreground[i]) + &#039;\t&#039; + str(background[i]) + &#039;\t&#039; + str(p))

res_motifs.sort()
for i in res_motifs:
    print i
</pre>
<p>Next we will see some basic Python methods. And maybe start a new series and phase.</p>
<div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/9a6ebaca-cd31-40e8-bb9f-df57424745a9/" title="Zemified by Zemanta"><img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_e.png?x-id=9a6ebaca-cd31-40e8-bb9f-df57424745a9" alt="Reblog this post [with Zemanta]"></a></div>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/09/05/python-overepresented-motifs-the-grand-finale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Obtaining overrepresented motifs in DNA sequences, part 13</title>
		<link>http://python.genedrift.org/2008/08/20/obtaining-overrepresented-motifs-in-dna-sequences-part-13/</link>
		<comments>http://python.genedrift.org/2008/08/20/obtaining-overrepresented-motifs-in-dna-sequences-part-13/#comments</comments>
		<pubDate>Thu, 21 Aug 2008 02:32:09 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[Phase 2]]></category>
		<category><![CDATA[Section 3]]></category>
		<category><![CDATA[Section 5]]></category>
		<category><![CDATA[motifs]]></category>
		<category><![CDATA[defaultdict]]></category>
		<category><![CDATA[determination]]></category>
		<category><![CDATA[dna]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=149</guid>
		<description><![CDATA[Now that we have the best quorum determination function and the ideal function to calculate the binomial expansions it is easy to program a script to calculate the p value of motifs in DNA sequences. To the script
below in the code there are a couple of errors that wordpress don&#8217;t let me fix. The &#62; [...]]]></description>
			<content:encoded><![CDATA[<p>Now that we have the best quorum determination function and the ideal function to calculate the <a href="http://en.wikipedia.org/wiki/Binomial_theorem" title="Binomial theorem" rel="wikipedia" class="zem_slink">binomial expansions</a> it is easy to program a script to calculate the <em>p</em> value of motifs in DNA sequences. To the script</p>
<p><em>below in the code there are a couple of errors that wordpress don&#8217;t let me fix. The <verbatim>&gt;</verbatim> and <verbatim>&lt;</verbatim> are replaced by their literal html enconding. I am working on it, sorry</em></p>
<pre name="code" class="python">
#!/usr/bin/env python

import fasta
import sys
from collections import defaultdict

def choose(n, k):
    if 0 &lt;= k &lt;= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        #print ntok // ktok
        return ntok // ktok
    else:
        return 0

def get_quorums(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use explicit counter to create seq_no
    &quot;&quot;&quot;
    quorum = defaultdict(set)
    id_no = 0
    for seq in seqs:
        id_no += 1
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]].add(id_no)
    return quorum

input_seqs = fasta.read_seqs(open(sys.argv[1]).readlines())
input_seqs2 = fasta.read_seqs(open(sys.argv[2]).readlines())

foreground = get_quorums(input_seqs, 10)
background = get_quorums(input_seqs2, 10)

N = len(input_seqs) + len(input_seqs2)

for i in foreground:
    term1 = choose(len(background[i]), len(foreground[i]))
    term2 = choose((N - len(background[i])), len(input_seqs)-1)
    term3 = choose(N, len(input_seqs))
    p = (float(term1) * float(term2)) / term3
    if 0 &lt; p &lt;= 0.0001:
        print i, len(foreground[i]), len(background[i]), p
</pre>
<p>We already defined choose in the last post (more information in the link from the Python&#8217;s cookbook) and earlier Mike sent us a series of quorum-determination functions and one of the best was portrayed and explained <a href="http://python.genedrift.org/2008/06/03/obtaining-overrepresented-motifs-in-dna-sequences-part-7/">here</a>. We also need our fasta module to read the sequences (and only the sequences) in order to use it in the quorum function.</p>
<p>Basically we use the foreground and background files as input, determine the quorum of the different words (width 10) and then we iterate over the results, calculating the <em>p</em> value for each motif found in the foreground set. The tree terms of the Hypergeometric Distribution are calculated separately and we test for a <em>p</em> value smaller that 0.0001 (this can be modified) and we only print the results that fall in this category.&gt;</p>
<div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/cdd03597-362b-4dcc-b588-fb3fe3fba91a/" title="Zemified by Zemanta"><img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_e.png?x-id=cdd03597-362b-4dcc-b588-fb3fe3fba91a" alt="Reblog this post [with Zemanta]"></a></div>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/08/20/obtaining-overrepresented-motifs-in-dna-sequences-part-13/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Obtaining overrepresented motifs in DNA sequences, part 10</title>
		<link>http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/</link>
		<comments>http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/#comments</comments>
		<pubDate>Wed, 04 Jun 2008 16:33:51 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[Phase 2]]></category>
		<category><![CDATA[motifs]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[overrepresented]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=112</guid>
		<description><![CDATA[Let&#8217;s get back to the statistical module, that will calculate an Hypergeometric Distribution (HD) p value so we can define the overrepresented motifs. Last time we saw it, we just had defined the factorial function, which is immensely helpful in this case due to the number of factorial calculations needed in the HD. The factorial [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s get back to the statistical module, that will calculate an Hypergeometric Distribution (HD) <i>p</i> value so we can define the overrepresented motifs. Last time we saw it, we just had defined the <a href="http://en.wikipedia.org/wiki/Factorial" title="Factorial" rel="wikipedia" class="zem_slink">factorial function</a>, which is immensely helpful in this case due to the number of factorial calculations needed in the HD. The factorial function was the one below</p>
<pre name="code" class="python">

def fac(n):
    value = reduce(lambda i, j : i * j, range(1, n + 1))
    return value
</pre>
<p>but as mentioned in the comments by <a href="http://python.genedrift.org/2008/05/21/obtaining-overrepresented-motifs-in-dna-sequences-part-iv/#comment-13532">Dave</a> and by Mike via email the method used is not the best method to calculate factorial in <a href="http://python.org/" title="Python (programming language)" rel="homepage" class="zem_slink">Python</a>. The best approach in this case is to use <code>operator.mul</code>. All functions in the operator modules are in implemented in pure C and they mimic the same operators in Python. So in this module we can find <code>mul</code> for multiplication, <code>sub</code> for subtraction, <code>add</code> for additions, etc. </p>
<p>The <code>operator.mul</code> needs two arguments to multiply, and in our case we still need to use <code>reduce</code> to sum all the results from a series of multiplications. As parameters we should use a <code>range</code>, that can start with 2, that should go up to the number we want the factorial plus one. Finally our function would be </p>
<pre name="code" class="python">

import operator

def fac(n):
    value = reduce(operator.mul, xrange(2, n+1))
    return value
</pre>
<p>The time gain, quickly measured in a non-scientific fashion in my system, is around 5 to 15%, depending on the factorial being calculated. It may seem a small gain, but when you need to calculate almost a million factorials for all possible motifs the amount of time saved is crucial. Next time we will be back with more statistics, expanding the module.
<div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Zemified by Zemanta"><img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/pixie.png?x-id=11aa0b44-3e0b-4f49-9921-54123607acf0" alt="Zemanta Pixie"></a></div>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Obtaining overrepresented motifs in DNA sequences, part 9</title>
		<link>http://python.genedrift.org/2008/06/03/obtaining-overrepresented-motifs-in-dna-sequences-part-8/</link>
		<comments>http://python.genedrift.org/2008/06/03/obtaining-overrepresented-motifs-in-dna-sequences-part-8/#comments</comments>
		<pubDate>Tue, 03 Jun 2008 18:59:28 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[Phase 2]]></category>
		<category><![CDATA[motifs]]></category>
		<category><![CDATA[Generator]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=111</guid>
		<description><![CDATA[Back on new functions for motif quorums. We jump function 7 in order to explain &#8220;simpler&#8221; ones, 8 and 9. Both functions use generators. We&#8217;ve already seen here generators, which are functions that use the yield statement to generate iterators. The generator is very similar to a function but instead of returning a value, it [...]]]></description>
			<content:encoded><![CDATA[<p>Back on new functions for motif quorums. We jump function 7 in order to explain &#8220;simpler&#8221; ones, 8 and 9. Both functions use <code>generators</code>. We&#8217;ve already seen here <code>generators</code>, which are functions that use the <code>yield</code> statement to generate iterators. The generator is very similar to a function but instead of returning a value, it yields one and waits for another call to resume. In function 8, a generator is used to return the motif sequence that is used as a key in the defaultdict. Notice the scope of the generator that is coded inside a function.</p>
<pre name="code" class="python">
def get_quorums_08(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use enumerate to create seq_no
    use an explicit generator to create the motifs
    &quot;&quot;&quot;
    def motif_gen(seq):
        for n in range(len(seq)-mlen):
            yield seq[n:n+mlen]

    quorum = defaultdict(set)
    for id_no, seq in enumerate(seqs):
        for motif in motif_gen(seq):
            quorum[motif].add(id_no)

    return quorum
</pre>
<p>In function 9 a very similar structure is used but in this cases instead of a &#8220;pure&#8221; <code>generator</code> it uses <code>generator expressions</code> which a very similar to <a href="http://python.genedrift.org/2008/03/11/fasta-module-generating-reverse-complement-of-dna-sequences/">list comprehensions</a> but with parentheses instead of square brackets. <code>Generator expressions</code> are generators that can be written in one line and have identical behaviour as generators coded in the &#8220;regular&#8221; inception. In the function below the <code>generator expressions</code> provide the iterator for the loop with <code>motif</code> as index. Very simple and elegant.</p>
<pre name="code" class="python">
def get_quorums_09(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use enumerate to create seq_no
    use a generator expression to create the motifs
    &quot;&quot;&quot;
    quorum = defaultdict(set)
    for id_no, seq in enumerate(seqs):
        for motif in (seq[n:n+mlen] for n in range(len(seq)-mlen)):
            quorum[motif].add(id_no)

    return quorum
</pre>
<p>In the next post we will go back to the statistical module and soon we will see the remainder 5 functions sent by Mike.
<div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Zemified by Zemanta"><img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/pixie.png?x-id=f8754f61-28d8-4dc0-b4f1-6ef618fa780e" alt="Zemanta Pixie"></a></div>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/06/03/obtaining-overrepresented-motifs-in-dna-sequences-part-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Obtaining overrerpresented motifs in DNA sequences, part 8</title>
		<link>http://python.genedrift.org/2008/06/03/obtaining-overrerpresented-motifs-in-dna-sequences-part-8/</link>
		<comments>http://python.genedrift.org/2008/06/03/obtaining-overrerpresented-motifs-in-dna-sequences-part-8/#comments</comments>
		<pubDate>Tue, 03 Jun 2008 17:49:53 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[Phase 2]]></category>
		<category><![CDATA[motifs]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[dna]]></category>
		<category><![CDATA[overrepresented]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=110</guid>
		<description><![CDATA[We keep on introducing Mike&#8217;s functions. This time there are a couple of Python methods that we haven&#8217;t seen here and need some introduction, izip and count. To use these two we also need to import new modules

from itertools import count, izip

count returns consecutive integers starting at a defined point (the method&#8217;s parameter). If empty [...]]]></description>
			<content:encoded><![CDATA[<p>We keep on introducing Mike&#8217;s functions. This time there are a couple of Python methods that we haven&#8217;t seen here and need some introduction, <code>izip</code> and <code>count</code>. To use these two we also need to import new modules</p>
<pre name="code" class="python">
from itertools import count, izip
</pre>
<p><code>count</code> returns consecutive integers starting at a defined point (the method&#8217;s parameter). If empty it starts from zero. Basically, by starting a <code>count</code> it will give an iterable with a increasing integer values, in a fashion similar to a function with yield. Every time our loop accesses the <code>count</code> it will &#8220;remember&#8221; the last return value and increment it by one.</p>
<p><code>izip</code> also returns an iterator, but from a list of iterables. It is basically used to iterate through a list of many iterables at the same time. In the function below it is used twice: one to generate a tuple (with <code>count</code>) with a sequence number and the sequence itself. The sequence in the tuple is then used in another <code>izip</code> to create the windows on the sequences to count motifs.</p>
<pre name="code" class="python">
def get_quorums_06(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use &#039;izip(count(),...) to create seq_no
    use &#039;izip(count(),range(...)) to create start/stop indices for motifs
    &quot;&quot;&quot;
    quorum = defaultdict(set)
    for id_no, seq in izip(count(), seqs):
        for s, e in izip(count(), range(mlen, len(seq))):
            quorum[seq[s:e]].add(id_no)
    return quorum
</pre>
<p>In the next couple of posts we still be checking motif quorum functions. Stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/06/03/obtaining-overrerpresented-motifs-in-dna-sequences-part-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Obtaining overrepresented motifs in DNA sequences, part 7</title>
		<link>http://python.genedrift.org/2008/06/03/obtaining-overrepresented-motifs-in-dna-sequences-part-7/</link>
		<comments>http://python.genedrift.org/2008/06/03/obtaining-overrepresented-motifs-in-dna-sequences-part-7/#comments</comments>
		<pubDate>Tue, 03 Jun 2008 15:27:49 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[Phase 2]]></category>
		<category><![CDATA[motifs]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[overrepresented]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=109</guid>
		<description><![CDATA[Continuing on Mike&#8217;s functions to obtain motif quorums. We see function 3, 4 and 5. Function get_quorums_03, uses an old friend of the blog, sets. Recall that sets are very similar to lists, but their are unordered and items are unique.

def get_quorums_03(seqs, mlen):
    &#34;&#34;&#34;
    add seq id_no to a [...]]]></description>
			<content:encoded><![CDATA[<p>Continuing on Mike&#8217;s functions to obtain motif quorums. We see function 3, 4 and 5. Function get_quorums_03, uses an old friend of the blog, <code>sets</code>. Recall that <code>sets</code> are very similar to lists, but their are unordered and items are unique.</p>
<pre name="code" class="python">
def get_quorums_03(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use explicit counter to create seq_no
    &quot;&quot;&quot;
    quorum = defaultdict(set)
    id_no = 0
    for seq in seqs:
        id_no += 1
        for n in range(len(seq)-mlen):
            quorum[seq[n:n+mlen]].add(id_no)
    return quorum
</pre>
<p>Basically, the sequence numbers (an incremented counter) are added to a defaultdict which was initialized as a set. This way you don&#8217;t need to check for the existence of the sequence number in the defaultdict list and count on the ability of <code>set</code> of being unique. Function 4 is very similar to function 3 with the difference of using enumerate (as in function 02) to make the sequence numbers.</p>
<pre name="code" class="python">
def get_quorums_04(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use &#039;enumerate&#039; to create seq_no
    &quot;&quot;&quot;
    quorum = defaultdict(set)
    for id_no, seq in enumerate(seqs):
        for n in range(len(seq)-mlen):
            quorum[seq[n:n+mlen]].add(id_no)
    return quorum
</pre>
<p>Function 5 adds a twist, which is to have an enumerate to set the sequence range (motif/word width) start and stop. This way the window is sliding based on the tuple created by the enumerate method and not on the slicing that were used in all other functions. Again, a defaultdict is initialized as set and the sequence numbers are generated by an enumerate.</p>
<pre name="code" class="python">
def get_quorums_05(seqs, mlen):
    &quot;&quot;&quot;
    add seq id_no to a set
    use &#039;enumerate&#039; to create seq_no
    use enumerate(range(...)) to create start/stop indices for motif
    &quot;&quot;&quot;
    quorum = defaultdict(set)
    for id_no, seq in enumerate(seqs):
        for s, e in enumerate(range(mlen, len(seq))):
            quorum[seq[s:e]].add(id_no)
    return quorum
</pre>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/06/03/obtaining-overrepresented-motifs-in-dna-sequences-part-7/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Obtaining overrepresented motifs in DNA sequences, part 6</title>
		<link>http://python.genedrift.org/2008/05/30/obtaining-overrepresented-motifs-part-6/</link>
		<comments>http://python.genedrift.org/2008/05/30/obtaining-overrepresented-motifs-part-6/#comments</comments>
		<pubDate>Sat, 31 May 2008 02:30:44 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[Phase 2]]></category>
		<category><![CDATA[motifs]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[defaultdict]]></category>
		<category><![CDATA[enumerate]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=108</guid>
		<description><![CDATA[We will take a break on developing the statistical module to obtain overrepresented motifs (I will introduce mul in the next stats post), and take a deeper look at the possibilities on obtaining the motif quorums. Mike DeHaemer, a regular commenter and contributor to the blog, sent me a Python script with 8 different ways [...]]]></description>
			<content:encoded><![CDATA[<p>We will take a break on developing the statistical module to obtain overrepresented motifs (I will introduce mul in the next stats post), and take a deeper look at the possibilities on obtaining the motif quorums. Mike DeHaemer, a regular commenter and contributor to the blog, sent me a <a href="http://python.org/" title="Python (programming language)" rel="homepage" target="_blank" class="zem_slink">Python</a> script with 8 different ways distributed in 13 distinct functions for obtaining the motif quorums. I will take advantage of his contribution and post all of them, with some quick comments on each one of them (his code comments were kept in each function). After, a small benchmarking will be posted.</p>
<p>Most of the functions need to import a couple of module</p>
<pre name="code" class="python">
from collections import defaultdict, deque
from itertools import count, izip, tee
</pre>
<p>and they have two parameters, a sequence list and the length of the motifs.</p>
<p>The first function uses again the defaultdict and it is very similar to the one used in the final version of the quorum script. The defaultdict is initialized as a list and the ids are added to a the list, keys are motifs, only if they are not already present in it. The sequence id is generated in a variable incremented each time the loop iterates.</p>
<pre name="code" class="python">
def get_quorums_01(seqs, mlen):
    &quot;&quot;&quot;
    append seq id_no to list after checking to see if already present
    use explicit counter to create seq_no
    &quot;&quot;&quot;
    quorum = defaultdict(list)
    id_no = 0
    for seq in seqs:
        id_no += 1
        for n in range(len(seq) - mlen):
            if id not in quorum[seq[n:n+mlen]]:
                quorum[seq[n:n + mlen]].append(id_no)
    return quorum
</pre>
<p>The second function is very similar to the first one, with the caveat that sequence id numbers are generated with <code>enumerate</code>.</p>
<pre name="code" class="python">
def get_quorums_02(seqs, mlen):
    &quot;&quot;&quot;
    append seq id_no to list after checking to see if already present
    use &#039;enumerate&#039; to create seq_no
    &quot;&quot;&quot;
    quorum = defaultdict(list)

    for id_no, seq in enumerate(seqs):
        for n in range(len(seq) - mlen):
            if id_no not in quorum[seq[n:n+mlen]]:
                quorum[seq[n:n+mlen]].append(id_no)
    return quorum
</pre>
<p><code>enumerate</code> is a object based on another iterable object. When called <code>enumerate</code> always returns a tuple of an indexed series. For instance, in our case above, enumerate will return a series of tuples <code>(0, sequence1), (1, sequence2) ... (n, sequenceN)</code>. That&#8217;s the reason the enumerate loop uses a tuple as its index</p>
<pre name="code" class="python">
for id_no, seq in enumerate(seqs)
</pre>
<p>Next couple of posts will cover the other functions sent by Mike. Then we will go back to the statistical module. </p>
<div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Zemified by Zemanta"><img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/pixie.png?x-id=056c1508-ad72-47fe-b53c-1b2c241e5ee6" alt="Zemanta Pixie"></a></div>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/05/30/obtaining-overrepresented-motifs-part-6/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Obtaining overrepresented motifs in DNA sequences, part 5</title>
		<link>http://python.genedrift.org/2008/05/21/obtaining-overrepresented-motifs-in-dna-sequences-part-iv/</link>
		<comments>http://python.genedrift.org/2008/05/21/obtaining-overrepresented-motifs-in-dna-sequences-part-iv/#comments</comments>
		<pubDate>Wed, 21 May 2008 18:50:00 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[Phase 2]]></category>
		<category><![CDATA[motifs]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[hypergeometric distribution]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=107</guid>
		<description><![CDATA[Now that we have the script to generate the word quorums working (and working fast!) we need then to calculate the a p value for each motif based on the fore and background quorums. A p value cut-off will determine the statistically significant words, or overrepresented. These overrepresented words then can be analysed in more [...]]]></description>
			<content:encoded><![CDATA[<p>Now that we have the script to generate the word quorums working (and working fast!) we need then to calculate the a <i>p</i> value for each motif based on the fore and background quorums. A <i>p</i> value cut-off will determine the statistically significant words, or overrepresented. These overrepresented words then can be analysed in more details (that we won&#8217;t see here) and for instance determine new or already known transcription factor binding sites.</p>
<p>A well established statistical method to determine such overrepresented words is the <a href="http://www.mnlottery.com/hypergeo.html">Hypergeometric Distribution</a> (HD for short). HD measures &#8220;success&#8221; and &#8220;failures&#8221; for values that do not fit in the <a href="http://en.wikipedia.org/wiki/Binomial_distribution" title="Binomial distribution" rel="wikipedia" target="_blank" class="zem_slink">binomial distribution</a>, and depend on the measurements without replacement.</p>
<p>Basically, HD&#8217;s equation has a a series of binomial coefficients/combinations</p>
<p><img src="http://www.genedrift.org/hd.gif" alt="HD equation" class="alignnone"></p>
<p>where <i>N</i> is the population size, <i>m</i> is foreground cluster size, <i>k</i> is the motif quorum in the background gene set and <i>x</i> is the word quorum in the foreground set. Note that the above equation is for the cumulative HD, where a sum of probabilities is calculated.</p>
<p>All the combinations in the above equation have to be expanded to factorials that depending on the value to be calculated are very computer intensive and sometimes don&#8217;t fit in the memory (either a float or integer). But <a href="http://python.org/" title="Python (programming language)" rel="homepage" target="_blank" class="zem_slink">Python</a> is able to handle very large numbers and the calculation of large factorials is relatively fast.</p>
<p>In C++, I had to use a couple of tricks to achieve a good speed in the factorial determination, and specially in the HD calculation that requires multiple factorials and multiplication, division and subtraction of large numbers. I didn&#8217;t want to use any mathematical trick such as <a href="http://en.wikipedia.org/wiki/Stirling%27s_approximation" title="Stirling's approximation" rel="wikipedia" target="_blank" class="zem_slink">Stirling&#8217;s approximation</a>. 13! in C++ already blows the size of long, so I had to use the <a href="http://www.tc.umn.edu/%7Eringx004/mapm-main.html">MAPM, A Portable Arbitrary Precision Math Library in C</a>. This library is quite fast to calculate the factorial values but when one needs to calculate more than 200,000 factorials the speed is unbearable. So, I decided to pre-calculate a series of factorial values, keeping 10 decimal places as precision and saving in another column their exponential. Then using this table as an input I was able to multiply, divide and subtract the factorials and by employing the first law of exponents do the same operations with their exponential. This speeds up the process tremendously.</p>
<p>In Python, we don&#8217;t need any extra third-party library, we just use Python itself, without importing an extra module. A factorial function in Python can be written in one line, but for clarity is better to define it separately. We can try throwing any number at it and see the result.</p>
<pre name="code" class="python">

def fac(n):
    value = reduce(lambda i, j : i * j, range(1, n + 1))
    return value
</pre>
<p>We already saw <code>reduce</code> and <code>lambda</code> and using these two methods make the factorial function clear and simple. And why are we not using a recursive function? Because Python has a limit recursion depth (1000). Next time we will implement the code that calculates the HD <i>p</i> values.
<div class="zemanta-pixie" style="margin: 5px 0pt; width: 100%;"><a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Zemified by Zemanta"><img class="zemanta-pixie-img" src="http://img.zemanta.com/pixie.png?x-id=60da2ecf-5651-4268-95c2-8f4bcac0b4be" style="border: medium none ; float: right;"></a></div>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/05/21/obtaining-overrepresented-motifs-in-dna-sequences-part-iv/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Obtaining overrepresented motifs in DNA sequences, part 4</title>
		<link>http://python.genedrift.org/2008/05/12/obtaining-overrepresented-motifs-in-dna-sequences-part-4/</link>
		<comments>http://python.genedrift.org/2008/05/12/obtaining-overrepresented-motifs-in-dna-sequences-part-4/#comments</comments>
		<pubDate>Mon, 12 May 2008 19:35:13 +0000</pubDate>
		<dc:creator>Paulo Nuin</dc:creator>
				<category><![CDATA[Phase 2]]></category>
		<category><![CDATA[motifs]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[dna]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://python.genedrift.org/?p=106</guid>
		<description><![CDATA[We found a way to make the Python script as good as or better than the C++ executable. But for the analysis we need to do, motif counts are not the value we want. We need the quorum: the number of sequences the motif is present at least once. For instance, if the desired motifs [...]]]></description>
			<content:encoded><![CDATA[<p>We found a way to make the Python script as good as or better than the C++ executable. But for the analysis we need to do, motif counts are not the value we want. We need the quorum: the number of sequences the motif is present at least once. For instance, if the desired motifs was AAACCCTTTG we will check in which sequences this word was present. Let&#8217;s say in a cluster of 10 sequences, we would find it in sequences 1, 2, 3, 4 and 5, giving us a quorum of 5 out of 10, or 50%. The quorum will be used in the future in the statistical calculation in order to determine the overrepresented motifs.</p>
<p>With only a couple of modifications, we can adapt the script used to get the motif counts to get the quorum.</p>
<pre name="code" class="python">
#!/scratch/python/bin/python

from collections import defaultdict
import sys
import fasta

seqs = fasta.get_seqs(open(sys.argv[1]).readlines())
length = int(sys.argv[2])

quorum = defaultdict(list)

seq_number = 0
for i in seqs:
    seq_number += 1
    for n in range(len(i.sequence) - int(length)):
        if not seq_number in quorum[i.sequence[n : n + length]]:
            quorum[i.sequence[n : n + length]].append(seq_number)

for i in quorum:
    print &#039;&#039;.join(i).upper(), len(quorum[i])
</pre>
<p>Basically, we change the way the <code>defaultdict</code> is initialized, this time as a list instead of int and we also change the procedure that used to get the counts. The loop does identical work, iterating along the sequences, with a window (of the input length) sliding on them and checking each word. This time instead of incrementing the value of the <code>defaultdict</code>, we append to the list the sequence number, obtained from a index integer variable (incremented in each iteration of the loop), if this number is not already in the list value. In the end each value of <code>quorum</code> will be a list os numbers and by printing the list length we obtain the quorum. Testing the above script there is no performance loss when comparing to the previous count script.</p>
<p>Next we will see which statistical method to use and start to devising an script to calculate it.</p>
]]></content:encoded>
			<wfw:commentRss>http://python.genedrift.org/2008/05/12/obtaining-overrepresented-motifs-in-dna-sequences-part-4/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

