Finally it’s 2009 …

Phase 2 Comments Off
Python logo, 1990s-2005
Image via Wikipedia

And … we’re back. The long and cold winter is still out there and January 2009 is almost in the books. After a long period without updating I’ll try to “rush” some posts this week, trying to get back on track. So, a little bit of what’s up and coming:

- a patch provided by Robin Stocker to make all scripts published here (at least the ones on GitHub) PEP 8 compliant.

- using SQLite databases in Python

- developing an interface to access the database

- anything that you might suggest, just leave a comment.

Let’s start 2009 then.

Reblog this post [with Zemanta]

Twitter

Phase 2 Comments Off

I’m on Twitter, for quite some time. Some Python stuff, some biology, some bioinformatics, and a little bit of everything else.

nuin.

Python Magazine?

off topic Comments Off

I have been buying Python Magazine in the last months and I really like it, especially now that I already miss Linux Magazine and have no close source for Linux Journal (I should subscribe, I know). Last week I got an email from Python Magazine that I could use a coupon to buy some issues. Coupon that I used right away. Paid with PayPal and I’m still waiting for my issue to show up. Sent a couple of emails using the contact form and until now, nothing. I’ll wait until next year and see what happened. It’s really sad because this issue covers cloud computing with Python.

Edit: problem solved. Thanks everyone!

That’s it for 2008

Phase 2 1 Comment »

The date came and is now gone, and I forgot to “celebrate” two years of Beginning Python for Bioinformatics on December 13th. I would like to thank everyone that commented, helped with posts and suggested anything that would make this website better. Clearly it is far from being what I wanted it to be, but slowly but surely we will get there.

Thanks again and I wish an excellent holiday season and a great 2009 to everyone!

See you in 2009.

Scripts and Python 3.0, part 2, using 2to3

Python 3.0 4 Comments »

And we’re back to check our initial scripts to run on Python 3.0. Along with this latest release, a nice tool to parse your scripts is also installed. It’s called 2to3 and it’s available in the Tools/scripts of your Python 3.o installation directory. Basic usage is very similar to any python script:

[sourcecode language='bash']2to3

and the output is similar to a diff output, with the lines that should be changed in the original form and in Python 3.0 form. Codes 01, 02, 03 and 04 were trivial to change, mainly the "issue" was on the print statement. The final codes 03 and 04 are below, after them we will see code_05. One extra thing: on code_04 we used to print the sequence in different lines (basically outputting what was read from the file), and if we change the print statement from

print line,

to

print(line,)

it will add an extra line between the sequence lines. The same will happen if we change the line to

print(line)

So what we need to do is to add a parameter to the print function, end. This will tell Python what we want the end of the line, in our case an empty string. So the line would be

print(line, end = '')

That's it. So code_03 and code_04 will be

#code_03
import re

#setting the DNA string
myDNA = 'ACGTTGCAACGTTGCAACGTTGCA'

#assigning a new regex and compiling it
#to find all Ts
regexp = re.compile('T')

#create a new string tha will receive
#the regex result with Us replacing Ts
myRNA = regexp.sub('U', myDNA)

print(myRNA)
#end 03

#code 04
#assigning a filename to a variable
dnafile = "AY162388.seq"

#opening the file
file = open(dnafile, 'r')

#printing each line of the file
for line in file:
    print(line, end="")

Now, code_05, has 45 lines including comments, so it should be a good idea to test 2to3 on it, especially after a long time since we created the script. There might be some other changes that we might miss (I already . Let's run 2to3 on our original script and check the output:

RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
--- code_05.py (original)
+++ code_05.py (refactored)
@@ -30,16 +30,16 @@
#the loop continues
while inputfromuser:
#raw_input received the user input as string
- inmotif = raw_input('Enter motif to search: ')
+ inmotif = input('Enter motif to search: ')
#now we check for the size of the input
if len(inmotif) >= 1:
#we compile a regex with the input given
motif = re.compile('%s' % inmotif)
#looking to see if the entered motif is in the sequence
if re.search(motif, sequence):
- print 'Yep, I found it'
+ print('Yep, I found it')
else:
- print 'Sorry, try another one'
+ print('Sorry, try another one')
else:
- print 'Done, thanks for using motif_search'
+ print('Done, thanks for using motif_search')
inputfromuser = False
RefactoringTool: Files that need to be modified:
RefactoringTool: code_05.py

So a lot of lines, seems that we need to do a lot of changes. But in fact there are not many changes. All lines starting with a - need to be removed and replaced by the lines starting with a +, and the + lines are adjacent to the ones we need to change. Again, the most common changes here are the print statements, but there is also another change to be done

inmotif = raw_input('Enter motif to search: ')
inmotif = input('Enter motif to search: ')

So, there is no raw_input in Python 3.0, it was abolished for a more evolved input function that now always expect a string, that can be evaluated later if needed (and desired), just like the old raw_input. Now there is no confusion anymore on which one to use.

Digging a little bit into 2to3 we see that it can write the code for use by using the -w parameter when running the script. Be careful as it rewrites the same file, however saving a backup copy. In the end we get this code from 2to3 (code_05.py)

#!/usr/bin/env python

'''
simple script to find motifs on DNA sequences using regex
the script is interactive
'''

# we use the RegEx module
import re
import string

#still keep the file fixed
dnafile = "AY162388.seq"

#opening the file, reading the sequence and storing in a list
seqlist = open(dnafile, 'r').readlines()

#let's join the the lines in a temporary string
temp = ''.join(seqlist)

#assigning our sequence, with no carriage returns to our
#final variable/object
sequence = temp.replace('\n', '')

#we start to deal with user input
#first we use a boolean variable to check for valid input
inputfromuser = True

#while loop: while there is an motif larger than 0
#the loop continues
while inputfromuser:
    #raw_input received the user input as string
    inmotif = input('Enter motif to search: ')
    #now we check for the size of the input
    if len(inmotif) >= 1:
        #we compile a regex with the input given
        motif = re.compile('%s' % inmotif)
        #looking to see if the entered motif is in the sequence
        if re.search(motif, sequence):
            print('Yep, I found it')
        else:
            print('Sorry, try another one')
    else:
        print('Done, thanks for using motif_search')
        inputfromuser = False

2to3 seems to be pretty good in detecting changes, pointing them to you and even writing the newer script for you. Until now I haven't tested on big scripts (more than 100 lines long), but I plan to do it soon. For small scripts we saw that it works quite well. A good test for 2to3 would be when we get to the motifs scripts that are a little bit more complex, even though they are quite short. Stay tuned and check new code on the repository.

Scripts and Python 3.0, part 1

Python 3.0 8 Comments »

Yes, Python 3.0 was released earlier than Perl … what version was it? 6? 7? Anyway, I decided to go back to most of the scripts that were posted here. In the github repo we have 50 files in the “original scripts” directory. Let’s check how do they fare on Python 3.0 and what type of changes we need to do in order to make them work. Starting with code_01.py, which is a couple of lines long

myDNA = "ACGTACGTACGTACGTACGTACGT"
print myDNA

Here we have one of the most evident differences between Python 2.x and 3.0. Now print is a function not a statement anymore, so whatever we want to print now should be passed as a function parameter. The above code would be changed to

myDNA = "ACGTACGTACGTACGTACGTACGT"
print(myDNA)

That simple ins this case. But what are the advantages of print being a function over a statement? More flexibility, as can be seen in the link above. It is possible now to send different parameters to print and make the output richer by customizing separators between items, directing the output, etc.

A similar change would have to be made n our code_02.py. There are two print statements there that should be translated to the function. Trivial, so far. The original code

myDNA = "ACGTACGTACGTACGTACGTACGT"
myDNA2 = "TCGATCGATCGATCGATCGA"
print "First and Second sequences"
print myDNA, myDNA2
myDNA3 = myDNA + myDNA2
print "Concatenated sequence"
print myDNA3

and to work on Python 3

myDNA = "ACGTACGTACGTACGTACGTACGT"
myDNA2 = "TCGATCGATCGATCGATCGA"
print("First and Second sequences")
print(myDNA, myDNA2)
myDNA3 = myDNA + myDNA2
print("Concatenated sequence")
print(myDNA3)

This is would be the biggest (or at least the most common) change that we would need to make in the scripts posted here. Follow the repo to get the newer versions.

Creating an interface for the motif finding script, final

motifs, wxPython Comments Off

We can say that this would be our final version of the script. There are many nice wxPython programming resources, and one is a very good book called wxPython in Action, which is co-written by Robin Dunn, the wxPython maintainer. Go check it out.

So for the last entry in this series, we just need to add a couple of changes to our interface and motif finding scripts. Basically on the interface script we need to add a line that gets the value entered (or the default one, if not changed) in the motif width input box. And we can do that by including the line below in the run_finder function.

width = self.motif_width.GetValue()

This line tells the script to get the value of the box and assign to the variable width. This method will get whatever is inside the input box and save as a string to the variable assigned. Now, we need to create the structure to actually send this value to the motif finder functions. Last version of our function calculate_motifs received two parameters, we need to add an extra one, and also change the lines that call the function that get the quorums. Basically the first lines of the function will be

def calculate_motifs(input_seqs, input_seqs2, width):

    print input_seqs, input_seqs2
    input_seqs = fasta.read_seqs(open(input_seqs).readlines())
    input_seqs2 = fasta.read_seqs(open(input_seqs2).readlines())

    foreground = get_quorums(input_seqs, width)
    background = get_quorums(input_seqs2, width)

And that’s it. Our simple interface is ready to primetime. OK, not prime primetime, we didn’t add a series of features that will make it useful by everyone. For instance, there is no error control, so someone could enter ‘ABC’ in the width input box and that value would be sent and an error will occur. Also you can click the run button without any file selected. And we could go on and on. But this is just a primer, and we can build from it.

The code is on Github, so get it there and have fun. Next time we will see … no plans yet. We’ll see …

Technorati Tags: , , ,

Creating an interface for the motif finding script, some corrections

motifs, wxPython Comments Off

We need to pause a bit and do some corrections on our code. First the code I posted on the last entry for the pymotif.py module is wrong. Ok, not wrong, but some of the code I use to test ended up on the blog. Ths first two lines of the calculate_motifs function contained a link to the files I use for testing and should be replaced by

input_seqs = fasta.read_seqs(open(input_seqs).readlines())
input_seqs2 = fasta.read_seqs(open(input_seqs2).readlines())

Also both variables that store the filenames and paths in pymoteGUI.py are declared in the wrong scope. The should have be declared at the pymotGUI class level, so it is accessible to all the functions in that class. This also means that every time we access the variable it should be preceded by the class name in order for the interpreter to know where the to get the value from. So both corrected files would be

#!/usr/bin/env python

import wx
import pymot
import pymotif
import fasta
import os

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    fore_file = ''
    back_file = ''

    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  'Python Motif Finder', style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()

    def __do_layout(self):

        #adding the panel
        panel = wx.Panel(self)

        #defines the menubar
        menubar = wx.MenuBar()

        #file menu
        filemenu = wx.Menu()
        foreground_menu = filemenu.Append(-1, 'Select foreground file')
        background_menu = filemenu.Append(-1, 'Select background file')
        sep = filemenu.AppendSeparator()
        quitmenu = filemenu.Append(-1, 'Quit')

        #appends the menu to the menubar and creates it
        menubar.Append(filemenu, 'File')
        self.SetMenuBar(menubar)

        #input box for motif width, and label
        self.one_label = wx.StaticText(panel, -1, 'Motif width', (10,50))
        self.motif_width = wx.TextCtrl(panel, -1, '10', (95, 50), (40,18))
        #result textbox
        self.results = wx.TextCtrl(panel, -1, '', (150, 50), (200, 100), wx.TE_MULTILINE | wx.TE_AUTO_SCROLL | wx.HSCROLL)

        #run bbutton
        self.run_button = wx.Button(panel, -1, 'Run', (10, 80))

        #labels
        self.fore_label = wx.StaticText(panel, -1, 'Select the foreground file', (10, 10))
        self.back_label = wx.StaticText(panel, -1, 'Select the background file', (10, 30))

        #binding the menus to functions
        self.Bind(wx.EVT_MENU, self.on_foreground, foreground_menu)
        self.Bind(wx.EVT_MENU, self.on_background, background_menu)
        self.Bind(wx.EVT_BUTTON, self.run_finder, self.run_button)

    def on_foreground(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            pymotGUI.fore_file = dialog.GetPath()
            self.fore_label.SetLabel(pymotGUI.fore_file)

    def on_background(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            pymotGUI.back_file = dialog.GetPath()
            self.back_label.SetLabel(pymotGUI.back_file)

    def run_finder(self, event):
        print pymotGUI.fore_file
        result = pymotif.calculate_motifs(pymotGUI.fore_file, pymotGUI.back_file)
        for motif in result:
            self.results.WriteText(motif + 'n')
        #wx.MessageBox('It should run, eh?')

#if __name__ == '__main__':
app = pymot()
frame = pymotGUI(parent=None, id = -1)
#frame.CentreOnScreen()
frame.Show()
app.MainLoop()

and

#!/usr/bin/env python

import fasta
import sys
from collections import defaultdict

def choose(n, k):
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

def get_quorums(seqs, mlen):
    """
    add seq id_no to a set
    use explicit counter to create seq_no
    """
    quorum = defaultdict(int)
    for seq in seqs:
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]] += 1
    return quorum

def calculate_motifs(input_seqs, input_seqs2):

    print input_seqs, input_seqs2
    input_seqs = fasta.read_seqs(open(input_seqs).readlines())
    input_seqs2 = fasta.read_seqs(open(input_seqs2).readlines())

    foreground = get_quorums(input_seqs, 10)
    background = get_quorums(input_seqs2, 10)

    N = len(input_seqs) + len(input_seqs2)

    res_motifs = []
    for i in foreground:
        term1 = choose(background[i], foreground[i])
        term2 = choose((N - background[i]), len(input_seqs) - 1)
        term3 = choose(N, len(input_seqs))
        p = (float(term1) * float(term2)) / term3
        if 0 < p <= 0.0001:
            res_motifs.append(i + 't' + str(foreground[i]) + 't' + str(background[i]) + 't' + str(p))

    res_motifs.sort()
    return res_motifs

On the next post, the last in the series, we will just check how to get the value from the width input box and wrap-up everything.

Technorati Tags: , , ,

Creating an interface for the motif finding script, part 8

motifs, wxPython Comments Off

Let’s see now how do we connect our GUI to the the pymotif file (I changed the name because of some conflicts with the app name [my bad!], the git repo was updated accordingly). And also how to display the results, in a simpler manner.

Ok, first to connecting the script to the function file, pymotif.py. The file is already imported in our script and we have used it before. We need to find the exact point and which parameters to pass. pytmotif.py is a slightly modified version of your command line script, and the code is below.

#!/usr/bin/env python

import fasta
import sys
from collections import defaultdict

def choose(n, k):
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

def get_quorums(seqs, mlen):
    """
    add seq id_no to a set
    use explicit counter to create seq_no
    """
    quorum = defaultdict(int)
    for seq in seqs:
        for n in range(len(seq) - mlen):
            quorum[seq[n:n + mlen]] += 1
    return quorum

def calculate_motifs(input_seqs, input_seqs2):

    input_seqs = fasta.read_seqs(open('celladhesion1000.fa').readlines())
    input_seqs2 = fasta.read_seqs(open('celladhesion1000C.fa').readlines())

    foreground = get_quorums(input_seqs, 10)
    background = get_quorums(input_seqs2, 10)

    N = len(input_seqs) + len(input_seqs2)

    res_motifs = []
    for i in foreground:
        term1 = choose(background[i], foreground[i])
        term2 = choose((N - background[i]), len(input_seqs) - 1)
        term3 = choose(N, len(input_seqs))
        p = (float(term1) * float(term2)) / term3
        if 0 < p <= 0.0001:
            res_motifs.append(i + 't' + str(foreground[i]) + 't' + str(background[i]) + 't' + str(p))

    res_motifs.sort()
    return res_motifs

So, basically the line we are interested is this one

def calculate_motifs(input_seqs, input_seqs2):

We replace the wx.MessageBox line in our run_finder function and use the input files selected by the user as parameters for calculate_motifs, and we are done

def run_finder(self, event):
	result = pymotif.calculate_motifs(self.fore_file, self.back_file)

Very simple and direct. This should take care of everything except the motif width, what we will see in the next post. We still need a place to write the overrepresented motifs. We can add a text box to the frame, and we do that by adding an extra declaration in our __do_layout function. This time we need to add some extra style to the box, so it can show multiple lines and has a scroll bar.

self.results = wx.TextCtrl(panel, -1, '', (150, 50), (200, 100), wx.TE_MULTILINE | wx.TE_AUTO_SCROLL | wx.HSCROLL)

Notice the wx. flags added. MULTILINE allows the box to have multiple lines and the other two turn on the auto scroll and horizontal scroll. Great. And how do we write the results. Notice above that the function that calculates the motifs, returns a list where each item has the motif sequence and the p value, sorted. So the only thing we need to do is to iterate over the list and print each line to the result box. That simple, and we accomplish it by using the WriteText method, that receives as a parameter a string, either literal or a string object. Our run_finder function will have a couple of extra lines

def run_finder(self, event):
	result = pymotif.calculate_motifs(self.fore_file, self.back_file)
	for motif in result:
		self.results.WriteText(motif + 'n')

That will present in a very simplistic way the resulting overrepresented motifs, but it’s enough for now. Our GUI script will be

#!/usr/bin/env python

import wx
import pymot
import pymotif
import fasta
import os

class pymot(wx.App):

    def __init__(self, redirect=False):
        wx.App.__init__(self, redirect)

class pymotGUI(wx.Frame):

    def __init__(self, parent, id):
        wx.Frame.__init__(self, parent, id,  'Python Motif Finder', style=wx.DEFAULT_FRAME_STYLE)
        self.__do_layout()
        self.fore_file = ''
        self.back_file = ''

    def __do_layout(self):

        #adding the panel
        panel = wx.Panel(self)

        #defines the menubar
        menubar = wx.MenuBar()

        #file menu
        filemenu = wx.Menu()
        foreground_menu = filemenu.Append(-1, 'Select foreground file')
        background_menu = filemenu.Append(-1, 'Select background file')
        sep = filemenu.AppendSeparator()
        quitmenu = filemenu.Append(-1, 'Quit')

        #appends the menu to the menubar and creates it
        menubar.Append(filemenu, 'File')
        self.SetMenuBar(menubar)

        #input box for motif width, and label
        self.one_label = wx.StaticText(panel, -1, 'Motif width', (10,50))
        self.motif_width = wx.TextCtrl(panel, -1, '10', (95, 50), (40,18))
        #result textbox
        self.results = wx.TextCtrl(panel, -1, '', (150, 50), (200, 100), wx.TE_MULTILINE | wx.TE_AUTO_SCROLL | wx.HSCROLL)

        #run bbutton
        self.run_button = wx.Button(panel, -1, 'Run', (10, 80))

        #labels
        self.fore_label = wx.StaticText(panel, -1, 'Select the foreground file', (10, 10))
        self.back_label = wx.StaticText(panel, -1, 'Select the background file', (10, 30))

        #binding the menus to functions
        self.Bind(wx.EVT_MENU, self.on_foreground, foreground_menu)
        self.Bind(wx.EVT_MENU, self.on_background, background_menu)
        self.Bind(wx.EVT_BUTTON, self.run_finder, self.run_button)

    def on_foreground(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            fore_file = dialog.GetPath()
            self.fore_label.SetLabel(fore_file)

    def on_background(self, event):
        dialog = wx.FileDialog(self, style=wx.OPEN)
        if dialog.ShowModal() == wx.ID_OK:
            back_file = dialog.GetPath()
            self.back_label.SetLabel(back_file)

    def run_finder(self, event):
        result = pymotif.calculate_motifs(self.fore_file, self.back_file)
        for motif in result:
            self.results.WriteText(motif + 'n')
        #wx.MessageBox('It should run, eh?')

#if __name__ == '__main__':
app = pymot()
frame = pymotGUI(parent=None, id = -1)
#frame.CentreOnScreen()
frame.Show()
app.MainLoop()

Creating an interface for the motif finding script, part 7

motifs, wxPython Comments Off

Let’s get back to the last post and check one line we entered

self.motif_width = wx.TextCtrl(panel, -1, '10', (95, 50), (40,18))

There is something in this line that I did not explain. The third parameter in the test box declaration is '10'. How does this affect our box? That’s the default text that will be displayed inside the box as soon as it is created. In our case, 10 is the motif width, and it’s the value we consider to be the most common search width.

Another aspect not explained is the run_finder. We added a line

wx.MessageBox('It should run, eh?')

where we declare a wx.MessageBox. What is it? A message box is the usual error/information dialog that you see in most programs. In our case it is very simple, just a warning/reminder that we need to include some code there.

Next time we will connect some Python source files and make our script find some motifs.

Design by j david macor.com.Original WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in