Finding motifs in DNA sequences: twist
Section 2 February 20th, 2007As promised, let’s change a bit our previous code, and make it more effective. If you are reading this tutorial in one-entry mode, let’s check the code
#!/usr/bin/env python
import re
dnafile = "AY162388.seq"
seqlist = open(dnafile, 'r').readlines()
temp = ''.join(seqlist)
sequence = temp.replace('\n', '')
inputfromuser = True
while inputfromuser:
inmotif = raw_input('Enter motif to search: ')
if len(inmotif) >= 1:
motif = re.compile('%s' % inmotif)
if re.search(motif, sequence):
print 'Yep, I found it'
else:
print 'Sorry, try another one'
else:
print 'Done, thanks for using motif_search'
inputfromuser = False
In order to make it more effective, let’s allow the input of any file, maybe asking for the file as soon as the script is started. To accomplish that we need to “protect” the script and make it error proof (almost at least). Yes, you thought it right: we need to check if the input file exists before opening it.
We can use try ... except statements to do that. Pretty handy. This is called an exception handler, so basically we try the validity of some command/method and depending on the result we continue our program flow or we catch the exception and do something else. Under try we put the expression to evaluate, and under excepts what to do in case of failure. Like this
try:
opening a file
except:
as the file does not exist, print something
So, let’s put in our code above. We remove this
dnafile = "AY162388.seq" seqlist = open(dnafile, 'r').readlines()
and include this lines
fileinput = True
while fileinput == True:
filename = raw_input('Enter file name:')
if len(filename) > 0:
try:
dnafile = open(filename, 'r')
fileinput = False
except:
print 'File does not exist, please try again'
else:
sys.exit()
I know, a lot of new code. But if you take a closer look, there is only three lines we have never seen: try except and the last line with sys.exit(). Here, the script tries to open the file provided as input, if it does find the file normal operation resumes, if does not, the script asks for another input. As in the other while loop, we control it with a boolean variable, and in the case of empty input we end the loop and the script, using a system command exit, in the last line of the new code.
Our script is better now, not bulletproof, but a little bit more efficient. There still a “flaw”, that you can only check one file for each run of the script. At least we not stuck to our usual DNA sequence.
Download the commented code on the repository.
October 13th, 2007 at 12:28 pm
Dear Paolo,
I think I have found a minor error in one of the sample programs in the repository.
In code_06.py, there is the line:
seqlist = open(dnafile, ‘r’).readlines()
This produces the error meassage:
TypeError: coercing Unicode: need string or buffer, file found
I don’t know exactly what that means, but I think the problem is that the variable dnafile contains, not the name of a file but the contents of a file, assigned in a previous line of the program:
dnafile = open(filename, ‘r’)
So, when I replace the error-generating line with:
seqlist = dnafile.readlines()
I don’t get the error message, and the program works just fine.
I don’t think the sample program is referred to in the tutorial. But it slowed me down for a while, so maybe it’s worth correcting. (If indeed I’m right in my analysis).
I’d like to thank you again for the tutorial. It’s very clear, and progresses smoothly from the simple to the complex.
Joe
October 16th, 2007 at 3:55 pm
Hi Joe
Thanks for spotting the error. I already corrected it. Ideally, you should change
seqlist = open(dnafile, ‘r’).readlines()
to
seqlist = open(filename, ‘r’).readlines()
Sorry for the delay in replying. I was away for a couple of days.
Paulo
October 19th, 2007 at 4:06 pm
Dear Paolo,
Probably just a novice question, but in the program code_05.py, in the line:
motif = re.compile(‘%s’, % inmotif)
why is the (‘%s’, %) necessary? Is it to convert inmotif into a string before compilation?.
The program seems to work with just:
motif = re.compile(inmotif)
and I had the impression that raw_input() always returned a string. For example, you have to convert the count in
count = raw_input(“enter an integer”)
count = int(count)
otherwise you get an error message when you then write:
count += 2
The only thing I can think of is that, if you reuse the code and inmotif doesn’t come from raw_input and isn’t a string, the ‘%s’, % would prevent an error.
Sorry, this has turned into a rather wordy question on a rather minor point.
Thanks,
Joe
October 19th, 2007 at 4:28 pm
Hi Joe
You are right, the %s are not necessary as you already have a string. Basically they do not work to convert inmotif to a string but basically put the variable in the %s argument. IT is a not a pythonic way of doing things but it is at first more didactic for those who come from other languages.
I believe in further entries I used the re.compile() without %s.
Cheers
Paulo
October 23rd, 2007 at 12:14 pm
Hi Paulo:
In the above second script, before sys.exit(), there is a new value ‘False’ reassigned to ‘fileinput’.
Is it really necessary for that? I think ’sys.exit()’, by itself, can end ‘while’ loop and script.
Thanks!
Dilmurat
October 23rd, 2007 at 1:19 pm
Hi Dilmurat,
You are right there is no need for that extra assignment. sys.exit() takes care of exiting. I changed the code already.
Thanks a lot.
Paulo