We already have a function that reads the enzymes from a dataset in a flat file (with one change: return)

def read_enzymes(file):
    resenz = {}
    start = False
    for line in file:
        if line.find('Rich Roberts') >= 0:
            start = True
            line = file.next()
        if start == True and len(line) > 10:</pre>
            buffer = line.split()
            resenz[buffer[0]] = buffer[-1].replace('^', '')

    return resenz

We now need a function to write a function that searches for the sites and a main function that accepts the parameters, coordinate the search and return the results. Looks like we are more than halfway there.

Parameters input was shown before, starting on section 3. We import the sys module and use the array inside sys.argv to send the parameters to the script. A basic skeleton of our main function would look like this

import sys
import re
import fasta

#reading the ezyme dataset in one line and storing
#enzyme information in enzymeset
enzymeset = read_enzymes(open('bionet.709', 'r'))

#storing enzyme name on a string
enzyme = sys.argv[1]
#reading a FASTA file and sotring the sequences
sequence = fasta.get_seqs(open(sys.argv[2], 'r').readlines())

That’s a start. Now we have to write a function that will check for the enzyme name entered by the user in order to check for the existence of such enzyme. Something like this would work

def check_enzyme(input, set):
    if set.has_key(input):
        return True
    else:
        return False

This basically tests of the dictionary contains the name entered. If yes then we return True, otherwise False is returned. This changes our main script body

import sys
import re
import fasta

#reading the ezyme dataset in one line and storing
#enzyme information in enzymeset
enzymeset = read_enzymes(open('bionet.709', 'r'))

#storing enzyme name on a string
enzyme = sys.argv[1]
#check if the name entered is valid
isname = check_enzyme(enzyme, enzymeset)

#if it is valid, continue, otherwise abort
if isname:
    #reading a FASTA file and sotring the sequences
    sequence = fasta.get_seqs(open(sys.argv[2], 'r').readlines())</pre>
else:
    print 'Name invalid. Please try again.'

So, we have a good idea on what to do now. We just need a function that creates a regular expression and searches it on the sequence. Next time …