Fasta module: generating reverse complement of DNA sequences
Phase 2 March 11th, 2008As shown in the GenBank DNA parser script, it is really useful to have the ability to get the reverse complement of some DNA sequences. The reverse complement of a 5′-3′ DNA sequence is on its complementary strand. Using our fasta module it is easy to implement a function to generate the antiparallel sequence
Basically we can do this in two step, one - obtaining the complement and two - reversing it.
def complement(seq):
abuffer = []
for i in seq:
if i == 'A':
abuffer.append('T')
elif i == 'C':
abuffer.append('G')
elif i == 'T':
abuffer.append('A')
elif i == 'G':
abuffer.append('C')
return abuffer
That’s not a Pythonic approach, but it works reasonably well for short sequences. What would be a Pythonic approach to it? Using dictionaries and list comprehension. From the Python online documentation:
List comprehensions provide a concise way to create lists without resorting to use of map(), filter() and/or lambda. The resulting list definition tends often to be clearer than lists built using those constructs. Each list comprehension consists of an expression followed by a for clause, then zero or more for or if clauses. The result will be a list resulting from evaluating the expression in the context of the for and if clauses which follow it.
It is basically a specific way to modify lists using a loop and if clauses when needed. In a couple of lines we can do what we accomplished in 10 lines with the above excerpt. First we need a dictionary to set how the nucleotides are related
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
and then a list comprehension to modify each nucleotide
complseq = [complement[base] for base in seq]
The list comprehension menas: for each base in the sequence get the dictionary value for each key (a nucleotide in the initial sequence). The complete function would be
def complement(seq):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
complseq = [complement[base] for base in seq]
return complseq
One step solved. We are already able to get the complement, now we need to reverse it. Simple. A Python list method automatically reverts list elements. Basically what we need then is below
def reverse_complement(seq):
seq = list(seq)
seq.reverse()
return ''.join(complement(seq))
The call from any script to this function would be simply be
fasta.reverse_complement('AACCGGTTTTGGCCAA')
Batteries included, indeed.
more can be found here. that served as an inpiration for the funtion
I am updating the fasta.py file in the repository soon
March 24th, 2008 at 4:50 am
hmm got a 403 error while downloading ur fasta.py in the repos
March 24th, 2008 at 7:08 am
Hi
It should be OK now. Thanks for pointing out.
April 21st, 2008 at 4:01 pm
This is overly complex, a better solution via Andrew Dalke
http://www.dalkescientific.com/writings/NBN/python_intro/functions.html
rcomp=sequence.translate(string.maketrans(”ATCG”, “TAGC”))[::-1]
April 21st, 2008 at 8:01 pm
Hi Bruce
Yes, I was aware of that solution, thanks for pointing out. I haven’t introduced some of the string methods, so I tried to keep it simple for the time being. Less pythonic I admit, but still simple to understand. Similar to what is on
http://www.onlamp.com/pub/a/python/2002/10/17/biopython.html?page=4
Paulo
June 3rd, 2008 at 1:59 pm
[...] instead of a “pure” generator it uses generator expressions which a very similar to list comprehensions but with parentheses instead of square brackets. Generator expressions are generators that can be [...]