Obtaining overrepresented motifs in DNA sequences, part 7
Phase 2, motifs June 3rd, 2008Continuing on Mike’s functions to obtain motif quorums. We see function 3, 4 and 5. Function get_quorums_03, uses an old friend of the blog, sets. Recall that sets are very similar to lists, but their are unordered and items are unique.
def get_quorums_03(seqs, mlen):
"""
add seq id_no to a set
use explicit counter to create seq_no
"""
quorum = defaultdict(set)
id_no = 0
for seq in seqs:
id_no += 1
for n in range(len(seq)-mlen):
quorum[seq[n:n+mlen]].add(id_no)
return quorum
Basically, the sequence numbers (an incremented counter) are added to a defaultdict which was initialized as a set. This way you don’t need to check for the existence of the sequence number in the defaultdict list and count on the ability of set of being unique. Function 4 is very similar to function 3 with the difference of using enumerate (as in function 02) to make the sequence numbers.
def get_quorums_04(seqs, mlen):
"""
add seq id_no to a set
use 'enumerate' to create seq_no
"""
quorum = defaultdict(set)
for id_no, seq in enumerate(seqs):
for n in range(len(seq)-mlen):
quorum[seq[n:n+mlen]].add(id_no)
return quorum
Function 5 adds a twist, which is to have an enumerate to set the sequence range (motif/word width) start and stop. This way the window is sliding based on the tuple created by the enumerate method and not on the slicing that were used in all other functions. Again, a defaultdict is initialized as set and the sequence numbers are generated by an enumerate.
def get_quorums_05(seqs, mlen):
"""
add seq id_no to a set
use 'enumerate' to create seq_no
use enumerate(range(...)) to create start/stop indices for motif
"""
quorum = defaultdict(set)
for id_no, seq in enumerate(seqs):
for s, e in enumerate(range(mlen, len(seq))):
quorum[seq[s:e]].add(id_no)
return quorum
August 20th, 2008 at 9:32 pm
[...] sent us a series of quorum-determination functions and one of the best was portrayed and explained here. We also need our fasta module to read the sequences (and only the sequences) in order to use it in [...]