<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Revisiting Pfam alignments: using defaultdicts, chains &#8230;</title>
	<atom:link href="http://python.genedrift.org/2008/05/02/revisiting-pfam-alignments/feed/" rel="self" type="application/rss+xml" />
	<link>http://python.genedrift.org/2008/05/02/revisiting-pfam-alignments/</link>
	<description>a step-by-step guide to create Python applications in bioinformatics</description>
	<lastBuildDate>Sun, 02 May 2010 04:24:01 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1-alpha</generator>
	<item>
		<title>By: nes</title>
		<link>http://python.genedrift.org/2008/05/02/revisiting-pfam-alignments/comment-page-1/#comment-12451</link>
		<dc:creator>nes</dc:creator>
		<pubDate>Tue, 06 May 2008 15:07:46 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/?p=97#comment-12451</guid>
		<description>Just a personal opinion of style (it does not change the algorithms in any way): I prefer defaultdict over setdefault and def over named lambdas, they seem easier to read to me.
I prefer
groups=defaultdict(list)
groups[key(item)].append(item)

over
groups={}
groups.setdefault(key(item), []).append(item)


and
def keyfunc(it): return it.name[it.name.find(’&#124;&#039;) + 1 : it.name.find(’/&#039;)]

over
keyfunc = lambda it: it.name[it.name.find(’&#124;&#039;) + 1 : it.name.find(’/&#039;)]</description>
		<content:encoded><![CDATA[<p>Just a personal opinion of style (it does not change the algorithms in any way): I prefer defaultdict over setdefault and def over named lambdas, they seem easier to read to me.<br />
I prefer<br />
groups=defaultdict(list)<br />
groups[key(item)].append(item)</p>
<p>over<br />
groups={}<br />
groups.setdefault(key(item), []).append(item)</p>
<p>and<br />
def keyfunc(it): return it.name[it.name.find(’|') + 1 : it.name.find(’/')]</p>
<p>over<br />
keyfunc = lambda it: it.name[it.name.find(’|') + 1 : it.name.find(’/')]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paulo Nuin</title>
		<link>http://python.genedrift.org/2008/05/02/revisiting-pfam-alignments/comment-page-1/#comment-12263</link>
		<dc:creator>Paulo Nuin</dc:creator>
		<pubDate>Sat, 03 May 2008 17:36:33 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/?p=97#comment-12263</guid>
		<description>Hi Tal

I will post your comment in a regular post in order to format it. Thanks a lot for your comment.

Cheers</description>
		<content:encoded><![CDATA[<p>Hi Tal</p>
<p>I will post your comment in a regular post in order to format it. Thanks a lot for your comment.</p>
<p>Cheers</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tal</title>
		<link>http://python.genedrift.org/2008/05/02/revisiting-pfam-alignments/comment-page-1/#comment-12253</link>
		<dc:creator>Tal</dc:creator>
		<pubDate>Sat, 03 May 2008 13:48:37 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/?p=97#comment-12253</guid>
		<description>Two suggestions for simplifying the code:

defaultdict is not really required here, the same could be done by initializing data to a normal empty dict (i.e. data={}) and using: data.setdefault(ident, []).append(...)

Generally, while this is nice for practicing your programming skills, itertools.groupby can do most of this work for you. Here&#039;s how (not tested) - notice that this version is shorter and clearer:

def merge_seqs(data1, data2):
    from itertools import chain, groupby

    format = &quot;%s-%s-&gt;%d\n%s%s&quot;
    flist = []
    keyfunc = lambda it: it.name[it.name.find(&#039;&#124;&#039;) + 1 : it.name.find(&#039;/&#039;)]
    for item, g in groupby(sorted(chain(data1, data2), key=keyfunc), keyfunc):
        values = list(g)
        if len(values) == 2:
            jname, jseq = values[0].name, values[0].sequence
            kname, kseq = values[1].name, values[1].sequence
            flist.append(format % (jname, kname, len(jseq), jseq, kseq) )

    return flist


The only &quot;ugliness&quot; here is that groupby requires that the items in the iterator it works on be sorted in advance by the same key function. For this reason I usually just use a dict with setdefault, as I mentioned above, but in this case using groupby tidies up the code considerably.

IMO the best solution here is to use this simple groupby replacement:

def groupby_unsorted(iterator, key=lambda x:x):
    &quot;&quot;&quot;Like itertools.groupby, but doesn&#039;t require that the given iterator be
    sorted in advance, and doesn&#039;t work with infinite iterators.

    &quot;&quot;&quot;
    groups={}
    for item in iterator:
        groups.setdefault(key(item), []).append(item)
    return groups.iteritems()


This way you can use the the second, more concise, version of the function above, without needed to first sort the data, by replacing the ugly:
groupby(sorted(chain(data1, data2), key=keyfunc), keyfunc)

with:
groupby_unsorted(chain(data1, data2), keyfunc)


Finally, I just want to say that this was a nice post, and I hope to see more like it :)</description>
		<content:encoded><![CDATA[<p>Two suggestions for simplifying the code:</p>
<p>defaultdict is not really required here, the same could be done by initializing data to a normal empty dict (i.e. data={}) and using: data.setdefault(ident, []).append(&#8230;)</p>
<p>Generally, while this is nice for practicing your programming skills, itertools.groupby can do most of this work for you. Here&#8217;s how (not tested) &#8211; notice that this version is shorter and clearer:</p>
<p>def merge_seqs(data1, data2):<br />
    from itertools import chain, groupby</p>
<p>    format = &#8220;%s-%s-&gt;%d\n%s%s&#8221;<br />
    flist = []<br />
    keyfunc = lambda it: it.name[it.name.find('|') + 1 : it.name.find('/')]<br />
    for item, g in groupby(sorted(chain(data1, data2), key=keyfunc), keyfunc):<br />
        values = list(g)<br />
        if len(values) == 2:<br />
            jname, jseq = values[0].name, values[0].sequence<br />
            kname, kseq = values[1].name, values[1].sequence<br />
            flist.append(format % (jname, kname, len(jseq), jseq, kseq) )</p>
<p>    return flist</p>
<p>The only &#8220;ugliness&#8221; here is that groupby requires that the items in the iterator it works on be sorted in advance by the same key function. For this reason I usually just use a dict with setdefault, as I mentioned above, but in this case using groupby tidies up the code considerably.</p>
<p>IMO the best solution here is to use this simple groupby replacement:</p>
<p>def groupby_unsorted(iterator, key=lambda x:x):<br />
    &#8220;&#8221;"Like itertools.groupby, but doesn&#8217;t require that the given iterator be<br />
    sorted in advance, and doesn&#8217;t work with infinite iterators.</p>
<p>    &#8220;&#8221;"<br />
    groups={}<br />
    for item in iterator:<br />
        groups.setdefault(key(item), []).append(item)<br />
    return groups.iteritems()</p>
<p>This way you can use the the second, more concise, version of the function above, without needed to first sort the data, by replacing the ugly:<br />
groupby(sorted(chain(data1, data2), key=keyfunc), keyfunc)</p>
<p>with:<br />
groupby_unsorted(chain(data1, data2), keyfunc)</p>
<p>Finally, I just want to say that this was a nice post, and I hope to see more like it <img src='http://python.genedrift.org/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
</channel>
</rss>

