<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Obtaining overrepresented motifs in DNA sequences, part 10</title>
	<atom:link href="http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/feed/" rel="self" type="application/rss+xml" />
	<link>http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/</link>
	<description>a step-by-step guide to create Python applications in bioinformatics</description>
	<lastBuildDate>Mon, 22 Feb 2010 18:22:18 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=3.0-alpha</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: A quick assessment of factorial functions in Python</title>
		<link>http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/comment-page-1/#comment-14146</link>
		<dc:creator>A quick assessment of factorial functions in Python</dc:creator>
		<pubDate>Fri, 06 Jun 2008 16:52:41 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/?p=112#comment-14146</guid>
		<description>[...] factorials in Python (the same &#8220;problem&#8221; can be found in some other languages too). Cariaso suggested to time the execution of different factorial functions, including the ones found on [...]</description>
		<content:encoded><![CDATA[<p>[...] factorials in Python (the same &#8220;problem&#8221; can be found in some other languages too). Cariaso suggested to time the execution of different factorial functions, including the ones found on [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bearophile</title>
		<link>http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/comment-page-1/#comment-14087</link>
		<dc:creator>bearophile</dc:creator>
		<pubDate>Thu, 05 Jun 2008 09:52:57 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/?p=112#comment-14087</guid>
		<description>This is probably the faster Python version (for small factorials):

def small_factorial(n):
____result = 1
____for i in xrange(2, n+1):
________result *= i
____return result

import psyco; psyco.bind(small_factorial)</description>
		<content:encoded><![CDATA[<p>This is probably the faster Python version (for small factorials):</p>
<p>def small_factorial(n):<br />
____result = 1<br />
____for i in xrange(2, n+1):<br />
________result *= i<br />
____return result</p>
<p>import psyco; psyco.bind(small_factorial)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paulo Nuin</title>
		<link>http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/comment-page-1/#comment-14064</link>
		<dc:creator>Paulo Nuin</dc:creator>
		<pubDate>Wed, 04 Jun 2008 23:52:54 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/?p=112#comment-14064</guid>
		<description>Thanks Andrew. I saw gmpy while searching for factorial implementations in Python. I will probably give it a try after finishing the module. 

I could have SciPy, but then it would add an external module to be installed.

Cheers</description>
		<content:encoded><![CDATA[<p>Thanks Andrew. I saw gmpy while searching for factorial implementations in Python. I will probably give it a try after finishing the module. </p>
<p>I could have SciPy, but then it would add an external module to be installed.</p>
<p>Cheers</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/comment-page-1/#comment-14063</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Wed, 04 Jun 2008 23:30:56 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/?p=112#comment-14063</guid>
		<description>To add to Mike&#039;s comments, since you are actually computing a binomial coefficient, you can save yourself a lot of work by not computing the entire factorial.  For example code,

http://groups.google.com/group/comp.lang.python/msg/6e7c3358b086ff9c?dmode=source

http://www.brpreiss.com/books/opus7/programs/pgm14_10.txt

I&#039;ll also emphasize something Mike mentioned in passing.  It&#039;s better if you call this function &quot;factorial&quot;.  That&#039;s easier for anyone to understand what you mean, or Google if they don&#039;t.

If you are working with really big numbers then it&#039;s best to work in log space, as for example: http://aspn.activestate.com/ASPN/Mail/Message/python-list/2954844 .  Note also that gmpy is yet another possible solution for you.  Assuming this is indeed your bottleneck.</description>
		<content:encoded><![CDATA[<p>To add to Mike&#8217;s comments, since you are actually computing a binomial coefficient, you can save yourself a lot of work by not computing the entire factorial.  For example code,</p>
<p><a href="http://groups.google.com/group/comp.lang.python/msg/6e7c3358b086ff9c?dmode=source" rel="nofollow">http://groups.google.com/group/comp.lang.python/msg/6e7c3358b086ff9c?dmode=source</a></p>
<p><a href="http://www.brpreiss.com/books/opus7/programs/pgm14_10.txt" rel="nofollow">http://www.brpreiss.com/books/opus7/programs/pgm14_10.txt</a></p>
<p>I&#8217;ll also emphasize something Mike mentioned in passing.  It&#8217;s better if you call this function &#8220;factorial&#8221;.  That&#8217;s easier for anyone to understand what you mean, or Google if they don&#8217;t.</p>
<p>If you are working with really big numbers then it&#8217;s best to work in log space, as for example: <a href="http://aspn.activestate.com/ASPN/Mail/Message/python-list/2954844" rel="nofollow">http://aspn.activestate.com/ASPN/Mail/Message/python-list/2954844</a> .  Note also that gmpy is yet another possible solution for you.  Assuming this is indeed your bottleneck.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paulo Nuin</title>
		<link>http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/comment-page-1/#comment-14061</link>
		<dc:creator>Paulo Nuin</dc:creator>
		<pubDate>Wed, 04 Jun 2008 23:06:36 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/?p=112#comment-14061</guid>
		<description>Hi Cariaso

http://blindscientist.genedrift.org/2008/04/21/xkcd-got-it-wrong/

Thanks for the comment. I agree with you the premature optimization is sometimes bad, I face challenges like this mostly everyday. 

The discussion you pointed out should have been linked, I thought of adding that (due to the fact that it was there that I was able to get a good grip on the implementations) but forgot in the last minute. 

I will add some more scientific testing, I was writing something with Python&#039;s time to check. 

Speaking from personal experience usually the bottleneck of such applications is the factorial calculation. I implemented a similar algorithm in C++ using MAPM (http://www.tc.umn.edu/~ringx004/mapm-main.html) and in the end opted to a &quot;dumb&quot; memoization by pre-calculating or loading all possible factorial values for my sample size.

I will post about the stats module, then I will have something on the factorial test.

Cheers</description>
		<content:encoded><![CDATA[<p>Hi Cariaso</p>
<p><a href="http://blindscientist.genedrift.org/2008/04/21/xkcd-got-it-wrong/" rel="nofollow">http://blindscientist.genedrift.org/2008/04/21/xkcd-got-it-wrong/</a></p>
<p>Thanks for the comment. I agree with you the premature optimization is sometimes bad, I face challenges like this mostly everyday. </p>
<p>The discussion you pointed out should have been linked, I thought of adding that (due to the fact that it was there that I was able to get a good grip on the implementations) but forgot in the last minute. </p>
<p>I will add some more scientific testing, I was writing something with Python&#8217;s time to check. </p>
<p>Speaking from personal experience usually the bottleneck of such applications is the factorial calculation. I implemented a similar algorithm in C++ using MAPM (<a href="http://www.tc.umn.edu/~ringx004/mapm-main.html" rel="nofollow">http://www.tc.umn.edu/~ringx004/mapm-main.html</a>) and in the end opted to a &#8220;dumb&#8221; memoization by pre-calculating or loading all possible factorial values for my sample size.</p>
<p>I will post about the stats module, then I will have something on the factorial test.</p>
<p>Cheers</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cariaso</title>
		<link>http://python.genedrift.org/2008/06/04/obtaining-overrepresented-motifs-in-dna-sequences-part-10/comment-page-1/#comment-14054</link>
		<dc:creator>cariaso</dc:creator>
		<pubDate>Wed, 04 Jun 2008 20:09:50 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/?p=112#comment-14054</guid>
		<description>http://xkcd.com/386/
Enjoying the series, but I disagree strongly with the entire last paragraph. 

As long as you can call the faster version with the same interface as the simpler version there is no reason to go for a more complicated version until you&#039;ve done some actual testing. Premature optimization is the root of all evil. 

This is still called &#039;beginning python for bioinformatics&#039;. I think we&#039;re better off with a reminder to google &#039;python factorial&#039; and to read a full discussion on 13 different implementations.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/67668

as long as you hide it behind 

def factorial(n):

I don&#039;t care which implementation you pick until I see 
1. that the program isn&#039;t fast enough
2. that the factorial function is a bottleneck

If you can do that, I&#039;d really like to see your testing. 
http://blog.doughellmann.com/2007/09/pymotw-timeit.html
since that seems very useful for this audience

at which point I&#039;ll probably suggest that 
http://en.wikipedia.org/wiki/Memoization

is often a better solution to the larger problem.</description>
		<content:encoded><![CDATA[<p><a href="http://xkcd.com/386/" rel="nofollow">http://xkcd.com/386/</a><br />
Enjoying the series, but I disagree strongly with the entire last paragraph. </p>
<p>As long as you can call the faster version with the same interface as the simpler version there is no reason to go for a more complicated version until you&#8217;ve done some actual testing. Premature optimization is the root of all evil. </p>
<p>This is still called &#8216;beginning python for bioinformatics&#8217;. I think we&#8217;re better off with a reminder to google &#8216;python factorial&#8217; and to read a full discussion on 13 different implementations.<br />
<a href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/67668" rel="nofollow">http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/67668</a></p>
<p>as long as you hide it behind </p>
<p>def factorial(n):</p>
<p>I don&#8217;t care which implementation you pick until I see<br />
1. that the program isn&#8217;t fast enough<br />
2. that the factorial function is a bottleneck</p>
<p>If you can do that, I&#8217;d really like to see your testing.<br />
<a href="http://blog.doughellmann.com/2007/09/pymotw-timeit.html" rel="nofollow">http://blog.doughellmann.com/2007/09/pymotw-timeit.html</a><br />
since that seems very useful for this audience</p>
<p>at which point I&#8217;ll probably suggest that<br />
<a href="http://en.wikipedia.org/wiki/Memoization" rel="nofollow">http://en.wikipedia.org/wiki/Memoization</a></p>
<p>is often a better solution to the larger problem.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

