A quick assessment of factorial functions in Python
Phase 2 June 6th, 2008A short pause on the motifs subject. As mentioned on comments there is a lot of different ways of calculating factorials in Python (the same “problem” can be found in some other languages too). Cariaso suggested to time the execution of different factorial functions, including the ones found on Python’s cookbook (which I should have included in the beginning of last post). Anyway all functions from the webpage were included, as the one mentioned on a comment and both functions seen here. Using timeit (thanks Mike!) the execution time of all of them were measured by calculating the factorial of 800 and 4000. First, the functions:
def fac_01(n):
result = 1
for i in xrange(2, n+1):
result *= i
return result
def fac_02(n):
value = reduce(lambda i, j : i * j, range(1, n + 1))
return value
def fac_03(n):
import operator
value = reduce(operator.mul, xrange(2, n + 1))
return value
def fac_04(n):
fac = lambda n:n-1 + abs(n-1) and fac(n-1)*long(n) or 1
return fac(n)
def fac_05(n):
fac = lambda n:[1,0][n>0] or fac(n-1)*n
return fac(n)
def fac_06(n):
fac = lambda n:reduce(lambda a,b:a*(b+1),range(n),1)
return fac(n)
def fac_07(n):
fac=lambda n: [1, 0][n > 0] or reduce(lambda x, y: x*y, xrange(1,n + 1))
return fac(n)
def fac_08(n):
fac = lambda n: n <= 0 or reduce(lambda a, b: a*b, xrange(1,n + 1))
return fac(n)
def fac_09(n):
fac = lambda n: [[[j for j in (j * i,)][0] for i in range(2, n+1)][-1] for j in (1,)][0]
return fac(n)
def fac_10(n):
fac = lambda n: [j for j in [1] for i in range(2, n+1) for j in [j * i]] [-1]
return fac(n)
fac_01 was suggested by bearophile on a comment (no psyco import here), fac_02 was the first one we have seen in the blog, fac_03 was the one “selected as the fastest” (mentioned on comments too) and functions 4 to 10 were gathered from the cookbook, all of them using lambda. Now, to the results (no fancy graphs, yet), average over 1000 repeats, time in seconds:
800!
fac_01: 0.0010
fac_02: 0.0012
fac_03: 0.0010
fac_04: 0.0022
fac_05: 0.0020
fac_06: 0.0015
fac_07: 0.0013
fac_08: 0.0013
fac_09: 0.0017
fac_10: 0.0015
4000!
fac_01: 0.0226
fac_02: 0.0242
fac_03: 0.0230
fac_04: N/A
fac_05: N/A
fac_06: 0.0244
fac_07: 0.0239
fac_08: 0.0241
fac_09: 0.0380
fac_10: 0.0362
In both cases the “best” functions were 1 and 3. For the smallest factorial (800!) 1 and 3 didn’t have a lot of advantage for 2, 6-10, while 4 and 5 were 2 times slower. For the largest factorial (4000!) 1 was 4% better than 3, and the third best (6) was also 4% slower than 3. Why do 4 and 5 have no computed time? Because they are recursive and it blows the recursion limit in Python.
So I guess it is a safe bet to use either function 1 or 3, with a slightly advantage for 3 in the case of large factorials. But at the same time a believe for our case of overrepresented motifs, it would be better to pre-calculate the factorials, store them and access the value when needed. Andrew’s idea of using binomials is also interesting, and I am planning to test it here. He also gave a couple of other suggestions that would need extra imports, scipy and gmpy.

June 6th, 2008 at 12:47 pm
Extra modules for the combinatorial functions? I gave two links to Python functions for choose(n, k). The one from Jussi Piitulainen was
def choose(n, k):
if 0 <= k <= n:
ntok = 1
ktok = 1
for t in xrange(1, min(k, n – k) + 1):
ntok *= n
ktok *= t
n -= 1
return ntok // ktok
else:
return 0
(I hope that comes out in the blog formatting. See http://groups.google.com/group/comp.lang.python/msg/6e7c3358b086ff9c?dmode=source )
For small numbers, like all choose(n,k) where 0 <= k <= n <= 800 then you might as well precompute everything, or use cariaso’s suggestion of memoizing.
June 6th, 2008 at 1:19 pm
Hi Andrew,
Yep, I forgot about that one. I was going to test that function too but decided not. The extra modules that I was referring to were the gmpy and another that uses the scipy (http://aspn.activestate.com/ASPN/Mail/Message/python-list/2954844).
In my opnion the precomputing everything is the fastest one, as I did with C++.
I will update the post. Sorry about that.
June 6th, 2008 at 1:39 pm
Sage ( a really big Python module with a lot of math functionality http://www.sagemath.org ) also provides its own wrapper over GMP integers:
sage: timeit(“factorial(4000)”)
625 loops, best of 3: 502 µs per loop
sage: timeit(“factorial(40000)”)
25 loops, best of 3: 27.5 ms per loop
–Mike
June 7th, 2008 at 8:42 pm
Here’s a faster implementation for large N. This takes into account that the real cost here is the cost of multiplying two bigints (python long objects). The goal is to keep the terms involved below a limit, until as late in the game as is possible.
def fastfact(n):
limit = 1 << (1 < limit:
terms.append(accum)
accum = 1
if accum > 1:
terms.append(accum)
return pairwise_reduce(operator.mul, terms)
def pairwise_reduce(op, terms):
while len(terms) > 1:
if len(terms) % 2 == 0:
terms = map(op, terms[0::2], terms[1::2])
else:
terms = map(op, terms[0:-1:2], terms[1:-1:2]) + [terms[-1]]
return terms[0]
On my machine, this algorithm takes:
100 loops, best of 3: 7.29 msec per loop
Whereas fact_03 takes:
100 loops, best of 3: 11.8 msec per loop
This differences become more staggering for larger n.
June 7th, 2008 at 8:46 pm
Ah, loss of indentation. And parts of my code have disappeared, as well. Is this indicative of bad character escaping?
Let me know how to fix this, and I’ll re-post the algorithm.
June 7th, 2008 at 9:04 pm
Here’s an experiment. I hope it works.
To get the code, run the following python:
s.replace(“_LT_”, “”, “_GT_”).replace(“…”, ” “)
where s is a string containing everything between BEGIN and END.
BEGIN
import…operator
def…factA(n):
…………m…=…operator.mul
…………return…reduce(operator.mul,…xrange(2,…n+1),…1)
def…factB(n):
…………limit…=…1…_LT__LT_…(1…_LT__LT_…10)……#…limit…found…experimentally…via…timing…tests
…………terms…=…[...]
…………accum…=…1
…………for…i…in…range(2,…n+1):
……………………accum…*=…i
……………………if…accum…_GT_…limit:
………………………………terms.append(accum)
………………………………accum…=…1
…………if…accum…_GT_…1:
……………………terms.append(accum)
…………return…pairwise_reduce(operator.mul,…terms)
def…pairwise_reduce(op,…terms):
…………while…len(terms)…_GT_…1:
……………………if…len(terms)…%…2…==…0:
………………………………terms…=…map(op,…terms[0::2],…terms[1::2])
……………………else:
………………………………terms…=…map(op,…terms[0:-1:2],…terms[1:-1:2])…+…[terms[-1]]
…………return…terms[0]
def…test(n):
…………for…i…in…range(2,…n):
……………………if…i…%…1000…==…0:
………………………………print…”testing…at…%i…”…%…i
……………………x…=…factA(i)
……………………y…=…factB(i)
……………………if…x…!=…y:
………………………………print…”factB…failed…at…%i”…%…i
………………………………break
if…__name__…==…’__main__’:
…………test(5000)
END
June 7th, 2008 at 9:08 pm
Ah beautiful, the instructions still didn’t come through. I should have realized this would happen. Let’s try:
print s.replace(“_LT_”, chr(60)).replace(“_GT_”, chr(62)).replace(“…”, ” “)
June 7th, 2008 at 9:10 pm
Okay, now I’m just testing things. Things I would probably know if I posted comments on blogs very often:
Does this look like formatted code: & < > + – / *
INDENTED LINE
FLUSH LEFT
blah
June 7th, 2008 at 9:13 pm
Ok, I give up. Very sorry about the comment spam. Moderator, if there’s a better way to express what I’m trying to say, please feel free to remove most of these comments. I’d rather post the code in a readable and useful way, but I’m at a total loss of how to do this.
June 7th, 2008 at 9:40 pm
Hi Dave
If you want send me the code in an email (nuin at genedrift dot org) and I can post it. I still didn’t find a way to make the comments “code-friendly”.
Thanks
November 6th, 2008 at 3:31 pm
[...] recorded first by mrklaw on 2008-11-06→ Paulo Nuin: A quick assessment of factorial functions in Python [...]