# An example of fast numerical computation using Haskell

Posted on August 24, 2013

# Problem statement

Consider the sequence of natural numbers obtained from iterating

an+1 = an/2 if an is even, else (3an + 1)/2

starting with a given a0. This is a slightly optimized version of the sequence in the Collatz conjecture. For example, with a0 = 12, the sequence is [12, 6, 3, 5, 8, 4, 2, 1, 2, 1, ...]. The sequence is also called the hailstone sequence and the mathematician Collatz conjectured that it reaches 1 for all values of a0. When it reaches 1, it is stuck in the cycle [1, 2, 1, 2 ...], so we will truncate the sequence at 1.

In the current problem, we would like find that a0 (in some range of numbers) which results in the longest hailstone sequence.

# C implementation

collatz.c

``````#include <stdio.h>

int main(int argc, char **argv) {
int max_a0 = atoi(argv[1]);
int longest = 0, max_len = 0;
int a0, len;
unsigned long a;

for (a0 = 1; a0 <= max_a0; a0++) {
a = a0;
len = 0;

while (a != 1) {
len++;
a = ((a%2==0)? a : 3*a+1)/2;
}

if (len > max_len) {
max_len = len;
longest = a0;
}
}
printf("(%d, %d)\n", max_len, longest);
return 0;
}``````

On my machine (with gcc 4.7.3), it takes 0.4 seconds to find the longest sequence for the first million numbers.

``````\$ gcc -O2 collatz.c
\$ time ./a.out 1000000
(329, 837799)

real	0m0.429s
user	0m0.424s
sys	0m0.004s``````

The Haskell code is quite short and more importantly, it seems to express the idea rather than give a particular implementation.

collatz.hs

``````import Data.Word
import Data.List
import System.Environment

collatzNext :: Word32 -> Word32
collatzNext a = (if even a then a else 3*a+1) `div` 2

collatzLen :: Word32 -> Int
collatzLen a0 = length \$ takeWhile (/= 1) \$ iterate collatzNext a0

main = do
[a0] <- getArgs
print \$ maximum \$ map (\a0 -> (collatzLen a0, a0)) [1..max_a0]``````

But, it takes an atrocious 6 seconds for the same computation (with ghc 7.6.2 and llvm 3.2)

``````\$ ghc -O2 -fllvm collatz.hs
\$ time ./collatz
(329,837799)

real	0m5.994s
user	0m5.944s
sys	0m0.040s``````

At this point, one should ideally run the Haskell program with profiling enabled and see what's slowing it down. But for now, I'll venture a guess that it has something to do with the multiple lists being created by `iterate` and `takeWhile` which are finally reduced by `length`. To test this, let's write a function `lenIterWhile` that combines the three into one without generating any intermediate lists.

collatz1.hs

``````import Data.Word
import Data.List
import System.Environment

collatzNext :: Word32 -> Word32
collatzNext a = (if even a then a else 3*a+1) `div` 2

-- new code
collatzLen :: Word32 -> Int
collatzLen a0 = lenIterWhile collatzNext (/= 1) a0

lenIterWhile :: (a -> a) -> (a -> Bool) -> a -> Int
lenIterWhile next notDone start = len start 0 where
len n m = if notDone n
then len (next n) (m+1)
else m
-- End of new code

main = do
[a0] <- getArgs
print \$ maximum \$ map (\a0 -> (collatzLen a0, a0)) [1..max_a0]``````

This brings the time down to 0.54 seconds! This is quite close to the C speed (0.43 s) but it comes at the cost of some code readability.

# Stream fusion

What we just did with `lenIterWhile` is called stream fusion and has been implemented as a Haskell library that replaces the common list processing functions in Prelude and Data.List. You can install the stream fusion library with `cabal install stream-fusion`. Armed with this library, we can get rid of the ugly `lenIterWhile` and write instead:

collatz2.hs

``````import Data.Word
import qualified Data.List.Stream as S
import System.Environment

collatzNext :: Word32 -> Word32
collatzNext a = (if even a then a else 3*a+1) `div` 2

collatzLen :: Word32 -> Int
collatzLen a0 = S.length \$ S.takeWhile (/= 1) \$ S.iterate collatzNext a0

main = do
[a0] <- getArgs
print \$ S.maximum \$ S.map (\a0 -> (collatzLen a0, a0)) [1..max_a0]``````

Notice that we only had to modify our first Haskell code to use list functions from Data.List.Stream in place of Prelude and Data.List. Also, at 0.51 seconds, this is slightly faster than our previous code, possibly because the `map` and `maximum` instances also got fused.

# Cython implementation

To compare with another high-level language:

cycollatz.pyx

``````import sys

cdef int collatzLen(int a0):
cdef unsigned long a = a0
cdef int length = 0
while a != 1:
a = (a if a%2 == 0 else 3*a+1) / 2
length += 1
return length

def maxLen(max_a0):
cdef int max_length = 0, longest = 0, a0, length
for a0 in xrange(1, max_a0 + 1):
length = collatzLen(a0)
if length > max_length:
max_length = length
longest = a0
return max_length, longest

if __name__ == '__main__':
max_a0 = int(sys.argv[1])
print maxLen(max_a0)``````

The code is more verbose and looks rather similar to the C code. But, at 0.47 seconds (with cython 0.17.4), it is faster than the Haskell code.

``````\$ cython --embed cycollatz.pyx
\$ gcc -O2 -I/usr/include/python2.7 cycollatz.c -lpython2.7 -o cycollatz
\$ time ./cycollatz 1000000
(329, 837799)

real	0m0.470s
user	0m0.460s
sys	0m0.008s``````

# Conclusion

The different implementations are compared in the table below.