Calculation of Fibonacci numbers in logarithmic number of steps

Disclaimer: this post is pretty much a comprehensive solution to exercise 1.19 from Structure and Interpretation of Computer Programs by Harold Abelson and Gerald Jay Sussman (the solution is written in Haskell instead of Lisp though).

Let us quickly recall the definition of Fibonacci numbers as well as a few well-known ways to calculate them.

Fibonacci numbers are defined recursively, with $n$-th number denoted as $
F_n = \left\{
\begin{array}{l l}
0 & \quad \text{if $n = 0$} \\
1 & \quad \text{if $n = 1$} \\
F_{n-2} + F_{n-1} & \quad \text{otherwise.}
\end{array} \right.
$

This definition can be translated to a Haskell program in a straightforward manner:

fib1 :: Int -> Integer
fib1 0 = 0
fib1 1 = 1
fib1 n = fib1 (n-2) + fib1 (n-1)

However, computation of $n$-th Fibonacci number with the above function requires $O(log_2(F_n)n)$ space and $O(log_2(F_n)2^n)$ time: recursive calls create a binary tree of depth $n$ (hence the space requirement as you need to keep track of $n$ intermediate values, each taking up to $O(log_2(F_n))$ bits), which contains at most $2^n-1$ elements, where each node takes $O(log_2(F_n))$ time to compute as we’re using arbitrary precision arithmetic. It is worth noting that $O(log_2(F_n))$ is in fact equal to $O(n)$ as the growth of Fibonacci sequence is exponential, but the former will be used as it’s more clear.

The other, significantly more performant way is to use the iterative approach, i.e. start with $F_0$ and $F_1$ and by appropriately adding them, compute your way up to $F_n$:

fib2 :: Int -> Integer
fib2 n = go n 0 1
  where
    go k a b
      | k == 0    = a
      | otherwise = go (k - 1) b (a + b)

The initial state is $(0, 1) = (F_0, F_1)$ and each iteration we replace a state $(F_{n-k}, F_{n-k+1})$ with $(F_{n-k+1}, F_{n-k+2})$, eventually ending up with $(F_n, F_{n+1})$ and taking the first element of a pair as a result.

The performance of this variation is much better than its predecessor as its space complexity is $O(log_2(F_n))$ and time complexity is $O(log_2(F_n)n)$.

Can we do better than that? As a matter of fact, we can. First of all, observe that in the last solution, each iteration we replaced a pair of two numbers with another pair of two numbers using elementary arithmetic operations, which is basically a transformation in two dimensional vector space (let us assume $\mathbb{R}^2$). In fact, this transformation is linear and its matrix representation is $T = \begin{pmatrix} 0 & 1 \\ 1 & 1\end{pmatrix}$, because $T \cdot \begin{pmatrix} F_n \\ F_{n+1} \end{pmatrix} = \begin{pmatrix} F_{n+1} \\ F_n + F_{n+1} \end{pmatrix} = \begin{pmatrix} F_{n+1} \\ F_{n+2} \end{pmatrix}$.

With this knowledge the formula for Fibonacci numbers can be written in a compact way:

$F_n = \pi_1\left( T^n \cdot \begin{pmatrix} F_0 \\ F_1 \end{pmatrix} \right)$, where $\pi_1 : \mathbb{R}^2 \rightarrow \mathbb{R}$, $\pi_1\left(\begin{pmatrix}v_1 \\ v_2\end{pmatrix}\right) = v_1$.

There are two routes to choose from at this point: we can either use an algorithm for fast exponentiation of matrices (such as square and multiply) to calculate $T^n$ or try to apply more optimizations. We will go with the latter, because matrix multiplication is still pretty heavy operation and we can improve the situation by exploiting the structure of $T$.

First, close observation of the powers of $T$ allows us to notice the following:

Fact. $T^n = \begin{pmatrix} F_{n-1} & F_n \\ F_n & F_{n+1} \end{pmatrix}$ for $n \in \mathbb{N_+}$.

Proof. For $n = 1$ it is easy to see that $T = \begin{pmatrix}F_0 & F_1 \\ F_1 & F_2\end{pmatrix}$. Now assume that the fact is true for some $n \in \mathbb{N_+}$. Then $T^{n+1} = T^n \cdot T = \begin{pmatrix} F_{n-1} & F_n \\ F_n & F_{n+1} \end{pmatrix} \cdot \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix} = \begin{pmatrix} F_n & F_{n-1} + F_n \\ F_{n+1} & F_n + F_{n+1} \end{pmatrix} = \begin{pmatrix} F_n & F_{n+1} \\ F_{n+1} & F_{n+2} \end{pmatrix}$.

Corollary. For $n \in \mathbb{N_+}$ there exists $p$ and $q$ such that $T^n = \begin{pmatrix} p & q \\ q & p+q \end{pmatrix}$.

Equipped with this knowledge we can not only represent $T^k$ for some $k \in \mathbb{N_+}$ using only two numbers instead of four, but also derive a transformation of these numbers that corresponds to the computation of $T^{2k}$.

Fact. Let $n \in \mathbb{N_+}$, $p, q \in \mathbb{N}$. Then $\begin{pmatrix} p & q \\ q & p+q \end{pmatrix}^2 = \begin{pmatrix} p’ & q’ \\ q’ & p’+q’ \end{pmatrix}$, where $p’ = p^2 + q^2$ and $q’ = (2p + q)q$.

Now, we can put all of these together and construct the final solution:

fib3 :: Int -> Integer
fib3 n = go n 0 1 0 1
  where
    go k a b p q
      | k == 0    = a
      | odd k     = go (k - 1) (p*a + q*b) (q*a + (p+q)*b) p q
      | otherwise = go (k `div` 2) a b (p*p + q*q) ((2*p + q)*q)

Let us denote $i$-th bit of $n$ by $n_i \in \{0,1\}$, where $i \in \{0, \dots, \lfloor log_2(n) \rfloor \}$. We start with $\begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} F_0 \\ F_1 \end{pmatrix}$ and $\begin{pmatrix} p \\ q \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \end{pmatrix} \cong T$. Then we traverse the bits of $n$ and return $a$. Note that while iterating through $n_i$:

$\begin{pmatrix} p \\ q \end{pmatrix} \cong T^{2^i}$.
$\begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} F_m \\ F_{m+1} \end{pmatrix}$ for $m = \displaystyle\sum_{j = 0}^{i-1} n_j \cdot 2^j$.
If $n_i = 1$, $\begin{pmatrix} F_m \\ F_{m+1} \end{pmatrix}$ is replaced with $T^{2^i} \cdot \begin{pmatrix} F_m \\ F_{m+1} \end{pmatrix} = \begin{pmatrix} F_{m+2^i} \\ F_{m+2^i+1} \end{pmatrix}$.

Hence, in the end $\begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} F_n \\ F_{n+1} \end{pmatrix}$, so the function correctly computes $n$-th Fibonacci number.

The space complexity of this solution is $O(log_2(F_n))$, whereas its time complexity is $O(log_2(F_n)(log_2(n)+H(n)))$ with $H(n)$ being Hamming weight of $n$.

Now, let us put all of the implementations together and measure their performance using criterion library.

{-# OPTIONS_GHC -Wall #-}
{-# LANGUAGE BangPatterns #-}
module Main where

import Criterion.Main
import Criterion.Types

fib1 :: Int -> Integer
fib1 0 = 0
fib1 1 = 1
fib1 n = fib1 (n - 2) + fib1 (n - 1)

fib2 :: Int -> Integer
fib2 n = go n 0 1
  where
    go !k !a b
      | k == 0    = a
      | otherwise = go (k - 1) b (a + b)

fib3 :: Int -> Integer
fib3 n = go n 0 1 0 1
  where
    go !k !a b !p !q
      | k == 0    = a
      | odd k     = go (k - 1) (p*a + q*b) (q*a + (p+q)*b) p q
      | otherwise = go (k `div` 2) a b (p*p + q*q) ((2*p + q)*q)

main :: IO ()
main = defaultMainWith (defaultConfig { timeLimit = 2 }) [
    bgroup "fib1" $ map (benchmark fib1) $ [10, 20] ++ [30..42]
  , bgroup "fib2" $ map (benchmark fib2) $ 10000 : map (100000*) [1..10]
  , bgroup "fib3" $ map (benchmark fib3) $ 1000000 : map (10000000*) ([1..10] ++ [20])
  ]
  where
    benchmark fib n = bench (show n) $ whnf fib n

The above program was compiled with GHC 7.10.2 and run on Intel Core i7 3770. HTML report generated by it is available here.

In particular, after substituting the main function with:

main :: IO ()
main = defaultMainWith (defaultConfig { timeLimit = 2 }) [
    bgroup "fib3" [benchmark fib3 1000000000]
  ]
  where
    benchmark fib n = bench (show n) $ whnf fib n

we can see that the final implementation is able to calculate billionth Fibonacci number in a very reasonable time:
benchmarking fib3/1000000000 time 30.82 s (29.86 s .. 31.97 s) 1.000 R² (0.999 R² .. 1.000 R²) mean 30.34 s (29.96 s .. 30.56 s) std dev 345.1 ms (0.0 s .. 387.0 ms) variance introduced by outliers: 19% (moderately inflated)