The Nth Fibonacci Number in O(log N)


Reading an article about getting a job in ABBYY, I came across the following task:

Find the Nth Fibonacci Number in O(N) time of arithmetic operations.

Thinking about it, I realized that the only solutions coming to my mind were those operating in O(n) time. But I found a better solution later.

I am going to use the following denotation of sets:

- non-negative integers

— positive integers

According to various mathematic traditions, a set of natural numbers can either contain 0 or not. Therefore, it’s preferred to indicate it explicitly in international mathematic texts.

The Solution

Donald E. Knuth provides a matrix identity of the following form:

The identity is provided without any proof, but it’s quite easy to prove it by induction.

The matrix on the right is called Q-matrix.

Let’s indicate:


According to the identity, . Thus, to calculate , we should calculate matrix and take the first element of the first line (the enumeration starts with 1).

Since calculation comes to raising the matrix to power, let’s take a look at this process in details.

Let’s say there is a matrix to be raised to power. Suppose, is the power of 2. Thus, .

We can represent in the form of a tree:

Meaning that:

So, to calculate matrix, we should calculate matrix and multiply it by itself. To calculate we would have to do the same with and so on.

Obviously, the tree height is .

Let’s estimate calculation time. matrix is of the same size in any power. Therefore, we can perform the multiplication of two matrices in any power in . We should perform of such multiplications. So, complexity is .

But what if n is not a power of 2?

There is a question now: what should we do if is not a power of 2? We can factor down any natural number as a sum of numbers that are the power of 2, without any repetitions (we do it all the time when converting a number from a binary to decimal numeral system). I.e.:

where is a set of powers, with the help of which we express . Remembering that is the power of a matrix, we get:


Matrix multiplication is not commutative though, i.e. the order of operands during the multiplication is important. But the commutative property is kept for the so-called permutation matrices. Matrix is permutational for , . Thus, we will not have to consider the order of operands during the multiplication task, which simplifies the process a bit.

So, we can represent the algorithm of calculation if the form of the following two steps:

  1. Factor down into the sum of the powers of two in the form of .
  2. Calculate all the elements of set.
  3. Calculate .

Let’s estimate the execution time of the given algorithm.

The first step performs in , where is the number of binary digits in .

Since we have to perform no more than matrix exponentiation, the second step takes time.

The third step performs in , as we should multiply matrices no more than times.


Is it possible to improve the algorithm execution time? Yes, it is. You should note that the second step does not describe the method of calculating matrices that belong to set. We know that is the power of 2 for each of them. Getting back to the picture with the tree, it means that all of them lie in some or other tree tiers and to calculate the greater ones, we should calculate the inferior ones. If we apply Memoization technique and cache all the calculated matrices of every tier, this will allow us to shorten the execution time of the second step up to , as well as the total execution time of a whole algorithm. That’s exactly what we need according to the task.

The Code

Let’s get down to coding. I implemented the algorithm in Python due to the following reasons:

It’s going to look like this:

class MatrixFibonacci:
    Q = [[1, 1],
         [1, 0]]

    def __init__(self):
        self.__memo = {}

    def __multiply_matrices(self, M1, M2):
        """Matrices miltiplication
        (the matrices are expected in the form of a list of 2x2 size)."""

        a11 = M1[0][0]*M2[0][0] + M1[0][1]*M2[1][0]
        a12 = M1[0][0]*M2[0][1] + M1[0][1]*M2[1][1]
        a21 = M1[1][0]*M2[0][0] + M1[1][1]*M2[1][0]
        a22 = M1[1][0]*M2[0][1] + M1[1][1]*M2[1][1]
        r = [[a11, a12], [a21, a22]]
        return r

    def __get_matrix_power(self, M, p):
        """Matrix exponentiation (it is expected that p that is equal to the power of 2)."""

        if p == 1:
            return M
        if p in self.__memo:
            return self.__memo[p]
        K = self.__get_matrix_power(M, int(p/2))
        R = self.__multiply_matrices(K, K)
        self.__memo[p] = R
        return R

    def get_number(self, n):
        """Getting the nth Fibonacci number
        (a non-negative integer number is expected as n)."""
        if n == 0:
            return 0
        if n == 1:
            return 1
        # Factoring down the passed power into the powers that are equal to the power of 2), 
        # i.e. 62 = 2^5 + 2^4 + 2^3 + 2^2 + 2^0 = 32 + 16 + 8 + 4 + 1.
        powers = [int(pow(2, b))
                  for (b, d) in enumerate(reversed(bin(n-1)[2:])) if d == '1']
        # The same, but less pythonic:
        matrices = [self.__get_matrix_power(MatrixFibonacci.Q, p)
                    for p in powers]
        while len(matrices) > 1:
            M1 = matrices.pop()
            M2 = matrices.pop()
            R = self.__multiply_matrices(M1, M2)
        return matrices[0][0][0]

mfib = MatrixFibonacci()
for n in range(0, 128):
    num = mfib.get_number(n)


I wanted to compare the execution time of my algorithm with a classical iteration algorithm that is based on the consecutive calculation saving the previous two numbers. I implemented it like this:

def get_number(self, n):
    if n == 0:
        return 0
    a = 0
    b = 1
    c = 1
    for i in range(n-1):
        c = a + b
        a = b
        b = c
    return c

I used the following technique of performance testing. number accepts values. Objects of MatrixFibonacci and IterationFibonacci (it’s the class that calculates Fibonacci numbers iteratively) classes are created for each . Then, an arbitrary integer is generated 10 000 times.

The test outputs a lot of strings like: n T1 T2. These data is used for building a chart.

As you can see from the chart, the matrix algorithm leaves the iterative algorithm behind starting from (its worth noting that will not fit into unsigned 64 bits). I guess, it’s possible to say that the algorithm is impractical, though its theoretical speed is quite good.

You will find the complete source code with the benchmark here.

The code for gnuplot to build charts:

set terminal png
set output 'fibgr.png'
set grid
set style data linespoints
set title "Calculation of 10000 Fibonacci numbers (Fn)"
set xlabel "Approximate number n"
set ylabel "Time, s"
plot "benchmark.txt" using 1:2 with lines title 'iterative', \
     "benchmark.txt" using 1:3 with lines title 'matrix'


  • Donald E. Knuth. The Art of Computer Programming. Volume 1. Fundamental Algorithms.


  1. Computing the n-th power of the diagonalized matrix would involve non-integer arithmetic, thus rounding errors. One could argue that the error should not affect the integer part, but I’d say that depends on the size of n.

    The power function is also not O(1) in n. You would get the same runtime O(log(n)) with a clever implementation.

  2. Bottom of def get_number renders as the Registered Trademark unicode instead of ®.

    You can be a little more concise with the code by modifying how __memo is set up.

    def __init__(self):
            self.__memo = {0:0,1:1}

    Then you can do:

    def __get_matrix_power(self, M, p):
            """Matrix exponentiation (it is expected that p that is equal to the power of 2)."""
            return self.__memo.get(p,self.non_memoized(M,p))
    def non_memoized(self, M,p):
            K = self.__get_matrix_power(M, int(p/2))
            R = self.__multiply_matrices(K, K)
            self.__memo[p] = R
            return R

    I know it essentially does the same thing, IMHO it’s a little cleaner code thou ;)

  3. Also, you should memoize the matrix multiplication, that’s a heavy step, and one for repeated invocations would save a ton of time.
  4. Why do you memoize the matrix method and not the iterative method? When I memoize the iterative method, I end up with about constant time for each of the steps: Around 1000 gives Iterative: 0.054137468338 Matrix: 1.92105817795

    This all stays in memory for my machine.

    You can find the improved fork of your code here:

  5. Because the exponentiation method lets you compute an arbitrary element of the Fibonacci sequence without calculating most of the ones before it. Your `benchmark’ calculates them in ascending order, so obviously a straightforward memoizing/iterative implementation does well. Instead, it should just calculate the millionth Fibonacci number (say), ab initio.
  6. Thinking about it, I realized that even the implementation described here isn’t really O(log n). Here’s why: it’s rather easy to prove that the nth term of the Fibonacci sequence has O(n) digits. So, even if you are performing only O(log n) multiplications, each multiplication is actually O(n), so the algorithm is actually O(n log n).

    I made a benchmark that shows that in practice:

  7. You should write an article — «Response to the Nth Fibonacci Number in O(log N)»! IT would be interesting to read.
  8. You’re doing too much work in the matrix multiplication: the matrices are always symmetrical, but you’re not taking advantage of that. But there’s a slicker way, which involves some slightly fancier mathematics.

    Suppose that we know that t^2 = t + 1 for some t. (Equivalently, we’ll work in the quotient ring R = Z[t]/(t^2 — t — 1).) Now, let’s look at the powers of t. By using the rule that t^2 = t + 1, we will end up with only units and coefficients of t. (That this is the characteristic polynomial of the matrix discussed in the main article is, of course, not a coincidence.)

    t^0 = 1 t^1 = t t^2 = t + 1 t^3 = t^2 + t = (t + 1) + t = 2 t + 1 t^4 = 2 t^2 + t = 2 (t + 1) + t = 3 t + 2 t^5 = 3 t^2 + 2 t = 3 (t + 1) + 2 t = 5 t + 3

    Hmm. There’s a pattern: t (a t + b) = a t^2 + b t = a (t + 1) + b t = (a + b) t + a. I feel a theorem coming on: t^k = F_k t + F_{k-1} for all integers k. Proof. (a) t^1 = t = F_1 t + F_0. (b) Suppose the claim holds for k; then t^{k+1} = t t^k = (F_k + F_{k-1}) t + F_k = F_{k+1} t + F_k. © Notice that t (t — 1) = t^2 — t = (t + 1) — t = 1, so t^{-1} = t — 1; then, supposing again that the claim holds for k, we have t^{k-1} = (t — 1) (F_k t + F_{k-1}) = F^k t^2 + F_{k-1} t — F_k t — F_{k-1} = F_k (t + 1) + F_{k-1} t — F_k t — F_{k-1} = F_{k-1} t + (F_k — F_{k-1}) = F_{k-1} t + F_{k-2}, as required. []

    Now, we can compute powers of t using the ordinary square-and-multiply method (or fancier methods involving windowing, combs, or whatever). The important pieces are:

    (a t + b)^2 = a^2 (t + 1) + 2 a b t + b^2 = (2 a b + a^2) t + (a^2 + b^2) t (a t + b) (c t + d) = a c (t + 1) + (a d + b c) t + b d = (a d + a c + b c) t + (a c + b d)

    Here’s a simple left-to-right exponentiation algorithm. Suppose we want to compute x^n, and we know that n = 2^k u + v, for 0 <= v < 2^k, and x^u = y. (Initially, we can arrange this with 2^k > n, so u = 0, v = n, and x = 1.) Now, suppose that v = 2^{k-1} b + v’, where b is either zero or one, and v’ < 2^{k-1}. Let u’ = 2 u + b. Then n = 2^{k-1} u’ + v’, and x^{u’} = x^{2u+b} = y^2 x^b. Continue until k = 0.

    Annoyingly, this means we need to count the number of bits in n, i.e., k = L(n), with 2^{i-1} <= n < 2^i. We can start with an upper bound, finding the smallest 2^{2^j} > n. Suppose, inductively, that we know how to determine L(m) for 0 <= m < 2^i, and that n < 2^{2i}. If n < 2^i, then we’re done; otherwise write n = 2^i n’ + r where 0 <= r < 2^i. Since n < 2^{2i}, we know that n’ <= n/2^i < 2^i, so we can determine k’ = L(n’). Let k = k’ + i: I claim that 2^{k-1} <= n = 2^i n’ + r < 2^k. Taking the lower bound first, we’re given that 2^{k’-1} <= n’, so 2^{k-1} = 2^i 2^{k’-1} <= 2^i n’ <= 2^i n’ + r. But also, n’ < 2^{k’}, so n’ + 1 <= 2^{k’} and 2^i n’ + 2^i <= 2^i 2^{k’} = 2^k. But r < 2^i, so n = 2^i n’ + r < 2^k as required. []

    Enough theory! Now let’s turn this into actual code!

    #! /usr/bin/python
    def power(x, n, id, sqr, mul):
      Compute and return X^N for nonnegative N.
      This is an abstract operation, assuming only that X is an element of a
      monoid: ID is the multiplicative identity, and the necessary operations are
      provided as functions: SQR(Y) returns Y^2, and MUL(Y, Z) returns the
      product Y Z.
      ## First, we need to find a 2^{2^i} > n.
      i = 1
      while (n >> i) > 0: i = 2*i
      ## Refine our estimate, so that 2^{k-1} <= n < 2^k.
      k, m = 1, n
      while m > 1:
        i = i >> 1
        mm = m >> i
        if mm > 0: m, k = mm, k + i
      ## Now do the square-and-multiply thing.
      y = id
      for i in xrange(k - 1, -1, -1):
        y = sqr(y)
        if (n >> i)%2: y = mul(x, y)
      ## We're done.
      return y
    def fibonacci(n):
      Compute and return the Nth Fibonacci number.
      Let F_0 = 0, F_1 = 1, F_{k+1} = F_k + F_{k-1}, for all integers k.  Note
      that this is well-defined even for negative k.  We return F_N.
      ## Auxiliary functions for working in our polynomial ring.
      def poly_sqr((a, b)):
        a2 = a*a
        return 2*a*b + a2, a2 + b*b
      def poly_mul((a, b), (c, d)):
        ac = a*c
        return a*d + b*c + ac, ac + b*d
      ## Do the job.  For negative indices, we take powers of t^{-1}.
      if n < 0: return power((1, -1), -n, (0, 1), poly_sqr, poly_mul)
      else: return power((1, 0), n, (0, 1), poly_sqr, poly_mul)
    ## Command-line interface.
    if __name__ == '__main__':
      import sys
      try: n = (lambda _, n: int(n))(*sys.argv)
      except TypeError, ValueError:
        print >>sys.stderr, 'Usage: fib N'
      print fibonacci(n)[0]
  9. Just diagonalize the matrix and do it in constant time.
  10. If you do the matrix power more cunningly, you don’t need to memoize anything. Just use this:

      def mat_pow(m, e):
        if e == 1:
          return m
          m2 = mat_pow(m, e/2)
          if (e % 1) == 0:
            return m2 * m2
            return m2 * m2 * m
    def mul(A, B):
        return (A[0]*B[0] + A[1]*B[2], A[0]*B[1] + A[1]*B[3], 
                A[2]*B[0] + A[3]*B[2], A[2]*B[1] + A[3]*B[3])
    def fib(n):
        if n==1: return (1, 1, 1, 0)
        A = fib(n>>1)
        A = mul(A, A)
        if n&1: A = mul(A, fib(1))
        return A

    *cap obvious flies away*

  12. I really enjoyed this article, thanks!
  13. You can also use the fact that, if f(n) is the nth Fibonacci number (using f(0) = 1), then f(n + k) = f(n — 1)f(k — 1) + f(n)f(k)

    This is not hard to see by looking at the sequence, assuming that a = f(n), b = f(n + 1) for some n: a, b, a + b, a + 2b, 2a + 3b, 3a + 5b, 5a + 8b, 8a + 13b, 13a + 21b,…

    You can use this to do a similar ‘exponentiation by squaring’ thing:

    def add((a, b, c), (x, y, z)):
        v = a * y + b * z
        w = b * y + c * z
        return (w - v, v, w)
    def fib2(n):
        (r, p) = ((1, 1, 2), (0, 1, 1))
        while not(n == 0):
            if n % 2: r = add(r, p)
            p = add(p, p)
            n = n >> 1
        return r[0]

    Which is logarithmic in n if you assume that multiplication, addition, etc. are constant time (which is only true for word-sized integers, so in general this is false).

  14. However in my opinion time complexity is O(z) since it’s still linear, In fact ,we consider n as a integer so that its real input size is 2^z .Finally O(log(n)) can be written as O(z);
  15. The matrix version is better than the iterative version because the latter is 0(n) and so O(2^z)

Ropes — Fast Strings

Most of us work with strings one way or another. There’s no way to avoid them — when writing code, you’re doomed to concatinate strings every day, split them into parts and access certain characters by index. We are used to the fact that strings are fixed-length arrays of characters, which leads to certain limitations when working with them. For instance, we cannot quickly concatenate two strings. To do this, we will at first need to allocate the required amount of memory, and then copy there the data from the concatenated strings.