Using the Quick Raise of Matrices to a Power to Write a Very Fast Interpreter of a Simple Programming Language


Hello! My name is Alex Skidanov. Currently I work at MemSQL.

I have recently read an article about calculating the Nth Fibonacci number in O(log N) arithmetic operations. But why would we need it in practice? — you may ask. By itself, the calculation N-th Fibonacci number may not be very interesting, but the approach with the matrices used in the article, in practice, can be applied to a much wider range of tasks.

In the course of this article we will examine how to write an interpreter that can perform simple operations (assignment, addition, subtraction and truncated multiplication) on a limited number of variables with nested loops with an arbitrary number of iterations in a fraction of a second (of course, if the intermediate values in the calculations will remain in reasonable limits). For example, here is a code passed to the input of an interpreter:

loop 1000000000
  loop 1000000000
    loop 1000000000
      a += 1
      b += a

It will immediately print a = 1000000000000000000000000000, b = 500000000000000000000000000500000000000000000000000000, despite the fact that if the program executed naively, the interpreter would have to perform an octillion of operations.

I believe that the reader has an idea of what matrix is and what matrix multiplication means. In this article we will use square matrices only and rely on a very important property of multiplication of square matrices — associativity.

For simplicity, let’s restrict our interpreter to four variables: A, B, C and D. To represent the state of the interpreter at the given moment, we will use a vector of length 5. Its first four elements will contain values of four variables accordingly. The last one will be equal to 1 during the whole interpretation time.

(A, B, C, D, 1) At the beginning of interpretation we will assume that values of all variables are 0.

(0, 0, 0, 0, 1) Suppose the first operation in the program code contains the following line:

A += 5 The effect of this command is that the value of A increases by 5, while values of other variables do not change. We can represent it in the form of the following matrix:

1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
5 0 0 0 1

Looking at it, we can see that it’s almost identical to the identity matrix (which, as we know, does not change the value of any vector that is multiplied by it), except for the last element of the first column, that is equal to 5. If we recall how to multiply a vector by a matrix, we can see that the values of all elements, except the first one, will not change, while the value of the first element will be

v[0] * 1 + v[4] * 5 Since v[0] contains the current value of variable A, and v[4] is always equal to 1:

v[0] * 1 + v[4] * 5 = A + 5 If we multiply the vector of the current state by this matrix, the obtained vector will correspond to the state, in which A is more by 5. That’s exactly what we needed.

If we modify the matrix a bit by removing 1 in the first element of the first row:

0 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
5 0 0 0 1

Just like before, the values of all the elements except the first will not be changed, while the first element becomes equal to v[4] * 5, or just 5. Multiplication of the current state vector by such matrix is equivalent to command

A = 5 Let’s take a look at the following matrix:

1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 1 0 1 0
0 0 0 0 1

Its only difference from the identity matrix is the second element in the fourth row, which is equal to 1.

Obviously, multiplication of the current state vector by this matrix will not change values in the first element and the last three ones. Meanwhile, the second element value will change to

v[1] * 1 + v[3] * 1 Since v[1] contains the current state of variable B, and v[3] contains the current state of variable D, multiplication of the state vector by this matrix is equivalent to B += D command execution.

Thinking the same way, we can understand that multiplication of the state vector by the next matrix is equivalent to C *= 7 command execution.

1 0 0 0 0
0 1 0 0 0
0 0 7 0 0
0 0 0 1 0
0 0 0 0 1

Let’s get down to combining commands. Let v vector define the current state. Ma matrix corresponds to A += 5 command. Mm matrix corresponds to A *= 7 command. To get vector r, we should, at first, multiply v by Ma, and then by Mm:

r = v * Ma * Mm As expected, the multiplication of the initial state vector by these two matrices will lead to the state, in which a=35:

1 0 0 0 0     7 0 0 0 0
              0 1 0 0 0     0 1 0 0 0
              0 0 1 0 0     0 0 1 0 0
              0 0 0 1 0     0 0 0 1 0
              5 0 0 0 1     0 0 0 0 1
0 0 0 0 1     5 0 0 0 1    35 0 0 0 1

Let’s take a look at another example. Instead of multiplying by 7, we just want seven times add 5 to A.

r = v * Ma * Ma * Ma * Ma * Ma * Ma * Ma That’s when the associative property of matrix multiplication comes in handy:

r = v * Ma * Ma * Ma * Ma * Ma * Ma * Ma = 
    v * (Ma * Ma * Ma * Ma * Ma * Ma * Ma) = 
    v * Ma ^ 7

It’s an example of a simple loop. Instead of multiplying v by Ma seven times, it’s enough to raise Ma matrix to the power of 7 and execute just one multiplication after. If we would want to execute the following two operations three times in the loop

A += 5
B -= C

(Let Mbc matrix represent B -= C operation), it would look like this:

r = v * Ma * Mbc * Ma * Mbc * Ma * Mbc = 
    v * ((Ma * Mbc) * (Ma * Mbc) * (Ma * Mbc)) = 
    v * (Ma * Mbc) ^ 3

We compute the matrix that corresponds to the loop body just once and then raise it to a power.

The considered examples are enough to start working on the interpreter for a simple language that supports assignment, addition, subtraction, multiplication (by a constant only) and loops. We will learn to represent any of such programs in the form of a matrix of N+1 by N+1 size, in which N is the number of variables the program operates with. Then we simply multiply the vector with initial state by this matrix.

The rules of representing a program in the form of matrix are really simple:

  1. Each separate command is represented in the form of a matrix that differs from the identity one by one element (or two elements for the assignment operation). We have already reviewed the examples of such matrices.
  2. Several commands in a row are represented in the form of a matrix that is equal to product of the matrix representation of each separate command.
  3. A loop is represented in the form of a matrix representing the loop body and raised to the power of the number of loop iterations.

If we have identity function returning an identity matrix:

def identity():
    return [[1 if i == j else 0 for j in range(REGS + 1)] for i in range(REGS + 1)]

then, the function that builds a matrix for r1 += r2 command (r1 and r2 are variables) can look like this:

def addreg(r1, r2):
    ret = identity()
    ret[r2][r1] = 1
    return ret

As for r += val (r is a variable, val is a constant), it’s going to look like the following:

def addval(r, val):
    ret = identity()
    ret[REGS][r] = val
    return ret

Functions for building matrices of other commands look similarly. It’s an identity matrix, in which one element is changed.

Now, it’s quite easy to write an interpreter without loops. Let mat matrix correspond to the code that has already been read. In the beginning, it is equal to the identity matrix, as an empty program does not change the state. Then, we read commands one by one; divide them into three elements (the left operand, the operator and the right operand). Depending on the operator, multiply the matrix corresponding to the entire program, by the matrix corresponding to the current command:

def doit():
    mat = identity()
    while True:
        line = sys.stdin.readline().lower()
        tokens = line.split()
        if tokens[0] == 'loop':
            # the code will be for loops here
        elif tokens[0] == 'end':
            return mat
            r1 = reg_names.index(tokens[0])
                r2 = reg_names.index(tokens[2])
                r2 = -1
            if tokens[1] == '+=':
                if r2 == -1: cur = addval(r1, long(tokens[2]))
                else: cur = addreg(r1, r2)
            elif tokens[1] == '-=':
        mat = matmul(mat, cur)

We have a little left to do — add support for loops. The loop raises the matrix of the body loop to the power of number of loop iterations. As we know, exponentiation requires O(log N) operations only. N is the power we raise the matrix to. The exponentiation algorithm is quite simple:

  1. If the power is zero, we will return the identity matrix.
  2. If the power is even (2N), we can recursively compute M^N, and then return the square of the obtained matrix.
  3. If the power is odd (2N+1), it’s enough to recursively compute M^2N value, and return the obtained matrix, that has been multiplied by M.

Since the power reduces twice after every two iterations, the complexity of such algorithm is logarithmic.

def matpow(m, p):
    if p == 0: return identity()
    elif p % 2 == 0:
        tmp = matpow(m, p / 2)
        return matmul(tmp, tmp)
    else: return matmul(m, matpow(m, p - 1))

Add the following line to the interpreter:

        if tokens[0] == 'loop':
            cur = matpow(doit(), long(tokens[1]))

The interpreter is ready!

The example is available on GitHub. The code is less than 100 lines long.

To test the speed, we can recall the Fibonacci numbers. For instance, the following code:

A = 1
B = 1
loop 100
  C = A
  C += B
  A = B
  B = C

will calculate the 101st and 102nd Fibonacci numbers:

A = 573147844013817084101, B = 927372692193078999176

100 to 1000000 replacement will calculate the 1 000 001st and 1 000 002nd numbers in 4 seconds. Classic version would take much more time, as the program has to operate with multi-thousand-digit numbers. If you write code that does not have to handle big numbers, for example, the code for calculating the sum of an arithmetic progression given in the beginning of this article, the number of iterations may go beyond reason, but the code will be executed in a fraction of a second.

loop 1000000000000000000000000000000000000000000000
  loop 1000000000000000000000000000000000000000000000
    loop 1000000000000000000000000000000000000000000000
      a += 1
      b += a

In practice, this approach may be used, for example, for optimizing compilers, which can thus fold loops with many iterations that operate on a small number of variables.


  1. How would this handle loops with nonlinear operations? Is there a way to break down code into linear and nonlinear parts which execute sequentially?

Ropes — Fast Strings

Most of us work with strings one way or another. There’s no way to avoid them — when writing code, you’re doomed to concatinate strings every day, split them into parts and access certain characters by index. We are used to the fact that strings are fixed-length arrays of characters, which leads to certain limitations when working with them. For instance, we cannot quickly concatenate two strings. To do this, we will at first need to allocate the required amount of memory, and then copy there the data from the concatenated strings.