AVL Trees

C++

In the previous article we’ve reviewed Randomized Binary Search Trees. Today we are going to review the implementation of AVL Trees. They must be the first type of Balanced Binary Search Trees. They were invented by two Soviet inventors, G. M. Adelson-Velskii and E. M. Landis in 1962. There are plenty of AVL trees implementations, but, to my mind, none of them is good enough when you try to make sense of it all from scratch. They say that AVL trees are simpler than Red-Black trees, but when looking at the code, we can not believe it. Thus, I have decided to explain the structure of AVL trees step by step. I’ll also provide the code in C++.

AVL Tree Notion

AVL Tree

First of all, an AVL Tree is a Binary Search Tree (BST), the keys of which meet standard requirements: a key of any tree node is not less than the key in the left subtree of the given node and not more than any key in the right subtree of this node. This means that in order to find the necessary key in an AVL tree, we can use a standard algorithm. For simplicity, we will consider that all keys in a tree are integers and not repeated.

AVL trees are peculiar as for their balance in a way that for any tree node the height of its right subtree differs from the left subtree height for not more than one. It has been proved that this property is enough for the tree height to depend logarithmically in the number of its nodes. h height of an AVL tree with n keys lies in the range from log2(n + 1) to 1.44 log2(n + 2) − 0.328. Since all major operations on a BST (search, insertion and deletion) depend linearly on its height, we get a guaranteed logarithmic dependence of these algorithms operation time from the number of keys that are stored in a tree. Reminding you, that randomized search trees provide the balance in probability sense only. Though the probability of getting a badly imbalanced tree having high n values, is negligible, it is still not equal to zero.

Node Structure

We will represent AVL tree nodes with the help of the following structure:

struct node // the structure for representing tree nodes 
{
    int key;
    unsigned char height;
    node* left;
    node* right;
    node(int k) { key = k; left = right = 0; height = 1; }
};

key field stores the node key, height field stores the height of the subtree with the root in the given node, left and right fields are pointers to the left and the right subtrees. A simple constructor creates a new node (of 1 height) with the specified k key.

Traditionally, AVL tree nodes do not store the height, but the difference between the height of the left and the right subtrees (the so-called balance factor) that can accept three values only: -1, 0 and 1. But we should note that this difference is still stored in a variable, the size of which is equal to at least 1 byte (if we do not think out any schemes of an “efficient” package of such sizes). Let’s recall that h balance factor. Thus, height storage does not increase the memory amount that is allocated for the tree nodes. On the other hand, it efficiently simplifies implementation of some operations.

Let’s define three helper functions related to height. The first one is a wrapper for height field. It can operate with NULL pointers (empty trees) as well:

unsigned char height(node* p)
{
    return p ? p->height : 0;
}

The second one calculates the balance factor of the given node. It operates with nonzero pointers only:

int bfactor(node* p)
{
    return height(p->right) - height(p->left);
}

The third function retrieves the correct value of height field of the given node (provided that this field value in the left and the right child nodes are correct)

void fixheight(node* p)
{
    unsigned char hl = height(p->left);
    unsigned char hr = height(p->right);
    p->height = (hl>hr ? hl : hr) + 1;
}

We should note that all of the three functions are nonrecursive, i.e. their operation time is О(1).

Balancing Nodes

When adding or deleting nodes in an AVL tree, the balance factor of some nodes can be equal either to 2, or -2. Thus, the subtree is imbalanced. To solve this problem, we should apply the well known rotations round some tree nodes. Reminding you that a simple right (left) rotation causes the following transformations of the tree:

Let’s take a look at the code implementing a right rotation (as usual, each function which modifies a tree, returns a root of a new tree):

node* rotateright(node* p) // the right rotation round p
{
    node* q = p->left;
    p->left = q->right;
    q->right = p;
    fixheight(p);
    fixheight(q);
    return q;
}

The left rotation is the symmetric copy of the right one:

node* rotateleft(node* q) // the left rotation round q
{
    node* p = q->right;
    q->right = p->left;
    p->left = q;
    fixheight(q);
    fixheight(p);
    return p;
}

Let’s review the situation of imbalance, when the height of the right subtree of p node is greater by 2, than of the left subtree (the opposite case is symmetric and is implemented in much the same way). Let’s assume that q is the right child node of a p node, while s is the left child node of a q node.

The analysis of possible cases within the limits of the given situation shows that in order to get rid of imbalance in p node, it’s enough to perform either a simple rotation to the left round p, or the so-called big rotation to the left round the same p. A simple rotation is performed when the height of the left subtree of g node is more than the height of its right subtree: h(s)≤h(D).

We can apply a big rotation when h(s)>h(D) and perform two simple ones – at first the right rotation round q, then the left one round p.

The balance executing code comes to checking conditions and performing rotations:

node* balance(node* p) // p node balance
{
    fixheight(p);
    if( bfactor(p)==2 )
    {
        if( bfactor(p->right) right = rotateright(p->right);
        return rotateleft(p);
    }
    if( bfactor(p)==-2 )
    {
        if( bfactor(p->left) > 0  )
            p->left = rotateleft(p->left);
        return rotateright(p);
    }
    return p; // no balance needed
}

The described rotation and balance functions contain neither loops, nor recursion. Thus, they’re executed in constant-time that does not depend on the AVL tree size.

Key Insertion

A new key insertion in an AVL tree is executed in much the same way as in simple search trees. We go down the tree choosing either the right, or the left move direction depending on the result of comparing the key in the current node with the inserted one.

The only difference is that we balance current node when returning from recursion (i.e. after the key is inserted either in the left or the right subtree and this tree is balanced). It is proven that during insertion, the imbalance occurring in any node along the path is not more than 2. Thus, the use of the mentioned above balance function is correct.

node* insert(node* p, int k) // k key insertion in the tree with p root  
{
    if( !p ) return new node(k);
    if( kkey )
        p->left = insert(p->left,k);
    else
        p->right = insert(p->right,k);
    return balance(p);
}

To check the correspondence of the implemented insertion algorithm to theoretical estimates for AVL trees height, a simple calculating experiment has been carried out. There was an array with randomly generated numbers from 1 to 1000. Then these numbers were subsequently inserted in an initially empty AVL tree and the tree height was measured after each insertion. The obtained results were averaged according to 1000 calculations. The following chart represents the dependency of n on average height (the red line), the minimal height (the green line) and the maximum height (the blue line). It also depicts the upper and the lower theoretical estimates.

We can see that the experimentally found heights for random key sequences are within the theoretical limits even with some spare. We can reach the lower bound (at least in some points) if the initial key sequence is arranged in ascending order.

Key Deletion

As for nodes deletion from an AVL tree, it is not as simple as with Randomized Search Trees. I have not been able to either find, or create a method that would be based on joining two trees together. Therefore, as a basis I used the variant that is described almost everywhere. It is usually applied for deleting nodes from a standard Binary Search Tree as well.

AVL Tree Delete Operation

Its main concept is that we find p node with the specified k key (if it is not there, we do not have to perform any actions). In the right subtree find min node with the least key and replace the deleted p node with the found min.

There are some nuances may occur during the implementation. First of all, if the found p node does not have a right subtree, according to an AVL tree property, this node either should have just one child node (the tree of 1 height) on the left, or it is a list. In both cases we should just delete p node and return a pointer to the left child node of p.

Assume that now p node has the right subtree. Find the minimal key in it. According to Binary Trees, this key is found in the end of the left branch, starting from the tree root. Use the recursive function:

node* findmin(node* p) // searching the node with the minimal key in p tree  
{
    return p->left?findmin(p->left):p;
}

Another function will be in charge of deleting the minimal element from the given tree. And again, according to the AVL tree property, the minimal element either has a single node, or it is empty. In both cases we should simply return the pointer to the right node and on our way back (when returning from recursion) perform the balancing. We will not delete the minimal node as it will come in handy later.

node* removemin(node* p) // deleting the node with the minimal key from p tree 
{
    if( p->left==0 )
        return p->right;
    p->left = removemin(p->left);
    return balance(p);
}

Now all is ready to implement key deletion from the AVL tree. First, find the necessary node by performing the same actions as when inserting a key.

node* remove(node* p, int k) // k key deletion from p tree
{
    if( !p ) return 0;
    if( k key )
        p->left = remove(p->left,k);
    else if( k > p->key )
        p->right = remove(p->right,k);    

As soon as we have found the k key, turn to plan B: Memorize q and r roots of the left and the right subtrees of p node, and then delete p. If the right subtree is empty, return the pointer to the left subtree. If the right subtree is not empty, we should find the minimal min element and extract it from there. Hook the q to min on the left. On the right we will hook what was obtained from r. Return min after its balancing.

else //  k == p->key 
    {
        node* q = p->left;
        node* r = p->right;
        delete p;
        if( !r ) return q;
        node* min = findmin®;
        min->right = removemin®;
        min->left = q;
        return balance(min);
    }

When quitting recursion, do not forget to execute balancing:

return balance(p);
}

That’s pretty much it! The search of the minimal node and its extract can be implemented in one function. At that, we will have to solve a problem (not really difficult one) of returning a pair of pointers from the function. But we can gain some time during one passage along the right subtree.

It is obvious that insertion and deletion functions (and also a simpler one search operation) are performed in time that is proportional to the tree height. During the time of these operations performance we execute the descent from the node to the given node. On each level some fixed number of actions are performed. Since an AVL tree is balanced, its height depends logarithmically on the number of nodes. Thus, the operation time of all three base operations is guaranteed to depend logarithmically on the number of the tree nodes.

Tanks for your time!

Complete code:

struct node
{
    int key;
    unsigned char height;
    node* left;
    node* right;
    node(int k) { key = k; left = right = 0; height = 1; }
};

unsigned char height(node* p)
{
    return p?p->height:0;
}

int bfactor(node* p)
{
    return height(p->right)-height(p->left);
}

void fixheight(node* p)
{
    unsigned char hl = height(p->left);
    unsigned char hr = height(p->right);
    p->height = (hl>hr?hl:hr)+1;
}

node* rotateright(node* p)
{
    node* q = p->left;
    p->left = q->right;
    q->right = p;
    fixheight(p);
    fixheight(q);
    return q;
}

node* rotateleft(node* q)
{
    node* p = q->right;
    q->right = p->left;
    p->left = q;
    fixheight(q);
    fixheight(p);
    return p;
}

node* balance(node* p) // balancing the p node
{
    fixheight(p);
    if( bfactor(p)==2 )
    {
        if( bfactor(p->right) right = rotateright(p->right);
        return rotateleft(p);
    }
    if( bfactor(p)==-2 )
    {
        if( bfactor(p->left) > 0  )
            p->left = rotateleft(p->left);
        return rotateright(p);
    }
    return p; // balancing is not required
}

node* insert(node* p, int k) // insert k key in a tree with p root
{
    if( !p ) return new node(k);
    if( kkey )
        p->left = insert(p->left,k);
    else
        p->right = insert(p->right,k);
    return balance(p);
}

node* findmin(node* p) // find a node with minimal key in a p tree 
{
    return p->left?findmin(p->left):p;
}

node* removemin(node* p) // deleting a node with minimal key from a p tree
{
    if( p->left==0 )
        return p->right;
    p->left = removemin(p->left);
    return balance(p);
}

node* remove(node* p, int k) // deleting k key from p tree
{
    if( !p ) return 0;
    if( k key )
        p->left = remove(p->left,k);
    else if( k > p->key )
        p->right = remove(p->right,k);    
    else //  k == p->key 
    {
        node* q = p->left;
        node* r = p->right;
        delete p;
        if( !r ) return q;
        node* min = findmin®;
        min->right = removemin®;
        min->left = q;
        return balance(min);
    }
    return balance(p);
}

References

  • B. Pfaff. An Introduction to Binary Search Trees and Balanced Trees — libavl library description
  • N. Wirth. Algorithms and Data Structures — according to Wirth, balanced trees are AVL trees
  • Thomas H. Cormen and Others. Introduction to Algorithms — AVL trees are mentioned in exercises to the chapter about Red-black trees
  • D. Knuth. The Art of Computer Programming — 6.2.3 section is dedicated to theoretical analysis of AVL trees

Comments

  1. Thanks. This is well readable code for AVL Tree i could find in entire net/Books. Thank you a lot for posting this.
  2. Thank you so much for a (very very) ^ (googol) good article, sir :D you saved my life, thank you again and again :D
  3. What does this do?

    hl>hr ? hl : hrDoes that mean if hl is greater than hr, then use hl, otherwise use hr?

  4. Yes, this is exactly what it does. It uses ternary operator ?:. You can read about it here.
  5. Dear Hub,Thank you very much for providing so powerful code. I have a question to use your code.After inserted all elements, I need to pop the maximal element (return element data and then delete the node) every time (also insert some new elements). However, my test shows that every time the elements are deleted, the elements are not changed when output the whole tree (by in_order_traversal). Could you please help me to take a look at it?Thanks,Tang LaoyaPS:the code I added:node* findmax(node* p) // find a node with maximal key in a p tree { return p->right?findmax(p->right):p;}void * popmax(node* p){ node* max=findmax(p); void *key=max->key; remove(p,key); return key;}void in_order_traversal(node *p){ if(p) { in_order_traversal(p->left); cout key right); }}
  6. omg, why don’t you use indentation? You should also paste your code by using the «code» tag.

    this is wrapped with code tag.

  7. Sorry I didn’t notice the format. I post again.

    Thank you very much for providing so powerful code. I have a question to use your code.
    After inserted all elements, I need to pop the maximal element (return element data and then delete the node) every time (also insert some new elements). However, my test shows that every time the elements are deleted, the elements are not changed when output the whole tree (by in_order_traversal). Could you please help me to take a look at it?
    
    Thanks,Tang Laoya
    
    node* findmax(node* p) // find a node with maximal key in a p tree 
    {
        return p->right?findmax(p->right):p;
    }
    
    void * popmax(node* p)
    {
            node* max=findmax(p);
            void *key=max->key;
            remove(p,key);
            return key;
    }
    
    void in_order_traversal(node *p)
    {
            if(p)
            {
                    in_order_traversal(p->left);
                    cout key right);
            }
    }
    
    
  8. «code» tag is intended for code only. Anyway, how does your move function look like?
  9. move/remove
  10. Dear Hub, Thank you very much for your kindly reply. I posted the whole code modified based on your original code. It seems that the problem is solved (there are some problems in compare functions before). However, the code is somewhat slow when the number of elements is huge (more than one million). Could you please help me to take a look at it and give me some suggestion to improve the code? Thanks, TangLaoya

    /// reference: https://kukuruku.co/hub/cpp/avl-trees
    extern int avlcompare(void *item1, void *item2);
    extern void externopt(void *t);
    extern void externout(void *t);
    #include "avl.h"
    
    unsigned char height(node* p)
    {
        return p?p->height:0;
    }
    
    int bfactor(node* p)
    {
        return height(p->right)-height(p->left);
    }
    
    void fixheight(node* p)
    {
        unsigned char hl = height(p->left);
        unsigned char hr = height(p->right);
        p->height = (hl>hr?hl:hr)+1;
    }
    
    node* rotateright(node* p)
    {
        node* q = p->left;
        p->left = q->right;
        q->right = p;
        fixheight(p);
        fixheight(q);
        return q;
    }
    
    node* rotateleft(node* q)
    {
        node* p = q->right;
        q->right = p->left;
        p->left = q;
        fixheight(q);
        fixheight(p);
        return p;
    }
    
    node* balance(node* p) // balancing the p node
    {
        fixheight(p);
        if( bfactor(p)==2 )
        {
            if( bfactor(p->right) right = rotateright(p->right);
            return rotateleft(p);
        }
        if( bfactor(p)==-2 )
        {
            if( bfactor(p->left) > 0  )
                p->left = rotateleft(p->left);
            return rotateright(p);
        }
        return p; // balancing is not required
    }
    
    node* insert(node* p, void* k) // insert k key in a tree with p root
    {
        if( !p ) return new node(k);
        //if( kkey )
                    if (avlcompare(k, p->key)left = insert(p->left,k);
        else
            p->right = insert(p->right,k);
        return balance(p);
    }
    
    node* findmin(node* p) // find a node with minimal key in a p tree 
    {
        return p->left?findmin(p->left):p;
    }
    
    node* removemin(node* p) // deleting a node with minimal key from a p tree
    {
        if( p->left==0 )
            return p->right;
        p->left = removemin(p->left);
        return balance(p);
    }
    
    node* findmax(node* p) // find a node with maximal key in a p tree 
    {
        return p->right?findmax(p->right):p;
    }
    
    void * popmax(node* p)
    {
            node* max=findmax(p);
            void *key=max->key;
            remove(p,key);
            return key;
    }
    
    node* remove(node* p, void* k) // deleting k key from p tree
    {
        if( !p ) return 0;
                    int comp=avlcompare(k, p->key);
        //if( k key )
        if( comp left = remove(p->left,k);
        //else if( k > p->key )
        else if( comp > 0 )
            p->right = remove(p->right,k);  
        else //  k == p->key 
        {
            node* q = p->left;
            node* r = p->right;
            delete p;
            if( !r ) return q;
            node* min = findmin®;
            min->right = removemin®;
            min->left = q;
            return balance(min);
        }
        return balance(p);
    }
    
    void in_order_traversal(node *p)
    {
            if(p)
            {
                    in_order_traversal(p->left);
                    externout(p->key);
                    in_order_traversal(p->right);
            }
    }
    
    void pre_order_traversal(node *p)
    {
            if(p)
            {
                    externout(p->key);
                    pre_order_traversal(p->left);
                    pre_order_traversal(p->right);
            }
    }
    
    void deletetree(node *p)
    {
            if(p)
            {
                    deletetree(p->left);
                    deletetree(p->right);
                    externopt(p->key);
                    delete(p);
            }
            p=0;
    }
    
  11. Dear Hub,   
    Sorry to bother you again. There are still some problems. 
    when the code running in the latter (after many times insert/remove/popmax operations, the code will crash. 
    
    The code is used as follows:
    node *avlroot;
    
    avlroot=insert(avlroot,(void*)t);  /// happened at many locations
    ...
    avl=popmax(avlroot);  /// happened at many locations
    ...
    avlroot=remove(avlroot,(void*)t); /// happened at many locations
    ...
    
    /// finally
    deletetree(avlroot);
    
    Is there any problem in using the code?
    
    Thanks,
    Tang Laoya
    
    
    
  12. Dear Hub,
    
    The problem is solved after modified the function popmax as follows:
    void * popmax(avltree* &p)
    {
            avltree* max=findmax(p);
            void *key=max->key;
            p=remove(p,key);
            return key;
    }
    
    Do you have any suggestion to improve the code for faster speed, especially the function popmax?
    
    Thanks,
    Tang Laoya
    
    
    
  13. You are fun. Seriously. Unfortunately, Nikolai Ershov is not around to comment on how to improve the code. But I think your code looks good. You can also check this implementation: AVLTree.h and AVLTree.cpp.
  14. 
    Dear Hub,
    Thanks for your kindly reply. I noticed that the implementation you suggested used push operation instead of recursive function. How do you think the efficiency of these two implementations?
    
    Thanks,
    Tang Laoya 
    
    
  15. What operations do you specifically mean?
  16. Dear Hub,Thank you very much for providing so powerful code. I have a question to use your code.After inserted all elements, I need to pop the maximal element (return element data and then delete the node) every time (also insert some new elements). However, my test shows that every time the elements are deleted, the elements are not changed when output the whole tree (by in_order_traversal). Could you please help me to take a look at it?Thanks,Tang LaoyaPS:the code I added:node* findmax(node* p) // find a node with maximal key in a p tree { return p->right?findmax(p->right):p;}void * popmax(node* p){ node* max=findmax(p); void *key=max->key; remove(p,key); return key;}void in_order_traversal(node *p){ if(p) { in_order_traversal(p->left); cout key right); }}
  17. BYU CS235 Burton Seamons
  18. AVL Lab
  19. Can you please explain, what does this symbol “

    ®«mean?

    node* min = findmin®;
            min->right = removemin®;
    
  20. Sorry, this is a small bug in an editor. It means ( r )
  21. hay… thanl you for yout helpful functions. i write acode and i used this functions. the computer gives me an eroor (c2040). can you help me ???

    this is a link to my c code. drive.google.com/file/d/0B3g5gYC5v2QCZHltUzlfSUlYdWM/view?usp=sharing

    thank you for helpping. rayan

  22. i have a problem making an instance from my struct my instance doesn’t take any values except for NULL,can you help me out with this? here’s my code

    
    #include 
    #include 
    #include 
    using namespace std;
    struct node
    {
        int key;
        unsigned char height;
        node* left;
        node* right;
        node(int k)
        {
            key=k;
            right=NULL;
            left=NULL;
        }
        };
    
    unsigned char height(node* p)
    {
        return p?p->height:0;
    }
    
    int bfactor(node* p)
    {
        return height(p->right)-height(p->left);
    }
    
    void fixheight(node* p)
    {
        unsigned char hl = height(p->left);
        unsigned char hr = height(p->right);
        p->height = (hl>hr?hl:hr)+1;
    }
    
    node* rotateright(node* p)
    {
        node* q = p->left;
        p->left = q->right;
        q->right = p;
        fixheight(p);
        fixheight(q);
        return q;
    }
    
    node* rotateleft(node* q)
    {
        node* p = q->right;
        q->right = p->left;
        p->left = q;
        fixheight(q);
        fixheight(p);
        return p;
    }
    
    node* balance(node* p) // balancing the p node
    {
        fixheight(p);
        if( bfactor(p)==2 )
        {
            if( bfactor(p->right) right = rotateright(p->right);
            return rotateleft(p);
        }
        if( bfactor(p)==-2 )
        {
            if( bfactor(p->left) > 0  )
                p->left = rotateleft(p->left);
            return rotateright(p);
        }
        return p; // balancing is not required
    }
    
    node* insert(node* p, int k) // insert k key in a tree with p root
    {
        //if( !p ) return new node(k);
        if( k key )
            p->left = insert(p->left,k);
        else
            p->right = insert(p->right,k);
        return balance(p);
    }
    
    node* findmin(node* p) // find a node with minimal key in a p tree
    {
        return p->left?findmin(p->left):p;
    }
    
    node* removemin(node* p) // deleting a node with minimal key from a p tree
    {
        if( p->left==0 )
            return p->right;
        p->left = removemin(p->left);
        return balance(p);
    }
    
    node* remove(node* p, int k) // deleting k key from p tree
    {
        if( !p ) return new node(k);
        if( k key )
            p->left = remove(p->left,k);
        else if( k > p->key )
            p->right = remove(p->right,k);
        else //  k == p->key
        {
            node* q = p->left;
            node* r = p->right;
            delete p;
            if( !r ) return q;
            node* min = findmin (r);
            min->right = removemin(r);
            min->left = q;
            return balance(min);
        }
        return balance(p);
    }
        node* Search(int data, node *aNode)
        {
            if (aNode != NULL) {
                if (data == aNode->key) {
                    return aNode;
                }
              else if (data key) {
                  
                    return Search(data, aNode->left);
                }
                else    {
                   
                    return Search(data, aNode->right);
                }
            }
            else {
                coutkeyleft);
                preOrder(aNode->right);
            }
        }
           void inOrder(node *aNode)
        {
            if (aNode != NULL) {
                inOrder(aNode->left);
                coutkeyright);
            }
        }
           void postOrder(node *aNode)
        {
            if (aNode != NULL) {
                postOrder(aNode->left);
                postOrder(aNode->right);
                coutkeykey > k1 && aNode->key left,s);
            find(k1, k2, aNode->right,s);
        }
    }
    
    int find(int k1, int k2,node *aNode)
    {
        int s = 0;
        find(k1,k2,aNode,&s);
        return s;
    }
    
    
    int main(){
        node *o1;
        int choice=0;
        int t=0,a=0,b=0,k1=0,k2=0;
        
    do {
       
        cout>choice;
        switch (choice) {
            case 1:
                cout>t;
                insert(o1,t);
                break;
            case 2:
                cout>a;
                Search(a,o1);
                break;
            case 3:
                cout>b;
                remove(o1,b);
                cout>k1;
                cin>>k2;
                cout
  23. Great article! it was very helpful :). There was one thing I did not quite understand… why do we need to use removemin in the remove function. if I understood correctly removemin job is only to find the right node of the minimum node…

    isn’t it the min->right already?

    Thanks,

3,751

Ropes — Fast Strings

Most of us work with strings one way or another. There’s no way to avoid them — when writing code, you’re doomed to concatinate strings every day, split them into parts and access certain characters by index. We are used to the fact that strings are fixed-length arrays of characters, which leads to certain limitations when working with them. For instance, we cannot quickly concatenate two strings. To do this, we will at first need to allocate the required amount of memory, and then copy there the data from the concatenated strings.