Exact Maximum Clique for Large or Massive Real Graphs

Algorithms

The excellent performance reported by BBMCSP—an exact maximum clique algorithm tailored for massive real networks— in a previous post has raised a number of comments, some even questioning either the report itself or the problem’s complexity. This post gives an insight on how BBMCSP works. In the process, and similar to what happens when magicians explain their tricks, we are aware that some of the magic will be lost.

BBMCSP’s overarching ideas are described in the pseudocode below:

BBMCSP: an exact maximum clique algorithm

Input: A simple graph G(V, E)

Output: A maximum clique in G in Smax

  1. Compute k-core decomposition of G: K(G)
  2. Compute an initial clique greedily: Smax
  3. Remove vertices from G such that K(v)<|Smax|: Gr (Vr, Er)
  4. Sort Vr according to degeneracy
  5. Iteratively branch on vertices in Vr in reverse order: v
  6. | Compute subproblem derived from v: W ⊆ Vr
  7. | Compute a greedy coloring of W: C(W). if |C(W)|< |Smax| return
  8. | Compute k-core decomposition of W: K(W). if |K(W)|< |Smax| return
  9. | Remove vertices w from W such that K(w)<|Smax|: W’
  10. | Sort W’ according to degeneracy
  11. | SEARCH for a maximum clique in W’ with BBMC sparse solver: Smax
  12. |_ Remove v from Vr
  13. return Smax

STEP BY STEP

All steps except 11 constitute preprocessing tailored for massive graphs. Core hierarchical decomposition in step 1 is a must for any decent graph software package. Each k-core determines a maximal subgraph with the property that all its vertices have degree at least k (cf. previous post). Thus, the core number of a vertex is the order the highest core which contains it and the core number of the graph is the order of its highest core. Note that, in the case of real graphs, the core number is usually much lower than the maximum degree of the graph—e.g. in networks with a power law distribution only a few individuals are massively connected—. Step 3 removes vertices that cannot possibly improve the initial greedy clique in Smax because of a low core number. After that vertices are sorted in step 4 according to degeneracy, i.e. non increasing core number.

Step 5 is the main loop which may be regarded as a first level unrolling of the search tree of a typical exact solver. For each vertex v picked in reverse order, the child subproblem—its neighbor set— is also preprocessed trying to produce an early cut. Step 7 is concerned with approximate color cuts, i.e. the current best solution in Smax cannot be improved by any subproblem in which a coloring of size less than | Smax|. Moreover, Step 8 prunes the subproblem if its core number is less than| Smax|. Finally, if the subproblem cannot be pruned directly in the previous two steps, concrete vertices are removed when their core number is less than | Smax| in step 9. Remaining vertices are sorted by degeneracy in step 10.

The actual NP-hard search occurs in step 11 over the filtered subproblem W’. BBMCSP uses the exact bitstring solver BBMC tailored for large or massive graphs by a specifically designed sparse encoding, as mentioned in the alluded previous post.

AN EXAMPLE

Figure 1. A Graph example. Label notation <x:y:z> denotes vertex index, kcore and degree respectively

Figure 1 contains a simple graph G of order 10 –notation for node labels x:y:x is x for index, y for kcore number and z for degree–. The graph core number K(G) is 4 and its maximum degree is 6 (given by vertex 6). Step 2 of the algorithm finds an initial greedy clique | Smax| of 3 so vertices 1 and 5 are explicitly removed from G in step 3. The remaining vertices are ordered according to degeneracy in step 4 (see figure 1, left).

Figure 2 below contains the resulting behaviour of BBMCSP in the main loop (steps 5-12), which includes the first level unrolling and the actual NP-hard search. Vertices are picked in reverse order (header v) and, after each iteration, removed from Gr. The first iteration in the loop calls vertex 8 and analyses subproblem W={2, 3, 5, 7}. The vertex coloring C(W) has color size 3, same as | Smax| so the subproblem cannot be pruned at step 7. The subproblem is pruned, however, in step 8 by core analysis. Subsequently vertex 8 is removed and vertex 7 startsa new iteration (second row).

The critical step occurs when vertex 6 is chosen, which contains a maximum clique {1, 2, 4, 6}. In this case the entire subproblem cannot obviously be pruned so vertex 9 is reached and vertex 5 is removed since K(5) is lower than | Smax|. The search routine is finally called in step 11 and the clique si found. Remaining vertices in Gr are pruned more or less trivially thereafter.

BBMCSP steps 5-12 taken in the exact maximum clique problem solving

Figure 2. BBMCSP steps 5-12

IMPLEMENTATION DETAILS

K-core analysis is critical during preprocessing. If the reader wants to implement it by himself we warn him that the typical solution runs in O(|V|2) which is not adequate for large or massive sparse graphs. We recommend the O(|E|) algorithm of Bagatelij—the implementation used by BBMCSP can be found in the pablodev/graphblock in the Biicode repo—. A side result is that vertices are also sorted according to degeneracy, which is necessary for steps 4 and 10 as explained.

A sequential greedy coloring in step 7 assigns, to each vertex in turn, the lowest possible color label consistent with the current partial coloring. The bitstring implementation employed by BBMCSP may be found in the pablodev/copt block (InitColor class).

Additionally, the sparse encoding used by BBMC is available as part of the BITSCAN library—pablodev/bitscan block— and the GRAPH library—pablodev/graph— block. BBMCSP uses the sparse_graph available in GRAPH and currently most of the source code is also available in the pablodev/copt block.

A NOTE ON COMPLEXITY

In reply to the comments of some of our readers, maximum clique is still NP-hard, yes, and uniform graphs of say a thousand vertices and 0.8 density remain a very difficult challenge for today’s best exact solvers. The reason why BBMCSP is so successful lies mainly on the structure of real networks, which is typically much simplified after the heavy tailored preprocessing described.

That this is so can be seen in the following example: the California road network roadNet-CA with 1.9 million nodes, 2.7 million edges and a clique number of 4 was trivially solved in less than a second of preprocessing by BBMCSP—step 11, the NP-hard search step, is actually never called—. The same graph was proposed as benchmark in a recent Big Data Conference [1]. There it took 153 seconds to solve using 75 processor nodes!

Raw results of BMCSPagainst more than 200 real networks, as well as results of state-of-the-art algorithms are available here. We are currently working on a heavy refactoring and a more sophisticated command line parameter interface both for Windows and Linux binaries. As always, we await comments and suggestions from readers.

[1] Hagan, R. D. et al.; Toward an Efficient, Highly Scalable Maximum Clique Solver for Massive Graphs, IEEE Conf. on Big Data, 2014.

Comments

    3,751

    Ropes — Fast Strings

    Most of us work with strings one way or another. There’s no way to avoid them — when writing code, you’re doomed to concatinate strings every day, split them into parts and access certain characters by index. We are used to the fact that strings are fixed-length arrays of characters, which leads to certain limitations when working with them. For instance, we cannot quickly concatenate two strings. To do this, we will at first need to allocate the required amount of memory, and then copy there the data from the concatenated strings.