Cholesky Factorization and Elimination Tree ~ rainyuxuan

In the last chapter, we introduced the concept of dependency. In this chapter, we will look into it and see how we can utilize this concept in factorization.

Left-looking Cholesky factorization

Particularly, we will use the Left-Looking Cholesky Factorization. In the left-looking Cholesky factorization, we compute the factor $\bold{L}$ column-by-column from left to right, using the already computed information ( $\bold{I}_1$ , $\bold{L}_2$ , and $\bold{A}$ in the figure below).

Figure 1

In practice, we can fold together these two expressions above into a single matrix-vector multiplication and addition. We call $\bold{c}$ the update matrix (in this case, it’s a vector; but when using supernodes, it will become a matrix).

Another Cholesky method is the Up-Looking Cholesky Factorization.

Overview of sparse factorization

For sparse factorization, we usually go through two phases:

Symbolic Analysis: Analyzing the non-zero pattern of matrix $\bold{A}$ and factor $\bold{L}$ , including the elimination tree, column and row counts, fill-reducing permutation, and supernodes.
Numerical Factorization: Computing the numerical values in $\bold{L}$ .

Figure 2, (Davis, 2016)

Elimination tree

The elimination tree (etree) of the matrix is a pruned version of the Dependency Graph $G_L$ . The idea is to eliminate the transitive edges that can be implied by a path of other edges in the factor’s dependency graph. The implication is based on: the non-zero entry $a_{ij}$ indicates a path from node j to i.

Although it looks like the path is longer than a single edge, remember that we are not trying to find the shortest path. We are trying to find a minimal graph that can represent the dependencies or the order of computation, and the path can cover more dependencies than a single edge. Intuitively, you can visualize the algorithm as only keeping each column’s topmost non-diagonal non-zero entry.

Figure 3, (Davis, 2016)

Thinking question: why does selecting the longer path make you select the smallest/topmost non-diagonal entry?

Figure 4

With etree, you can easily find the dependencies of node j (the nodes that need to be computed before j) by traversing down the descendants, and find the nodes that will be affected by node j by traversing up the ancestors. Therefore, we can determine that the order of computation is from the leaves of the tree to the root.

The elimination tree is stored as an array of parents, where parents[i] is the parent node of the node i.

Constructing the elimination tree

The etree describes the dependency pattern of the factor $\bold{L}$ , but we can compute etree from $\bold{A}$ iteratively before factorization (see cs_etree for details).

We iterate over the columns in the upper triangular matrix of $\bold{A}$ and then look at the non-zero rows in each column. Note that this is equivalent to iterating over the rows in the lower triangular matrix and then looking at the columns, but the upper matrix is easier to traverse with the CSC format.

The iterative computation of the etree is as follows:

Suppose $T_{k-1}$ (the etree of submatrix $\bold{L}_{1...k-1, 1...k-1}$ ) is known. This tree is a subset of $T_k$ . To compute $T_k$ from $T_{k-1}$ , the children of k (which are root nodes in $T_{k-1}$ ) must be found. Since $a_{ki} \neq 0$ implies the path $i\longrightarrow k$ exists in T , this path can be traversed in $T_{k-1}$ until reaching a root node. This node must be a child of k, since the path $i\longrightarrow k$ must exist.
Davis, 2006

Beyond this, To speed up the subtree traversal (all nodes within the path started from i will be rooted at k), we keep an array of ancestors, and set ancestors[i] = k to put the subtree $T_{k-1}$ under $T_{k}$ . This process is path compression. Also, notice that the ancestor of a node is the root node of the largest-k row subtree $T^k$ the node is in.

Figure 5, (Davis, 2016): The 9th iteration of cs_etree. Red lines show the parents, yellow lines show the ancestors.

The row subtree of node k, $T^k$ is a subtree of the etree which consists of the paths from those nodes j, where j‘s descendants are all zeros in column k.

Node j is a leaf of $T^k$ if and only if both $a_{jk} \neq 0$ and $a_{ik} = 0$ for every descendant i of j in the elimination tree $T$ .
Davis, 2006

Post-ordering

One simple trick to improve the performance of the factorization is post-ordering. After building the elimination tree, we can permutate the matrix $\bold{A}$ to exploit memory locality. Since the tree hierarchy determines the order of computation, we can place the nodes that have a dependency relationship closer to make memory access more consecutive.

For example, in the figure above, the etree will guide the computation to follow the order $2 \rightarrow 3 \rightarrow 5 \rightarrow 1 \rightarrow 4 \rightarrow 8$ . The nodes 2, 3, and 5 required by node 8 are computed a long time ago, and the pointer jumps back and forth. Hence the ideal computation order is $2 \rightarrow 3 \rightarrow 5 \rightarrow 8 \rightarrow 1 \rightarrow ...$ , where each node is computed right after its dependencies are ready.

The postordering of an etree can be easily found by performing a depth-first search. Column j will be moved to i where i is the tree node’s position’s order of traversal during a DFS.

Permutation

In this context, permutation means switching nodes’ order; this is feasible because the real problem focuses on the relationship (edges) of the nodes, not their ordering. We can store the permutation of $\bold{A}$ by a matrix $\bold{P}$ and perform matrix multiplications $\bold{PAP^\intercal}$ .

Figure 6, (Davis, 2006): Nodes move, edges persist.

Non-zero patterns

Fill-ins

When factorizing a matrix, it will create some non-zero entries in the factor that does not exist in the original matrix (the crossed dots in Figure 2). We can these new entries the fill-ins. We can do a Cholesky factorization to see how they appear.

A key observation of the non-zero pattern in the factor is:

l_{ji} \neq 0 \text{ and } l_{ki} \neq 0 \Rightarrow l_{kj} \neq 0

For a matrix $\bold{A}$ , according to the left-looking Cholesky,

\bold{l_k} = (\bold{a_i} - \bold{L_2} \bold{I_1^\intercal}) / l_{kk}

Particularly, given that $l_{ji} \neq 0$ ( $l_{ji}$ is in $\bold{I_1}$ ) and $l_{ki} \neq 0$ ( $l_{ki}$ is in $\bold{L_2}$ ), the theorem asserts that the entry $l_{ji}$ in $\bold{l_k}$ is non-zero in the factor $\bold{L}$ , even if $a_{ji}$ is a zero.

In other words, given three nodes $i < j < k$ , if paths $i\longrightarrow j$ and $i\longrightarrow k$ exist, then there will be a path $j\longrightarrow k$ in the factor that is not necessarily in the original matrix.

Fill-reducing ordering

The fill-ins increase the amount of computation; by permutating the nodes prior to computing the elimination tree, we can reduce the number of fill-ins. The fill-in minimization problem is NP-hard, so we usually use heuristics.

Figure 7: Two matrices with the same content but different ordering create different fill-ins.

Nested dissection ordering

Nested dissection ordering is a fill-reducing ordering that is useful in the field of computer graphics. We can use an analogy: “Moving your toes does not affect your head”.

Nested dissection ordering finds a list of nodes as a separator (for example, the “waist” from the analogy above), and then puts the nodes that are closely connected, and separated by the separator, near each other (“leg” with “feet”, “head” with “neck”). The separated part can then dissect recursively.

The goal of the ordering is to build a balanced etree, the dissected parts will be two disjoint subtrees, and they join together by the separator. As an effect, the matrix will result in a block structure, where there are no edges between each block.

Figure 8, (Herholz & Alexa, 2019): Red region shows the separator. Red blocks in the bottom-left figure show fill-ins.

References

Davis, T. A. (2006). Direct methods for sparse linear systems. Society for Industrial and Applied Mathematics.

Davis, T. A., Rajamanickam, S., & Sid-Lakhdar, W. M. (2016). A survey of direct methods for sparse linear systems. Acta Numerica, 25, 383–566. https://doi.org/10.1017/S0962492916000076

Herholz, P., & Alexa, M. (2019). Factor once: reusing cholesky factorizations on sub-meshes. ACM Transactions on Graphics, 37(6), 1–9. https://doi.org/10.1145/3272127.3275107