Sparse Linear Systems and Direct Methods


My study note for my CSC494 course project: Systems for Graphics.
A rough introduction to solving a sparse linear system and related problems.

Background

The project started by investigating the Incremental Potential Contact model for physics-based animation, which is an augmented Newton-type optimizer. We found that the bottleneck of the computation happens at the step of solving for the step-direction, which is a sparse linear solve with a Hessian matrix.

\bold{H}\bold{p}= -\bold{g}

So my research started with the direct methods for sparse linear systems. These are my study notes that roughly cover some of the basic knowledge and concepts involved in the topic with some expressive figures.

Great thanks to Behrooz and Prof. Maryam for all their help and support!

Goal: Solving a sparse linear system

The end goal of the problem is to solve for \bold{x} in a sparse linear system:

\bold{Ax}=\bold{b}

where \bold{A} is a known sparse matrix, \bold{b} is a known vector, and \bold{x} is the UNKNOWN vector.

The naive method uses the Gaussian Elimination to convert \bold{A} into the row echelon form (as an upper-triangular matrix) and then perform a backward solve.

Another method to solve this linear system is first to perform a matrix factorization, to decompose the matrix \bold{A} into the product of a lower triangular matrix and an upper triangular matrix, and then perform two less costly solves. For our case, \bold{A} is the Hessian matrix in Newton’s Method, which means it is symmetric positive-definite (SPD), so that we can apply the Cholesky factorization to find a lower triangular factor \bold{L}.

\bold{A}=\bold{L}\bold{L}^\intercal

And then apply a series of computations:

\text{Substitute } \bold{A}\bold{x} = \bold{L} \bold{L}^\intercal \bold{x} = \bold{b}
\\
\text{Let } \bold{L}^\intercal \bold{x} = \bold{y}
\\
\text{Solve } \bold{L} \bold{y} = \bold{b}
\\
\text{Solve } \bold{L}^\intercal \bold{x} = \bold{y}

Solve: algorithm

Let’s focus on the problem of \bold{L} \bold{x} = \bold{b}, whose solution \bold{x} can be computed iteratively.

C++
x = b
for j = 0 to n-1 do // Iterate over b
  if x[j] != 0:
    for i = j+1 to n-1 do  // Iterate over rows of L at column j
      x[i] = x[i] - L(i,j) * x[j]

Sparse

For the sparse case, the for-loop at line 4 from the above algorithm can be pruned to only iterate over the columns that the entry j is dependent on. See the following figure:

Figure 1

In this example, when we are solving for j = 1 (the first entry), only the non-zero entries in column j of \bold{L} will affect the later iterations. This gives us the concept of dependency: the computation of node (column) j will only affect a subset of all nodes; only a subset of all nodes will affect the computation of the node j.

The dependency relationship of the nodes in the factor \bold{L} can be expressed by a Graph G_\bold{L}. We can also visualize it in the factor L, each non-zero entry implies an edge from the entry’s column index to the row index.

Figure 2, (Davis, 2016)

Reach

Another way to improve the solve step is to look at the non-zero pattern of \bold{b} and \bold{x} and prune line 2 to only iterate over those nodes.

When the right-hand side \bold{b} is sparse, the solution \bold{x} is also sparse. Theorems state that, with the non-zero pattern of columns in \bold{L}, the non-zero pattern of \bold{x} can also be determined.

Figure 3, (Davis, 2016)

We call the list of the relevant nodes of the non-zero pattern B of \bold{b}, which is also the non-zero pattern of \bold{x}, the Reach_{L}(B). The list order of the Reach is a topologic ordering of the nodes, in terms of the order of computation.

The reach tells: in order to compute a list of nodes, which nodes needed to be computed prior and in what ordering.

Data structure

Let’s start with an easy concept: the data structure for storing the sparse matrix. SuiteSparse stores sparse matrices in a Compressed Sparse Column (CSC) format. It consists of 3 arrays.

  1. p: index array of size ncols + 1. Storing the beginning index of each column in array i.
  2. i: index array of size nzmax. Storing the row index of each non-zero. Entries of columns are stored consecutively. Therefore, the array p is used to find the beginning of each column.
  3. x: value array of size nzmax. Storing the numerical value of each non-zero. Corresponding to the array i.

For the column j of a sparse matrix, the row indices of the non-zeros are stored in i[p[j]] through i[p[j+1]-1], and the same locations in x will store their numerical values.

C++
/*
Matrix:
  |  1                 |
  |  1.1  2            |
  |  1.2       3       |
  |  1.3  2.3  3.3  4  |
*/

int[] p =     [0,                  4,        6,        8,   9]
int[] i =     [0,   1,   2,   3,   1,   3,   2,   3,   3  ]
double[] x =  [1.0, 1.1, 1.2, 1.3, 2.0, 2.3, 3.0, 3.3, 4.0]

There is also another format called Compressed Sparse Row (CSR), which is basically the same idea as CSC. Interestingly, you may find that the CSC of a matrix is identical to the CSR of its transpose.

Anyway, the data structure is not crucial in understanding the concepts in our study.

Next Chapter

We have seen that for solving a sparse linear system with a Hessian matrix, we need to know the dependency relations among the columns.

References

  Davis, T. A. (2006). Direct methods for sparse linear systems. Society for Industrial and Applied Mathematics.  

  Davis, T. A., Rajamanickam, S., & Sid-Lakhdar, W. M. (2016). A survey of direct methods for sparse linear systems. Acta Numerica, 25, 383–566. https://doi.org/10.1017/S0962492916000076

  Herholz, P., & Alexa, M. (2019). Factor once: reusing cholesky factorizations on sub-meshes. ACM Transactions on Graphics, 37(6), 1–9. https://doi.org/10.1145/3272127.3275107