EECS technical reports

Conditions of Use

Archive Home Page

Automatic Sparsity Detection in LAPACK

Carbunescu, Razvan
Technical Report Identifier: EECS-2014-224

Abstract: Many applications call linear algebra libraries as methods of achieving better performance and reliability. LAPACK (Linear Algebra Package) is a standard software library for numerical linear algebra that is widely used in the industrial and scientific community. LAPACK functions require the user to know the sparsity and other mathematical structure of their inputs to be able to take advantage of the fastest codes: General Matrix (GE), General Band (GB), Positive Definite (PO) etc. If a user is unsure of their matrix structure or cannot easily express it in the formats available (profile matrices, arrow matrices etc.) they are forced to use a more general structure, which includes their input, and so run less efficiently than possible. The goal of this thesis is to allow for automatic sparsity detection (ASD) within LAPACK that is completely hidden from the user and provides no slowdown for users running fully dense matrices. This work adds modular support for the detection of blocked sparsity within LAPACK LU and Cholesky functions. It also creates the infrastructure and the algorithms to potentially expand sparsity detection to other factorizations, more input matrix structures, or provide further timing and memory improvements via integration directly in the solver routines. Two general approaches are implemented named 'Profile' (ASD1) and 'Sparse block' (ASD2) with a third more complicated method named 'Full sparsity' (ASD3) being described more abstractly, only at an algorithm level. With these algorithms we obtain benefits of up to an order of magnitude (35.10x faster over the same LAPACK function) for matrices displaying 'blocked sparsity' patterns and large benefits over the best LAPACK algorithms for patterns that don't fit into LAPACK categories (4.85x faster over the best LAPACK function). For matrices exhibiting no sparsity these implementations incur either a negligible penalty (an overhead of 1 percent) or incur a small overhead (10-30 percent) that quickly decreases with the size of matrix n or band b (less than 5 percent for n,b greater than 500).