Scientific discovery increasingly relies on the ability to analyze datasets of unprecedented scale and complexity. In genomics, studies that once examined a few thousand individuals now aim to analyze millions. Computing systems capable of handling this scale exist but using them effectively requires expertise in parallel programming and modern accelerator architectures, which most scientists lack. As a result, important scientific questions remain unanswered because computations would take weeks or months to complete. This award addresses that gap by establishing sparse linear algebra as a unifying abstraction for scientists to express their computations in a standard form, enabling portable execution on evolving computing hardware. By lowering the barrier to advanced computing systems, the project will accelerate discovery in data-intensive sciences while strengthening the United States workforce in high-performance computing. A central component is an undergraduate training program that recruits students from across the United States to participate in international supercomputing competitions, creating a pipeline of talent prepared to use the computing facilities in which the nation has invested. This project develops a systematic methodology for mapping irregular scientific computations onto sparse linear algebra primitives, enabling portable execution across heterogeneous systems, including CPUs, GPUs, and emerging architectures. The project advances three research thrusts. The first develops methods to automatically translate domain code into sparse linear algebra, using a semantic-signature layer that abstracts over syntactic variations of common computational motifs and equality saturation to rewrite code into canonical sparse-primitive form. The second identifies sparsity structure and algebraic provenance from code and uses these properties to specialize sparse primitives. The third extends these application-aware primitives to distributed memory through within-primitive layout derivation and sequence-level composition passes that determine communication-optimal data layout from the recovered structure. The validation includes genome-wide association testing on genotype representation graphs, standard graph algorithm benchmarks, and clustering workloads at scale on national supercomputing resources. The educational activities integrate research outcomes into graduate coursework and create an undergraduate training program to enable students to effectively use supercomputing facilities across the nation. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2542947 | Program: 01002930DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Giulia Guidi | Institution: Cornell University, ITHACA, NY | Award Amount: $425,666 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2542947 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2542947.html

CAREER: Sparse Linear Algebra as a Scalable Computational Paradigm

Description

Interested in this grant?

Grant Details

External Links

Get personalized grant matches