Previous Work: Implement and Evaluate Matrix Algorithms in Spark on High Performance Computing Platforms for Science Applications

Principal Investigator(s): 
Fred Roosta

The overall goal of this project is to enable the Berkeley Data Analytics Stack (BDAS) to run efficiently on the Cray XC30 and Cray XC40 supercomputer platforms. BDAS has a rich set of capabilities and is of interest as a computational environment for very large-scale machine learning and data analysis applications. To extend the capabilities of BDAS, ICSI researchers will consider the performance of deterministic and randomized matrix algorithms for problems such as least-squares approximation and low-rank matrix approximation that underlie many common machine-learning algorithms. To demonstrate end-to-end performance improvement, they will focus on a class of scientific data applications drawn from collaborations with colleagues at Lawrence Berkeley National Laboratory (LBNL). The science data applications will include BioImaging, Climate, and Neuroscience. These applications will stress-test the linear algebra features in BDAS in high performance computing (\HPC") environments. The project will continue these collaborations, using these scientific applications to provide downstream validation of the algorithms that are developed, implemented, and evaluated in BDAS.

Funded by Cray Research Inc.