Shark: Fast Data Analysis Using Coarse-grained Distributed Memory
Title | Shark: Fast Data Analysis Using Coarse-grained Distributed Memory |
Publication Type | Conference Paper |
Year of Publication | 2012 |
Authors | Engle, C., Lupher A., Xin R., Zaharia M., Franklin M. J., Shenker S. J., & Stoica I. |
Other Numbers | 3383 |
Abstract | Shark is a research data analysis system built on a novelcoarse-grained distributed shared-memory abstraction. Sharkmarries query processing with deep data analysis, providinga unified system for easy data manipulation using SQL andpushing sophisticated analysis closer to data. It scales tothousands of nodes in a fault-tolerant manner. Shark cananswer queries 40X faster than Apache Hive and run machinelearning programs 25X faster than MapReduce programsin Apache Hadoop on large datasets. |
Acknowledgment | We would like to thank Peter Alvaro, Eric Yi Liu, TimKraska, Gene Pang, and Andrew Wang for feedback.This research is supported in part by gifts from Google,SAP, Amazon Web Services, Blue Goji, Cloudera, Ericsson,General Electric, Hewlett Packard, Huawei, IBM, Intel,MarkLogic, Microsoft, NEC Labs, NetApp, Oracle, Quanta,Splunk, VMware and by DARPA (contract #FA8650-11-C-7136). |
URL | http://www.icsi.berkeley.edu/pubs/networking/ICSI_sharkfastdata12.pdf |
Bibliographic Notes | Demo, ACM SIGMOD/PODS Conference, Scottsdale, Arizona |
Abbreviated Authors | C. Engle, A. Lupher, R. Xin, M. Zaharia, M. Franklin, S. Shenker, and I. Stoica |
ICSI Research Group | Networking and Security |
ICSI Publication Type | Article in conference proceedings |