Reliable, Memory Speed Storage for Cluster Computing Frameworks
Title | Reliable, Memory Speed Storage for Cluster Computing Frameworks |
Publication Type | Technical Report |
Year of Publication | 2014 |
Authors | Li, H., Ghodsi A., Zaharia M., Shenker S. J., & Stoica I. |
Other Numbers | 3705 |
Abstract | Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique borrowed from application frameworks, into the storage layer. The key challenge in making a long-lived lineage-based storage system is timely data recovery in case of failures. Tachyon addresses this issue by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under common resource schedulers. Our evaluation shows that Tachyon outperforms in-memory HDFS by 110x for writes. It also improves the end-to-end latency of a realistic workflow by 4x. Tachyon is open source and is deployed at multiple companies. |
Acknowledgment | This research is supported in part by NSF CISE Expeditions Award CCF- 1139158, LBNL Award 7076018, and DARPA XData Award FA8750-12-2- 0331, and gifts from Amazon Web Services, Google, SAP, The Thomas and Stacey Siebel Foundation, Apple, Inc., Cisco, Cloudera, EMC, Ericsson, Facebook, GameOnTalis, Guavus, HP, Huawei, Intel, Microsoft, NetApp, Pivotal, Splunk, Virdata, VMware, WANdisco and Yahoo! |
Bibliographic Notes | Technical Report UCB/EECS-2014-135, EECS Department, University of California, Berkeley |
Abbreviated Authors | H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica |
ICSI Research Group | Research Initiatives |
ICSI Publication Type | Technical Report |