Why Let Resources Idle? Aggressive Cloning of Jobs with Dolly

TitleWhy Let Resources Idle? Aggressive Cloning of Jobs with Dolly
Publication TypeConference Paper
Year of Publication2012
AuthorsAnanthanarayanan, G., Ghodsi A., Shenker S. J., & Stoica I.
Page(s)1-6
Other Numbers3380
Abstract

Despite prior research on outlier mitigation, our analysisof jobs from the Facebook cluster shows that outliersstill occur, especially in small jobs. Small jobsare particularly sensitive to long-running outlier tasksbecause of their interactive nature. Outlier mitigationstrategies rely on comparing different tasks of the samejob and launching speculative copies for the slower tasks.However, small jobs execute all their tasks simultaneously,thereby not providing sufficient time to observeand compare tasks. Building on the observation that clustersare underutilized, we take speculation to its logicalextreme—run full clones of jobs to mitigate the effectof outliers. The heavy-tail distribution of job sizes impliesthat we can impact most jobs without using muchresources. Trace-driven simulations show that average

Acknowledgment

This work was partially supported by funding provided by the sponsors of the AMP Lab at Berkeley:SAP, Amazon Web Services, Cloudera, Huawei, IBM, Intel, Microsoft, NEC, NetApp and VMWare, and the U.S. Defense Advanced Research Projects Agency (DARPA - contract #FA8650-11-C-7136). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of DARPA or of the U.S. Government.

URLhttp://www.icsi.berkeley.edu/pubs/networking/ICSI_whyletresources12.pdf
Bibliographic Notes

Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '12), pp. 1-6, Boston, Massachusetts

Abbreviated Authors

G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica

ICSI Research Group

Networking and Security

ICSI Publication Type

Article in conference proceedings