Back to the Future: Malware Detection with Temporally Consistent Labels

TitleBack to the Future: Malware Detection with Temporally Consistent Labels
Publication TypeJournal Article
Year of Publication2015
AuthorsMiller, B., Kantchelian A., Tschantz M. Carl, Afroz S., Bachwani R., Faizullabhoy R., Huang L., Shankar V., Wu T., Wu G., Joseph A. D., & Tygar J.D..
Published inCoRR
Date Published10/2015

The malware detection arms race involves constant change: malware changes to evade detection and labels change as detection mechanisms react. Recognizing that malware changes over time, prior work has enforced temporally consistent samples by requiring that training binaries predate evaluation binaries. We present temporally consistent labels, requiring that training labels also predate evaluation binaries since training labels collected after evaluation binaries constitute label knowledge from the future. Using a dataset containing 1.1 million binaries from over 2.5 years, we show that enforcing temporal label consistency decreases detection from 91% to 72% at a 0.5% false positive rate compared to temporal samples alone.

The impact of temporal labeling demonstrates the potential of improved labels to increase detection results. Hence, we present a detector capable of selecting binaries for submission to an expert labeler for review. At a 0.5% false positive rate, our detector achieves a 72% true positive rate without an expert, which increases to 77% and 89% with 10 and 80 expert queries daily, respectively. Additionally, we detect 42% of malicious binaries initially undetected by all 32 antivirus vendors from VirusTotal used in our evaluation. For evaluation at scale, we simulate the human expert labeler and show that our approach is robust against expert labeling errors. Our novel contributions include a scalable malware detector integrating manual review with machine learning and the examination of temporal label consistency

ICSI Research Group

Networking and Security