A Large-Scale Empirical Analysis of Email Spam Detection Through Transport-Level Characteristics

TitleA Large-Scale Empirical Analysis of Email Spam Detection Through Transport-Level Characteristics
Publication TypeTechnical Report
Year of Publication2010
AuthorsOuyang, T., Ray S., Allman M., & Rabinovich M.
Other Numbers2791
Abstract

Spam is a never-ending issue that constantly consumes resources to no useful end. In this paper we evaluate the efficacy of using a machine learning-based model of the transport layer characteristics of email traffic to identify spam. The underlying idea is that the manner in which spam is transmitted has an impact that is statistically observable in the traffic (e.g., in the network round-trip time or jitter between packets). Therefore, by identifying a solid set of traffic features we can construct a model that can identify spam without relying on expensive content filtering. We carry out a large scale empirical analysis of this idea with data collected over the course of one year (roughly 600K messages). With this data, we train classifiers using machine learning methods and test several hypotheses. First, we validate prior results using similar techniques. Second, we determine which transport characteristics contribute most significantly to the detection process. Third, we analyze the behavior of our detectors over weekly and monthly intervals and in the presence of major network events. Finally, we evaluate the behavior of our detectors in a practical setting where they are used in a filtering pipeline along with standard off-the-shelf content filtering methods, and demonstrate that they can lead to computational savings in practice.

URLhttp://www.icsi.berkeley.edu/pubs/techreports/TR-10-001.pdf
Bibliographic Notes

ICSI Technical Report TR-10-001

Abbreviated Authors

T. Ouyang, S. Ray, M. Allman, and M. Rabinovich

ICSI Research Group

Networking and Security

ICSI Publication Type

Technical Report