A Large-Scale Empirical Analysis of Email Spam Detection Through Transport-Level Characteristics
Title | A Large-Scale Empirical Analysis of Email Spam Detection Through Transport-Level Characteristics |
Publication Type | Technical Report |
Year of Publication | 2010 |
Authors | Ouyang, T., Ray S., Allman M., & Rabinovich M. |
Other Numbers | 2791 |
Abstract | Spam is a never-ending issue that constantly consumes resources to no useful end. In this paper we evaluate the efficacy of using a machine learning-based model of the transport layer characteristics of email traffic to identify spam. The underlying idea is that the manner in which spam is transmitted has an impact that is statistically observable in the traffic (e.g., in the network round-trip time or jitter between packets). Therefore, by identifying a solid set of traffic features we can construct a model that can identify spam without relying on expensive content filtering. We carry out a large scale empirical analysis of this idea with data collected over the course of one year (roughly 600K messages). With this data, we train classifiers using machine learning methods and test several hypotheses. First, we validate prior results using similar techniques. Second, we determine which transport characteristics contribute most significantly to the detection process. Third, we analyze the behavior of our detectors over weekly and monthly intervals and in the presence of major network events. Finally, we evaluate the behavior of our detectors in a practical setting where they are used in a filtering pipeline along with standard off-the-shelf content filtering methods, and demonstrate that they can lead to computational savings in practice. |
URL | http://www.icsi.berkeley.edu/pubs/techreports/TR-10-001.pdf |
Bibliographic Notes | ICSI Technical Report TR-10-001 |
Abbreviated Authors | T. Ouyang, S. Ray, M. Allman, and M. Rabinovich |
ICSI Research Group | Networking and Security |
ICSI Publication Type | Technical Report |