Detecting Malware Callouts in Realtime Network Traffic

Speaker: Domenic Puzio , CapitalOne

Date: Wednesday, February 15, 2017

Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-G882

Event Type:

Room Description:

Host: CSAIL Security Seminar

Contact: Frank Wang,

Relevant URL:

Speaker URL: None

Speaker Photo:

Reminders to:

Reminder Subject: TALK: Detecting Malware Callouts in Realtime Network Traffic

Domain generation algorithm (DGA) malware makes callouts to unique web addresses to avoid detection by static rules engines. To counter this type of malware, we created an ensemble model that analyzes domains and evaluates if they were generated by a machine and thus potentially malicious. The model works entirely on the URL being accessed, thereby eliminating the need for DNS data, which can be difficult to access in large organizations. The ensemble consists of a transliteration pipeline to handle non-English language domains, a highly advanced NLP-based linguistic entropy algorithm, and a collocation and linear word embeddings algorithm to identify dictionary DGAs. We are also researching sequence-based machine learning analysis to detect dictionary DGAs. Our system analyzes enterprise-scale network traffic in real time, renders predictions, and raises alerts for cyber security analysts to evaluate. This talk will discuss the machine learning algorithms that were used to build the model, the features that we found to be informative, and the tools used in model testing and creation. We will then present the tools leveraged in building our model-as-a-service architecture for low-latency stream processing of high velocity and high volume traffic.

Domenic Puzio is a Data Engineer with Capital One. He graduated from the University of Virginia with degrees in Mathematics and Computer Science. On his current project - code-named Purple Rain - he is a core developer of a custom platform for ingesting, processing, and analyzing Capital One's cyber-security data sources. Built entirely from open-source tools (NiFi, Kafka, Storm, Elasticsearch, Kibana), this framework processes hundreds of millions of events per hour. Currently, his focus is on the creation and productionization of machine learning models that provide enrichment to the data being streamed through the system. He is a contributor to two Apache projects, and his research interests include natural language processing and deep learning.

Research Areas:

Impact Areas:

See other events that are part of the CSAIL Security Seminar 2016/2017.

Created by Frank Wang Email at Tuesday, January 10, 2017 at 8:37 PM.