Scalable Network Forensics
Technical Report Identifier: EECS-2016-55
May 12, 2016
Abstract: Network forensics and incident response play a vital role in site operations, but for large networks can pose daunting difficulties to cope with the ever-growing volume of activity and resulting logs. On the one hand, logging sources can generate tens of thousands of events per second, which a system supporting comprehensive forensics must somehow continually ingest. On the other hand, operators greatly benefit from interactive exploration of disparate types of activity when analyzing an incident, which often leaves network operators scrambling to ferret out answers to key questions: How did the attackers get in? What did they do once inside? Where did they come from? What activity patterns serve as indicators reflecting their presence? How do we prevent this attack in the future? Operators can only answer such questions by drawing upon high-quality descriptions of past activity recorded over extended time. A typical analysis starts with a narrow piece of intelligence, such as a local system exhibiting questionable behavior, or a report from another site describing an attack they detected. The analyst then tries to locate the described behavior by examining past activity, often cross-correlating information of different types to build up additional context. Frequently, this process in turn produces new leads to explore iteratively ("peeling the onion"), continuing and expanding until ultimately the analyst converges on as complete of an understanding of the incident as they can extract from the available information. This process, however, remains manual and time-consuming, as no single storage system efficiently integrates the disparate sources of data that investigations often involve. While standard Security Information and Event Management (SIEM) solutions aggregate logs from different sources into a single database, their data models omit crucial semantics, and they struggle to scale to the data rates that large-scale environments require. In this thesis we present the design, implementation, and evaluation of VAST (Visibility Across Space and Time), a distributed platform for high-performance network forensics and incident response that provides both continuous ingestion of voluminous event streams and interactive query performance. VAST offers a type-rich data model to avoid loss of critical semantics, allowing operators to express activity directly. Similarly, strong typing persists throughout the entire system, enabling type-specific optimization at lower levels while retaining type safety during querying for a less error-prone interaction. A central contribution of this work concerns our novel type-specific indexes that directly support the type's common operations, e.g., top-k prefix search for IP addresses. We show that composition of these indexes allows for a powerful and unified approach to fine-grained data localization, which directly supports the workflows of security investigators. VAST leverages a native implementation of the actor model to scale both intra-machine across available CPU cores, and inter-machine over a cluster of commodity systems. Our evaluation with real-world log and packet data demonstrates the system's potential to support interactive exploration at a level beyond what current systems offer. We release VAST as free open-source software under a permissive license.