Automated Discovery of User Trackers
Technical Report Identifier: EECS-2014-229
Abstract: Web tracking, the practice by which web sites collect information about the user's browsing history across one or more sites, is highly prevalent on the web today. This is done using unique identifiers (trackers) that can be mapped to client machines and user accounts. Although such tracking has desirable properties like personalization and website analytics, it raises serious concerns about online user privacy. Conventional trackers like browser cookies and Flash cookies are widely known to the community; however there is potentially more tracking information being sent to servers around the world unbeknownst to the users and security community at large. This work is motivated by the possibility of discovering previously unrecognized forms of trackers, either potential or actual in an "automated" fashion from raw network traffic. In this work, we built a tool that processes users' network traces and outputs tracker strings such as usernames, cookies, IMEI numbers and the like, that uniquely identify a machine/device/browser. The key challenge in automatically capturing trackers from raw traces is dealing with enterprise-sized data. We tackle this problem by applying data-driven multi-stage filtering, thereby pruning the size of network traces to be analyzed. Each filtering step has a trade-off between between false positive rate and potentially interesting information lost (false negatives). Our tool uses six major filters and outputs a set of potential trackers for each user in the network. We found trackers that were sent as a part of URL parameters, User Agent, as well as in the non-HTTP payload apart from cookies.