An Network Measurement Architecture for Adaptive Applications
Abstract: In today's Internet, the characteristics of the network path between a pair of Internet hosts can span several orders of magnitude. Some hosts may communicate over high bandwidth, low latency, uncongested paths, while others communicate over much lower quality paths. Applications can cope with these differences by adapting to network changes: for example, choosing alternate representations of objects or streams or downloading objects from alternate locations. For applications to adapt most effectively, however, they must discover the condition of the network path before communicating with distant hosts in order to make appropriate adaptation decisions. Unfortunately, the ability to determine the quality of network paths is missing in today's suite of Internet services, and applications have no way to make informed adaptation decisions.
To address this limitation, we have developed a network measurement architecture called SPAND (Shared PAssive Network Performance Discovery) that enables a new class of adaptive networked applications. In SPAND, applications make passive application-specific measurements of the network and store the results of the measurements in a per-domain centralized repository of network performance information. Other applications retrieve this information from the repository -- thereby leveraging the shared experiences of all hosts in a domain -- and use it to predict future performance. Through SPAND, applications make more informed decisions about adaptation choices as they communicate with distant Internet hosts.
In this thesis, we describe the design, implementation, and evaluation of the SPAND architecture. We describe and justify the design choices we make in SPAND and show the strengths and limitations of the architecture when compared to alternate design choices. We describe how SPAND is flexible and extensible: applications define their own types of performance measurements and averaging algorithms by providing an Active Messages-like interface between SPAND clients and SPAND's repository of network performance information. We show how measurement noise, the variation associated with the measurement of a particular network performance statistic, affects the granularity of application-level adaptation decisions. We also categorize and quantify measurement noise into three components: network noise (variations in the state of the network over small times scales), sharing noise (variations between the observed performance of nearby clients), and temporal noise (variations in performance over longer time scales).
To illustrate these concepts, we then present two realizations of the architecture that measure network performance for different types of data transport: a generic bulk transfer data transport that measures TCP-specific performance and a HTTP-specific data transport that more closely measures the way in which web clients use multiple parallel TCP connections to complete web page transfers. Measurements of the bulk-transfer realization of SPAND show that SPAND works well at providing relevant, accurate responses to clients: in the steady state, SPAND can respond to 95% of performance queries with predictions, and 70% of the time, these predictions are within a factor of two of actual performance (a discrepancy equal to the network noise inherent in the state of the network).
To validate the SPAND architecture, we built and evaluated two specific adaptive networked applications: SpandConneg, a suite of applications that use HTTP Content Negotiation to reduce client and server-side network bottlenecks, and LookingGlass, a WWW mirror selection tool. By using SpandConneg, web clients can fix download times by matching content fidelity to network conditions, and web servers can handle large numbers of clients by reducing document quality under periods of heavy load. LookingGlass presents a complete solution to the problem of replicating web content at multiple web sites, addressing the problems of transparently notifying web clients of mirrored content, providing a mechanism to disseminate mirror information in a distributed way between mirror locations, and providing algorithms for choosing mirror locations that take actual network performance into account.
Measurements of these applications show that SPAND dramatically improves the performance of adaptive networked applications. SpandConneg works well at both the client and server side of the network. Web clients that use SPAND to trade off document quality for download time can reduce the frequency of excessive user-visible (i.e. more than 30 seconds) download times from 35% to less than 10%, and reduce the median download time from 16 to 6 seconds. Web servers that use SPAND to handle an unexpected burst of clients can increase their throughput by as much as 450%. LookingGlass performs well despite the challenges in meeting the dual goals of collecting passive network performance measurements while maintaining good client repose times. LookingGlass's application-level measurements and server selection algorithms lead to faster (i.e., factor of 40) web page downloads than alternate techniques such as relying on geographic location or routing metrics to make server selection decisions. More than 90% of the time, our technique allows clients to download mirrored web objects within 40% of the fastest possible download time.