Design and Evaluation of Multi-Protocol Communication on a Cluster of SMP's
Abstract: Modern computers have deep and increasingly complex data hierarchies. The task of managing the motion of data between the levels is of prime importance, particularly with regard to communication. It requires a combination of automatic control and application-specific knowledge. Automatic control handles the bulk of the work, applying heuristics grounded in general principles and system parameters. Application-specific knowledge allows a program to make more effective use of the system, integrating the application's needs with the capabilities of the architecture. Striking a proper balance between these two approaches is not easy, and effective abstractions are a necessary aid in finding appropriate solutions.
This thesis addresses the management of data motion in the context of a cluster of symmetric multiprocessors, or SMP's. The shift from uniprocessors to multiprocessors as the basic unit of cluster computing reflects the trend toward deeper hierarchies of data, extending the hierarchy in a cluster of workstations with an intermediate level -- the memory interconnect -- at which processors communicate and share physical resources. The resulting systems, known as Clumps, offer potentially superior performance to applications capable of exploiting the tight coupling within each SMP. For some applications, however, resource contention and other interactions degrade performance relative to a cluster of uniprocessors.
We address interprocess communication on a Clump through a uniform message-passing interface that exposes the performance of the underlying hardware. The interface transparently routes messages through the appropriate medium, providing the necessary automatic control to allow a programmer to obtain reasonable performance with minimal effort, yet provides the locality information necessary to support the incremental use of application-specific knowledge.
In constructing our uniform interface, we carefully engineer and tune a transport protocol for passing messages across a cache-coherent interconnect, introducing in the process a new concurrent queue algorithm that obtains good performance on both dedicated and multiprogrammed machines. We integrate this protocol with a similarly well-engineered network protocol to present a uniform interface to programmers. We expose the problems involved with coupling protocols of disparate speed and present a solution that dynamically tunes our communication layer to the underlying architecture. Using both applications and benchmarks derived from the message-passing literature, we measure the performance of our system and highlight the phenomena that counteract the advantages of faster communication. Through a model of shared communication resources, we explain these same phenomena analytically.
The communication layer developed in this thesis demonstrates the value of a uniform interface in abstracting a hierarchy below the level addressed by an application programmer. The concurrent queue algorithm illustrates a good approach to the development of concurrent data structures, backed up by a wealth of performance comparisons. The dynamic adaptation solution also proves very effective, enabling applications to address a general Clump architecture, from an SMP to a NOW, with only a single binary. Similar solutions might be used to address a range of other problems. Taken in part, the shared memory protocol is also a powerful building block for higher-level interfaces within an SMP.
Through the work described in this thesis, we develop an understanding of the issues for fast, user-level communication between processes in a Clump and for the broader problem of coupling levels of a hierarchical system within a single abstraction.