Interconnection Network Design Based on Packaging Considerations
Raghunath, Mandayam Thondanur
Technical Report Identifier: CSD-93-782
Abstract: An important problem in building large scale parallel computers is the design of the interconnection network. The network affects the performance, cost, scalability, and availability of the parallel computer. In this dissertation, we examine in detail the problem of interconnection network design for a large scale parallel machine. Our objective is to design networks that achieve a high performance for a low cost.
In general, the cost of a network is a complex function of a wide variety of parameters and is difficult to compute. We first develop a packaging model to characterize network costs. Our model is based on the fact that most large scale machines have to be packaged in a hierarchical fashion. We argue for a hierarchical design strategy, where the design of the network proceeds in levels corresponding to the levels of the packaging hierarchy. We evaluate several network designs using analysis and detailed simulations of random traffic. We identify families of networks (product of complete graphs, high-degree de Bruijn networks) that we believe to be useful for multi-level packaging technologies. Our results also indicate that making the networks denser at the lower levels of the packaging hierarchy has a significant positive impact on the overall performance, even when the higher levels use a sparser interconnect.
We also characterize network designs in terms of scalability, dividing network families into three broad categories of scalability based on the flexibility with which the networks can be scaled. The three categories illustrate the trade-offs between hardware costs and scaling flexibility. We provide quantitative evaluations of these trade-offs.
Many of the interesting networks, namely those that provide high performance for a low cost, also have the disadvantage of being vulnerable to faults because they have a single path between any pair of processors. We devise a scheme to tolerate faults in such networks. The scheme is simple to implement and allows the network to tolerate a large number of faults with a slight degradation in performance.