Query Optimization in Distributed Databases Through Load Balancing
Technical Report Identifier: CSD-86-296
Abstract: As technology advances, computing environments composed of large numbers of workstations and mainframes connected by a high bandwidth local area network become attractive. These systems typically support a large number of application classes, and the heterogeneity of their workload, coupled with the decentralization of the systems, can lead to load imbalances across the network. This work attempts to study the benefits of load balancing strategies in the context of a particular application, distributed database systems. It was felt that, by focusing on a specific area, the problem would become more tractable. The choice of database management systems can be justified not only by their intrinsic importance, but also by the adaptability of load balancing strategies to query optimization algorithms.
In order to determine whether load balancing strategies can indeed be adapted to current optimizers with a moderate amount of effort and to see whether the resulting performance benefits are sizable, both benchmarking and simulation experiments were carried out. The approach taken was first to construct a simple model in order to gain some insight into the problem. This was followed by some benchmarking experiments on a running systems, the R* distributed database at the IBM San Jose Research Laboratory. Finally, a model of distributed INGRES was constructed and validated by measurements of INGRES queries and of U.C. Berkeley's TCP/IP implementation. It was hoped that, by utilizing two different techniques, simulation and measurement, and by examining two very different distributed database systems, R* and distributed INGRES, the results of this thesis would be of both greater reliability and wider applicability.
The conclusions of this study are that query optimizers are relatively easy to modify in order to incorporate load balancing strategies, and that the increase in the running time of the algorithms is negligible. Furthermore, load balancing results in a sizable performance improvement even in environment with a moderate load imbalance. It should be pointed out that the results of this work are important not only from the viewpoint of load balancing studies, but also provide useful insights into the construction of distributed database systems.