Performance Studies of Dynamic Load Balancing in Distributed Systems
Technical Report Identifier: CSD-87-376
Abstract: Distributed systems are often characterized by uneven loads on hosts and other resources. In this thesis, the problems concerning dynamic load balancing in loosely-coupled distributed systems are studied using trace-driven simulation, implementation, and measurement. Information about job CPU and I/O demands is collected from three production systems and used as input to a simulator that includes a representative CPU scheduling policy and considers the message exchange and job transfer costs explicitly. A prototype load balancer is implemented in the Berkeley UNIX and Sun/UNIX environments, and the results of a large number of measurement experiments performed on six workstations are presented.
The quality of two families of load indices, one based on resource queue length, the other on resource utilization, is evaluated in the context of dynamic load balancing. The performances of seven algorithms using different load information exchange and job placement strategies are compared. The factors that affect load balancing performance, and the impacts of load balancing on individual hosts and on each type of job are also quantitatively investigated.
Load balancing is found to reduce significantly the mean and standard deviation of job response times, especially under heavy and/or unbalanced workload. The performance is strongly dependent upon the load index; queue-length-based indices perform better. Algorithms based on periodic or non-periodic load information exchange perform similarly. Among the periodic algorithms, the centralized ones cut down the overhead, hence scale better. The reduction of the mean response time increases with the number of hosts, but levels off beyond a few tens of hosts. Load balancing is still very effective when a large portion of the workload is immobile. All hosts, even those with light loads, benefit from load balancing. Similarly, all types of jobs see improvements in their response times, with larger jobs benefiting more. System instability is possible, but can be easily avoided. Many of the above results are likely to be of general applicability due to the excellent agreement among the simulation and measurement findings. Our implementation work shows that transparent, flexible load balancing can be achieved at low cost, without modifying the system kernel or the application programs.