EECS technical reports

Conditions of Use

Archive Home Page

Parallel Layout Engines: Synthesis and Optimization of Tree Traversals

Meyerovich, Leo
Technical Report Identifier: EECS-2013-242

Abstract: Mobile web browsers and data visualization tools require a performance boost. Parallelization poses an opportunity because commodity computers feature fast and energy-efficient multicore, subword-SIMD, and GPU hardware. However, layout engines in both browsers and visualizations have resisted parallelization thus far. This thesis introduces techniques for parallel computing over trees and applies them to generating layout engines. Our solution is twofold. First, we show how to specify layout languages in the attribute grammar model of constraints over trees. Second, we address outstanding challenges in parallelizing attribute grammars and computations over trees. Our resulting attribute grammar compiler generates layout engines for various layout languages and parallel hardware. We provide several individual contributions: * We specify the functional and parallel behavior of common layout language primitives. * We present a language for scheduling parallel tree traversals and a static verifier for ensuring a schedule will solve the attributes of an arbitrary layout in a safe order. * We introduce a parallel programming model where programmers may partially specify the parallel schedule and call an optimizing synthesizer to complete the schedule. * We design a scheduling algorithm that is fast and modular over scheduling constructs. * We optimize tree traversals for SIMD hardware by automatically staging dynamic memory allocation and clustering diverging tasks. * We optimize parallel tree traversals for MIMD architectures through a load-balancing heuristic that approximates work stealing. To evaluate our approach, we present two case studies. First, we generated the first parallel webpage layout engine that can for the most part render complex sites such as Wikipedia. Second, we generated interactive data visualizations that support up to 1,000,000 data points in real-time. Both case studies have been further validated industrially.