Loading…

Improving data locality with loop transformations

In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In the this article, we present compiler optimizations to improve data localit...

Full description

Saved in:

Bibliographic Details
Published in:	ACM transactions on programming languages and systems 1996-07, Vol.18 (4), p.424-453
Main Authors:	McKinley, Kathryn S., Carr, Steve, Tseng, Chau-Wen
Format:	Article
Language:	English
Subjects:	Applied sciences Compilers Computer science control theory systems Exact sciences and technology Programming languages Software Software and its engineering Software notations and tools
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In the this article, we present compiler optimizations to improve data locality based on a simple yet accurate cost model. The model computes both temporal and spatial reuse of cache lines to find desirable loop organizations. The cost model drives the application of compound transformations consisting of loop permutation, loop fusion, loop distribution, and loop reversal. To validate our optimization strategy, we implemented our algorithms and ran experiments on a large collection of scientific programs and kernels. Experiments illustrate that for kernels our model and algorithm can select and achieve the best loop structure for a nest. For over 30 complete applications, we executed the original and transformed versions and simulated cache hit rates. We collected statistics about the inherent characteristics of these programs and our ability to improve their data locality. To our knowledge, these studies are the first of such breadth and depth. We found performance improvements were difficult to achieve bacause benchmark programs typically have high hit rates even for small data caches; however, our optimizations significanty improved several programs.
ISSN:	0164-0925 1558-4593
DOI:	10.1145/233561.233564