Scalable locality
Encyclopedia
Software is said to exhibit scalable locality if it can continue to make use of processor
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

s that out-pace their memory systems, to solve ever larger problems.
This term is a high-performance uniprocessor analog of the use of scalable parallelism
Scalable parallelism
Software is said to exhibit scalable parallelism if it can make use of additional processors to solve larger problems,i.e. this term refers to software for which Gustafson's law holds.Consider a program whose execution time is dominated by one or more loops,...

 to refer to software for which increasing numbers of processors can be employed for larger problems.

Consider the memory usage patterns of the following loop nest (an iterative two-dimensional stencil computation
Stencil (numerical analysis)
In mathematics, especially the areas of numerical analysis concentrating on the numerical solution of partial differential equations, a stencil is a geometric arrangement of a nodal group that relate to the point of interest by using a numerical approximation routine. Stencils are the basis for...

):

for t := 0 to T do
for i := 1 to N-1 do
for j := 1 to N-1 do
new(i,j) := (A(i-1,j) + A(i,j-1) + A(i,j) + A(i,j+1) + A(i+1,j)) * .2
end
end
for i := 1 to N-1 do
for j := 1 to N-1 do
A(i,j) := new(i,j)
end
end
end

The entire loop nest touches about 2*N**2 array elements, and performs about 5*T*N**2 floating-point operations.
Thus, the overall compute balance (ratio of floating-point computations to floating-point memory cells used) of this entire loop nest is about 5T/2.
When the compute balance is a function of problem size, as it is here, the code is said to have scalable compute balance.
Here, we could achieve any compute balance we desire by simply choosing a large enough T.

However, when N is large, this code will still not exhibit good cache reuse, due to poor locality of reference:
by the time new(1,1) is needed in the second assignment, or the second time step's execution of the first assignment,
the cache line holding new(1,1) will have been overwritten with some other part of one of the arrays.

Tiling
Loop optimization
In compiler theory, loop optimization plays an important role in improving cache performance, making effective use of parallel processing capabilities, and reducing overheads associated with executing loops. Most execution time of a scientific program is spent on loops...

 of the first i/j loop nest can improve cache performance,
but only by a limited factor, since that nest has compute balance of about 5/2.
To produce a very high degree of locality, for example 500 (to run this code efficiently with an array that will not fit in RAM and is relegated to virtual memory), we must re-use values across time steps.

Optimization across time steps has been explored in a number of research compilers;
see work by Wonnacott, by Song and Li, or by Sadayappan et al. for details of some approaches to time-tiling.
Wonnacott demonstrated that time tiling could be used to optimize for out-of-core data sets;
in principle, any of these approaches should be able to achieve arbitrarily high memory locality without requiring that the entire array fit in cache (the cache requirement does, however, grow with the required locality).
The multiprocessor techniques cited above should, in principle, simultaneously produce scalable locality and scalable parallelism
Scalable parallelism
Software is said to exhibit scalable parallelism if it can make use of additional processors to solve larger problems,i.e. this term refers to software for which Gustafson's law holds.Consider a program whose execution time is dominated by one or more loops,...

.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK