Algorithmic adaptations are required if anticipated exascale hardware is to be used near its potential for upstream applications, since the existing code base has been engineered to minimize floating point operations. Programmers must now minimize synchronizations, memory usage, and memory transfers, while extra flops on locally cached data are almost “free”. High concurrency requires greater freedom to redistribute data while power-efficient design of the individual cores will likely require greater fault tolerance from algorithms to relieve the hardware. Stencil operation-intensive hyperbolic solvers present different opportunities for improving data locality in different regimes of dimension, number of components, discretization order, stencil structure, coefficient characteristics, and hardware characteristics. Today’s elliptic solvers exploit frequent global synchronizations, ultimately reflecting the global Green’s function for the Laplacian, yet execute few flops to cover these latencies. After decades of algorithm refinement during a period of programming model stability, new programming models and algorithms must be developed simultaneously. In this presentation, we briefly recap the architectural constraints and roadmap, highlight ongoing work at KAUST, and outline future directions.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error