Timing Analysis of General-Purpose Graphics Processing Uni ts for Real-Time Systems: Models and Analyses
Ref: CISTER-TR-160611 Publication Date: 20, Apr, 2016
Timing Analysis of General-Purpose Graphics Processing Uni ts for Real-Time Systems: Models and Analyses
Ref: CISTER-TR-160611 Publication Date: 20, Apr, 2016Abstract:
Graphics processors were originally develop ed for rendering graphics but have evolved
towards being an architecture for general-purpose computations. These processors
are well-suited for massively parallel computational problems because of the ability to efficiently manage a great numb er of lightweight threads competing for the
computational resources of the processor. To day, Graphics Processing Units (GPU s)
are widely used to unload Central Processing Units (CPUs), liberate other resources
of a given computer system, and provide an alternative to multiprocessor computers as a means of processing computationally expensive parallel tasks. The recent
trend of utilizing GPUs in embedded systems necessitates developing timing analysis
approaches for finding bounds on the execution time of GPU-threads because the
approaches develop ed for CPU timing analysis are not applicable. The reason is that
we are not interested in how long it takes for any given GPU thread to complete, but
rather how long it takes for all of the GPU threads to complete in the context of their
competition for the functional units.
We develop ed both theoretical and practical approaches for GPU timing analysis that could provide exact values and tight upper bounds, marginally optimistic
lower bounds or probabilistic upper bounds on the worst-case temp oral behavior of
GPU processing. We call these approaches optimization-based, metaheuristic-based
and statistical measurement-based respectively. We formulate them sub ject to the
hardware features, tractability constraints and some simplifying assumptions.
First, we proposed a model of a single streaming multiprocessor – a computationally independent module of a GPU. The optimization-based and metaheuristic-based
approaches are formulated in the context of that theoretical model and related assumptions. The measurement-based approach is targeting the real GPU hardware
and is ready for practical us age.
The optimization-based approach is built up on a simple but very pessimistic technique for finding an upper bound on the worst-case makespan – the longest possible
time interval between the moment when the “earliest” GPU thread starts its execution, and the moment when the “latest” thread finishes. The outcome of this
technique is used for the formulation of a combinatorial optimization problem for
finding an exact value of the worst-case execution requirement. Addressing the issue
of tractability, we also proposed a marginally pessimistic estimation technique for
finding a tight upper-bound on the worst-case makespan. This approach was implemented in a timing analysis software tool applicable to the problem instance under
consideration subject to the configuration of the streaming multiprocessor.
Pursuing an objective of discovering computationally fast approaches we addressed
the problem of finding the worst-case makespan from the metaheuristic view point. We
experimentally demonstrated that the metaheuristic-based approach is able to find a
tight lower bound and in combination with the optimization-based approach proposes
a complete framework for bounding the respective solution from both, the top and the
bottom. This asp ect is of paramount importance for the cases when an exact worst-case makespan of the problem under consideration cannot b e tractably computed.
On the other hand, the simplicity, flexibility and ability for massive parallelization of
the metaheuristic-based approach determine a potential of its us age for soft real-time
systems.
Aiming to bring our research closer to the industry, in order to overcome some
limiting assumptions of memory subsystem, we addressed the problem of GPU timing
analysis from the probabilistic and measurement-based perspectives. Our statistical
measurement-based approach includes a marginally invasive technique for obtaining
the GPU execution time measurements. For analyzing these measurements, the approach introduced a probabilistic characterization of the worst-case temp oral behavior
of GPU applications. We formulated our approach based on a solid statistical background of Extreme Value Theory (EVT) and the “Block Maxima” paradigm. The applicability of EVT was extended to less constraining hypotheses than independence.
We also provided a way for obtaining accurate estimates on the worst-case execution
requirement for the desired confidence level.
Document:
PhD Thesis, Faculdade de Engenharia da Universidade do Porto.
Porto.
Record Date: 28, Jun, 2016