Login

Timing Analysis of General-Purpose Graphics Processing Uni ts for Real-Time Systems: Models and Analyses
Ref: CISTER-TR-160611       Publication Date: 20, Apr, 2016

Timing Analysis of General-Purpose Graphics Processing Uni ts for Real-Time Systems: Models and Analyses

Ref: CISTER-TR-160611       Publication Date: 20, Apr, 2016

Abstract:
Graphics processors were originally develop ed for rendering graphics but have evolved towards being an architecture for general-purpose computations. These processors are well-suited for massively parallel computational problems because of the ability to efficiently manage a great numb er of lightweight threads competing for the computational resources of the processor. To day, Graphics Processing Units (GPU s) are widely used to unload Central Processing Units (CPUs), liberate other resources of a given computer system, and provide an alternative to multiprocessor computers as a means of processing computationally expensive parallel tasks. The recent trend of utilizing GPUs in embedded systems necessitates developing timing analysis approaches for finding bounds on the execution time of GPU-threads because the approaches develop ed for CPU timing analysis are not applicable. The reason is that we are not interested in how long it takes for any given GPU thread to complete, but rather how long it takes for all of the GPU threads to complete in the context of their competition for the functional units. We develop ed both theoretical and practical approaches for GPU timing analysis that could provide exact values and tight upper bounds, marginally optimistic lower bounds or probabilistic upper bounds on the worst-case temp oral behavior of GPU processing. We call these approaches optimization-based, metaheuristic-based and statistical measurement-based respectively. We formulate them sub ject to the hardware features, tractability constraints and some simplifying assumptions. First, we proposed a model of a single streaming multiprocessor – a computationally independent module of a GPU. The optimization-based and metaheuristic-based approaches are formulated in the context of that theoretical model and related assumptions. The measurement-based approach is targeting the real GPU hardware and is ready for practical us age. The optimization-based approach is built up on a simple but very pessimistic technique for finding an upper bound on the worst-case makespan – the longest possible time interval between the moment when the “earliest” GPU thread starts its execution, and the moment when the “latest” thread finishes. The outcome of this technique is used for the formulation of a combinatorial optimization problem for finding an exact value of the worst-case execution requirement. Addressing the issue of tractability, we also proposed a marginally pessimistic estimation technique for finding a tight upper-bound on the worst-case makespan. This approach was implemented in a timing analysis software tool applicable to the problem instance under consideration subject to the configuration of the streaming multiprocessor. Pursuing an objective of discovering computationally fast approaches we addressed the problem of finding the worst-case makespan from the metaheuristic view point. We experimentally demonstrated that the metaheuristic-based approach is able to find a tight lower bound and in combination with the optimization-based approach proposes a complete framework for bounding the respective solution from both, the top and the bottom. This asp ect is of paramount importance for the cases when an exact worst-case makespan of the problem under consideration cannot b e tractably computed. On the other hand, the simplicity, flexibility and ability for massive parallelization of the metaheuristic-based approach determine a potential of its us age for soft real-time systems. Aiming to bring our research closer to the industry, in order to overcome some limiting assumptions of memory subsystem, we addressed the problem of GPU timing analysis from the probabilistic and measurement-based perspectives. Our statistical measurement-based approach includes a marginally invasive technique for obtaining the GPU execution time measurements. For analyzing these measurements, the approach introduced a probabilistic characterization of the worst-case temp oral behavior of GPU applications. We formulated our approach based on a solid statistical background of Extreme Value Theory (EVT) and the “Block Maxima” paradigm. The applicability of EVT was extended to less constraining hypotheses than independence. We also provided a way for obtaining accurate estimates on the worst-case execution requirement for the desired confidence level.

Authors:
Kostiantyn Berezovskyi


PhD Thesis, Faculdade de Engenharia da Universidade do Porto.
Porto.



Record Date: 28, Jun, 2016