Many-Core Platforms in the Real-Time Embedded Computing Domain
Ref: CISTER-TR-150603 Publication Date: 24, Apr, 2015
Many-Core Platforms in the Real-Time Embedded Computing Domain
Ref: CISTER-TR-150603 Publication Date: 24, Apr, 2015Abstract:
Over the past few decades, the technological advancements made our lives increasingly permeated by and dependent on embedded systems. At the present day, these devices account for more than 98% of all produced computing systems, with applications that span over a wide range of areas, from medicine to avionics. Some embedded systems interact with the physical environment and have to guarantee not only that a certain action will be performed correctly, but also that the action will complete within a certain time. These devices are called real-time embedded systems, and some notable examples are medical pacemakers, airbags in cars and autopilots in airplanes.
The process of analysing the temporal behaviour of a real-time embedded system is called
real-time analysis. In many cases, the purpose of the analysis is to derive guarantees that a device
will perform its functions correctly, while at the same time meeting all timing requirements. A
real-time analysis is mostly performed at design-time, thus its efficiency highly depends on the
amount of predictability of the entire system, whereas any non-deterministic aspect of the system
behaviour has to be accounted for in the analysis with a certain degree of pessimism. A pessimistic
analysis may cause a significant resource over-provisioning in the design phase, and consequently
lead to a severe underutilisation of available resources at runtime. Therefore, reducing the analysis
pessimism is one of the ever-present objectives in the real-time embedded computing domain.
The first real-time embedded systems were predominantly single-core devices with limited
sets of functionalities. However, constantly increasing demands for more advanced and sophisticated
functionalities required more powerful computational devices. When faced with the same
challenge, the other computing areas (e.g. general-purpose or high-performance computing) opted
for platforms consisting of several cores – multi-cores and more than a dozen of cores – manycores.
It comes as no surprise that the same trends, although with an offset, are noticeable in the
evolution of the real-time embedded systems, where many-core platforms present the new frontier
technology.
Besides giving the options to implement more advanced functionalities, many-core platforms
offer other beneficial possibilities as well. For instance, multiple functionalities, that were previously
implemented on a set of single-core devices, can be integrated within fewer many-core
platforms with significant design-cost reductions. Moreover, the abundance of available cores allows
to implement efficient thermal and power management strategies by deliberately performing
temporary shutdowns of idle cores. At the same time, the existence of idle cores, which can be
used if necessary, makes these devices more resilient to hardware failures.
Yet, despite the aforementioned benefits, the integration of many-cores into the real-time embedded
domain is a big challenge. The most notable reasons are (i) increasingly complex designs
of hardware components, promoting performance, often at the expense of predictability,
and (ii) more significant and hard-to-analyse contention patterns for accesses to shared resources.
These facts may contribute to a non-deterministic system behaviour, while, as explained above,
every non-deterministic aspect of the system behaviour has to be accounted for in the real-time
analysis with a certain degree of pessimism.
In this dissertation, the focus is on the analysis of real-time embedded systems deployed on
many-core platforms. Specifically, a comprehensive collection of techniques and design choices
is presented, with the common objective to make many-cores more amenable to the real-time
analysis, and consequently more suitable and applicable to the real-time embedded domain. The
proposed methods achieve this end in several ways: (i) by extending the state-of-the-art approaches
in order to reduce the analysis pessimism, (ii) by exploiting novel hardware features, as well as
enforcing constraints which cause a more deterministic and analysable system behaviour, and
(iii) by elaborating on promising OS and workload paradigms, which have not been previously
considered in the real-time embedded computing domain.
The contributions of this dissertation can be classified into two groups. In the first set of
contributions the focus is on the interconnect medium, which is one of the most complex-toanalyse
resources in many-core platforms. Initially, the target interconnect is the network-on-chip
with a 2-D mesh topology, which utilises the wormhole switching mechanism and the XY routing
technique. For such a generic model, which is present in the most of contemporary many-cores,
a novel worst-case communication delay analysis is proposed, and subsequently compared with
the state-of-the-art method. Then, assuming the additional hardware support in the form of virtual
channels, improvements over the state-of-the-art approaches are proposed, which, not only reduce
the analysis pessimism, but also significantly reduce the requirements for hardware resources.
Finally, a novel arbitration policy for NoC routers is proposed.
In the second set of contributions the focus is on a novel paradigm in the real-time embedded
domain, called the Limited Migrative Model. This model is inspired by the latest trends in the
high-performance and general-purpose computing. First, the model is introduced and the cost of
maintaining it is analytically estimated, both in terms of computational and interconnect resources,
where, for the later aspect, the findings from the first set of contributions are used (see the previous
paragraph). Then, three aspects of the application workload are studied, namely: (i) communication
requirements, (ii) memory requirements, and (iii) computation requirements. The
first aspect is addressed by imposing several constraints, which make the communication patterns
more predictable, and subsequently allow to derive a communication delay analysis. Moreover,
the workload assignment to computational resources is investigated, but only from the communication
perspective, with the objective to spatially distribute the workload in such a way that all
timing constraints posed on communication delays are met. Then, the focus is shifted towards the
memory requirements, and a set of analysis techniques are proposed, which can be used to check
whether the memory traffic requirements are also fulfilled. In the final part, the computation requirements
of the application workload are studied. However, for this aspect only a coarse-grained
analysis with several simplifying assumptions is presented. The proposed method represents an
initial step towards the complete analysis related to the computation requirements. Subsequently,
assuming this initial analysis, the problem of the workload assignment to computational resources
is revisited, but this time with an orthogonal objective, which is to assure that the computational
requirements of the workload are fulfilled.
The findings suggest that the first set of contributions significantly improves over the state-ofthe-art
methods in the real-time analysis of interconnects. The improvements are manifested with
the reduced analysis pessimism, as well as reduced hardware requirements. Both these aspects
are essential for mitigating the resource over-provisioning effects when designing a new system.
Additionally, the findings suggest that the Limited Migrative Model has a lot of potential, and
represents a promising step towards the application of many-core platforms into the real-time
embedded computing domain
Document:
PhD Thesis, Faculdade de Engenharia, Universidade do Porto.
Porto.
Record Date: 12, Jun, 2015