#### A Multicore Processor Platform for energy and throughput aware application

<u>Ishfaq Hussain</u>, Yasir Qadri, Ayaz Ahmed, Nadia N. Qadri (CISTER Periodic Seminar Series) 03-May-2018



Research Center in Real-Time & Embedded Computing Systems







#### Sequence Of Presentation

- Introduction
- Need For Energy Optimization
- Problem Statement
- Proposed Solution
- Simulation Setup
- Results
- Conclusion
- Research Contribution
- Question & Answer



#### Introduction

## □ The project "A Multicore Reconfigurable Processor Platform for Energy and Throughput Aware Applications".









CISTER Research Centre, Porto, PORTUGAL

03/05/2018

#### **Project Team Overview**

- Members were divided into two groups
  - Hardware Development
  - Algorithms Development.
- Each group worked on their team specific issues.



Ishfaq Hussain

### Need For Energy Efficient

Energy Efficient system design is important due to following reasons

Portable Devices

 Examples: Mobile Phones, PDA, Laptop

High-end Desktop/Server Computing

Green Computing





# Technique to reduce Energy consumption

- Logic level
  - Clock Gating
  - Power Gating
- Micro-architectural level
  - Cache memories
  - Pipelining
  - Buses
- System level
  - Dynamic Voltage-Frequency Scaling (DVFS)
  - Adding Parallelism
  - Reconfiguration



### Reconfigurable Architecture

Contemporary Processor Architectures

- Designed for overall average performance
- Do not leverage much flexibility for reconfiguration
- ✓ Limited application of Energy aware throughput management

Reconfigurable Processors

- ✓ A recent contender for energy/performance
- Need to go beyond traditional DFS and Core switching approach



### The System Architecture

- A multicore architecture with an Artificial Intelligence algorithm based reconfiguration engine
- Input Parameters
  - Energy Consumption
  - Throughput
  - Miss Rate

Ishfaq Hussain

- Reconfigurable Parameters
  - Number of Cores
  - Operating Frequency/Voltage
  - ✓ L1/L2 Cache Size
  - L1/L2 Cache Associativity



| Parameter                    | Value                   |  |  |
|------------------------------|-------------------------|--|--|
| Processor Type               | Intel x86               |  |  |
| Number of Cores              | 16                      |  |  |
| Operating Frequencies        | [16, 20, 25, 33] MHz    |  |  |
| Operating Voltages           | [2, 2.2, 2.4, 2.7]V     |  |  |
| Energy Consumption per cycle | [13.1,15.4,18.7,22.9]nJ |  |  |



### **Optimization Algorithm**

- Fuzzy Logic Type 1
  - Mamdani
  - Sugeno
- Fuzzy Logic Type 2
- Ant Colony Optimization
- Genetic Algorithm
- Estimation of Distribution Algorithm



#### Comparison with state of the art

| Parameter/<br>Architecture  | Intel IA-7  | NVIDIA | RENT<br>Cache1 | RHC 2 | RAMPSoC3 | ACODSEE |
|-----------------------------|-------------|--------|----------------|-------|----------|---------|
| DFS                         | Y           | N      | Ν              | Ν     | Y        | Y       |
| Core Switching              | Y           | Ν      | Ν              | Ν     | Y        | Y       |
| Cache Associativity         | Ν           | N      | Y              | Y     | Ν        | Y       |
| Cache Resizing              | Ν           | Y      | Y              | Y     | Ν        | Y       |
| Energy Efficiency<br>Scheme | Proprietary | None   | None           | None  | None     | ACO     |

RENT: Reconfigurable Energy Efficient Near Threshold Cache Architectures RHC: Dynamically Reconfigurable Hybrid Cache RAMPSoC: Runtime adaptive multi-processor system-on-chip ACODSEE: Ant colony optimization based design space exploration engine





CISTER Research Centre, Porto, PORTUGAL

### **Reconfiguration Engine**





CISTER Research Centre, Porto, PORTUGAL

03/05/2018

#### What is ACO ?



## Continue..Probabilistic Selection

$$p_{c}(u/q) = \begin{cases} \frac{\left[\tau_{qu}(t)\right]^{\alpha} \left[\eta_{qu}\right]^{\beta}}{\sum_{\substack{k \in allowed \\ k}} \left[\tau_{qu}(t)\right]^{\alpha} \left[\eta_{qu}\right]^{\beta}} & \text{if } j \in allowed \\ k \in allowed \\ 0 & \text{otherwise} \end{cases}$$

#### GlobaphBernmanuppalate

$$\Delta \tau_{qu}^{\text{glockall}} = \begin{cases} & \underline{\mathscr{Q}} \\ & \underline{\mathscr{Q}} \end{aligned} \\ & \underline{\mathscr{Q}} \\ & \underline{\mathscr{Q}} \end{aligned}$$

• Pheromone Update

$$\tau_{qu}(k+1) = (1-\rho)\tau_{qu} + \Delta\tau_{qu}^{Local} + \Delta\tau_{qu}^{Global}$$

#### **Exploration Engine**

#### input Space

Ishfaq Hussain

#### **Solution space**



#### E.C $\rightarrow$ Energy Consumption Th $\rightarrow$ Throughput



CISTER Research Centre, Porto, PORTUGAL

03/05/2018

### **Experimental Platform**





CISTER Research Centre, Porto, PORTUGAL

03/05/2018

#### Simulation setup

Basic modules of the simulation scheme are

- **Exploration Tools** 
  - ✓ MATLAB
  - ✓ M3 Explorer
- Simulation Setup
  - ✓ MARSSx86
  - SESC Simulator
  - ✓ Ubuntu 12.04
  - ✓ SPLASH-2
  - ✓ CACTI

Ishfaq Hussain

Mathematical Model [7]



#### **Results and Analysis**





CISTER Research Centre, Porto, PORTUGAL

03/05/2018

| Iterations     | Cores | Operating<br>Frequency(MHz) | Lı Cache<br>Size<br>(Kbytes) | Normalized<br>Energy<br>Consumption | Normalized<br>Throughput |
|----------------|-------|-----------------------------|------------------------------|-------------------------------------|--------------------------|
| Default        | 16    | 33                          | 256                          | 1.00                                | 1.00                     |
| 1              | 4     | 25                          | 128                          | 0.31                                | 0.7172                   |
| 3              | 9     | 25                          | 8                            | 0.51                                | 0.64727                  |
| 5              | 11    | 25                          | 16                           | 0.55                                | 0.72618                  |
| 7              | 3     | 33                          | 64                           | 0.23                                | 0.93173                  |
| Ishfaq Hussain | C     | CISTER Research Centre      | e, Porto, PORTU              | GAL                                 | 03/05/2018               |

CISTER Research Centre in teal-Time & Embedde Computing Systems

## Impact of ACO based DSE engine on normalized energy consumption



#### **Number Of Iterations**

Ishfaq Hussain



CISTER Research Centre, Porto, PORTUGAL

03/05/2018

## Impact of ACO based DSE engine on normalized throughput



#### **Number Of Iterations**





#### Impact of ACO based DSE engine on EDP



Ishfaq Hussain



CISTER Research Centre, Porto, PORTUGAL

03/05/2018

## Average reduction in energy delay product (EDP) of all Benchmarks



#### **Reduction in EDP**



Ishfaq Hussain

CISTER Research Centre, Porto, PORTUGAL

03/05/2018

### Conclusion

- The design space explored by ACO is validated using various SPLASH-2 benchmarks, and simulation results reveal that an average, 77% energy consumption is reduced at the cost of only 7% reduction in throughput
- Therefore it can be concluded that the proposed ACODSEE successfully propose energy and throughput efficient solution for a Multicore architecture





CISTER Research Centre, Porto, PORTUGAL

#### **Research Contribution**

- Hussain, Ishfaq, et al. "Ant Colony Optimization for multicore re-configurable architecture." *AI Communications* 29.5 (2016): 595-606.
  - Hussain, Ishfaq, et al. "NSGA-II-Based Design Space Exploration for Energy and Throughput Aware Multicore Architectures." *Cybernetics and Systems* 48.6-7 (2017): 536-550.





CISTER Research Centre, Porto, PORTUGAL

## Thank you





CISTER Research Centre, Porto, PORTUGAL

8/17/2017

### **Questions?**





CISTER Research Centre, Porto, PORTUGAL

8/17/2017