# The BubbleWrap Many-Core: Popping Cores for Sequential Acceleration

Ulya Karpuzcu, Brian Greskamp, Josep Torrellas University of Illinois

http://iacoma.cs.uiuc.edu/





# The BubbleWrap Many-Core

 Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life



- Base: A homogeneous many-core
- Throughput Cores
  - Most energy-efficient cores
  - Run parallel sections
  - Operate at nominal V/f
- Expendable Cores
  - Run sequential sections
  - Operate at elevated V/f
  - Discarded early due to shorter service life (Popped like BubbleWrap)



# **Core Aging**

- Manifestation: Progressive slow-down in logic as the core is being used
- Main contributor: Bias Temperature Instability (BTI)

  - Aging rate: Exponential dependence on Vdd and T

# Aging-induced Degradation



- f<sub>NOM</sub> set by the delay at the end of the service-life (S<sub>NOM</sub>)

$$f_{NOM} = 1/\tau_D$$

#### Impact of Operation at Higher V/f on Aging



- Higher Vdd: Vdd<sub>OP</sub> >> Vdd<sub>NOM</sub>
- Result: Lower critical path delay; higher aging rate
- Run at constant  $f_{OP} = 1/\tau_{OP}$  until  $S_{SHORT}$ ; then discard



# How to Manage Aging Optimally?

Contribution: DVS for Aging Management (DVSAM)



- Change Vdd with time to compensate for critical path degradation
- Enforce minimum Vdd needed for any f-target

- DVSAM-Pow: Turn wasted opportunity to power efficiency
- DVSAM-Perf: Turn wasted opportunity to higher frequency

#### **DVSAM-Pow**

Idea: Minimize power consumption at  $f_{NOM} = 1/\tau_D$ 



- $^{ullet}$  Critical path delays are kept at  $au_{
  m D}$  until S<sub>NOM</sub>: Run at f<sub>NOM</sub>
- Start with low Vdd and increase slowly

#### **DVSAM-Pow**



- Vdd < Vdd<sub>NOM</sub> and f = f<sub>NOM</sub> throughout S<sub>NOM</sub>
- Power savings due to Vdd < Vdd<sub>NOM</sub>
  - → More cores active for the same P-budget
  - → Increased throughput





## **DVSAM-Perf**

Idea: Maximize frequency for the same service life



- Shorter critical path delay  $\tau_{\text{OP}}$  until  $S_{\text{NOM}}$ : Run at higher f = 1 /  $\tau_{\text{OP}}$
- Start with low Vdd and increase rapidly

# **DVSAM-Perf for Schort**

Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance



Even higher frequency than DVSAM-Perf for short service life

# **BubbleWrap Environments**

#### **Throughput Cores**



**Expendable Cores** 

- Two choices for Throughput Cores
  - Nominal operation
  - Use DVSAM-Pow and expand the set of throughout cores for the same power budget
- Two choices for Expendable Cores
  - Higher, constant Vdd until S<sub>SHORT</sub>; then discard
  - DVSAM-Perf until S<sub>SHORT</sub>; then discard





# **Hardware Support?**

- No change in the core architecture
- Need circuits to measure aging
- Need high-precision DVS
- Clock and power distribution
  - Two separate V/f domains: One for Expendable and one for Throughput Cores

# **BubbleWrap Evaluation**

- 32 core chip:  $N_T = 16$  Throughput and  $N_E = 16$  Expendable cores
- 22nm high-k metal-gate process
- Multiprogrammed workload synthesized from SPEC2000
- SESC enhanced by a power & thermal model

## Frequency Gains of Sequential Section



## Frequency Gains of Sequential Section



Sequential Fraction (LSEQ)

- Large f gains are feasible
- f increases with smaller sequential section
- For DVSAM-Perf, each expendable core runs for L<sub>SEQ</sub>/N<sub>E</sub> x S<sub>NOM</sub>



#### Power Consumption of Sequential Section

Each Expendable core has max P budget of two cores



#### Power Consumption of Sequential Section

Each Expendable core has max P budget of two cores



Tolerable power cost for the frequency gains



#### Conclusion

The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration



- Simple homogeneous design
- No architectural or software changes
- Improves sequential and parallel performance
  - Fully sequential applications at 16% higher f
  - Fully parallel applications at 30% higher throughput

# Issues

- Other uses of die area taken by dormant cores?
  - More on-chip storage
- Alternative architectures for energy efficiency?
  - Heterogeneous systems
  - Accelerator architectures
  - Near-Vth operation