### Exploiting Multiple On-Chip Contexts in New Ways

#### Eric Rotenberg

Dept. of Electrical and Computer Engineering North Carolina State University http://www.tinker.ncsu.edu/ericro ericro@ece.ncsu.edu

### High Perf. Microprocessor Trends

- Technology
  - Billions of transistors on a chip
  - Clock rate growth curve may not be dependable
- Implications to microarchitecture
  - Higher performance will rely increasingly on parallelism
    - Dictates multiple sources of parallelism
  - Billions of transistors helps, but how to use them?
    - Evolutionary: integrate more independent entities on a single chip
    - Single-chip Multiprocessors (SMP) and Simultaneous Multithreaded Processors (SMT)

Eric Rotenberg NC State University © 2000 by Eric Rotenberg Exploiting Multiple On-Chip Contexts in New Ways January 10, 2000

Slide 2



## Exploiting SMP/SMT in New Ways

- SMP/SMT is happening
  - Compaq announced SMT on an 8-way superscalar
  - IBM announced 2 processor cores on a chip
- Opportunity
  - View SMP/SMT architecture as an enabling technology for new processing models
  - Look beyond improving throughput
    - Value-add other than performance (e.g. fault tolerance)
    - Speed up single programs in interesting ways

Eric Rotenberg NC State University © 2000 by Eric Rotenberg



## Microarchitecture and Fault Tolerance

- Conventional fault-tolerant techniques
  - Specialized techniques (e.g. ECC for memory, RESO for ALUs) do not cover arbitrary logic faults
  - Pervasive self-checking logic is intrusive to design
  - System-level fault tolerance (e.g. redundant boards/ computers) too costly for commodity computers
- A microarchitecture-based fault-tolerant approach
  - Microarchitecture performance trends can be easily leveraged for fault-tolerance goals
  - Broad coverage of transient faults
  - Low overhead: performance, area, and design changes



Slide

Technology and Fault Tolerance
High clock rate, dense designs (GHz/billion transistors)

low voltages for power management
high-performance and "undisciplined" circuit techniques
managing clock skew with GHz clocks
pushing the technology envelope potentially reduces design tolerances in general

=> Entire chip prone to frequent, arbitrary transient faults

Eric Rotenberg NC State University © 2000 by Eric Rotenberg  Exploiting Multiple On-Chip Contexts in New Ways January 10, 2000





### **AR-SMT:** Fault Tolerance

- Delay Buffer
  - Simple, fast, hardware-only state passing for comparing thread state
  - Ensures time redundancy: the A- and R-stream copies of an instruction execute at different times
  - Buffer length adjusted to cover transient fault lifetimes
- Transient fault detection and recovery
  - Fault detected when thread state does not match
  - Error latency related to length of Delay Buffer
  - Committed R-stream state is checkpoint for recovery



# AR-SMT: Low Overhead

- Low hardware and design overhead
  - Leverages underlying microarchitecture (SMT)
- Low performance overhead
  - 1. SMT-ness: all the same benefits of general SMT (utilization)
  - 2. R-stream has perfect control and data "predictions" from the A-stream (via delay buffer)!



Eric Rotenberg NC State University © 2000 by Eric Rotenberg



#### **AR-SMT** Summary

- Technology-driven performance improvements
  - New fault environment: frequent, arbitrary transient faults
- Leverage microarchitecture performance trends for broadcoverage, low-overhead fault tolerance
  - SMT-based time redundancy
  - Control and data "prediction"
- Introducing a second, redundant thread increases execution time by only 10% to 30%



NC State University © 2000 by Eric Rotenberg

#### **Other Interesting Perspectives**

- D. Siewiorek. "Niche Successes to Ubiquitous Invisibility: Fault-Tolerant Computing Past, Present, and Future", FTCS-25
  - (Quote) Fault-tolerant architectures have not kept pace with the rate of change in commercial systems.
  - Fault tolerance must make unconventional in-roads into commodity processors: leverage the commodity microarchitecture.
- P. Rubinfeld. "Managing Problems at High Speeds", Virtual Roundtable on the Challenges and Trends in Processor Design, Computer, Jan. 1998.
  - Implications of very high clock rate, dense designs

Eric Rotenberg NC State University © 2000 by Eric Rotenberg



#### Equivalent Dynamic Instr. Streams

- Thought experiment
  - Run the full program and lay out the dynamic instruction stream
  - By trial and error, remove dynamic instructions and discard them if the final program output does not change
- Result: as little as 20% of the original dynamic program can produce the same result



Exploiting Multiple On-Chip Contexts in New Ways

January 10, 2000

Eric Rotenberg NC State University © 2000 by Eric Rotenberg

© 2000 by Eric Rotenberg









































