Architectures for Real-Time

Overview

Computers have always evolved by integrating more complexity into smaller systems. In the 1990s, this led to highly-functional personal computers with supercomputer-like performance. Now, users interact with embedded computers continuously and transparently (e.g., cell phones, cameras, cars). High-end embedded processors provide the computational horsepower. Due to higher functional requirements, these processors have begun inheriting high-performance techniques from their desktop counterparts, such as pipelining, caches, dynamic branch prediction, and multithreading. Unfortunately, while these techniques perform well on average, their performance cannot be analytically bounded, a key safety requirement for embedded systems with real-time tasks. In this project, we are pioneering new directions for designing higher performance real-time embedded systems without compromising safety.

Virtual Simple Architecture (VISA) Framework. The VISA framework is a combined static/dynamic approach to worst-case schedulability analysis of real-time task-sets. Statically, tasks’ worst-case execution times (WCETs) are derived assuming a simple processor which is analyzable. Dynamically, the tasks are actually run on an arbitarily complex processor. Normally, this is unsafe since tasks’ WCETs are based on a different processor abstraction. A novel dynamic checking approach and dual-mode pipeline design assures overall safety.
Safely Exploiting Multithreaded Processors to Tolerate Memory Latency in Real-Time Systems. As frequency increases for high-end embedded processors, memory will become a performance bottleneck, just as it is in desktop systems today. Dynamic switch-on-event multithreading (quickly switching to a different task when the current task misses in the cache) is a promising solution, especially considering that embedded systems are typically rich in available threads, even more so than desktop counterparts. However, currently, there are no known frameworks for analytically bounding the performance of dynamic multithreading. Two related projects are underway in this strategic research area. First, we use deterministic thread switching to nearly fully capitalize on the potential for overlapping computation and memory latency, at the same time yielding a simple closed-form test for determining whether or not a task-set is schedulable. This project, for the first time, extends Liu and Layland’s classic EDF test to handle computation/memory overlap. Second, we are deriving safe yet tight bounds on the performance of dynamic switch-on-event multithreading itself.
Real-Time Processors. At a high level, a single-processor real-time system has three layers: (1) the underlying processor architecture, (2) static/dynamic worst-case timing analysis for deriving tasks’ worst-case execution times (WCETs) on the processor, and (3) scheduling algorithms which depend on WCETs. Using rigid abstractions of one layer to the other, these three layers have evolved separately by researchers in different specialties. Abstraction compartmentalizes each specialty and thus manages complexity, leading to prolific developments within each specialty. But antiquated abstractions may also cause significant trends in one area to go unnoticed in another area, passing up opportunities for leaps in performance, power, and cost.We have begun a new project that “co-designs” the three layers rather than insulate them.

Virtual Simple Architecture (VISA) Framework

Embedded processors provide the computational horsepower for ubiquitous embedded systems, from cell phones and automobiles to NASA’s Mars rovers (pictured below). Embedded processors are evolving, becoming increasingly complex — even borrowing high-performance microarchitectural techniques from desktop processors such as Intel’s Pentium processors. While these techniques improve average performance, deriving worst-case execution times (WCETs) of software tasks becomes intractable. Yet, having WCETs is essential for safely scheduling software tasks in embedded systems with real-time constraints. ubiquitous embedded systems

We developed a new framework, called Virtual Simple Architecture (VISA), for building timing-safe systems on top of timing-unsafe hardware components (pictured below). The VISA framework provides a simple processor model to worst-case execution time analysis. Thus, WCETs are derived for tasks assuming a simple processor. However, tasks are actually executed on a complex processor. Strictly speaking, this is unsafe since WCETs were not derived assuming the complex processor. To address this, progress of tasks is continuously gauged to dynamically confirm that the “proxy” WCETs are not exceeded. Typically, there are no problems, i.e., tasks execute much faster than if they were executed on the hypothetical simple processor. Nonetheless, anomalies cannot be ruled out (e.g., pathological dynamic branch prediction scenarios and speculation penalties). The gauging technique can detect dangerously slow progress of a task, in which case the complex processor is dynamically downgraded to a simple mode of operation that mimics the simple processor model, explicitly bounding the execution time of the task by its WCET.

Thus, the microarchitectural support is a complex processor with dual operating modes, a complex mode and a simple mode. The complex mode typically executes tasks much faster than a simpler processor would, freeing the processor for other tasks or enabling frequency/voltage to be drastically reduced for power savings. The gauging technique plus simple mode ensures bounded timing in atypical cases.

Safely Exploiting Multithreaded Processors to Tolerate Memory Latency in Real-Time Systems

The peak frequency of embedded processors is increasing to meet the demand for more functional embedded systems. As a result, embedded systems are now facing the same “memory wall” that has plagued desktop systems for years. The “memory wall” refers to the widening processor-memory speed gap, which causes performance to not scale with frequency.

A coarse-grain multithreaded processor can effectively hide long memory latencies by quickly switching to an alternate task when the active task issues a memory request, improving overall throughput. However, dynamic switching cannot be safely exploited to improve throughput in hard-real-time embedded systems. The schedulability of a task-set (guaranteeing all tasks meet deadlines) must be determined a priori using offline schedulability tests. Any computation/memory overlap must be statically accounted for.

We developed a novel analytical framework that bounds the overlap between computation of a pipeline-resident-task and on-going memory transfers of other tasks. A simple closed-form schedulability test is derived, that only depends on the aggregate computation (C) and memory (M) components of tasks. Namely, the technique does not require specificity regarding the location of memory transfers within and among tasks and avoids searching all task permutations for a specific feasible schedule. To the best of our knowledge, this is the first work to provide the necessary formalism for safely and tractably exploiting coarse-grain multithreaded processors to tolerate memory latency in hard-real-time systems, exceeding the schedulability limits of classic real-time theory for uniprocessors. Our techniques make it possible to capitalize on higher frequency embedded processors, despite the widening processor-memory speed gap.

The analytical framework is pictured below. The closed-form test extends Liu and Layland’s EDF test, from their classic 1973 paper, to handle coarse-grain multithreading in a uniprocessor. By (1) matching the simplicity of the original EDF test and (2) targeting multithreading for naturally thread-rich embedded systems, our approach holds real promise for influencing real-time scheduling theory.

We have plans to deploy our techniques in a real system based on Ubicom’s IP3023 embedded microprocessor (Ubicom designs embedded microprocessors for wireless networking), among the first embedded microprocessors with multithreading capability. The IP3023 is a 10-stage scalar pipeline with 8 hardware threads. Ubicom graciously donated their high-end development board worth around $20K, pictured below.

Real-Time Processors

At a high level, a single-processor real-time system has three layers: (1) the underlying processor architecture, (2) static/dynamic worst-case timing analysis for deriving tasks’ worst-case execution times (WCETs) on the processor, and (3) scheduling algorithms which depend on WCETs. Classically, the three layers have evolved separately. We feel this insulating approach is obsolete. We have begun a new project that “co-designs” the three layers, opening up opportunities for leaps in performance, power, and cost of real-time systems.

Publications

Conference and Journal Papers

A. Anantaraman and E. Rotenberg. Non-Uniform Program Analysis & Repeatable Execution Constraints: Exploiting Out-of-Order Processors in Real-Time Systems. ACM SIGBED Review, Volume 3, Number 1, January 2006. [pdf]

A. Anantaraman and E. Rotenberg. Non-Uniform Program Analysis & Repeatable Execution Constraints: Exploiting Out-of-Order Processors in Real-Time Systems. Work in Progress Session for the 26th IEEE International Real-Time Systems Symposium (RTSS-26), December 2005. [pdf]

A. El-Haj-Mahmoud, A. S. AL-Zawawi, A. Anantaraman, and E. Rotenberg. Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing. Proceedings of the 2005 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’05), pp. 213-224, September 2005. [pdf]

K. Seth, A. Anantaraman, F. Mueller, and E. Rotenberg. FAST: Frequency-Aware Static Timing Analysis. ACM Transactions on Embedded Computing Systems (TECS), 5(1):200-224, February 2006.

A. Anantaraman, K. Seth, E. Rotenberg, and F. Mueller. Enforcing Safety of Real-Time Schedules on Contemporary Processors Using a Virtual Simple Architecture (VISA). Proceedings of the 25th IEEE International Real-Time Systems Symposium (RTSS-25), pp. 114-125, December 2004. [pdf]

A. El-Haj-Mahmoud and E. Rotenberg. Safely Exploiting Multithreaded Processors to Tolerate Memory Latency in Real-Time Systems. Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’04), pp. 2-13, September 2004. [pdf]

K. Seth, A. Anantaraman, F. Mueller, and E. Rotenberg. FAST: Frequency-Aware Static Timing Analysis. Proceedings of the 24th IEEE International Real-Time Systems Symposium (RTSS-24), pp. 40-51, December 2003.

A. Anantaraman, K. Seth, K. Patil, E. Rotenberg, and F. Mueller. Virtual Simple Architecture (VISA): Exceeding the Complexity Limit in Safe Real-Time Systems. Proceedings of the 30th IEEE/ACM International Symposium on Computer Architecture (ISCA-30), pp. 350-361, June 2003. [pdf]

J. Koppanalil, P. Ramrakhyani, S. Desai, A. Vaidyanathan, and E. Rotenberg. A Case for Dynamic Pipeline Scaling. Proceedings of the 5th International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’02), pp. 1-8, October 2002. [pdf]

E. Rotenberg. Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems. Proceedings of the 34th IEEE/ACM International Symposium on Microarchitecture (MICRO-34), pp. 28-39, December 2001. [pdf]

Technical Reports

A. Anantaraman, K. Seth, E. Rotenberg, and F. Mueller. Exploiting VISA for Higher Concurrency in Safe Real-Time Systems. Technical Report TR-2004-15, Department of Computer Science, North Carolina State University, May 2004. [pdf]

Ashwini Sidhaye, Paul Steinmetz, Eric Rotenberg, David Barrow, and Domenico Arpaia. Collecting Memory Address Traces from an Ericsson Cell Phone and Estimating Cache Performance. Technical Report CESR-TR-01-1, Center for Embedded Systems Research, Department of Electrical and Computer Engineering, North Carolina State University, August 2001.

Book Chapters

E. Rotenberg and A. Anantaraman. Architecture of Embedded Microprocessors, in Multiprocessor Systems-on-Chips. Ahmed Jerraya and Wayne Wolf, Eds. San Francisco, CA: Morgan Kaufmann Publishers, 2005, pp. 81-112.

Student Theses

A. A. El-Haj-Mahmoud. Hard-Real-Time Multithreading: A Combined Microarchitectural and Scheduling Approach. Ph.D. Thesis, Department of Electrical and Computer Engineering, North Carolina State University, May 2006. [NCSU library: on-line thesis]

A. V. Anantaraman. Analysis-Managed Processor (AMP): Exceeding the Complexity Limit in Safe-Real-Time Systems. Ph.D. Thesis, Department of Electrical and Computer Engineering, North Carolina State University, April 2006. [NCSU library: on-line thesis]

P. S. Ramrakhyani. Dynamic Pipeline Scaling. M.S. Thesis, Department of Electrical and Computer Engineering, North Carolina State University, May 2003. [pdf]

A. V. Anantaraman. Reducing Frequency in Real-Time Systems via Speculation and Fall-Back Recovery. M.S. Thesis, Department of Electrical and Computer Engineering, North Carolina State University, April 2003. [pdf]

Talks

Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing. Presented at CASES’05 by A. El-Haj-Mahmoud. [ppt]

Enforcing Safety of Real-Time Schedules on Contemporary Processors Using a Virtual Simple Architecture (VISA). Presented at RTSS-25 by A. V. Anantaraman. [ppt] [ppt – no animation] [pdf]

Safely Exploiting Multithreaded Processors to Tolerate Memory Latency in Real-Time Systems. Presented at CASES’04 by A. El-Haj-Mahmoud. [ppt] [ppt – no animation] [pdf]

Virtual Simple Architecture (VISA): Exceeding the Complexity Limit in Safe Real-Time Systems. Presented at ISCA-30 by E. Rotenberg. [pdf]

A Case for Dynamic Pipeline Scaling. Presented at CASES’02 by P. Ramrakhyani. [pdf]

Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems. Presented at MICRO-34 by E. Rotenberg. [pdf]

Funding

This project is supported by NSF grants No. CCR-0207785 (Dynamic Superpipelining: Shaping Microarchitecture for Variable Frequency), No. CCR-0208581 (Reducing Frequency via Speculation and Fall-Back Recovery), and No. CCR-0310860 (Virtual Simple Architecture (VISA): Exceeding the Complexity Limit in Safe Real-Time Systems). Funding and a development board were also provided by Ericsson. Ubicom provided a development board.

Any opinions, findings, and conclusions or recommendations expressed in this website and publications herein are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.