Search: 
Faculty
Students
Papers
Talks
Posters
Tasks

The Resilient Systems theme

GSRC Resilient Systems Theme

The research in the Resilient Systems theme is centered on providing reliable computation on unreliable silicon platforms of future technology nodes: in the upcoming transition from the late- to post-silicon era, a host of diverse threats endanger the availability and survivability of silicon-based platforms. The Resilient Systems theme will concentrate on post-deployment or lifetime resiliency and will address availability and survivability issues with new thinking gained from its past experiences.

The research will follow several guiding principles:

  1. Resiliency to a high number of faults with graceful degradation. For transient failures, this corresponds to a low error rate (<0.01%). In presence of multiple failures, performance should gracefully degrade with incidence of faults, i.e., solutions should provide a fluid trade-off between a platform<92>s health and its performance.
  2. Near-zero cost resilience through hardware/software techniques, i.e., solutions that operate across the hardware/software boundary to achieve improved quality or cost in resiliency
  3. Runtime verification solutions for chip multi-processors. The research team plans to investigate new practical and low-cost solutions targeting highly concurrent platforms where the memory/communication subsystem adds additional complexity
  4. Tailored resiliency, that is, a focus to achieve improved resiliency at lower cost by providing tailored solutions that leverage the flexibility of specific architectures and/or applications,
  5. Resiliency for ultra-low power.

GSRC has traditionally been an incubator for low cost defect-tolerant solutions, protecting systems from permanent and transient failures and extending their expected lifetime. The new generation of the Resilient Systems theme will focus on steadily reducing the costs of reliability through through novel cross-layer mechanisms of robustness and adaptivity, and will be an important source of progress with respect to these long-standing cross-cutting roadmap challenges. Six tasks provide research coverage on all main platform segments. For infrastructure platforms, where component replacement may not be possible (or immediate), especially for battlefield defense applications, availability is of the utmost importance: they demand graceful degradation to failures, high fault resiliency, and they benefit from runtime validation solutions. Mobile platforms have tighter power and cost constraints, thus making them more suited for integrated hardware/software techniques and tailored resiliency. Finally, sensor nodes are naturally deployed in large numbers, thus they can inherently sustain some failures.