Caching has been a consistent tool of designers of high-performance, scalable computing systems,
but it has been deployed in so many ways that it can be difficiult to standardize and scale in
cloud systems. This project elevates the use of caching in cloud-scale storage system to a
“first-class citizen” by designing and implementing generalized Caching-as-a-Service (CaaS).
CaaS defines transformative technology along four complementary dimensions. First, it defines
a new abstraction and architecture for storage caches whereby storage stacks can easily embed
lightweight CaaS clients within a distributed compute infrastructure. Second, CaaS formulates
and theoretically analyzes distributed caching algorithms that operate within the CaaS service
such that individual CaaS server nodes cooperate towards achieving globally optimal caching
decisions. Third, the distributed CaaS clients and servers are co-designed to achieve strict
durability and fault-tolerance in their implementations. And finally, all of the CaaS
advancements are driven by insights generated from a detailed whole-system simulator that
models the diverse cache devices, network configurations, and application demand.
Recent advances in machine learning open up new and attractive approaches
for solving classic problems in computing systems. For storage systems,
cache replacement is one such problem because of its enormous impact on
performance. CACHEUS represents a new class of fully adaptive,
machine-learned caching algorithms that utilize a combination of experts
designed to address a variety of workload primitive types. The experts
used by CACHEUS include the state-of-the-art ARC, LIRS and LFU, and two
new ones – SR-LRU, a scan-resistant version of LRU, and CR-LFU, a
churn-resistant version of LFU. CACHEUS using the newly proposed
lightweight experts, SR-LRU and CR-LFU, is the most consistently performing
caching algorithm across a range of workloads and cache sizes. Furthermore,
CACHEUS enables augmenting state-of-the-art algorithms (e.g., LIRS, ARC)
by combining it with a complementary cache replacement algorithm (e.g.,
LFU) to to better handle a wider variety of workload primitive types.
Modern operating systems have been designed around the hypotheses that
(a) memory is both byte-addressable and volatile and (b) storage is block
addressable and persistent. The arrival of new Persistent Memory (PM)
technologies, has made these assumptions obsolete. Despite much of the recent
work in this space, the need for consistently sharing PM data across multiple
applications remains an urgent, unsolved problem. The Region System is a
high-performance operating system stack for PM that implements usable consistency
and persistence for application data. The region system provides support for
consistently mapping and sharing data resident in PM across user application
address spaces. Its high-performance design minimizes the expensive PM ordering
and durability operations by embracing a minimalistic approach to metadata
construction and management. Finally, the region system creates a novel IPI based
PMSYNC operation, which ensures atomic persistence of mapped pages across multiple
address spaces.
The goals of this NVM Caching project are to:
* develop selective caching algorithms that better utilize non-datapath
NVM caches on the host,
* develop a model for component availability and failure domains leading
to a fault-tolerant write caching design for systems,
* develop co-operative cache partitioning techniques for DRAM and NVM
enabled caches, and develop data reduction techniques for host-side NVM
caches.
In current operating systems, writes to pages that are not in core
memory require the process to block until the page can be fetched
from the backing store. This project investigates buffering the
write to a temporary page in core memory, so as to unblock the
process to continue computation, and applying the write
asynchronously. Research tasks include study and experimentation
with implementation techniques for deferring out-of-core page writes,
analysis of how scheduling and other aspects of the operating system
may need to be modified in order to realize the full benefits of
write deferral, and empirical studies to assess the performance
impact of write deferral on a variety of applications. By
incorporating non-blocking writes within the operating system,
applications can transparently benefit from a performance
improvement, without any modification to the application. The
potential performance benefits apply to a broad spectrum of computer
systems and applications.
Persistent memory systems promise a new era of computing where data
management is simplified significantly. For the very first time,
applications will be able to directly access devices that can respond
with latencies close to DRAM latencies and at the same time, store
data persistently without the intervention of the operating system
(OS). Fully realizing the potential of this new technology requires
that new applications are able to integrate access to these devices
and exploit their unique capabilities to the fullest with little
additional effort. Moreover, enabling existing applications to easily
transition to using this new technology will be critical to the
success of persistent memory technologies in the marketplace. The
goals of this proposal are directed towards optimal integration of
persistent memory devices within existing system software and ease
of use within existing application programming paradigms.
Current high-end computing (HEC) applications explicitly manage
persistent data, including both application state and application
output. This practice not only increases development time and cost,
but also requires an application developer to be intimately aware
of the underlying platform-dependent storage mechanisms to achieve
good application I/O performance. The Software Persistent Memory
(SoftPM) project builds a lightweight infrastructure for streamlining
data management in next generation HEC applications. SoftPM eliminates
the duality of data management in HEC applications by allowing
applications to allocate persistent memory in much the same way
volatile memory is allocated and easily restore, browse, and interact
with past versions of persistent memory state. This simplifies the
implementation of three broad capabilities required in HEC applications
-- recoverability (e.g., checkpoint-restart), record-replay (e.g.,
-data-visualization), and execution branching (e.g., simulation
-model-space exploration).
Accurately characterizing the resource usage of an application at
various levels in the memory hierarchy has been a long-standing
research problem. The studies thus far have also implicitly assumed
that there is no contention for the resource under consideration.
The inevitable future of virtualization driven consolidation
necessitates the sharing of physical resources at all levels of the
memory hierarchy by multiple virtual machines. We present a unifying
Generalized ERSS Tree Model that characterizes the resource usage at
all levels of the memory hierarchy during the entire life-time of an
application. Our model characterizes capacity requirements, the rate
of use, and the impact of resource contention, at each level of
memory. We present a methodology to build the model and demonstrate
how it can be used for the accurate provisioning of the memory
hierarchy in a consolidated environment.
Duplication of data in storage systems is becoming increasingly
common. We introduce I/O Deduplication, a storage optimization
that utilizes content similarity for improving I/O performance by
eliminating I/O operations and reducing the mechanical delays
during I/O operations. I/O Deduplication consists of three main
techniques: content-based caching, dynamic replica retrieval, and
selective duplication.
The Environmental Protection Agency (EPA) estimates that energy
consumption at data centers could grow to 100 billion KWhr,
contributing to 2.9% of the total US electricity needs, by the
year 2011. This project investigates fundamentally new techniques
for building energy proportional storage systems that consume energy
in proportion to the I/O workload intensity. It takes the view that
a carefully constructed data replication based approach (instead of
data migration) combined with background data synchronization for
consistency provide more effective mechanisms for enabling dynamic
storage consolidation.
Performance models provide the ability to predict application
performance for a given set of hardware resources and are used for
capacity planning and resource management. Traditional performance
models assume the availability of dedicated hardware for the
application. In this paper, we build performance models for
applications in virtualized environments. We identify a key set of
virtualization architecture independent parameters that influence
application performance for a diverse and representative set of
applications. We propose an iterative model training technique based
on artificial neural networks which is found to be accurate across a
range of applications.
The Active Block Layer Extensions (ABLE) project presents a new
approach to realizing self-managing storage systems. It makes
two contributions. First, it creates an evolvable block layer
software infrastructure that substantially reduces the complexity
involved in building self-managing storage systems by raising the
level of abstraction for their development. Second, it develops a
theory of storage extensions that provides a logic framework for
analyzing extensions developed at the block layer.
BORG is a self-optimizing storage system that performs automatic
block reorganization based on the observed I/O workload. BORG is
motivated by three characteristics of I/O workloads: non-uniform
access frequency distribution, temporal locality, and partial
determinism in non-sequential accesses. To achieve its objective,
BORG manages a small, dedicated partition on the disk drive, with
the goal of servicing a majority of the I/O requests from within
this partition with significantly reduced seek and rotational delays.
Power consumption within the disk-based storage subsystem forms a
substantial portion of the overall energy footprint in commodity
systems. We present the design and implementation of EXCES, an
external caching system that employs prefetching, caching, and
buffering of disk data for reducing disk activity. EXCES addresses
important questions related to external caching, including the
estimation of future data popularity, I/O indirection, continuous
reconfiguration of the ECD contents, and data consistency.