====== SyLab Projects ======

===== Recent Projects =====


  * [[projects:caas|Generalized Caching as a Service]]

   Caching has been a consistent tool of designers of high-performance, scalable computing systems, 
   but it has been deployed in so many ways that it can be difficiult to standardize and scale in 
   cloud systems. This project elevates the use of caching in cloud-scale storage system to a 
   “first-class citizen” by designing and implementing generalized Caching-as-a-Service (CaaS). 
   CaaS defines transformative technology along four complementary dimensions. First, it defines 
   a new abstraction and architecture for storage caches whereby storage stacks can easily embed 
   lightweight CaaS clients within a distributed compute infrastructure. Second, CaaS formulates 
   and theoretically analyzes distributed caching algorithms that operate within the CaaS service 
   such that individual CaaS server nodes cooperate towards achieving globally optimal caching 
   decisions. Third, the distributed CaaS clients and servers are co-designed to achieve strict 
   durability and fault-tolerance in their implementations. And finally, all of the CaaS 
   advancements are driven by insights generated from a detailed whole-system simulator that 
   models the diverse cache devices, network configurations, and application demand.

  * [[projects:cacheus|Cacheus]]

    Recent advances in machine learning open up new and attractive approaches 
    for solving classic problems in computing systems. For storage systems, 
    cache replacement is one such problem because of its enormous impact on 
    performance. CACHEUS represents a new class of fully adaptive, 
    machine-learned caching algorithms that utilize a combination of experts 
    designed to address a variety of workload primitive types. The experts 
    used by CACHEUS include the state-of-the-art ARC, LIRS and LFU, and two 
    new ones – SR-LRU, a scan-resistant version of LRU, and CR-LFU, a 
    churn-resistant version of LFU. CACHEUS using the newly proposed 
    lightweight experts, SR-LRU and CR-LFU, is the most consistently performing 
    caching algorithm across a range of workloads and cache sizes. Furthermore, 
    CACHEUS enables augmenting state-of-the-art algorithms (e.g., LIRS, ARC) 
    by combining it with a complementary cache replacement algorithm (e.g., 
    LFU) to to better handle a wider variety of workload primitive types.


  * [[projects:region-system|Region System]]

    Modern operating systems have been designed around the hypotheses that 
    (a) memory is both byte-addressable and volatile and (b) storage is block 
    addressable and persistent. The arrival of new Persistent Memory (PM) 
    technologies, has made these assumptions obsolete. Despite much of the recent 
    work in this space, the need for consistently sharing PM data across multiple 
    applications remains an urgent, unsolved problem. The Region System is a 
    high-performance operating system stack for PM that implements usable consistency 
    and persistence for application data. The region system provides support for 
    consistently mapping and sharing data resident in PM across user application 
    address spaces. Its high-performance design minimizes the expensive PM ordering 
    and durability operations by embracing a minimalistic approach to metadata 
    construction and management. Finally, the region system creates a novel IPI based 
    PMSYNC operation, which ensures atomic persistence of mapped pages across multiple 
    address spaces.

  * [[projects:nvm-caches|NVM-enabled Host-side Caches]]

    The goals of this NVM Caching project are to:
      * develop selective caching algorithms that better utilize non-datapath 
        NVM caches on the host,
      * develop a model for component availability and failure domains leading 
        to a fault-tolerant write caching design for systems,
      * develop co-operative cache partitioning techniques for DRAM and NVM 
        enabled caches, and develop data reduction techniques for host-side NVM 
        caches.

  * [[projects:nbw:|Non-blocking Writes]]


    In current operating systems, writes to pages that are not in core 
    memory require the process to block until the page can be fetched 
    from the backing store. This project investigates buffering the 
    write to a temporary page in core memory, so as to unblock the 
    process to continue computation, and applying the write 
    asynchronously. Research tasks include study and experimentation 
    with implementation techniques for deferring out-of-core page writes, 
    analysis of how scheduling and other aspects of the operating system 
    may need to be modified in order to realize the full benefits of 
    write deferral, and empirical studies to assess the performance 
    impact of write deferral on a variety of applications. By 
    incorporating non-blocking writes within the operating system, 
    applications can transparently benefit from a performance 
    improvement, without any modification to the application. The 
    potential performance benefits apply to a broad spectrum of computer 
    systems and applications. 


  * [[projects:persistent-memory:|Programming Abstractions for Persistent Memory]]

    Persistent memory systems promise a new era of computing where data 
    management is simplified significantly. For the very first time, 
    applications will be able to directly access devices that can respond 
    with latencies close to DRAM latencies and at the same time, store 
    data persistently without the intervention of the operating system 
    (OS). Fully realizing the potential of this new technology requires 
    that new applications are able to integrate access to these devices 
    and exploit their unique capabilities to the fullest with little 
    additional effort. Moreover, enabling existing applications to easily 
    transition to using this new technology will be critical to the 
    success of persistent memory technologies in the marketplace. The 
    goals of this proposal are directed towards optimal integration of 
    persistent memory devices within existing system software and ease 
    of use within existing application programming paradigms.


  * [[projects:softpm:|Software Persistent Memory (SoftPM) ]]

    Current high-end computing (HEC) applications explicitly manage   
    persistent data, including both application state and application  
    output. This practice not only increases development time and cost,   
    but also requires an application developer to be intimately aware   
    of the underlying platform-dependent storage mechanisms to achieve   
    good application I/O performance. The Software Persistent Memory   
    (SoftPM) project builds a lightweight infrastructure for streamlining   
    data management in next generation HEC applications. SoftPM eliminates   
    the duality of data management in HEC applications by allowing   
    applications to allocate persistent memory in much the same way   
    volatile memory is allocated and easily restore, browse, and interact   
    with past versions of persistent memory state. This simplifies the   
    implementation of three broad capabilities required in HEC applications   
    -- recoverability (e.g., checkpoint-restart), record-replay (e.g.,   
    -data-visualization), and execution branching (e.g., simulation   
    -model-space exploration).

  * [[projects:gerss-trees:|Generalized ERSS Tree Model]]

    Accurately characterizing the resource usage of an application at 
    various levels in the memory hierarchy has been a long-standing 
    research problem. The studies thus far have also implicitly assumed 
    that there is no contention for the resource under consideration. 
    The inevitable future of virtualization driven consolidation 
    necessitates the sharing of physical resources at all levels of the 
    memory hierarchy by multiple virtual machines. We present a unifying 
    Generalized ERSS Tree Model that characterizes the resource usage at 
    all levels of the memory hierarchy during the entire life-time of an 
    application. Our model characterizes capacity requirements, the rate 
    of use, and the impact of resource contention, at each level of 
    memory. We present a methodology to build the model and demonstrate 
    how it can be used for the accurate provisioning of the memory 
    hierarchy in a consolidated environment. 


  * [[projects:iodedup:|I/O Deduplication]]

    Duplication of data in storage systems is becoming increasingly 
    common. We introduce I/O Deduplication, a storage optimization 
    that utilizes content similarity for improving I/O performance by 
    eliminating I/O operations and reducing the mechanical delays 
    during I/O operations. I/O Deduplication consists of three main 
    techniques: content-based caching, dynamic replica retrieval, and 
    selective duplication.

  * [[projects:srcmap:|Energy Proportional Storage Systems]]

    The Environmental Protection Agency (EPA) estimates that energy 
    consumption at data centers could grow to 100 billion KWhr, 
    contributing to 2.9% of the total US electricity needs, by the 
    year 2011. This project investigates fundamentally new techniques 
    for building energy proportional storage systems that consume energy 
    in proportion to the I/O workload intensity. It takes the view that 
    a carefully constructed data replication based approach (instead of 
    data migration) combined with background data synchronization for 
    consistency provide more effective mechanisms for enabling dynamic 
    storage consolidation.

  * [[projects:virtualization:|Resource Management in Virtualized Data Center]]

    Performance models provide the ability to predict application 
    performance for a given set of hardware resources and are used for 
    capacity planning and resource management. Traditional performance 
    models assume the availability of dedicated hardware for the 
    application. In this paper, we build performance models for 
    applications in virtualized environments. We identify a key set of 
    virtualization architecture independent parameters that influence 
    application performance for a diverse and representative set of 
    applications. We propose an iterative model training technique based 
    on artificial neural networks which is found to be accurate across a 
    range of applications. 


  * [[projects:able:|Active Block-Layer Extensions]]

    The Active Block Layer Extensions (ABLE) project presents a new 
    approach to realizing self-managing storage systems. It makes 
    two contributions. First, it creates an evolvable block layer 
    software infrastructure that substantially reduces the complexity 
    involved in building self-managing storage systems by raising the 
    level of abstraction for their development. Second, it develops a 
    theory of storage extensions that provides a logic framework for 
    analyzing extensions developed at the block layer. 

  * [[projects:borg:|Block reORGanization for Self-optimizing Storage Systems]]

    BORG is a self-optimizing storage system that performs automatic 
    block reorganization based on the observed I/O workload. BORG is 
    motivated by three characteristics of I/O workloads: non-uniform 
    access frequency distribution, temporal locality, and partial 
    determinism in non-sequential accesses. To achieve its objective, 
    BORG manages a small, dedicated partition on the disk drive, with 
    the goal of servicing a majority of the I/O requests from within 
    this partition with significantly reduced seek and rotational delays.


  * [[projects:exces:|EXternal Caching in Energy Saving Storage Systems]]

    Power consumption within the disk-based storage subsystem forms a 
    substantial portion of the overall energy footprint in commodity 
    systems. We present the design and implementation of EXCES, an 
    external caching system that employs prefetching, caching, and 
    buffering of disk data for reducing disk activity. EXCES addresses 
    important questions related to external caching, including the 
    estimation of future data popularity, I/O indirection, continuous 
    reconfiguration of the ECD contents, and data consistency.


===== Past Projects =====

  * [[projects:xml_storage:|Semistructured Storage Systems]]: Storage, Data placement, Semistructured data, Semisequential placement, XML, DOM, SAX.