User Tools

Site Tools


internal:ideas:idea-list

Ideas for Potential Projects

Visualization and Analysis Support for Distributed Simulations using Software Persistent Memory

The goal of this project is to simplify the addition of visualization and online/post-hoc analysis support for distributed simulation runs. The proposed approach will use versioned SoftPM containers, one per simulation node. Versioned containers support the creation of persistence points (checkpoints) on a time-line which can then be retrieved and browsed through by a visualizer or analyzer, either online or post-hoc. The initial goals of the project would be:

  1. coordinated versioning of simulation state within PRIME distributed instances
  2. implementation of an analyzer/visualizer application that feeds off of this versioned state
  3. integration with a Java-based viz engine that reflects state changes online

[Note: This project assumes that an implementation of versioned containers in SoftPM is available.]

Making Distributed Simulation Recoverable with Software Persistent Memory

The goal of this project is to make PRIME recoverable after system crashes and power failures to support long-running simulation runs on large number of nodes. The project will implement the use SoftPM simple containers to store key data structures that constitute PRIME runtime state and to design the recovery of simulation state and simulation execution. The goal of this project will be address:

  1. identification of key simulation state and the appropriate encapsulation of such state within one or more containers
  2. implementation of coordinated checkpointing of distributed simualation state in PRIME
  3. exploring the trade-off between rate of incremental and full persistence points
  4. implementation of container-based recovery of simulation execution in PRIME

[Note: This project assumes that an implementation of simple containers in SoftPM is available.]

Speeding Up HEC I/O with Solid State Drives

FIXME What is HEC?

The goal of this project is to explore and optimize the use of Solid-state Drives (SSDs) in HEC platforms. SSDs have very different cost/reliability/performance characteristics than hard disk dirves (HDD). Consequently, SSDs can be used in many ways and this project will conduct a systematic analysis of these possibilities and develop designs to address each of them:

  1. compute node persistent storage (used as a large, persistent cache for PFS data and staging checkpoints)
  2. server-side read cache and write buffer for speeding up I/O performance of PFS servers
  3. optimizing data layout and I/O scheduling on SSDs for parallel I/O workloads

[Note: This project assumes the availability of substantial SSD storage for constructing the testbed.]

Full System Memory Traces Using QEMU

Contact: Luis or Ricardo

In this project we want to instrument QEMU to get full system memory accesses. We already have an implementation that solve many of the challenges we found in this project.

  1. Are we logging all accesses? The answer to this question depends on what are the traces needed for. In our current implementation, we log only those accesses that are more likely to generate a fault.
  2. What information are we logging? We have plenty of information in the page tables and we are currently log all of them. In this bullet, we have two important pieces missing that require an interesting amount of work:
    • Time: It turns out there is no easy answer for this. Using the host time have three big problems: (1) the host CPU scheduler could create big time gaps in the traces, (2) this time could give the impression that the accesses are slow and (3) the internal processing of QEMU could make the gap between accesses uneven. Another option is to use the QEMU instructions count but instructions != time. We haven't been able to solve this problem yet.
    • Anonymous or FS: Is the page accessed an anonymous or file system page? This information is available only in the guest kernel. This will involve developing some way of communicating the guest OS and QEMU. One way of doing this is to create a new interruption in QEMU and implement its handler in the OS. This will be general enough to work with any OS. We haven't done this yet. What other scenarios would make this useful?

Multi-location Blocks I/O Scheduler

Contact: Luis or Ricardo

The idea of this project is to design an I/O scheduler for requests which data is available in more than one location. Having the same data in more than one location could be available in storage systems with redundancy like RAID 1 or deduplicated systems.

For this project we were trying to create an scheduler with theoretical support using competitive analysis. This was already done for commodity storage systems in the paper “New Algorithms for Disk Scheduling”. We already have some results we obtained in the Topics of Algorithms class.

Hybrid Storage Device

Contact: Luis

The idea of this project is to study how we can combine multiple heterogeneous devices and export a single block array to the system. As goal, the combination of these devices should improve performance, reliability, power efficiency, etc. This project brings two important challenges:

  1. Optimal data layout in both devices.
  2. How to schedule multiple requests when more than one option is available. Note that all writes have an option of where they can reside. Also note that the previous project is relevant to this point.

Some of this questions are partially answer in previous meetings a projects we developed before.

Coordinated guest I/O schedulers

Contact: Ricardo

internal/ideas/idea-list.txt · Last modified: 2024/06/28 20:42 by 127.0.0.1