Trace Analysis and Replay

Participants

Daniel Campello
Humberto Chacon
Christopher Kerutt
Hector Lopez
Steven Lyons
Andy Norcisa
Jason Liu
Raju Rangaswami

Project Goals

Analyze all kinds of storage access traces, including block, file, cache, and syscall levels.

Replay syscall traces

Initial part of the project until December 2013 involved:

To build a trace utility to record file system oriented (from and into the HDD) system calls (using strace) used by a single process. The goal is to create a trace that can be used to evaluate caching efficiency.

Reading List

SOSP'13: A File is not a File
- Home use application (monolithically developed, heavy API, interactive): iBench including iWork and iLife
- Use DTrace (system calls, stack traces, in-kernel functions such as page-in/outs); AppleScript for repeatable and automated runs
- File is not a file: documents organized into complex directory trees
- Sequential access is not sequential: pure sequential access is rare (meta-data, headers access more often, out of sequence)
- Many auxiliary files; write is forced (fsync ‘misuse’); renaming is popular; heavy multithreads use for I/O latency hiding

SIGMETRICS’13: Reuse-based Online Models for Caches

SOSP’99: File System Usage in Windows NT 4.0

USENIX'08: Measurement and Analysis of Large-Scale Network File System Workloads

USENIX’06: Disk Drive Level Workload Characterization
- Measurements at disk level or RAID-controller level, through SCSI/IDE analyzer attached to the IO bus intercepting electrical signals (in 2004)
- Application-dependent variability: r/w ratio, access pattern, write traffic
- Environment-dependent variability: request arrival rate, service time, response time, sequentiality, idle time
- Burstiness is consistent through all workloads
- Measurement environments (not enough information provided due to NDA) include: enterprise (HPC systems, RAID, for web, database, and emails); desktops (PCs, single disk, single user applications); consumer electronics (personal video recorder, m3 player, game console, digital camera)
- Major findings:
  - Disks are idle (high percentage bus idle), although idle interval is environment dependent
  - Average response time is only a few milliseconds
  - Access pattern is more random for enterprise than desktop; CE is highly sequential (video recording); use ‘degree of sequentiality’
  - Request size varies but variability is low
  - Use ‘rewrite distance’ (my term) for write lifetime; it’s application dependent
  - Inter-arrival time varies greatly; it shows long-range dependency (using Hurst parameter)
  - Seek distances exhibit “extreme long-range dependence”; locality is an inherent characteristic of disk-drive workloads (?)

FAST'03: Passive NFS Tracing of Email and Research Workloads
- Two workloads: EECS and CAMPUS
- EECS is research workload for home directories; dominated by metadata requests (for cache consistency) and read/write ratio of less than 1.
- CAMPUS workload is almost entirely email; all files can be categorized according to file names with predictable size, lifespan and access pattern.
- Issues on NFS tracing: hidden file system operations (never accessed files, no on-disk layout, file hierarch (can be learned), no internal state of server); mismatch NFS interface (no open/close), client-side caching, lost NFS communications, network reordering
- Detecting “runs” (can be defined as contiguous accesses, with blocks rounding up to 8k, to a file with gap no larger than 30 seconds; also need to sort accesses within a reorder window of a few milliseconds)
- Runs are classified as entire/sequential/random (both traces mostly sequential sub-runs separated by short seeks), read-only/write-only/read-write (big difference between the two traces)
- Lifespan of blocks: over 1/2 blocks in EECS die in less than 1 second (log or index files), few more than a day; CAMPUS blocks live longer (due to email client operations)
- EECS problems found: user store temporary files (web page caching, dot files, Applet files) in home directory
- CAMPUS problems found: email behavior (flat user inbox file, lock files)
- Large variations of workload over time (diurnal pattern) observed
- Sequentiality metric (delta-consecutive vs. reorder window)?

LISA'03: New NFS Tracing Tools and Techniques for System Analysis
- nfsdump is used to gather traces at NFS server (like tcpdump, using libpcap) and output human-readable text
- nfsscan is for data processing and output one or more tables (containing total number of NFS operations, time and latency of these operations, and information of file accessed
- A set of utility tools for helping dissect the data output from nfsscan and for gnuplot
- One can obtain the following information: workload intensity over time, read/write, data/meta-data, overall, per client, per user, put directory, per file

USENIX'00: A COMPARISON OF FILE SYSTEM WORKLOADS
- Collected traces from 4 environments: instruction lab, research lab, web (all HP-UX), and NT cluster.
- Histogram of file system events: read dominates; web has significantly more reads; high number of file stat calls
- Block lifetime: some traces show bimodal distribution; most blocks die due to overwrites and there’s a high degree of locality in overwritten files; average block lifetime is longer than estimated (for write delay)
- Caching effect: small write buffer is sufficient even for write delays up to a day; small cache sufficient to absorb most read traffic; cache effect on reads or writes varies…
- File size distribution, file access patterns: mostly reads, mostly sequential

SIGMETRICS'92: Analysis of file I/O traces in commercial computing environments
- File I/O traces of VAX/VMS systems collected at 8 sites
- System characteristics: #files, file size distribution, % active files/data, #IOs, % control vs. access ops, file creation/deletion ratios
- File access characteristics: #opens/active file (distribution), #reads/active file (distribution), #writes/active file (distribution), open-to-close, close-to-open timing, read/write activities within open to close intervals
- Process access characteristics: #users, #processes, process lifetime, #open files/process, #file ops/process, inter-open time distribution
- File sharing: “simultaneous sharing” is low, “sequential sharing” (not necessary open files at the same time) is 2-4 times more (further classified into read-only, write-only, read-and-write).
- Workload analysis: relative stable behavior observed for IO operations (intensity, file control ops, read/write)

SOSP'91: Measurements of a Distributed File System
- Sprint distributed file system, 40 diskless workstations, 4 file servers
- File system activities are collected at server end (some require kernel modifications)
- Measurements/statistics on caching collected through counters
- Two main changes from '85 study: larger files, higher intensity (more burstiness)

SOSP'85:A Trace-Driven Analysis of the UNIX 4.2 BSD File System
- Instrumented BSD Unix (4.2), timeshared VAX-11/780s at UC Berkeley EECS
- Record only user file system activities (open, close, seek, unlink, truncate, exec)
- No reads and writes (no location and timing for individual disk IO, no information on disk access due to paging)
- Main results: low IO activity; most file accesses are sequential; short file open time; short file lifetime; caching can have significant effect

Meetings

07/13/15: Updates on module; kprobe takes over
07/02/15: (Hector) Updates on module; search for missing records; one bug found
06/30/15: Updates from Humberto on ARTC and Hector on
06/23/15: (Hector) Updates on module; search for missing records continues
06/23/15: (Hector) Updates on module; search for missing records continues
06/23/15: (Humberto) Trace discrepancies with ARTc
06/19/15: (Hector) Initial investigation of lost trace records
06/16/15: (Hector) Various updates
06/16/15: (Humberto) ARTc port v1 complete
06/09/15: (Humberto) Updates on ARTc port
05/22/15: (Humberto) Initial port to ARTc replayer
05/22/15: (Hector) Current status of replayer (single-threaded)
12/23/13: Handle multi-page requests; Understanding the cache simulator code
11/25/13: Next step: MRC Construction using LRU cache simulator
10/28/13: Final remaining tasks for user-level page cache simulation
10/14/13: First cut – need to work on bug and sendfile implementation
09/30/13: Problems with script
09/16/13: First MRC, discussion of project directories
08/21/13: Implementation – first version
08/07/13: Next steps – data generation
07/31/13: Updates on systrace (-e) options, recording page IDs, and other status
07/24/13: Review of initial document, scrum tasks, and overview of strace based scripting and expected output
07/17/13: Project introduction

Current MRC's results

Systems Research Laboratory

Table of Contents

Trace Analysis and Replay

Participants

Project Goals

Reading List

Meetings