User Tools

Site Tools


internal:projects:sys:start

Trace Analysis and Replay

Participants

  • Daniel Campello
  • Humberto Chacon
  • Christopher Kerutt
  • Hector Lopez
  • Steven Lyons
  • Jason Liu
  • Raju Rangaswami

Project Goals

Analyze all kinds of storage access traces, including block, file, cache, and syscall levels.

Replay syscall traces

Initial part of the project until December 2013 involved:

To build a trace utility to record file system oriented (from and into the HDD) system calls (using strace) used by a single process. The goal is to create a trace that can be used to evaluate caching efficiency.

Reading List

    • Home use application (monolithically developed, heavy API, interactive): iBench including iWork and iLife
    • Use DTrace (system calls, stack traces, in-kernel functions such as page-in/outs); AppleScript for repeatable and automated runs
    • File is not a file: documents organized into complex directory trees
    • Sequential access is not sequential: pure sequential access is rare (meta-data, headers access more often, out of sequence)
    • Many auxiliary files; write is forced (fsync ‘misuse’); renaming is popular; heavy multithreads use for I/O latency hiding
    • Measurements at disk level or RAID-controller level, through SCSI/IDE analyzer attached to the IO bus intercepting electrical signals (in 2004)
    • Application-dependent variability: r/w ratio, access pattern, write traffic
    • Environment-dependent variability: request arrival rate, service time, response time, sequentiality, idle time
    • Burstiness is consistent through all workloads
    • Measurement environments (not enough information provided due to NDA) include: enterprise (HPC systems, RAID, for web, database, and emails); desktops (PCs, single disk, single user applications); consumer electronics (personal video recorder, m3 player, game console, digital camera)
    • Major findings:
      • Disks are idle (high percentage bus idle), although idle interval is environment dependent
      • Average response time is only a few milliseconds
      • Access pattern is more random for enterprise than desktop; CE is highly sequential (video recording); use ‘degree of sequentiality’
      • Request size varies but variability is low
      • Use ‘rewrite distance’ (my term) for write lifetime; it’s application dependent
      • Inter-arrival time varies greatly; it shows long-range dependency (using Hurst parameter)
      • Seek distances exhibit “extreme long-range dependence”; locality is an inherent characteristic of disk-drive workloads (?)
    • Two workloads: EECS and CAMPUS
    • EECS is research workload for home directories; dominated by metadata requests (for cache consistency) and read/write ratio of less than 1.
    • CAMPUS workload is almost entirely email; all files can be categorized according to file names with predictable size, lifespan and access pattern.
    • Issues on NFS tracing: hidden file system operations (never accessed files, no on-disk layout, file hierarch (can be learned), no internal state of server); mismatch NFS interface (no open/close), client-side caching, lost NFS communications, network reordering
    • Detecting “runs” (can be defined as contiguous accesses, with blocks rounding up to 8k, to a file with gap no larger than 30 seconds; also need to sort accesses within a reorder window of a few milliseconds)
    • Runs are classified as entire/sequential/random (both traces mostly sequential sub-runs separated by short seeks), read-only/write-only/read-write (big difference between the two traces)
    • Lifespan of blocks: over 1/2 blocks in EECS die in less than 1 second (log or index files), few more than a day; CAMPUS blocks live longer (due to email client operations)
    • EECS problems found: user store temporary files (web page caching, dot files, Applet files) in home directory
    • CAMPUS problems found: email behavior (flat user inbox file, lock files)
    • Large variations of workload over time (diurnal pattern) observed
    • Sequentiality metric (delta-consecutive vs. reorder window)?
    • nfsdump is used to gather traces at NFS server (like tcpdump, using libpcap) and output human-readable text
    • nfsscan is for data processing and output one or more tables (containing total number of NFS operations, time and latency of these operations, and information of file accessed
    • A set of utility tools for helping dissect the data output from nfsscan and for gnuplot
    • One can obtain the following information: workload intensity over time, read/write, data/meta-data, overall, per client, per user, put directory, per file
    • Collected traces from 4 environments: instruction lab, research lab, web (all HP-UX), and NT cluster.
    • Histogram of file system events: read dominates; web has significantly more reads; high number of file stat calls
    • Block lifetime: some traces show bimodal distribution; most blocks die due to overwrites and there’s a high degree of locality in overwritten files; average block lifetime is longer than estimated (for write delay)
    • Caching effect: small write buffer is sufficient even for write delays up to a day; small cache sufficient to absorb most read traffic; cache effect on reads or writes varies…
    • File size distribution, file access patterns: mostly reads, mostly sequential
    • File I/O traces of VAX/VMS systems collected at 8 sites
    • System characteristics: #files, file size distribution, % active files/data, #IOs, % control vs. access ops, file creation/deletion ratios
    • File access characteristics: #opens/active file (distribution), #reads/active file (distribution), #writes/active file (distribution), open-to-close, close-to-open timing, read/write activities within open to close intervals
    • Process access characteristics: #users, #processes, process lifetime, #open files/process, #file ops/process, inter-open time distribution
    • File sharing: “simultaneous sharing” is low, “sequential sharing” (not necessary open files at the same time) is 2-4 times more (further classified into read-only, write-only, read-and-write).
    • Workload analysis: relative stable behavior observed for IO operations (intensity, file control ops, read/write)
    • Sprint distributed file system, 40 diskless workstations, 4 file servers
    • File system activities are collected at server end (some require kernel modifications)
    • Measurements/statistics on caching collected through counters
    • Two main changes from '85 study: larger files, higher intensity (more burstiness)
    • Instrumented BSD Unix (4.2), timeshared VAX-11/780s at UC Berkeley EECS
    • Record only user file system activities (open, close, seek, unlink, truncate, exec)
    • No reads and writes (no location and timing for individual disk IO, no information on disk access due to paging)
    • Main results: low IO activity; most file accesses are sequential; short file open time; short file lifetime; caching can have significant effect

Meetings

  • 07/13/15: Updates on module; kprobe takes over
  • 07/02/15: (Hector) Updates on module; search for missing records; one bug found
  • 06/30/15: Updates from Humberto on ARTC and Hector on
  • 06/23/15: (Hector) Updates on module; search for missing records continues
  • 06/23/15: (Hector) Updates on module; search for missing records continues
  • 06/23/15: (Humberto) Trace discrepancies with ARTc
  • 06/19/15: (Hector) Initial investigation of lost trace records
  • 06/16/15: (Hector) Various updates
  • 06/16/15: (Humberto) ARTc port v1 complete
  • 06/09/15: (Humberto) Updates on ARTc port
  • 05/22/15: (Humberto) Initial port to ARTc replayer
  • 05/22/15: (Hector) Current status of replayer (single-threaded)
  • 12/23/13: Handle multi-page requests; Understanding the cache simulator code
  • 11/25/13: Next step: MRC Construction using LRU cache simulator
  • 10/28/13: Final remaining tasks for user-level page cache simulation
  • 10/14/13: First cut – need to work on bug and sendfile implementation
  • 09/30/13: Problems with script
  • 09/16/13: First MRC, discussion of project directories
  • 08/21/13: Implementation – first version
  • 08/07/13: Next steps – data generation
  • 07/31/13: Updates on systrace (-e) options, recording page IDs, and other status
  • 07/24/13: Review of initial document, scrum tasks, and overview of strace based scripting and expected output
  • 07/17/13: Project introduction

Current MRC's results

internal/projects/sys/start.txt · Last modified: 2024/06/28 20:42 by 127.0.0.1