====== Trace Analysis and Replay ======

===== Participants =====

  * Daniel Campello
  * Humberto Chacon
  * Christopher Kerutt
  * Hector Lopez
  * Steven Lyons
  * [[http://www.linkedin.com/pub/andy-norcisa/53/299/415/|Andy Norcisa]]
  * Jason Liu
  * Raju Rangaswami
  
===== Project Goals =====

Analyze all kinds of storage access traces, including block, file, cache, and syscall levels. 

Replay syscall traces

Initial part of the project until December 2013 involved:

 To build a trace utility to record file system oriented (from and into the HDD) system calls (using strace) used by a single process. The goal is to create a trace that can be used to evaluate caching efficiency. 


===== Reading List =====

  * {{internal:projects:trace:not-a-file.pdf|SOSP'13: A File is not a File}}
    * Home use application (monolithically developed, heavy API, interactive): iBench including iWork and iLife
    * Use DTrace (system calls, stack traces, in-kernel functions such as page-in/outs); AppleScript for repeatable and automated runs
    * File is not a file: documents organized into complex directory trees
    * Sequential access is not sequential: pure sequential access is rare (meta-data, headers access more often, out of sequence)
    * Many auxiliary files; write is forced (fsync ‘misuse’); renaming is popular; heavy multithreads use for I/O latency hiding

    * {{http://research.cs.wisc.edu/multifacet/papers/sigmetrics13_caches_reuse.pdf|SIGMETRICS’13: Reuse-based Online Models for Caches}}

    * {{http://www.cs.cornell.edu/projects/spinglass/public_pdfs/File%20System%20Usage.pdf|SOSP’99: File System Usage in Windows NT 4.0}} 

  * {{internal:projects:trace:leung-usenix08.pdf|USENIX'08: Measurement and Analysis of Large-Scale Network File System Workloads}}

  * {{https://www.usenix.org/legacy/event/usenix06/tech/full_papers/riska/riska.pdf|USENIX’06: Disk Drive Level Workload Characterization}}
    * Measurements at disk level or RAID-controller level, through SCSI/IDE analyzer attached to the IO bus intercepting electrical signals (in 2004)
    * Application-dependent variability: r/w ratio, access pattern, write traffic
    * Environment-dependent variability: request arrival rate, service time, response time, sequentiality, idle time
    * Burstiness is consistent through all workloads
    * Measurement environments (not enough information provided due to NDA) include: enterprise (HPC systems, RAID, for web, database, and emails); desktops (PCs, single disk, single user applications); consumer electronics (personal video recorder, m3 player, game console, digital camera)
    * Major findings:
      * Disks are idle (high percentage bus idle), although idle interval is environment dependent
      * Average response time is only a few milliseconds
      * Access pattern is more random for enterprise than desktop; CE is highly sequential (video recording); use ‘degree of sequentiality’
      * Request size varies but variability is low
      * Use ‘rewrite distance’ (my term) for write lifetime; it’s application dependent
      * Inter-arrival time varies greatly; it shows long-range dependency (using Hurst parameter)
      * Seek distances exhibit "extreme long-range dependence”; locality is an inherent characteristic of disk-drive workloads (?)

  * {{internal:projects:trace:ellard-fast03.pdf|FAST'03: Passive NFS Tracing of Email and Research Workloads}}
    * Two workloads: EECS and CAMPUS
    * EECS is research workload for home directories; dominated by metadata requests (for cache consistency) and read/write ratio of less than 1.
    * CAMPUS workload is almost entirely email; all files can be categorized according to file names with predictable size, lifespan and access pattern.
    * Issues on NFS tracing: hidden file system operations (never accessed files, no on-disk layout, file hierarch (can be learned), no internal state of server); mismatch NFS interface (no open/close), client-side caching, lost NFS communications, network reordering
    * Detecting “runs” (can be defined as contiguous accesses, with blocks rounding up to 8k, to a file with gap no larger than 30 seconds; also need to sort accesses within a reorder window of a few milliseconds)
    * Runs are classified as entire/sequential/random (both traces mostly sequential sub-runs separated by short seeks), read-only/write-only/read-write (big difference between the two traces)
    * Lifespan of blocks: over 1/2 blocks in EECS die in less than 1 second (log or index files), few more than a day; CAMPUS blocks live longer (due to email client operations)
    * EECS problems found: user store temporary files (web page caching, dot files, Applet files) in home directory
    * CAMPUS problems found: email behavior (flat user inbox file, lock files)
    * Large variations of workload over time (diurnal pattern) observed
    * Sequentiality metric (delta-consecutive vs. reorder window)?

  * {{internal:projects:trace:lisa03-paper.pdf|LISA'03: New NFS Tracing Tools and Techniques for System Analysis}}
    * nfsdump is used to gather traces at NFS server (like tcpdump, using libpcap) and output human-readable text
    * nfsscan is for data processing and output one or more tables (containing total number of NFS operations, time and latency of these operations, and information of file accessed
    * A set of utility tools for helping dissect the data output from nfsscan and for gnuplot
    * One can obtain the following information: workload intensity over time, read/write, data/meta-data, overall, per client, per user, put directory, per file

  * {{internal:projects:trace:roselli.pdf|USENIX'00: A COMPARISON OF FILE SYSTEM WORKLOADS}}
    * Collected traces from 4 environments: instruction lab, research lab, web (all HP-UX), and NT cluster. 
    * Histogram of file system events: read dominates; web has significantly more reads; high number of file stat calls
    * Block lifetime: some traces show bimodal distribution; most blocks die due to overwrites and there’s a high degree of locality in overwritten files; average block lifetime is longer than estimated (for write delay)
    * Caching effect: small write buffer is sufficient even for write delays up to a day; small cache sufficient to absorb most read traffic; cache effect on reads or writes varies…
    * File size distribution, file access patterns: mostly reads, mostly sequential

  * {{http://dl.acm.org/citation.cfm?id=133090|SIGMETRICS'92: Analysis of file I/O traces in commercial computing environments}}
    * File I/O traces of VAX/VMS systems collected at 8 sites
    * System characteristics: #files, file size distribution, % active files/data, #IOs, % control vs. access ops, file creation/deletion ratios
    * File access characteristics: #opens/active file (distribution), #reads/active file (distribution), #writes/active file (distribution), open-to-close, close-to-open timing, read/write activities within open to close intervals
    * Process access characteristics: #users, #processes, process lifetime, #open files/process, #file ops/process, inter-open time distribution
    * File sharing: "simultaneous sharing” is low, “sequential sharing” (not necessary open files at the same time) is 2-4 times more (further classified into read-only, write-only, read-and-write).
    * Workload analysis: relative stable behavior observed for IO operations (intensity, file control ops, read/write) 

  * {{internal:projects:trace:baker.pdf|SOSP'91: Measurements of a Distributed File System}}
    * Sprint distributed file system, 40 diskless workstations, 4 file servers
    * File system activities are collected at server end (some require kernel modifications) 
    * Measurements/statistics on caching collected through counters
    * Two main changes from '85 study: larger files, higher intensity (more burstiness)

  * {{internal:projects:trace:ousterhout85.pdf|SOSP'85:A Trace-Driven Analysis of the UNIX 4.2 BSD File System}}
    * Instrumented BSD Unix (4.2), timeshared VAX-11/780s at UC Berkeley EECS
    * Record only user file system activities (open, close, seek, unlink, truncate, exec)
    * No reads and writes (no location and timing for individual disk IO, no information on disk access due to paging)
    * Main results: low IO activity; most file accesses are sequential; short file open time; short file lifetime; caching can have significant effect
===== Meetings =====

  * 07/13/15: Updates on module; kprobe takes over {{{{internal:projects:trace:trace071315.mp3|mp3}}
  * 07/02/15: (Hector) Updates on module; search for missing records; one bug found {{{{internal:projects:trace:trace070215.mp3|mp3}}
  * 06/30/15: Updates from Humberto on ARTC and Hector on {{{{internal:projects:trace:trace063015.mp3|mp3}}
  * 06/23/15: (Hector) Updates on module; search for missing records continues {{{{internal:projects:trace:trace062315b.mp3|mp3}}
  * 06/23/15: (Hector) Updates on module; search for missing records continues {{{{internal:projects:trace:trace062315b.mp3|mp3}}
  * 06/23/15: (Humberto) Trace discrepancies with ARTc {{{{internal:projects:trace:trace062315a.mp3|mp3}}
  * 06/19/15: (Hector) Initial investigation of lost trace records {{{{internal:projects:trace:trace061915.mp3|mp3}}
  * 06/16/15: (Hector) Various updates {{{{internal:projects:trace:trace061615b.mp3|mp3}}
  * 06/16/15: (Humberto) ARTc port v1 complete {{{{internal:projects:trace:trace061615.mp3|mp3}}
  * 06/09/15: (Humberto) Updates on ARTc port {{{{internal:projects:trace:trace060915.mp3|mp3}}
  * 05/22/15: (Humberto) Initial port to ARTc replayer {{{{internal:projects:trace:trace052215b.mp3|mp3}}
  * 05/22/15: (Hector) Current status of replayer (single-threaded) {{{{internal:projects:trace:trace052215.mp3|mp3}}
  * 12/23/13: Handle multi-page requests; Understanding the cache simulator code {{{{internal:projects:trace:sys122313.mp3|mp3}}
  * 11/25/13: Next step: MRC Construction using LRU cache simulator {{{{internal:projects:trace:sys112513.mp3|mp3}}
  * 10/28/13: Final remaining tasks for user-level page cache simulation {{{{internal:projects:trace:sys112513.mp3|mp3}}
  * 10/14/13: First cut -- need to work on bug and sendfile implementation {{{{internal:projects:trace:sys101413.mp3|mp3}}
  * 09/30/13: Problems with script {{{{internal:projects:trace:sys093013.mp3|mp3}}
  * 09/16/13: First MRC, discussion of project directories {{{{internal:projects:trace:sys091613.mp3|mp3}}
  * 08/21/13: Implementation -- first version {{{{internal:projects:trace:sys082113.mp3|mp3}}
  * 08/07/13: Next steps -- data generation {{{{internal:projects:trace:sys080713.mp3|mp3}}
  * 07/31/13: Updates on systrace (-e) options, recording page IDs, and other status {{{{internal:projects:trace:sys073113.mp3|mp3}}
  * 07/24/13: Review of initial document, scrum tasks, and overview of strace based scripting and expected output {{{{internal:projects:trace:sys072413.mp3|mp3}}
  * 07/17/13: Project introduction {{{{internal:projects:trace:sys071713.mp3|mp3}}


[[MRCs|Current MRC's results]]