====== Trace Analysis and Replay ====== ===== Participants ===== * Daniel Campello * Humberto Chacon * Christopher Kerutt * Hector Lopez * Steven Lyons * [[http://www.linkedin.com/pub/andy-norcisa/53/299/415/|Andy Norcisa]] * Jason Liu * Raju Rangaswami ===== Project Goals ===== Analyze all kinds of storage access traces, including block, file, cache, and syscall levels. Replay syscall traces Initial part of the project until December 2013 involved: To build a trace utility to record file system oriented (from and into the HDD) system calls (using strace) used by a single process. The goal is to create a trace that can be used to evaluate caching efficiency. ===== Reading List ===== * {{internal:projects:trace:not-a-file.pdf|SOSP'13: A File is not a File}} * Home use application (monolithically developed, heavy API, interactive): iBench including iWork and iLife * Use DTrace (system calls, stack traces, in-kernel functions such as page-in/outs); AppleScript for repeatable and automated runs * File is not a file: documents organized into complex directory trees * Sequential access is not sequential: pure sequential access is rare (meta-data, headers access more often, out of sequence) * Many auxiliary files; write is forced (fsync ‘misuse’); renaming is popular; heavy multithreads use for I/O latency hiding * {{http://research.cs.wisc.edu/multifacet/papers/sigmetrics13_caches_reuse.pdf|SIGMETRICS’13: Reuse-based Online Models for Caches}} * {{http://www.cs.cornell.edu/projects/spinglass/public_pdfs/File%20System%20Usage.pdf|SOSP’99: File System Usage in Windows NT 4.0}} * {{internal:projects:trace:leung-usenix08.pdf|USENIX'08: Measurement and Analysis of Large-Scale Network File System Workloads}} * {{https://www.usenix.org/legacy/event/usenix06/tech/full_papers/riska/riska.pdf|USENIX’06: Disk Drive Level Workload Characterization}} * Measurements at disk level or RAID-controller level, through SCSI/IDE analyzer attached to the IO bus intercepting electrical signals (in 2004) * Application-dependent variability: r/w ratio, access pattern, write traffic * Environment-dependent variability: request arrival rate, service time, response time, sequentiality, idle time * Burstiness is consistent through all workloads * Measurement environments (not enough information provided due to NDA) include: enterprise (HPC systems, RAID, for web, database, and emails); desktops (PCs, single disk, single user applications); consumer electronics (personal video recorder, m3 player, game console, digital camera) * Major findings: * Disks are idle (high percentage bus idle), although idle interval is environment dependent * Average response time is only a few milliseconds * Access pattern is more random for enterprise than desktop; CE is highly sequential (video recording); use ‘degree of sequentiality’ * Request size varies but variability is low * Use ‘rewrite distance’ (my term) for write lifetime; it’s application dependent * Inter-arrival time varies greatly; it shows long-range dependency (using Hurst parameter) * Seek distances exhibit "extreme long-range dependence”; locality is an inherent characteristic of disk-drive workloads (?) * {{internal:projects:trace:ellard-fast03.pdf|FAST'03: Passive NFS Tracing of Email and Research Workloads}} * Two workloads: EECS and CAMPUS * EECS is research workload for home directories; dominated by metadata requests (for cache consistency) and read/write ratio of less than 1. * CAMPUS workload is almost entirely email; all files can be categorized according to file names with predictable size, lifespan and access pattern. * Issues on NFS tracing: hidden file system operations (never accessed files, no on-disk layout, file hierarch (can be learned), no internal state of server); mismatch NFS interface (no open/close), client-side caching, lost NFS communications, network reordering * Detecting “runs” (can be defined as contiguous accesses, with blocks rounding up to 8k, to a file with gap no larger than 30 seconds; also need to sort accesses within a reorder window of a few milliseconds) * Runs are classified as entire/sequential/random (both traces mostly sequential sub-runs separated by short seeks), read-only/write-only/read-write (big difference between the two traces) * Lifespan of blocks: over 1/2 blocks in EECS die in less than 1 second (log or index files), few more than a day; CAMPUS blocks live longer (due to email client operations) * EECS problems found: user store temporary files (web page caching, dot files, Applet files) in home directory * CAMPUS problems found: email behavior (flat user inbox file, lock files) * Large variations of workload over time (diurnal pattern) observed * Sequentiality metric (delta-consecutive vs. reorder window)? * {{internal:projects:trace:lisa03-paper.pdf|LISA'03: New NFS Tracing Tools and Techniques for System Analysis}} * nfsdump is used to gather traces at NFS server (like tcpdump, using libpcap) and output human-readable text * nfsscan is for data processing and output one or more tables (containing total number of NFS operations, time and latency of these operations, and information of file accessed * A set of utility tools for helping dissect the data output from nfsscan and for gnuplot * One can obtain the following information: workload intensity over time, read/write, data/meta-data, overall, per client, per user, put directory, per file * {{internal:projects:trace:roselli.pdf|USENIX'00: A COMPARISON OF FILE SYSTEM WORKLOADS}} * Collected traces from 4 environments: instruction lab, research lab, web (all HP-UX), and NT cluster. * Histogram of file system events: read dominates; web has significantly more reads; high number of file stat calls * Block lifetime: some traces show bimodal distribution; most blocks die due to overwrites and there’s a high degree of locality in overwritten files; average block lifetime is longer than estimated (for write delay) * Caching effect: small write buffer is sufficient even for write delays up to a day; small cache sufficient to absorb most read traffic; cache effect on reads or writes varies… * File size distribution, file access patterns: mostly reads, mostly sequential * {{http://dl.acm.org/citation.cfm?id=133090|SIGMETRICS'92: Analysis of file I/O traces in commercial computing environments}} * File I/O traces of VAX/VMS systems collected at 8 sites * System characteristics: #files, file size distribution, % active files/data, #IOs, % control vs. access ops, file creation/deletion ratios * File access characteristics: #opens/active file (distribution), #reads/active file (distribution), #writes/active file (distribution), open-to-close, close-to-open timing, read/write activities within open to close intervals * Process access characteristics: #users, #processes, process lifetime, #open files/process, #file ops/process, inter-open time distribution * File sharing: "simultaneous sharing” is low, “sequential sharing” (not necessary open files at the same time) is 2-4 times more (further classified into read-only, write-only, read-and-write). * Workload analysis: relative stable behavior observed for IO operations (intensity, file control ops, read/write) * {{internal:projects:trace:baker.pdf|SOSP'91: Measurements of a Distributed File System}} * Sprint distributed file system, 40 diskless workstations, 4 file servers * File system activities are collected at server end (some require kernel modifications) * Measurements/statistics on caching collected through counters * Two main changes from '85 study: larger files, higher intensity (more burstiness) * {{internal:projects:trace:ousterhout85.pdf|SOSP'85:A Trace-Driven Analysis of the UNIX 4.2 BSD File System}} * Instrumented BSD Unix (4.2), timeshared VAX-11/780s at UC Berkeley EECS * Record only user file system activities (open, close, seek, unlink, truncate, exec) * No reads and writes (no location and timing for individual disk IO, no information on disk access due to paging) * Main results: low IO activity; most file accesses are sequential; short file open time; short file lifetime; caching can have significant effect ===== Meetings ===== * 07/13/15: Updates on module; kprobe takes over {{{{internal:projects:trace:trace071315.mp3|mp3}} * 07/02/15: (Hector) Updates on module; search for missing records; one bug found {{{{internal:projects:trace:trace070215.mp3|mp3}} * 06/30/15: Updates from Humberto on ARTC and Hector on {{{{internal:projects:trace:trace063015.mp3|mp3}} * 06/23/15: (Hector) Updates on module; search for missing records continues {{{{internal:projects:trace:trace062315b.mp3|mp3}} * 06/23/15: (Hector) Updates on module; search for missing records continues {{{{internal:projects:trace:trace062315b.mp3|mp3}} * 06/23/15: (Humberto) Trace discrepancies with ARTc {{{{internal:projects:trace:trace062315a.mp3|mp3}} * 06/19/15: (Hector) Initial investigation of lost trace records {{{{internal:projects:trace:trace061915.mp3|mp3}} * 06/16/15: (Hector) Various updates {{{{internal:projects:trace:trace061615b.mp3|mp3}} * 06/16/15: (Humberto) ARTc port v1 complete {{{{internal:projects:trace:trace061615.mp3|mp3}} * 06/09/15: (Humberto) Updates on ARTc port {{{{internal:projects:trace:trace060915.mp3|mp3}} * 05/22/15: (Humberto) Initial port to ARTc replayer {{{{internal:projects:trace:trace052215b.mp3|mp3}} * 05/22/15: (Hector) Current status of replayer (single-threaded) {{{{internal:projects:trace:trace052215.mp3|mp3}} * 12/23/13: Handle multi-page requests; Understanding the cache simulator code {{{{internal:projects:trace:sys122313.mp3|mp3}} * 11/25/13: Next step: MRC Construction using LRU cache simulator {{{{internal:projects:trace:sys112513.mp3|mp3}} * 10/28/13: Final remaining tasks for user-level page cache simulation {{{{internal:projects:trace:sys112513.mp3|mp3}} * 10/14/13: First cut -- need to work on bug and sendfile implementation {{{{internal:projects:trace:sys101413.mp3|mp3}} * 09/30/13: Problems with script {{{{internal:projects:trace:sys093013.mp3|mp3}} * 09/16/13: First MRC, discussion of project directories {{{{internal:projects:trace:sys091613.mp3|mp3}} * 08/21/13: Implementation -- first version {{{{internal:projects:trace:sys082113.mp3|mp3}} * 08/07/13: Next steps -- data generation {{{{internal:projects:trace:sys080713.mp3|mp3}} * 07/31/13: Updates on systrace (-e) options, recording page IDs, and other status {{{{internal:projects:trace:sys073113.mp3|mp3}} * 07/24/13: Review of initial document, scrum tasks, and overview of strace based scripting and expected output {{{{internal:projects:trace:sys072413.mp3|mp3}} * 07/17/13: Project introduction {{{{internal:projects:trace:sys071713.mp3|mp3}} [[MRCs|Current MRC's results]]