Two workloads: EECS and CAMPUS
EECS is research workload for home directories; dominated by metadata requests (for cache consistency) and read/write ratio of less than 1.
CAMPUS workload is almost entirely email; all files can be categorized according to file names with predictable size, lifespan and access pattern.
Issues on NFS tracing: hidden file system operations (never accessed files, no on-disk layout, file hierarch (can be learned), no internal state of server); mismatch NFS interface (no open/close), client-side caching, lost NFS communications, network reordering
Detecting “runs” (can be defined as contiguous accesses, with blocks rounding up to 8k, to a file with gap no larger than 30 seconds; also need to sort accesses within a reorder window of a few milliseconds)
Runs are classified as entire/sequential/random (both traces mostly sequential sub-runs separated by short seeks), read-only/write-only/read-write (big difference between the two traces)
Lifespan of blocks: over 1/2 blocks in EECS die in less than 1 second (log or index files), few more than a day; CAMPUS blocks live longer (due to email client operations)
EECS problems found: user store temporary files (web page caching, dot files, Applet files) in home directory
CAMPUS problems found: email behavior (flat user inbox file, lock files)
Large variations of workload over time (diurnal pattern) observed
Sequentiality metric (delta-consecutive vs. reorder window)?