Table of Contents

I/O Deduplication

Use intrinsic data duplication in storage systems to improve I/O performance.

Participants

Project Goals

The goal of this project we explore the premise that intrinsic data duplication in storage systems can be utilized to improve I/O performance.

Design

I/O deduplication comprises three key techniques: (i) content based caching that uses the popularity of “data content” rather than “data location” of I/O accesses in making caching decisions, (ii) dynamic replica retrieval that upon a cache miss, dynamically chooses target replica to retrive that minimizes disk head movement , and (iii) popular content duplication that creates copies of popular files on disk so that optimization (ii) becomes more effective.

A fourth optimization which is still under consideration is optional replica updates that dynamically chooses between updating the target location (as requested) for duplicate content and registering persistent copy-on-write metadata information.

Meetings

Experiments

Effect of the head position optimization:

The solid line shows the execution time for 8192 requests distributed in a 500GB disk. Each request is of size 4096B and its content has 1 to 1000 copies (x axis).

The dotted line shows the execution time for 8192 requests distributed in a 500GB/x disk.

Static similarity:

Web traces

Webmail:

  bash-3.00$ wc -l webmail-hda-hashes-blocks
  2621433 webmail-hda-hashes-blocks0.836426698
  bash-3.00$ wc -l webmail-hda-hashes-blocks-freqs 
  679913 webmail-hda-hashes-blocks-freqs
  bash-3.00$ tail webmail-hda-hashes-blocks-freqs
      190 1bf5da25d44d5e211da6d05a71aa8bb5
      287 edc44bbcad017816f872db4dcbe0ef2e
      291 34b96409e3df0a7de5e829b1c4bb3000
      401 115d93fc589fab0673448f2fe76e9b09
      408 a12b4a9b75f829fd55107cfbd7981279
      413 01718ffb8ce64b88a42d84fefc703145
      419 ecb4f1d9d1677005bdaaa2e2b5b9464e
      421 941d368d2ca84b91a6a3709d0117db8a
      879 48859c5c4d9612a8fa65b9c239511b61
   612994 3df1244f6143869f52abf2a1d73d0c0f

static_similarity = (2621433 - 612994) / (1679913 - 1) = 1.195562029

Online:

 bash-3.00$ wc -l online-hda-hashes-blocks
 2621433 online-hda-hashes-blocks
 bash-3.00$ wc -l online-hda-hashes-blocks-freqs 
 1672880 online-hda-hashes-blocks-freqs
 bash-3.00$ tail online-hda-hashes-blocks-freqs
   201 5a91379cfb04d88cbc0529b005263b92
   220 1bf5da25d44d5e211da6d05a71aa8bb5
   331 34b96409e3df0a7de5e829b1c4bb3000
   483 115d93fc589fab0673448f2fe76e9b09
   493 a12b4a9b75f829fd55107cfbd7981279
   501 01718ffb8ce64b88a42d84fefc703145
   502 ecb4f1d9d1677005bdaaa2e2b5b9464e
   503 941d368d2ca84b91a6a3709d0117db8a
   998 48859c5c4d9612a8fa65b9c239511b61
580305 3df1244f6143869f52abf2a1d73d0c0f

static_similarity = (2621433 - 580305) / (1672880 - 1) = 1.220128892

webmail+online:

rkoll001@leopard:~ 82% wc -l online-webmail-hda-hashes-blocks
5242866 online-webmail-hda-hashes-blocks
rkoll001@leopard:~ 83% wc -l online-webmail-hda-hashes-blocks-freqs 
1960763 online-webmail-hda-hashes-blocks-freqs
rkoll001@leopard:~ 84% tail online-webmail-hda-hashes-blocks-freqs
    387 5a91379cfb04d88cbc0529b005263b92
    410 1bf5da25d44d5e211da6d05a71aa8bb5
    622 34b96409e3df0a7de5e829b1c4bb3000
    884 115d93fc589fab0673448f2fe76e9b09
    901 a12b4a9b75f829fd55107cfbd7981279
    914 01718ffb8ce64b88a42d84fefc703145
    921 ecb4f1d9d1677005bdaaa2e2b5b9464e
    924 941d368d2ca84b91a6a3709d0117db8a
   1877 48859c5c4d9612a8fa65b9c239511b61
1193299 3df1244f6143869f52abf2a1d73d0c0f

static_similarity = (5242866 - 1193299) / (1960763 - 1) = 2.065302673

Lab traces

mad-max

ricardo@mad-max:~/research/vcache/data/mad-max$ wc -l sda.hash.blocks.sorted
39062042 sda.hash.blocks.sorted
ricardo@mad-max:~/research/vcache/data/mad-max$ wc -l sda.hash.blocks.freqs 
27431635 sda.hash.blocks.freqs
ricardo@mad-max:~/research/vcache/data/mad-max$ tail sda.hash.blocks.freqs
  56906 2f7cf239e69c81d559fd8c7340c0cc5a
  56915 68032d6801c04eee872fa4fe57d8a115
  56927 36967e11b7887d32d8e6a4adc8d71414
  56932 8e7974c25680dec98bfd3cfedc48e918
  56972 e8df727d1312a825f5e02cf8d95a7309
  57060 a43f839498484cf668095c11038733e0
  57281 95c898b9208b7895b1f9b3313802d865
  64334 711a5c482f6d8f2dc849b60145092f92
 486646 d4478f77dc66cde39a432db0299f6d71
1727055 3df1244f6143869f52abf2a1d73d0c0f

static_similarity = (39062042 - 1727055) / (27431635 - 1) = 1.361019435

madmax + ikki + apu + topgun

rkoll001@puppy:~ 262% wc -l homes.sorted.2 
186024571 homes.sorted.2
rkoll001@puppy:~ 261% wc -l homes.freq.2
62273294 homes.freq.2
rkoll001@puppy:~ 260% tail homes.freq.sorted.2
  56972 e8df727d1312a825f5e02cf8d95a7309
  57060 a43f839498484cf668095c11038733e0
  57296 95c898b9208b7895b1f9b3313802d865
  64341 711a5c482f6d8f2dc849b60145092f92
  79652 8e0bebb0539a580d44a9cfd65214ffe8
 427475 675a226b6d15cbadacce60c04e10dca1
 486646 d4478f77dc66cde39a432db0299f6d71
 769700 54f1565f8a686e900ad84ad5ad00aeb3
 779708 bc8b26a149ad79305567f89e9c5353bd
97863369 3df1244f6143869f52abf2a1d73d0c0f

static_similarity = (186024571 - 97863369) / (62273294 - 1) = 1.41571447

Mail traces

cheetah:

 rkoll001@leopard:~ 33% wc -l cheetah-sda-hashes-blocks
 73103754 cheetah-sda-hashes-blocks
 rkoll001@leopard:~ 34% wc -l cheetah-sda-hashes-blocks-freqs 
 27675205 cheetah-sda-hashes-blocks-freqs
 rkoll001@leopard:~ 32% tail cheetah-sda-hashes-blocks-freqs                        
    1617 f991a8fd929ee7b267aba02880e694a1
    2503 85e3fd60029277c87f74d1fa4d6ca22e
    2815 52849183a60a723e83315dda0b23775e
    2901 82b1b5c8ddaa19dd84fb4f572b0ccc4d
    3195 0ac632fa7ec32e2259a49a0a7c708f94
    3489 48859c5c4d9612a8fa65b9c239511b61
    3889 70284059c0b91af0f85761e87941acd0
    3922 d128857459c72b774379585e9eae2590
    6634 173c963d3665445f1c863fa5a30a3918
40358276 3df1244f6143869f52abf2a1d73d0c0f

static_similarity = (73103754 - 40358276) / (27675205 - 1) = 1.183206382

Content analysis

madmax

rkoll001@puppy:~ 360% tail madmax.sda.hash.blocks.freq 
  56906 2f7cf239e69c81d559fd8c7340c0cc5a
  56915 68032d6801c04eee872fa4fe57d8a115
  56927 36967e11b7887d32d8e6a4adc8d71414
  56932 8e7974c25680dec98bfd3cfedc48e918
  56972 e8df727d1312a825f5e02cf8d95a7309
  57060 a43f839498484cf668095c11038733e0
  57281 95c898b9208b7895b1f9b3313802d865
  64334 711a5c482f6d8f2dc849b60145092f92
 486646 d4478f77dc66cde39a432db0299f6d71    free block   hashes
1727055 3df1244f6143869f52abf2a1d73d0c0f    free block   000000000000

d4478f77dc66cde39a432db0299f6d71

rkoll001@puppy:~ 357% grep -n d4478f77dc66cde39a432db0299f6d71 madmax.sda.hash.blocks | head
3960654:d4478f77dc66cde39a432db0299f6d71
3961833:d4478f77dc66cde39a432db0299f6d71
..
rkoll001@puppy:~ 359% ./collection_static/find-block madmax 3960654
0: [63,78140159] 3960654 31685287  
ricardo@mad-max:~/research/vcache/code/collection_static$ sudo ./hashblock /dev/sda 31685287 8
     1 73f3618c9fa1b8a070273df01b8b4153 3030300a202020202020312032393338
     2 8a788c54488001a0006d8558f511fc64 20203120323933383033393036206266
     3 cffa939ad8a6028d12d8441bf4fd11dc 30333931322062663631396561633063
     4 78737061c7e6d64ade158a4143878321 36313965616330636466336636386434
     5 f7a88a7d2791fad80803fc5738074593 64663366363864343936656139333434
     6 00978ee1d510b78df39c45430af1916d 39366561393334343133376538622030
     7 f6fb9be5993f18a21cc4262785f40b9a 31333765386220303030303030303030
     8 a545b72922179e75b7a98dc8b8da0b9e 30303030303030303030303030303030

but printing content (not in hex) shows:

     1 73f3618c9fa1b8a070273df01b8b4153 000
    1 293803900 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
    1 293803901 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
    1 293803902 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
    1 293803903 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
    1 293803904 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
    1 293803905 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
  /dev/sda
     2 8a788c54488001a0006d8558f511fc64   1 293803906 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
    1 293803907 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
    1 293803908 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000

topgun

rkoll001@puppy:~/home-traces 330% tail ../topgun.sda.hash.blocks.freq 
   1472 c2507c3aaf41c99d38c33d60bc90111c
   1503 940d36a6a9373e53bf4483a5076e6dd3
   1782 8a08d49bc4fada337a91d1808de03ba8
   2165 501ff97e8c3ea1371701688f5f0bfda0
   2272 48859c5c4d9612a8fa65b9c239511b61
   7066 2853de6d55536980d9a530ee6c8e9375
 427475 675a226b6d15cbadacce60c04e10dca1     free block postmark readable random chars
 769700 54f1565f8a686e900ad84ad5ad00aeb3     free block postmark readable random chars
 779708 bc8b26a149ad79305567f89e9c5353bd     free block postmark readable random chars
31794051 3df1244f6143869f52abf2a1d73d0c0f    free block 000000000000000000000...

675a226b6d15cbadacce60c04e10dca1

rkoll001@puppy:~/home-traces 346% grep -n 675a226b6d15cbadacce60c04e10dca1 ../topgun.sda.hash.blocks | head
580082:675a226b6d15cbadacce60c04e10dca1
580086:675a226b6d15cbadacce60c04e10dca1
580090:675a226b6d15cbadacce60c04e10dca1
580094:675a226b6d15cbadacce60c04e10dca1
580098:675a226b6d15cbadacce60c04e10dca1
580102:675a226b6d15cbadacce60c04e10dca1
580109:675a226b6d15cbadacce60c04e10dca1
580116:675a226b6d15cbadacce60c04e10dca1
580126:675a226b6d15cbadacce60c04e10dca1
580130:675a226b6d15cbadacce60c04e10dca1
rkoll001@puppy:~/home-traces 347% ../collection_static/find-block topgun 580082
2: [4353615,297314954] 35889 4640719
root@topgun:~/ricardo_stuff/vcache/code/collection_static# ./hashblock /dev/sda 4640719 8
     1 026bbcf1c5427e0cc8e075268170e421 232c406b2c5060294c5f5a4f52513033
     2 2ee2a9d275179b265f43988773fd041f 646257495556427c286a2050545f6935
     3 dbf1c9e824b0470d72a7efca3136f356 525856417a5e5d2439734144787c6f38
     4 0cfaaca8826da327dd2084b1697f5c3b 5c235f2a35413c33742d6e773f395b29
     5 9549e813aa8c48beb743105cd232ae7c 50506647553d50243a4b4c6775224a49
     6 51fe121098dd40e702fd1ce8e0bf3d09 26277a4a3b395478242e5b2e4d313244
     7 2177aa1b9cc6689a15f698d1a4eafe42 30787d33343353243621356d6367242c
     8 831e583c5371980234949344e49bdcce 7b455e7055763722602e41487d584b20

54f1565f8a686e900ad84ad5ad00aeb3

rkoll001@puppy:~/home-traces 334% grep -n 54f1565f8a686e900ad84ad5ad00aeb3 ../topgun.sda.hash.blocks | head
580080:54f1565f8a686e900ad84ad5ad00aeb3
580084:54f1565f8a686e900ad84ad5ad00aeb3
580088:54f1565f8a686e900ad84ad5ad00aeb3
580092:54f1565f8a686e900ad84ad5ad00aeb3
580096:54f1565f8a686e900ad84ad5ad00aeb3
580100:54f1565f8a686e900ad84ad5ad00aeb3
580104:54f1565f8a686e900ad84ad5ad00aeb3
580107:54f1565f8a686e900ad84ad5ad00aeb3
580111:54f1565f8a686e900ad84ad5ad00aeb3
580114:54f1565f8a686e900ad84ad5ad00aeb3
rkoll001@puppy:~/home-traces 335% ../collection_static/find-block topgun 580080
2: [4353615,297314954] 287088 4640703
root@topgun:~/ricardo_stuff/vcache/code/collection_static# ./hashblock /dev/sda 4640703 8
     1 281599a2e1c3e086a00828201cfe82bd 4c655a65624a3653357e715076244324
     2 4fef6272f5e1f4ebfe114bf23371ed6b 245c226e7d54676e34403f265e3d3d25
     3 c69559f6765f83ada5b31ee37dec6cc7 4963446e5822776d63775c367b4f2d24
     4 6a38f6888c631da78ec2c4ae841b2fff 2e40324f30435e3e525743494d384d22
     5 fc96ec96664b296c06b1a9999920bf15 655e77343c2c7b275c5e6b3337792e2f
     6 ba2ffaea52e538a4b6bc7ca1b0025f3c 49777d2b3b5b3e30302f7a2a3e524b7d
     7 71a489f7a9bc010aecdc1b1f4db8a50b 293e7c27696f2424265270537a443559
     8 ebed780c41eafb8c564c29e222f5c76d 5b7c5a5a6c362f2f4470207e6d557526

printing the content (not in hex) shows some random but clear text data. This is typical temporal files content of postmark. In this case postmark was executed with the same random seed.

root@topgun:~/ricardo_stuff/vcache/code/collection_static# ./hashblock /dev/sda 4640703 8
     1 281599a2e1c3e086a00828201cfe82bd LeZebJ6S5~qPv$C$py}hmKd H58nM4~.}pv}Q7wSt'&R0{oSe8eY_i&z}T'VGt*I)RJ>[Aq!ZAVvdBF"@ l|}B-:7>bL8C0Gw>7r~U'GmbpL}waz00{sz?>X^j?b zo&/ShA+HN3^F.2J<> N7#i(~ek~G<SO6e8Vd~@U:mvb0!_\kj_Pbt_]Azjez5WdWQ \A6j@<}4kf^JcF":i=L3NF9EO4rC0r4ATX&hR?90x]p4a7`yJ8H"tf!mB"=yeqYgHRD{2mz#&NR9n*xo#A>;hX&Ab\d\!8Vt"e.8z2Z2AsB@G2{:D*MrEt%@1+?udy{r>v)(YPq2LgVGG'Z^5H==V@5*{d,OgE<S'=GOZoyu0X/:bMJ#l7":<;EvxuF@_=Z\COCLesV9}dzV~dTh\+Ft>Ac;a?em<9A]dJX9S!'x%J#6BH;^CJm"k+).wIYEC4#]ry!Y~m,7/0)I>I=Fs1r)PD<JKs[Ldzx:-u=pc9v?x^!,_e3JQ3h[U<koS-kMwFgI/dev/sda
     2 4fef6272f5e1f4ebfe114bf23371ed6b $\"n}Tgn4@?&^==%&-:Kx-hVlMr}<mEcR'ZAG#oyBBreDOxf:!P_MLENUV@-m,0=Q`tNTi8*?Hx-vJlT|:TN-7&h$u5y%]TI1:s"k1r+MW`(Hxy'>fSstZ@1b.V^Ky1wT|5DBpeHBeu'mWA@f(uL?qYDXgh>_qe@u \V)ul3"J(euB;8}zT()Jg%<&]nkSe"rFIytn>K1c[u:^P_.zpzu,6LDn/!Mp[D77Q%^n*Q{/7aMFGJXA"JIND$:B'SQIrM_[~~!BRt?LU3un|9hksjqWlAS]@t5(9}A{F8{Mi1YS,_{~WI516phB_kS/p,#wWgbv afC`2@@{? ({z(R%|gbhZX-w.ai2`@Jzu3Z[Z/|@>7NCh;.*qUnpK_Z:-+PQ8U $.d>N,%kks-F<kcAWiX8.,)\/;lPng<Zm<+wwBxU1XQ6sttt~G(.A%o"QH9$HrcNMp-i#o"aYO`SNxiO}GqV82d'EsDs6;l-p \9oR>&ad7)-*+qL<h`:|@KjHeSza/dev/sda

bc8b26a149ad79305567f89e9c5353bd

rkoll001@puppy:~/home-traces 339% grep -n bc8b26a149ad79305567f89e9c5353bd ../topgun.sda.hash.blocks | head
580081:bc8b26a149ad79305567f89e9c5353bd
580085:bc8b26a149ad79305567f89e9c5353bd
580089:bc8b26a149ad79305567f89e9c5353bd
580093:bc8b26a149ad79305567f89e9c5353bd
580097:bc8b26a149ad79305567f89e9c5353bd
580101:bc8b26a149ad79305567f89e9c5353bd
580105:bc8b26a149ad79305567f89e9c5353bd
580108:bc8b26a149ad79305567f89e9c5353bd
580112:bc8b26a149ad79305567f89e9c5353bd
580115:bc8b26a149ad79305567f89e9c5353bd
rkoll001@puppy:~/home-traces 340% ../collection_static/find-block topgun 580081
2: [4353615,297314954] 35888 4640711
root@topgun:~/ricardo_stuff/vcache/code/collection_static# ./hashblock /dev/sda 4640711 8
     1 434acb2d12010f6b33cf28bb721c107c 4b444962693e254c5e75252a2444272a
     2 28d1e037fcf37d68dd14edc830dc16b3 3629673f6b385767516e257636526921
     3 9a2ec9e2ea991177d8c01dc1b8083c8a 3c2861294d55232e454c62526b225029
     4 01ea271001bca0d9d1c55922684373a2 79777c7128493b322821347d5d422133
     5 31c4db8d8d3dddd12d03d447d04dc0e2 423e4f27573c35212f3e733b5b2a2c30
     6 4cceb92ad6b1e850bd26caf5dd115c75 7e282b25213f44737439385a40684c36
     7 79a2e4df733beeafbdc8c80e309a7fcb 3067464f725d79255f68473641622e46
     8 13f345b7f8ecef9e05be5783d2c7c8f3 2a582c68263e5c325b7c2e3f3d5a535d
rkoll001@puppy:~/home-traces 329% grep 434acb2d12010f6b33cf28bb721c107c ../topgun-hashes | head
4640711 434acb2d12010f6b33cf28bb721c107c 4b444962693e254c5e75252a2444272a
4640743 434acb2d12010f6b33cf28bb721c107c 4b444962693e254c5e75252a2444272a

3df1244f6143869f52abf2a1d73d0c0f

rkoll001@puppy:~/home-traces 343% grep -n 3df1244f6143869f52abf2a1d73d0c0f ../topgun.sda.hash.blocks | head
2:3df1244f6143869f52abf2a1d73d0c0f
3:3df1244f6143869f52abf2a1d73d0c0f
rkoll001@puppy:~/home-traces 345% ../collection_static/find-block topgun 2
0: [63,144584] 2 71
root@topgun:~/ricardo_stuff/vcache/code/collection_static# ./hashblock /dev/sda 71 8
       1 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
       2 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
       3 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
       4 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
       5 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
       6 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
       7 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
       8 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000
  

Trace Static similarity:

TODO: add an analysis of processes, which are the processes which make more requests with more copies?? or something like that..

webmail + online

rkoll001@wolf:~ 209% awk 'BEGIN {s = 0; i = 0;} {s += $2; i++;} END { print s/i}' final-traces/2111.web.read.blocks.copies
1.60422

Interestingly, 1.60 is less than the regular static similarity of 2. Another day 12/11 shows this number:

rkoll001@puppy:~ 42% awk 'BEGIN {s = 0; i = 0;} {if ($2 < 10) {s += $2; i++;} } END { print s/i}' final-traces/1211.web.read.blocks.copies
1.83932
rkoll001@puppy:~ 43% awk 'BEGIN {s = 0; i = 0;} {if ($2 < 100) {s += $2; i++;} } END { print s/i}' final-traces/1211.web.read.blocks.copies
1.93666
rkoll001@puppy:~ 44% awk 'BEGIN {s = 0; i = 0;} {if ($2 < 1000) {s += $2; i++;} } END { print s/i}' final-traces/1211.web.read.blocks.copies
1.93666

without removing extreme values it would have been:

rkoll001@wolf:~ 230% awk 'BEGIN {s = 0; i = 0;} {s += $2; i++;} END { print s/i}' final-traces/1211.web.read.blocks.copies
721.38

furthermore, the reads are to the zero block:

rkoll001@wolf:~ 231% grep 1193299 final-traces/1211.web.read.blocks.copies
3df1244f6143869f52abf2a1d73d0c0f 1193299 13758159

made by first sshd:

rkoll001@wolf:~ 245% grep 2736191 mail-traces/1211.online.0
3395788967958299 R 31238 sshd 3 0 2736191 8192 268435465 0 0 0 0 bf619eac0cdf3f68d496ea9344137e8bbf619eac0c

and then by swapper:

rkoll001@wolf:~ 252% grep 13758159 mail-traces/1211.webmail.0
3422630657163567 R 0 swapper 3 0 13758159 81920 805306377 0 0 0 0 bf619eac0cdf3f68d496ea9344137e8bbf619eac0c

just to confirm.. yes it is the zero block:

7415 bf619eac0cdf3f68d496ea9344137e8b 00000000000000000000000000000000

now trace static similarity for a weeks traces:

rkoll001@wolf:~ 338% awk 'BEGIN {s = 0; i = 0;} {if ($2 < 1000) {s += $2; i++;} } END { print s/i}' final-traces/1111-1711.web.read.blocks.copies
1.99529

mail

rkoll001@puppy:~ 45% awk 'BEGIN {s = 0; i = 0;} {if ($2 < 10) {s += $2; i++;} } END { print s/i}' final-traces/1211.cheetah.read.blocks.copies
1.1387
rkoll001@puppy:~ 46% awk 'BEGIN {s = 0; i = 0;} {if ($2 < 100) {s += $2; i++;} } END { print s/i}' final-traces/1211.cheetah.read.blocks.copies
1.23882
rkoll001@puppy:~ 47% awk 'BEGIN {s = 0; i = 0;} {if ($2 < 1000) {s += $2; i++;} } END { print s/i}' final-traces/1211.cheetah.read.blocks.copies
2.47925

considering the extreme values (using accesses to the zero page) it would be:

rkoll001@puppy:~ 48% awk 'BEGIN {s = 0; i = 0;} {s += $2; i++;} END { print s/i}' final-traces/1211.cheetah.read.blocks.copies
1.22369e+06

the number of access to the zero page is:

rkoll001@puppy:~ 51% grep 3df1244f6143869f52abf2a1d73d0c0f final-traces/1211.cheetah.read.blocks.copies | wc -l
83856

from a total block accesses of:

rkoll001@puppy:~ 52% wc -l final-traces/1211.cheetah.read.blocks.copies
2765642 final-traces/1211.cheetah.read.blocks.copies

now for a week of traces:

bash-3.2$ awk 'BEGIN {s = 0; i = 0;} {if ($3 < 10) {s += $3; i++;} } END { print s/i}' final-traces/1111-1711.mail.read.blocks.copies 
1.16444
bash-3.2$ awk 'BEGIN {s = 0; i = 0;} {if ($3 < 100) {s += $3; i++;} } END { print s/i}' final-traces/1111-1711.mail.read.blocks.copies 
1.27565
bash-3.2$ awk 'BEGIN {s = 0; i = 0;} {if ($3 < 1000) {s += $3; i++;} } END { print s/i}' final-traces/1111-1711.mail.read.blocks.copies 
2.26263
bash-3.2$ awk 'BEGIN {s = 0; i = 0;} {{s += $3; i++;} } END { print s/i}' final-traces/1111-1711.mail.read.blocks.copies 
1.2478e+06

now for a month of traces

rkoll001@puppy:~ 16% awk 'BEGIN {s = 0; i = 0;} {if ($3 < 10) {s += $3; i++;} } END { print s/i}' final-traces/0111-3011.mail.read.blocks.copies 
1.13465
rkoll001@puppy:~ 17% awk 'BEGIN {s = 0; i = 0;} {if ($3 < 100) {s += $3; i++;} } END { print s/i}' final-traces/0111-3011.mail.read.blocks.copies
1.25955
rkoll001@puppy:~ 19% awk 'BEGIN {s = 0; i = 0;} {if ($3 < 1000) {s += $3; i++;} } END { print s/i}' final-traces/0111-3011.mail.read.blocks.copies
2.32995

Dynamic Similarity

% .. | ruby dynamic-similarity-blocks.rb
dynamic similarity
number of hits

Online (alone)

reads

rkoll001@wolf:~ 38% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks | ruby dynamic-similarity-blocks.rb
205706
780
rkoll001@wolf:~ 42% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks | ruby dynamic-similarity-sectors.rb
216239
742

writes

rkoll001@wolf:~ 35% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks | ruby dynamic-similarity-blocks.rb
2765
85517         
rkoll001@wolf:~ 45% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks | ruby dynamic-similarity-sectors.rb
35596
931259                                                             

reads + writes

rkoll001@wolf:~ 50% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks | ruby dynamic-similarity-blocks.rb 
4599
86297
rkoll001@wolf:~ 48% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks | ruby dynamic-similarity-sectors.rb
35740
932001

Online + Webmail

day reads

rkoll001@wolf:~ 13% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks| ruby dynamic-similarity-sectors.rb ; 
521599
1655
rkoll001@wolf:~ 14% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks| ruby dynamic-similarity-blocks.rb 
72254
56428

day writes

rkoll001@wolf:~ 17% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks | ruby /tmp/dynamic-similarity-sectors.rb
71541
1832590
rkoll001@wolf:~ 18% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks | ruby /tmp/dynamic-similarity-blocks.rb
7168
320962

day reads + writes

rkoll001@wolf:~ 20% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks | ruby /tmp/dynamic-similarity-sectors.rb
71947
1834245
rkoll001@wolf:~ 21% cat final-traces/1211.web.traces | ./collection_static/trace-extract-blocks | ruby /tmp/dynamic-similarity-blocks.rb
16900
377390

week read

rkoll001@wolf:~ 65% cat final-traces/1111-1711.web.traces | ./collection_static/trace-extract-blocks | ruby /tmp/dynamic-similarity-sectors.rb
928238
1062015
rkoll001@wolf:~ 66% cat final-traces/1111-1711.web.traces | ./collection_static/trace-extract-blocks | ruby /tmp/dynamic-similarity-blocks.rb
583077
1144024

week write

rkoll001@wolf:~ 68% cat final-traces/1111-1711.web.traces | ./collection_static/trace-extract-blocks | ruby /tmp/dynamic-similarity-sectors.rb
93025
4764854
rkoll001@wolf:~ 69% cat final-traces/1111-1711.web.traces | ./collection_static/trace-extract-blocks | ruby /tmp/dynamic-similarity-blocks.rb
5552
1910869

week read + write

rkoll001@wolf:~ 61% cat final-traces/1111-1711.web.traces | ./collection_static/trace-extract-blocks | ruby /tmp/dynamic-similarity-sectors.rb
245253
5826869
rkoll001@wolf:~ 62% cat final-traces/1111-1711.web.traces | ./collection_static/trace-extract-blocks | ruby /tmp/dynamic-similarity-blocks.rb
221829
3054893

Cheetah

read

rkoll001@puppy:~ 11% cat final-traces/1211.mail.traces.blocks | ruby dynamic-similarity-sectors.rb
2587904
1162035
rkoll001@puppy:~ 12% cat final-traces/1211.mail.traces.blocks | ruby dynamic-similarity-blocks.rb 
2347697
1260203

writes

rkoll001@puppy:~ 15% cat final-traces/1211.mail.traces.blocks | ruby dynamic-similarity-sectors.rb
537507
18828186
rkoll001@puppy:~ 16% cat final-traces/1211.mail.traces.blocks | ruby dynamic-similarity-blocks.rb
199689
17409314

read + writes

rkoll001@puppy:~ 43% cat final-traces/1211.mail.traces.blocks | ruby dynamic-similarity-sectors.rb
656697
19990221
rkoll001@puppy:~ 42% cat final-traces/1211.mail.traces.blocks | ruby dynamic-similarity-blocks.rb
344681
18669517
blocks requested for read: 3454044 (14 GB)
blocks requested for read and write: 22972351

Cache sizes and LRU vs. ARC

small = 10000 entries
grande = 100000 entries
root@armageddon:/home/ric/module# wc -l /tmp/small-* /tmp/grande-*
10005 /tmp/small-arc
 7516 /tmp/small-lru
74229 /tmp/grande-arc
 7978 /tmp/grande-lru
total number of requests 34707