User Tools

Site Tools


projects:iodedup:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
projects:iodedup:start [m/d/Y H:i]
ric
projects:iodedup:start [m/d/Y H:i] (current)
raju
Line 9: Line 9:
 ===== Abstract ===== ===== Abstract =====
 Duplication of data in storage systems is becoming increasingly common. We introduce I/O Deduplication,​ a storage optimization that utilizes content similarity for improving I/O performance by eliminating I/O operations and reducing the mechanical delays during I/O operations. I/O Deduplication consists of three main techniques: content-based caching, dynamic replica retrieval, and selective duplication. Each of these techniques is motivated by our observations with I/O workload traces obtained from actively-used production storage systems, all of which revealed surprisingly high levels of content similarity for both stored and accessed data. Evaluation of a prototype implementation using these workloads revealed an overall improvement in disk I/O performance of 28-47% across these workloads. Further breakdown also showed that each of the three techniques contributed significantly to the overall performance improvement. Duplication of data in storage systems is becoming increasingly common. We introduce I/O Deduplication,​ a storage optimization that utilizes content similarity for improving I/O performance by eliminating I/O operations and reducing the mechanical delays during I/O operations. I/O Deduplication consists of three main techniques: content-based caching, dynamic replica retrieval, and selective duplication. Each of these techniques is motivated by our observations with I/O workload traces obtained from actively-used production storage systems, all of which revealed surprisingly high levels of content similarity for both stored and accessed data. Evaluation of a prototype implementation using these workloads revealed an overall improvement in disk I/O performance of 28-47% across these workloads. Further breakdown also showed that each of the three techniques contributed significantly to the overall performance improvement.
 +
 +
 +
 +
 +
 +
  
 ===== Publications ===== ===== Publications =====
  
-  * **I/O Deduplication:​ Utilizing Content Similarity to Improve I/O Performance** {{iodedup-fast10.pdf|pdf}}\\ Ricardo Koller, Raju Rangaswami\\ Proceedings of File and Storage Technologies (FAST), February, 2010.+  * **I/O Deduplication:​ Utilizing Content Similarity to Improve I/O Performance** {{iodedup-fast10.pdf|pdf}}\\ Ricardo Koller, Raju Rangaswami\\ Proceedings of USENIX ​File and Storage Technologies (FAST), February, 2010.
  
 +
 +  * **I/O Deduplication:​ Utilizing Content Similarity to Improve I/O Performance** {{iodedup-acmtos10.pdf|pdf}}\\ Ricardo Koller, Raju Rangaswami\\ ACM Transactions on Storage, 6(3), September 2010.
  
  
 ===== Traces ===== ===== Traces =====
-3 weeks of traces ​for the following workloads:+3 weeks of traces ​collected during ​the period 11/​01/​08-11/​21/​08. All the systems were running Linux using the ext3 file system. ​
  
 ^ Files ^ Description ^ ^ Files ^ Description ^
-[[http://​doomsday.cs.fiu.edu/​iodedup/​web-vm.tar.gz]] | CS department webmail proxy and online course management. | +{{:projects:iodedup:mail.tar.xz|}} (LZMA compression)| CS department'​s mail server traces. It includes all the inboxes of mails in the CS department. | 
-| [[http://​doomsday.cs.fiu.edu/​iodedup/mail.tar.gz]] | CS department'​s mail server traces. It includes all the inboxes of mails in the CS department. | +{{:projects:iodedup:homes.tar.gz|}}  | Research group activities: developing, testing, experiments,​ technical writing, plotting. | 
-[[http://​doomsday.cs.fiu.edu/​iodedup/homes.tar.gz]]  | Research group activities: developing, testing, experiments,​ technical writing, plotting. |+| {{:​projects:​iodedup:​web-vm.tar.gz|}} | CS department webmail proxy and online course management. | 
  
-The format ​is as follows ​(zero block is `3df1244f6143869f52abf2a1d73d0c0f`):+The traces files (one per day) are in ASCII and each record ​is as follows:
  
->> `[ts] [pid] [process] [lba] [size in 512 Bytes blocks] [Write or Read] [major device number] [minor device number] [MD5 per 4096 Bytes]`+''​[ts in ns] [pid] [process] [lba] [size in 512 Bytes blocks] [Write or Read] [major device number] [minor device number] [MD5 per 4096 Bytes]''​
  
-In the case of the home traces, the format is different for the digests:+In the case of the homes traces, the format is different for the digests:
  
->> {{{[ts] [pid] [process] [lba] [size in 512 Bytes blocks] [Write or Read] [major device number] [minor device number] [MD5 per 512 Bytes]}}}+''​[ts in ns] [pid] [process] [lba] [size in 512 Bytes blocks] [Write or Read] [major device number] [minor device number] [MD5 per 512 Bytes]''​
projects/iodedup/start.1267759885.txt.gz · Last modified: m/d/Y H:i by ric