Reliable Storage
Can disk drives be made reliable enough to identify and recover from errors that cause data corruption?
Disk drives exhibit a variety of storage errors that may occur due to media imperfections or buggy firmware code. Standalone machines do no employ any sophisticated data protection measures to detect or recover from these errors.
There is a need for a protection strategy that ensures data protection for partial disk failures.
Participants
Project Goals
Ensure data relibility even in the event of:
Latent sector errors
Torn writes
Lost writes
Misdirected writes
Silent data corruptions
Design
Store checksums with blocks to identify lost/torn/misdirected writes. Checksums for a segment of “n-1” contiguous blocks can be stored as the last block “n” in the segment.
Store parity information for every block to identify latent sector errors and corruptions and recover from any kind of error. Parity is not co-located with the data blocks.
Enable periodic scrubbing of parity and data blocks to ensure their validity.
read_block(X) {
read block X and associated checksum;
verify checksum;
if (success) done;
else {
read stripe;
rebuild data for X;
write X;
}
}
write_block(X) {
write block X and checksum to log; /* (an optimization) */
during_idle_time {
/* do the following in a batches for optimization */
for each entry in log {
read prev version of block from static location;
read parity;
compute new parity;
write in static location;
write parity;
delete log entry;
}
}
}
Some optimizations
Writes could be logged on a separate disk partition and flushed to static locations during idle periods.
Writes can be buffered on an ECD and can be evicted once its parity has been stored.
Cache contents can be used to make wise allocation decisions to reduce overheads due to the additional RMW I/Os.
Raju's early notes
File systems are already complex. EIO work shows that complex file systems neglect error conditions. This is a natural consequence of the complexity and also brings out the reliance of the file system developers on the underlying block device. \
Possible solution: Intra-disk redundancy for single disk reliability (Improving the reliability of commodity disk drives)
Split disk into equi-sized cylinder groups
1 Parity cylinder group (can be an SSD alternatively – cost?)
RAID-4 like parity
Block-level indirection.
Overheads - keeping parities synchronized
Cache popular parity and compute subtractive parity.
Update parity in background and in batches with low overhead (how about nightly updates)
how is consistency addressed due to power failure? parity can be outdated … can some additional bits be maintained to indicate dirty/out-of-date parity
sort of like a continuous backup — is performance going to be worse?
Delayed updating of parity will address temporal locality in data corruption (see Lakshmi's fast08 paper)
Key is to create parity across uncorrelated data
Questions: - What about alternate solutions? e.g. backup? RAID-1?