==+== FAST '09 Paper Review Form ==-== Set the paper number and fill out lettered sections A through G. ==-== DO NOT CHANGE LINES THAT START WITH ”==+==”! ==+== RAJU MUST REPLACE THIS LINE BEFORE UPLOADING ==+== Begin Review ==+== Paper #000000 ==-== Replace '000000' with the actual paper number. ==+== Review Readiness ==-== Enter “Ready” here if the review is ready for others to see: Ready ==+== A. Overall merit ==-== Enter a number from 1 to 5. ==-== Choices: 1. Reject ==-== 2. Weak reject ==-== 3. Weak accept ==-== 4. Accept ==-== 5. Strong accept 1 ==+== B. Novelty ==-== Enter a number from 1 to 5. ==-== Choices: 1. Published before ==-== 2. Done before (not necessarily published) ==-== 3. Incremental improvement ==-== 4. New contribution ==-== 5. Surprisingly new contribution 2 ==+== C. Longevity ==-== How important will this work be over time? ==-== Enter a number from 1 to 5. ==-== Choices: 1. Not important now or later ==-== 2. Low importance ==-== 3. Average importance ==-== 4. Important ==-== 5. Exciting 1 ==+== D. Reviewer expertise ==-== Enter a number from 1 to 4. ==-== Choices: 1. No familiarity ==-== 2. Some familiarity ==-== 3. Knowledgeable ==-== 4. Expert 4 ==+== E. Paper summary The authors propose a simple method for allocating space on solid state drives from the filesystem standpoint, aiming at improving random write performance on these devices. In developing such method, a write cost model is proposed making analogy with HDD costs; this model assumes file system uses the overwrite approach. The method involves logically dividing the logical address space in units of fixed size - the logical block, which is empirically derived. Real experiments show the proposed allocation scheme reduces overall response time compared to ext2 baseline performance. ==+== F. Comments for author High-level comments: * The problem area chosen is timely. There is substantial research still needed on operating system support for SSDs. * The authors propose to examine overwrite based file systems and propose minor changes to space allocation. However, over-write based FSs are already known to perform miserably with Flash technology. Several recent file system proposals and implementations for logging file systems known to work much better with SSDs than existing overwrite-based FSs, including some in mainline kernels [1,2,3,4,5]. Thus, evaluating a new proposal for file system space management, which while not explicitly stated by the authors, falls squarely in the family of logging based file systems (see more on this below) with overwrite based file systems is incorrect. * The key technique proposed is of space allocation in file systems that store data on SSDs. This technique partitions the SSD logical address space into empirically-derived "logical block units". Newly allocated are always suggested to be written sequentially to one or a few logical block units sequentially at any given time. There is a fundamental problem with this approach. The authors discuss allocation at length, but do not address deletion and space fragmentation. Over time as files get deleted, most blocks will be internally fragmented and thus there no longer exists a convenient set of blocks that have free space at their tails. For this allocation to succeed, the file system must do periodic defragmentation (cleaning?) to create free blocks. This is not addressed. More generally, given that fragmentation must be eliminated with some cleaning mechanisms the approach proposed by the authors shares characteristics common to the logging-family of file systems and thus evaluation must compare against members of this family rather than ext2. * There is other class of work on optimizing existing file systems for use with SSDs. See the MFT work [6] which builds a logging block device layer underneath existing overwrite-based file systems. The paper must address benefits of their approach at the conceptual / design level in comparison to not just MFT but also logging file systems. Other low-level comments: * Flexible I/O benchmark seems to be oriented to testing random IO patterns. However, it is not clear on the paper how this random IO patterns are executed, or what the tool does indeed. The reference site ”[5] http://freshmeat.net/projects/fio/” does not provide further explanation in this respect, either. So still improvements in random write patterns is yet to be shown explicitly since Figure 8 shows different response type improvements depending on the device, e.g. Mtron experiment shows reduction in response time of a modest 25%. Thus, the proposed technique does not uniformly benefit all devices. * A better explanation of the improvements seen is that by allocating SSD free blocks from different segments, you are using distinct allocation pools in the device, thus avoiding expensive full merge operation because you'll be distributing random write logs on more free log blocks. * Consider the reliability impact of using the proposed allocation scheme. For instance, if a single logical block is overused, then it may decrease the SSD overall lifetime especially if it internally does not implement dynamic wear-leveling. * Increasing the focal factor may not be the right explanation for the performance improvement seen. It could be merely translating random writes to sequential writes that helps. You need to provide more details on what is the average I/O size feasible due to what you term as "sync" operations to justify the focal factor explanation. * After choosing the logical block size, it remains unanswered what would happen in terms of performance if we select a different size. There is no solid analysis to back up this decision. It may be the case that re-running the experiments with half the size of the selected one, for instance, the results stay almost unchanged. * There is an implicit assumption that logical block size will remain "optimal" through time. In practice, FTLs are known to dynamically allocate SSD blocks in accordance to its policy, wear-leveling factors among others, which will break "logical block" to physical SSD space mapping. Thus, you should consider how performance is effected with the proposed technique as SSD blocks are worn out. In general, long-running experiments that write new data more than 3-4X the size of SSD need to be explored for conclusive findings. * In the Clock-Space scheme, it would be nice to know what would be the worst case overhead of checking logical blocks one at a time for a write operation. 3. Minor comments: Typos / missing-refs / etc. Page 8, section 5.1: Wrong table number, "Table 2" -> "Table 1" Page 12, section 7: "There are still ..." -> "There is still ..." Abstract: that they possess -> they possess Section 4.1: evidenced -> evident Section 6.3: read results Figure 8 -> read results in Figure 8 Section 6.3: From this, we can deduce....is happening -> From this, we can deduce that many merge operations that incur multiple copy operations, and possibly an erase operation, are happening. Conclusion: There are still much -> There is still much Related work to reference: [1] Woodhouse, David. JFFS: The Journalling Flash File System (also see newer JFFS2 which is part of mainline Linux kernel) [2] YAFFS http://www.yaffs.net/yaffs-overview [3] UBIFS http://www.linux-mtd.infradead.org/doc/ubifs.html (included in mainline Linux kernel) [4] NILFS http://www.nilfs.org/en/ [5] Btrfs http://btrfs.wiki.kernel.org/ [6] Managed Flash Technology http://www.easyco.com/ ==+== G. Comments for PC (hidden from authors) ==+== End Review