Systems Research Laboratory

==+== FAST '09 Paper Review Form

==-== Set the paper number and fill out lettered sections A through G.

==-== DO NOT CHANGE LINES THAT START WITH ”==+==”!

==+== RAJU MUST REPLACE THIS LINE BEFORE UPLOADING

==+== Begin Review

==+== Paper #000000

==-== Replace '000000' with the actual paper number.

==+== Review Readiness

==-== Enter “Ready” here if the review is ready for others to see:

[Enter your choice here]

==+== A. Overall merit

==-== Enter a number from 1 to 5.

==-== Choices: 1. Reject

==-== 2. Weak reject

==-== 3. Weak accept

==-== 4. Accept

==-== 5. Strong accept

==+== B. Novelty

==-== Enter a number from 1 to 5.

==-== Choices: 1. Published before

==-== 2. Done before (not necessarily published)

==-== 3. Incremental improvement

==-== 4. New contribution

==-== 5. Surprisingly new contribution

==+== C. Longevity

==-== How important will this work be over time?

==-== Enter a number from 1 to 5.

==-== Choices: 1. Not important now or later

==-== 2. Low importance

==-== 3. Average importance

==-== 4. Important

==-== 5. Exciting

==+== D. Reviewer expertise

==-== Enter a number from 1 to 4.

==-== Choices: 1. No familiarity

==-== 2. Some familiarity

==-== 3. Knowledgeable

==-== 4. Expert

==+== E. Paper summary

The paper presents Container I/O that allows applications to directly access file system metadata in their on-disk formats, as well as control data caching and transfer – functionality beyond what is supported by current stream and mmap based kernel FS interfaces. This is accomplished using the container abstraction which allows applications to maintain their own view of a per-container contiguous virtual space mapped to the underlying logical block addresses. Applications can request for remapping VBNs to effect data layout modifications. Container IO consults applications during page replacement; applications can thus hint OS on caching improving control over cache space. Data transfers are optimized in a similar fashion to mmap mechanisms.

==+== F. Comments for author

This is a different design view following the Exokernel approach of delivering application-managed disks. While it seems that the proposed approach is novel, I had a hard time following this paper due to its organization and presentation problems. Another problem with my reading was I could not bring myself to get excited about the content of the paper because the authors spend too much time on how metadata operations can be sped up, but not enough time on how the really important issues such as data layout and caching control, which I believe are the more interesting and useful consequences of application-managed disks. The exokernel evaluation is more along these lines as well.

Container IO claims include:

+ safely exposing file system metadata to applications + applications can directly control on-disk layout of their data + applications can directly control caching + applications can directly control data transfer

Some high-level comments:

- It would have been much more easier to relate to the concepts of the paper if the Container I/O design was first presented at a high-level and by motivating the various design elements, rather than starting out with several definitions and design decisions right away. I know this is not easy when you need to motivate several concepts within a high-level framework, but surely the current writeup is improvable.

- The comment that application working sets fit in memory needs to be much better supported. The validity of the statement depends on highly variable factors such as hardware configuration and workload. More importantly, why is this statement even required? The key benefits of application-managed disks in my opinion are better I/O performance, which is what I suggest focusing on rather than on optimizing in-memory operations.

- The following statement occurs within page 5 more than once: “all logical blocks are always owned by a single container” but I could not figure out how this statement could be true and yet the system supports multiple containers. Could it be that containers own the entire disk instantaneously, but in a time-shared fashion? Yes, but that would require explicitly shared disk usage state across applications, which did not seem to be happening. The only alternative is that this is a mis-statement and should read: “each logical block is owned by a single container” which certainly seems to be un-conflicted with the rest of the writeup. I eventually went with this assumption, but I had to spend a substantial amount of time with this confusion about a very fundamental point.

- Data layout optimization is much more important across multiple files since the file systems optimize for sequential access a generally good idea for single file layout. Thus it would better if you were to motivate using this rather than single file layout in Section 2.4.

- One of the issues I had trouble with was the fact that directory entries get exposed to irrespective of access permissions if they all lie within the same logical block. While you state that “Enforcing directory level metadata access control should solve this problem” I could not gather from the rest of the paper how this would be implemented? In general the safety of exposing metadata (even read-only) to processes is not comprehensively addressed

- Would a container *typically* be used to store a file ? or an entire directory ? or all FS elements belonging to a single user? or something else. I had a hard time imagining how containers would typically be used by an application. The paper needs to do a much better job at introducing and motivating the concept of containers.

- Most of the evaluation focuses on metadata operation efficiency. I would be much more convinced if this efficiency was put in the right perspective by demonstrating actual performance improvements for a metadata intensive benchmark (which also does disk I/O) such as Postmark. While the file split producer consumer example is good, I am looking for more compelling layout sensitive widely-used applications (E.g., webservers as used in the Exokernel work).

Other detailed comments:

- I could not understand why “logical blocks assigned to files are non-transferrable in current file systems and moving data requires at least 1 copy”. In unix file systems “mv” does not physically copy blocks, it simply assigns the internal pointers to the file data appropriately.

- A detailed description of the I/O controller and disk drive in the evaluation is then followed by a whole bunch of memory-only experiments, this seems a little odd.

- Several sentences are missing commas which make these sentences ambiguous. A careful proof-reading is needed.

- Several typos need fixing; some listed below

for to → to (pg 4)
clases → classes (pg 4)
Appendix reference missing (pg 6)

In summary, while the paper proposes some interesting new choices in the storage management space, I think the paper is currently not ready because:

- it focuses on the less important aspects of file system performance - the ideas are written/presented in a confusing manner

This is unfortunate, because it seems a lot of work might have gone into this work. I would urge the authors to rethink what design and evaluation aspects they choose to focus on in the writeup and how they would simplify/improve presentation of the ideas – I certainly look forward to seeing this work published in the future.

==+== G. Comments for PC (hidden from authors)

I think there is a core of a pretty good idea in this paper, but the paper seems to focus on the less important aspects of this approach. If the focus was more towards the I/O related issues of file system design, this work would have been much more compelling.

This paper exceeds the page limit by 1.5 pages.

==+== End Review