internal:fast09-reviewing:review37

==+== FAST '09 Paper Review Form

==-== Set the paper number and fill out lettered sections A through G.

==-== DO NOT CHANGE LINES THAT START WITH ”==+==”!

==+== RAJU MUST REPLACE THIS LINE BEFORE UPLOADING

==+== Begin Review

==+== Paper #37

==-== Replace '000000' with the actual paper number.

==+== Review Readiness

==-== Enter “Ready” here if the review is ready for others to see:

Ready

==+== A. Overall merit

==-== Enter a number from 1 to 5.

==-== Choices: 1. Reject

==-== 2. Weak reject

==-== 3. Weak accept

==-== 4. Accept

==-== 5. Strong accept

==+== B. Novelty

==-== Enter a number from 1 to 5.

==-== Choices: 1. Published before

==-== 2. Done before (not necessarily published)

==-== 3. Incremental improvement

==-== 4. New contribution

==-== 5. Surprisingly new contribution

==+== C. Longevity

==-== How important will this work be over time?

==-== Enter a number from 1 to 5.

==-== Choices: 1. Not important now or later

==-== 2. Low importance

==-== 3. Average importance

==-== 4. Important

==-== 5. Exciting

==+== D. Reviewer expertise

==-== Enter a number from 1 to 4.

==-== Choices: 1. No familiarity

==-== 2. Some familiarity

==-== 3. Knowledgeable

==-== 4. Expert

==+== E. Paper summary

The authors propose a system for virtualizing the file system for large scale clusters. This virtualization is based on providing a layer (COFS) where files metadata is analyzed and assigned to the underlying FS, seamlessly from an users point of view. A design and implementation for the COFS system is given which consist of two Linux FUSE modules (one for data and other for metadata), that when compared to GPFS significantly improves metadata operations.

==+== F. Comments for author

The problem addressed in the paper is an important one for institutions that run large-scale distributed file systems. The authors' approach to the problem is reasonable. The paper does now provide sufficient details about the proposed approach to allow an evaluation.

Key details of the design of the system are missing, including:

- 3.3 provides a list of potential optimizations that “could” be done. Rather than providing this list, the paper should focus on the specific techniques and include an in-depth motivating discussion of those that COFS specifically chooses to implement.

- What specific facilities are provided by COFS to help the administrator? How do administrators configure COFS ?

- What is the algorithm to choose a specific file system for a given file ? What are the file parameters of importance and file system characterstics of importance? The description in Section 4.2 needs to be substantially tightened. The second paragraph is totally hand-wavy at this point.

- Choosing the policy to organize user's files, will the administrator have to implement the replacement policy?

- Migrating/Replicating files between files systems. When and How do these work exactly?

- Fault tolerance of your system, e.g. what happens if the centralized metadata node fails?

Since this idea is intended for use in large scale systems, you might take into consideration the underlying hardware configuration (RAID, SAN, netwrok delays, etc) of the storage system (not only the software part) since these factors would typically play an important role in dictating the performance of the file system.

What is really *really* important for this work is to identify the specific workload parameters that clearly define preference for one file system implementation (say GPFS) over another (say PVFS). Unless an in-depth investigation is provided that unravels the dependencies, any argument for specific decision making becomes unfounded. I would highly recommend the authors to spend time in this direction. This will really help in validating this work.

The evaluation of the system is only oriented to the metadata module, omitting the additional data module. Further, no comparison of how file system administration improves with COFS is given.

Section 5 titled “Case study” is really written like an extended motivation. In fact, this was the least interesting section of the paper. The discussion on GPFS sub-optimal performance must be backed up with real experiments or references. Otherwise it is merely the authors stating their belief, which may not be shared by others with equal enthusiasm.

I think the authors have identified an interesting problem, but do not yet have a clear direction on the key issues that a solution would entail addressing. More in-depth investigation and brainstorming is needed to make this a worthwhile contribution.

* Other minor comments:

- Abstract: Remove “…”

- Pg 1. What would you claim are “logical/functional/sensible organization of files?

- There is significant reptition in the contributions paragraph and the one immediately preceding it. One of these should go.

- Pg 2. Please provide reference(s) for your statement “It is widely known that current parallel and distributed file systems constitute a potential bottleneck for the high I/O demands …”

- Pg 9, Fix the sentence : “Being GPFS a black box …”

==+== G. Comments for PC (hidden from authors)

[Enter your comments here]

==+== End Review