Systems Research Laboratory

==+== FAST '09 Paper Review Form

==-== Set the paper number and fill out lettered sections A through G.

==-== DO NOT CHANGE LINES THAT START WITH ”==+==”!

==+== RAJU MUST REPLACE THIS LINE BEFORE UPLOADING

==+== Begin Review

==+== Paper #23

==+== Review Readiness

==-== Enter “Ready” here if the review is ready for others to see:

Ready

==+== A. Overall merit

==-== Enter a number from 1 to 5.

==-== Choices: 1. Reject

==-== 2. Weak reject

==-== 3. Weak accept

==-== 4. Accept

==-== 5. Strong accept

1

==+== B. Novelty

==-== Enter a number from 1 to 5.

==-== Choices: 1. Published before

==-== 2. Done before (not necessarily published)

==-== 3. Incremental improvement

==-== 4. New contribution

==-== 5. Surprisingly new contribution

2

==+== C. Longevity

==-== How important will this work be over time?

==-== Enter a number from 1 to 5.

==-== Choices: 1. Not important now or later

==-== 2. Low importance

==-== 3. Average importance

==-== 4. Important

==-== 5. Exciting

2

==+== D. Reviewer expertise

==-== Enter a number from 1 to 4.

==-== Choices: 1. No familiarity

==-== 2. Some familiarity

==-== 3. Knowledgeable

==-== 4. Expert

4

==+== E. Paper summary

Persistent Shared Heap File System (PSHFS) is a file system for shared memory. It exposes malloc like semantics to create and modify tmpfs files by mapping them to memory.

==+== F. Comments for author

The authors address an interesting (although known) problem – persistent sharing of data structures across multiple cooperating applications.

The contribution of this paper is small considering that it merely uses existing primitives (tmpfs and memory-mapped files) to create a new abstraction. There are really no fundamental challenges addressed and the new abstraction is bound by the limitations of the underlying primitives.

Some high-level comments:

The title of the paper “Persistent Shared Heap File System” is misleading – the authors do not really create a new file system or file system abstraction, but rather use existing file system abstraction (tmpfs) to create a persistent “memory” abstraction and memory mapping (mmap) to reduce in-memory data copies.

The definition of “persistence” needs to be spelled out clearly. There are bounds on persistence. First, as the authors do point out, persistence only implies existence beyond process lifetime; specifically, data is not persistent for ever. Second, there is bound on how much persistent data exists at any time, bound by the size limitations of the underlying tmpfs, which in turn is bound by the SWAP size.

The authors do not address a key question conclusively regarding the usability of the persistent memory. Specifically, with respect to restoring type information, if no type meta data is ever stored then are applications responsible for remembering what they stored in what part of memory? This problem becomes immediately pronounced when you have variable-sized data structures. If so, then isn't this the same as just writing bitwise information to the files.

Incremental contribution w.r.t. the “SPHDE” work is small – a change of interface.

Low-level comments:

The paper says 'A new definition of the term “file” is provided'. It would be much easier for the reader if you used the term “object” instead which is less specific/standardized in the storage community.

Define before use: “anonymous mmap”.

Why is a new concept of named-pointer required? I do not see it being used as an r-value in any of your code snippets. It seems that persistent objects can be referred to by their names (simple strings) as specified in 3.3.

A general comment is that too much detail is provided where not necessary, and withheld when required. For instance, the description of the VFS could be substantially reduced, and a more detailed example of how PSHFS works (probably using a figure with arrows indicating control flow) would be more appropriate use of space.

The behavior of the FS is not specified in all cases, for instance the rules for sharing are not clear: can any process attach any object to its namespace? if so how does it accomplish it? if not what restrictions are there? what does it mean to have “right permission set on the object”?

The section describing the API is very hard to read due to alternating presentation conventions, perhaps if the function signature was given it could be more clear, e.g. change:

mfopen()
mfopen("path")

to this:

int mfopen(char *path)

In the evaluation of the locate utility in Section 4.3, PSHFS should be compared to tmpfs as well, since this is the closest competitor and would clearly identify the benefits of preserving structures “as is” instead of serializing/deserializing them across applications.

In the micro benchmarks experiments, the file system you compared your solution to (SHMFS) was never presented, hence no clear idea of what you are comparing to is given. (I assumed that you are referring to tmpfs when you specify shmfs but I may be wrong.) Also, it is unclear how SHMFS will perform with a buffer bigger than 1024 bytes, it appears to possibly do a better job than PSHFS, if true than why is this the case? - The standard FS has lower performance for small buffers owning to the constant overheads for system calls. What is the performance gain with PSHFS for larger buffers?

Further, this analysis only shows that the size of buffer doesn't affect its performance, but what about CPU, memory or hd?

What is the key takeaway point of the microbenchmark experiments?

How is the seek experiment done? Why is the buffer size restricted here?

What is the buffer size for the locate utility experiments?

Section 3.3: is accesses → is accessed

Sometimes the writing is a bit informal, e.g. instead of saying PAGE_SIZE say 4k (for x86).

==+== G. Comments for PC (hidden from authors)

==+== End Review