Systems Research Laboratory

==+== FAST '09 Paper Review Form

==-== Set the paper number and fill out lettered sections A through G.

==-== DO NOT CHANGE LINES THAT START WITH ”==+==”!

==+== RAJU MUST REPLACE THIS LINE BEFORE UPLOADING

==+== Begin Review

==+== Paper #73

==-== Replace '000000' with the actual paper number.

==+== Review Readiness

==-== Enter “Ready” here if the review is ready for others to see:

Ready

==+== A. Overall merit

==-== Enter a number from 1 to 5.

==-== Choices: 1. Reject

==-== 2. Weak reject

==-== 3. Weak accept

==-== 4. Accept

==-== 5. Strong accept

4

==+== B. Novelty

==-== Enter a number from 1 to 5.

==-== Choices: 1. Published before

==-== 2. Done before (not necessarily published)

==-== 3. Incremental improvement

==-== 4. New contribution

==-== 5. Surprisingly new contribution

4

==+== C. Longevity

==-== How important will this work be over time?

==-== Enter a number from 1 to 5.

==-== Choices: 1. Not important now or later

==-== 2. Low importance

==-== 3. Average importance

==-== 4. Important

==-== 5. Exciting

4

==+== D. Reviewer expertise

==-== Enter a number from 1 to 4.

==-== Choices: 1. No familiarity

==-== 2. Some familiarity

==-== 3. Knowledgeable

==-== 4. Expert

3

==+== E. Paper summary

The paper presents an approach for automatically mining the relative importance of pages in a multi-level caching environment in situations where the lower-level caches must deal with a request stream with low temporal locality. The idea is that I/O requests are tagged with hints which may help the cache decide whether or not to keep the page: pages with hints frequently re-referenced have priority over ones that are not. The type of hints proposed by the authors are not fixed and hence do not need to be hard-coded in the cache replacement policy. Moreover, the second-level cache is oblivious of the semantics of the hints inside the upper level; it simply maintains statistics about the hints to automatically determine the hints which are important so that associated pages can be retained in memory.

==+== F. Comments for author

High-level comments:

The authors make an important contribution in the class of hint-based approches. This approach to be intuitive and novel. The authors conduct a reasonable evaluation that highlights the strengths of their proposed approach.

There are some comments that I would like to see addressed in a subsequent revision.

- I understand that important hints are automatically recognized. But there is an orthogonal question that I believe also deserves discussion in the paper. How can the quality of the hint-sets themselves be determined. If the hint-sets are ill-configured, even the important hint sets may not generate good caching decisions. Consider a hint set which includes both more frequent and very less frequent accessed pages, the averages re-reference distance becomes an incorrect characterization. If this happens with most hint-sets, the caching decisions can become arbitrarily incorrect. To take this approach one step further, (and this I believe will also increase its value), automatically determining hint-set quality should be addressed. Given poorly chosen hint-sets, other techniques (even simple LRU) may trump this approach since it maintains more fine-grained information.

- There are recent contributions with the promote (FAST 2008) and related “demote” class of algorithms which seem very related. While you refer to these in the related work section, I would really like the paper to conceptually compare the strenghts/weaknesses of this class with the generalized hint-based approach class. Second, given the promising results in promote, the paper will be much more significant if the evaluation section would include comparison with these approaches (or at least promote) as well.

- This approach seems more relevant to applications that manage their own cache. How does this approach generalize to situations where the first-level cache is a system cache, VFS being the most prominent. What sort of hint sets can be specified. I think this question if explored completely can substantially increase the value of this paper.

- Some discussion on how the use of hints promotes mutually exclusive caching is neeed.

- One concern with the evaluation of the paper is the use of a single database trace as the only way of experimentation. The use of OLTP specifically limits the evaluation take-aways as this shows only a few reduced set of possible IO traces. OLTP IO requests are known to be random and short. For example a type of access pattern that may result in very low hit rate is a long sequential read. All of these requests seem to have the same hint, hence a high frequency after some initial requests. This will start polluting the cache removing pages for these ones that may never be used again. This is also addressed by CAR or ARC.

Detailed comments:

* Pg 2, when you say “but what about low-priority recovery write?” what do you mean? Are you saying that this may or may not be a good caching candidate depending on memory pressure? Some elaboration is necessary.

* I think a more precise D(H) would be the average number of unique requests that occur between the request and the read re-reference. This is the well-known “reuse distance” definition used in the compilers community.

* The use of LRU does not seem correct. Some recent algorithm(s) like ARC, CAR or 2Q should be used instead. LRU is well known to perform poorly for low temporal locality.

* The paper proposes some techniques in order to limit the maximum number of hints being analyzed. There should be a similar mechanism for the minimum as well. Too few hints result in no useful information for cache management.

* I think you should provide the size of the outqueue in experiments in 6.3, 6.4 and 6.5.

* You should better justify your concurrent multiple clients access with diverse set of benchmarks. Also, I would expect some common hints (e.g. buffer caching priority) would these need to be normalized?

Typos etc.:

Figure 4 could include the interaction with the outqueue for improved clarity.

* Instead of using the “re-reference distance” term, the authors should use more commonly used ones such as “re-use distance”.

* In page 6, “change that are being tracked my vary over time” → “..may vary over time”.

* Page 6, my → may

* The legends are not clear in Fig 9.

==+== G. Comments for PC (hidden from authors)

==+== End Review