====== Generalized Caching-as-a-Service ====== Project Period: 06/15/2020 - 05/31/2024 __The goals of this project are to:__ - define a new abstraction and architecture for storage caches whereby storage stacks can easily embed lightweight CaaS clients within a distributed compute infrastructure. - formulate and theoretically analyze distributed caching algorithms that operate within the CaaS service such that individual CaaS server nodes cooperate towards achieving globally optimal caching decisions, - co-design client and server end-points to achieve strict durability and fault-tolerance in their implementations, and - drive all CaaS advancements using insights generated from a detailed whole-system simulator that models the diverse cache devices, network configurations, and application demand. ====== Investigators ====== * [[https://acadent.github.io/|Raju Rangaswami]], Principal Investigator, FIU * [[https://visa.lab.asu.edu/web/people/mingzhao/|Ming Zhao]], Principal Investigator, ASU * [[https://people.cis.fiu.edu/liux/|Jason X Liu]], Co-Principal Investigator * [[https://users.cs.fiu.edu/~giri/|Giri Narasimhan]], Co-Principal Investigator ====== Personnel ====== * Dr. Liana Valdes (PhD graduate) * Dr. Qirui Yang (PhD graduate) * Alexis Gonzales (PhD student) * Pratik Poudel (PhD student) * Kritshekhar Jha (PhD student) * Emam Hossain (PhD student) * Rukmangadh Myana (PhD student) * Ashikee Ghosh (PhD student) * Daniel Nunez Dominguez (Undergraduate student) * Fernando Cabanes (Undergraduate student) * Lester Fernandez (Undergraduate student) * Lillian Seebold (Undergraduate student) ===== Abstract ===== Caching has been a consistent tool of designers of high-performance, scalable computing systems, but it has been deplo yed in so many ways that it can be difficiult to standardize and scale in cloud systems. This project elevates the use of caching in cloud-scale storage system to a "first-class citizen" by designing and implementing generalized Caching -as-a-Service (CaaS). CaaS defines transformative technology along four complementary dimensions. First, it defines a new abstraction and architecture for storage caches whereby storage stacks can easily embed lightweight CaaS clients w ithin a distributed compute infrastructure. Second, CaaS formulates and theoretically analyzes distributed caching alg orithms that operate within the CaaS service such that individual CaaS server nodes cooperate towards achieving global ly optimal caching decisions. Third, the distributed CaaS clients and servers are co-designed to achieve strict durabi lity and fault-tolerance in their implementations. And finally, all of the CaaS advancements are driven by insights ge nerated from a detailed whole-system simulator that models the diverse cache devices, network configurations, and appl ication demand. The CaaS project supports a broad spectrum of applications that run in the private and public clouds. The CaaS project showcases these improvements via use cases in three important computing paradigms: Cloud, Big Data, and Deep Learning . The findings from the CaaS project create new educational content and research opportunities for undergraduates, Mas ters, and PhD students via exposition and involvement of these student groups within classroom projects and laboratory work. The outreach activities focus on the recruitment of under-represented students from minority groups in Computer Science for participation in the project. The outcomes of the CaaS project include open source software and public di ssemination of research findings which help transition of the new technologies to practice. ====== Publications ====== * Lu, Xiaoyang and Najafi, Hamed and Liu, Jason and Sun, Xian-He, CHROME: Concurrency-Aware Holistic Cache Management Framework with Online Reinforcement Learning. 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA) 2024. * Maruf, Adnan and Carlson, Daniel and Ghosh, Ashikee and Saha, Manoj and Bhimani, Janki and Rangaswami, Raju, Allocation Policies Matter for Hybrid Memory Systems (Poster and Extended Abstract). IEEE International Conference on High-Performance Parallel and Distributed Computing 2023. * Steven Lyons, Raju Rangaswami, Finding optimal non-datapath caching strategies via network flow. Theoretical computer science, 2023. * Q. Yang, R. Jin, AdaCache: A Disaggregated Cache System with Adaptive Block Size for Cloud Block Storage. IEEE International Conference on Cloud Computing 2023. * Pratik Poudel. Storage System Trace Characterization, Compression, and Synthesis using Machine Learning – An Extended Abstract. International Conference on Principles of Advanced Discrete Simulation (PADS) 2023. * Lyons, Steven, Raju Rangaswami, and Ning Xie, Finding optimal non-datapath caching strategies via network flow, Journal of Theoretical Computer Science 2023. * Zhao, Ming, Kritshekhar Jha, and Sungho Hong, GPU-enabled function-as-a-service for machine learning inference, IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2023. * Lixi Zhou, Jiaqing Chen, Amitabh Das, Hong Min, Lei Yu, Ming Zhao, and Jia Zou, Serving deep learning models with deduplication from relational databases. Proc. VLDB Endow. 15, 10 (June 2022). * Q. Yang, R. Jin, B. Davis, D. Inupakutika and M. Zhao, Performance Evaluation on CXL-enabled Hybrid Memory Pool, 2022 IEEE International Conference on Networking, Architecture and Storage (NAS) 2022, * Jason Liu, Simulus: Easy Breezy Simulation in Python, Winter Simulation Conference (WSC) 2020. * Adnan Maruf, Ashikee Ghosh, Janki Bhimani, Daniel Campello, Andy Rudoff, Raju Rangaswami, MULTI-CLOCK: Dynamic Tiering for Hybrid Memory Systems, IEEE HPCA 2022. * Rangaswami, Raju. (2022). FAB Storage for the Hybrid Cloud. IEEE International Conference on Networking, Architecture and Storage (NAS) 2022. * Liana V. Rodriguez, Alexis Gonzalez, Pratik Poudel, Raju Rangaswami, and Jason Liu, Unifying the data center caching layer: feasible? profitable?, ACM/USENIX HotStorage 2021. * Learning Cache Replacement with CACHEUS, Liana V. Rodriguez, Farzana Yusuf, Steven Lyons, Eysler Paz, Raju Rangaswami, and Jason Liu, Ming Zhao, Giri Narasimhan, USENIX File and Storage Technologies (FAST), February, 2021. * Zou, J., Zhao, M., Shi, J., & Wang, C., Watson: A workflow-based data storage optimizer for analytics. In 36th Intl. Conf. on Massive Storage Systems and Technology 2020. ===== Public Software ===== * [[https://github.com/sylab/cacheus|Sources for the paper titled "Learning Cache Replacement with CACHEUS", Proceed ings of USENIX FAST 2021.]] ====== Support ====== This work has been supported by the National Science Foundation awards [[https://www.nsf.gov/awardsearch/showAward?AWD_ID=1956229&HistoricalAwards=false|CNS-1956229]] and [[https://www.nsf.gov/awardsearch/showAward?AWD_ID=1955593&HistoricalAwards=false|CNS-1955593|]].