Date and Time
12/5, 1 PM
Location
CSB 453
Title
System software support for CXL memory systems
Abstract
Compute Express Link (CXL) is a an interconnect standard that allows CPUs to access memory over a PCIe bus. Recently, CXL Version 3.0 (and 3.1) brings multi-host hardware cache coherence in the form of back-invalidation, which makes memory sharing feasible. Multiple hosts can access some region of CXL memory with hardware enforced cache coherence. We believe this multi-host, shared CXL memory will be an attractive alternative to a small-scale distributed system, but such a hardware system requires software support to be successful.
We discuss an explicit CXL memory allocator for user programs and we discuss an in-memory database optimized for CXL memory. One problem for both systems is coping with partial failures. Processes accessing shared CXL memory can fail independently. System software should not block live threads during either death or recovery. Lock-free data structures are a first step toward tolerating partial failures, but recoverability (the ability to determine whether an in-progress operation completed) is also needed. A second problem is the limited availability of hardware cache-coherent memory on CXL devices. Hardware cache-coherent memory should be used mostly for metadata that requires atomic operations and this metadata needs to be segregated from data that can be managed using software coherence. In the database, we co-design the software coherence mechanism with the database's concurrency control mechanism to enable larger granularity and more efficient operation.
Bio
Emmett Witchel is a professor in computer science at the University of Texas at Austin where he has been since getting his doctorate at MIT in 2004. He co-chaired architectural support for programming languages and operating systems (ASPLOS) in 2019 and general co-chaired SOSP in 2024. His publishing recognition includes research highlights in Communications of the ACM (CACM), IEEE Micro top picks, and best paper awards at systems venues like operating systems design and implementation (OSDI) and the symposium on operating systems principles (SOSP). He is interested in operating systems, security, architecture, and concurrency. He is an ACM Fellow.
Hosted by: Joint Columbia Electrical Engineering Distinguished Lecture & “Sense, Collect and Move Data” DSI Center Seminar; for questions, please reach out to Prof. Tanvir Ahmed Khan.