Efficient Memory Management for Large Language Model Serving with PagedAttention
In this blog post I explain a paper that creates a system similar to OS memory managment and applies to machine learning memory managment
things I find cool