llm 3 Dynamic KV Cache Resize in llama.cpp — 8 GB Savings on a 27B Model 2026/03/02 Why OS-Level Demand Paging Fails on Apple Silicon GPU 2026/03/02 How llama.cpp Manages KV Cache — and How PagedAttention Fits In 2026/03/01