Bin Zhang Data, AI, and engineering notes
Home Blog Archive

Tags

  • #agents
  • #autogen
  • #cloud
  • #coding agents
  • #containers
  • #copilot
  • #cpu
  • #data platform
  • #databricks
  • #dataframe
  • #deep learning
  • #etl
  • #functional programming
  • #git
  • #kv cache
  • #langchain
  • #langgraph
  • #leadership
  • #linear algebra
  • #llm
  • #llm apps
  • #llm fundamentals
  • #llmops
  • #lmcache
  • #mcp
  • #mlops
  • #model serving
  • #model training
  • #multi-agent systems
  • #optimization
  • #orchestration
  • #python
  • #rag
  • #spark
  • #sql
  • #streaming
  • #tool use
  • #uv
  • #vllm
  • #vscode
  • #web development

Reproducing vLLM and LMCache KV Cache Reuse on a CPU-Only MacBook

July 3, 2026
AI ENGINEERINGSOFTWARE ENGINEERING
vllmlmcachekv cachecpuuv

I became interested in LMCache because it sits in the part of LLM serving that feels both very practical and very under-discussed: KV cache movement.

© 2026 Bin Zhang. All rights reserved.