Bin Zhang's Field Notes

Modern notes for data, AI, and engineering systems.

Practical essays, implementation notes, and mental models for LLMs, infrastructure, data platforms, and the craft of building durable software.

Fresh from the notebook

Recent writing

All posts

Jul 3, 2026

Reproducing vLLM and LMCache KV Cache Reuse on a CPU-Only MacBook

AI ENGINEERINGSOFTWARE ENGINEERING

I became interested in LMCache because it sits in the part of LLM serving that feels both very practical and very under-discussed: KV cache...

Read note

Jun 23, 2026

Use Databricks Models with VS Code Copilot and Copilot CLI

AI ENGINEERINGDATA ENGINEERING

I wanted one Databricks-hosted model to work in two developer surfaces:

Read note

Jun 18, 2026

A Git-Native Message Channel for Local Coding Agents

AI ENGINEERINGSOFTWARE ENGINEERING

My previous local development workflow was simple:

Read note

Mar 15, 2026

Attention Dilution

LARGE LANGUAGE MODELS

Attention dilution (also called context dilution) is one of the fundamental limitations of transformer-based LLMs when dealing with long...

Read note

Mar 8, 2026

AI Terminology: Agents, Skills, RAG, MCP, and the Layers Beneath the Hype

AI ENGINEERING

How many of these terms do you actually recognize?

Read note

Jan 4, 2026

ChatGPT in 2025: A Year in Review

LARGE LANGUAGE MODELS

ChatGPT Stats ChatGPT Growth ChatGPT Revenue

Read note