Archive

Archive year

2026

From Prompt to Response: A Step-by-Step Walkthrough of LLM Inference

March 7, 2026

LARGE LANGUAGE MODELS

llm fundamentals

Update: For a deeper systems-level treatment of LLM inference, especially the interaction between request scheduling, prefill, decode, and KV-cache...

Standing on Open Source: How a Codex Edge Case Became Agentusage

July 21, 2026

AI ENGINEERING SOFTWARE ENGINEERING

codex coding agents rust observability developer tools

Agentusage would not exist without open source.

Statistical Tests for Data Analysis: A Practical Onboarding Guide

July 12, 2026

statistics hypothesis testing experimentation data analysis onboarding

Learn how to choose and use statistical tests without turning analysis into a p-value checklist—from experimental design and assumptions to effect...

Use Local Models in VS Code Copilot with LM Studio and Unsloth Studio

July 10, 2026

AI ENGINEERING SOFTWARE ENGINEERING

vscode copilot lm studio unsloth local llm byok

Before you begin Install VS Code with Copilot Chat, then download a model in LM Studio or Unsloth Studio.

KV-Centric LLM Serving: vLLM, SGLang, and Disaggregated Attention

July 5, 2026

AI ENGINEERING LARGE LANGUAGE MODELS

vllm sglang paged attention radix attention kv cache disaggregated serving prefill decode

The more I look at LLM serving, the more it feels like the main object is not the request, the model, or even the GPU.

Reproducing vLLM and LMCache KV Cache Reuse on a CPU-Only MacBook

July 3, 2026

AI ENGINEERING SOFTWARE ENGINEERING

vllm lmcache kv cache cpu uv

I became interested in LMCache because it sits in the part of LLM serving that feels both very practical and very under-discussed: KV cache movement.

Use Databricks Models with VS Code Copilot and Copilot CLI

June 23, 2026

AI ENGINEERING DATA ENGINEERING SOFTWARE ENGINEERING

databricks copilot vscode llm model serving

I wanted one Databricks-hosted model to work in two developer surfaces:

A Git-Native Message Channel for Local Coding Agents

June 18, 2026

AI ENGINEERING SOFTWARE ENGINEERING

coding agents git multi-agent systems

My previous local development workflow was simple:

Attention Dilution

March 15, 2026

LARGE LANGUAGE MODELS

llm fundamentals rag

Attention dilution (also called context dilution) is one of the fundamental limitations of transformer-based LLMs when dealing with long contexts or...

AI Terminology: Agents, Skills, RAG, MCP, and the Layers Beneath the Hype

March 8, 2026

How many of these terms do you actually recognize?

ChatGPT in 2025: A Year in Review

January 4, 2026

LARGE LANGUAGE MODELS

llm fundamentals

ChatGPT Stats ChatGPT Growth ChatGPT Revenue

Archive year

2025

The Mandate for Leadership in AI Engineering

November 27, 2025

Over the next 12 to 24 months, the differentiator among engineers will shift from mastery of programming languages like Rust, Go, or Python, or the...

LLM Interview Questions

October 30, 2025

LARGE LANGUAGE MODELS

llm fundamentals

Hyperparameters are external settings chosen before training, such as the learning rate or regularization strength.

LLM Training Epoch

October 29, 2025

LARGE LANGUAGE MODELS

As large language models (LLMs) scale up, researchers have begun to notice a growing imbalance between model size and the availability of high-quality...

vllm throughput

October 20, 2025

LARGE LANGUAGE MODELS

In large-language-model (LLM) inference serving contexts, once the model compute becomes sufficiently fast, the performance bottleneck often shifts to...

LangGraph Reflection

October 19, 2025

langgraph agents

Reflection is related to agent self-improvement or reasoning feedback loops.

LangGraph Sample Project

October 2, 2025

[x] Independent deployable services - Each agent can scale horizontally (e.g., analysisservice replicas) - You can version and deploy agents...

LangChain/LangGraph Q&A

September 29, 2025

langchain langgraph

Its advantages over traditional sequential chains are evident in two areas:

Training LLM From Zero

August 10, 2025

LARGE LANGUAGE MODELS

1. Objective 2. Environment Setup

FastMCP MCP Server Hub

July 16, 2025

MCP Server Hub Currently, our different projects are using various MCP servers. To streamline and unify the process, we plan to implement a HUB MCP...

How LLM Tools work

July 11, 2025

Tools in Large Language Models (LLMs) Tools enable large language models (LLMs) to interact with external systems, APIs, or data sources, extending...

LangChain Retry Logic

July 1, 2025

LangChain Invoke Retry Logic LLM call is not stable and may fail due to network issues or other reasons, therefore, retry logic is necessary.

MCP Transports

June 23, 2025

| Feature | stdio | sse (Server-Sent Events) | streamable-http | |--------------------------|------------------------------------------|--------------...

Text to SQL (Smolagents)

May 4, 2025

Out: None [Step 1: Duration 146.87 seconds| Input tokens: 2,113 | Output tokens: 923] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2...

MCP Server & Client (SSE)

April 25, 2025

Step-by-Step Guide: Building an MCP Server using Python-SDK, AlphaVantage & Claude AI Model Context Protocol (MCP) lab

RAG-Reranking

April 22, 2025

Retrieval-Augmented Generation (RAG) is a powerful approach that combines retrieval and generation to produce high-quality responses. However, the...

Ollama Import GGUF Models

April 21, 2025

LARGE LANGUAGE MODELS

You start by creating a Modelfile, which acts as a key to unlock any GGUF model you want to use.

GenAI Projects

March 29, 2025

Learning never exhausts the mind         ― Leonardo da Vinci

Crawling the Web with LLM

February 16, 2025

Skyvern ScrapegraphAI Crawl4AI Reader Firecrawl Markdowner

LangGraph VS AutoGen

February 9, 2025

agents langgraph autogen

|Feature| LangGraph| AutoGen| |---|---|---| |Core Concept| Graph-based workflow for LLM chaining| Multi-agent system with customizable agents|...

Autogen Intro and RAG Workflow

February 8, 2025

agents autogen rag

AutoGen is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans.

Local LLM Setup

February 2, 2025

LARGE LANGUAGE MODELS

If you find this in your VSCode, congratulations! You have successfully set up Ollama for code generation and assistance in Visual Studio Code. alt...

Archive year

2024

Gradio with Ollama

December 15, 2024

llm apps python llmops

%%{init: { 'look':'handDrawn' } }%%

PySpark Dataframe Transformation

November 15, 2024

DATA ENGINEERING

spark python dataframe

```python linenums="1" spark = ( SparkSession.builder.master("local[]").appName("test").getOrCreate() ) d = [ Event(1, "abc"), Event(2, "ddd"), ]

Databricks Wheel Job

November 1, 2024

DATA ENGINEERING

data platform spark python

My previous spark project is scala based and I use IDEA to compile and test conveniently.:smile::smile::smile: Databricks Job nice UI save your time...

Python Decorator

October 23, 2024

SOFTWARE ENGINEERING

:bulb: It will extend your function behaviors during runtime.

ZIO

October 16, 2024

SOFTWARE ENGINEERING

functional programming

This video is helpful to understand it. type:video

Reflex Learning

October 13, 2024

SOFTWARE ENGINEERING

python web development

Reflex (pynecone) Reflex is a library to build full-stack web apps in pure Python. Repo Video type:video

Snowflake Data Science Training Summary

October 5, 2024

I have enrolled in a private Snowflake Data Science Training. Let me list what I learned from it.

AutoGen HttpClient

September 8, 2024

```python linenums="1" title="myclient.py"

How to execute python modules

September 8, 2024

SOFTWARE ENGINEERING

We can use internal runpy to execute different moduls in our project.

Model Registry

August 12, 2024

MACHINE LEARNING

Problem: How to introduce ml-based production/features to cross-functional teams.

Archive year

2021

Setup Minikube

July 18, 2021

bin/spark-submit \ master k8s://https://192.168.99.100:8443 \ deploy-mode cluster \ name spark-pi \ class org.apache.spark.examples.SparkPi \ conf...

Archive year

2020

Azure Data Factory (Data Flow)

November 18, 2020

DATA ENGINEERING

Recently I'm working in Azure to implement ETL jobs. The main tool is ADF (Azure Data Factory). This post show some solutions to resolve issue in my...

Spark Dataframe window function

March 1, 2020

DATA ENGINEERING

spark dataframe

scala ref create dataframe

Spark SQL

February 21, 2020

DATA ENGINEERING

spark sql optimization

```txt master MASTERURL --> 运行模式例：spark://host:port, mesos://host:port, yarn, or local.

Spark Optimization

February 21, 2020

DATA ENGINEERING

spark optimization

PROCESSLOCAL data is in the same JVM as the running code. This is the best locality possible NODELOCAL data is on the same node. Examples might be in...

Airflow

February 11, 2020

DATA ENGINEERING

import airflow from airflow.models import DAG from airflow.operators.pythonoperator import PythonOperator

Whitening transformation

February 11, 2020

MACHINE LEARNING

Whitening Transformation

Spark Structured Streaming

February 8, 2020

DATA ENGINEERING

spark streaming

Recently reading a blog Structured Streaming in PySpark It's implemented in Databricks platform. Then I try to implement in my local Spark. Some...

Batch Normalization

February 4, 2020

MACHINE LEARNING

Batch Normalization is one of important parts in our NN.

Gradient Descent

February 2, 2020

MACHINE LEARNING

Vanilla gradient descent, aka batch gradient descent, computes the gradient of the cost function w.r.t. to the parameters θ

Archive year

2012

Repo List

October 15, 2012

Repos Repo List language link