Blog

All posts

Notes on LLMs, machine learning, data engineering, and systems work.

Oct 20, 2025

vllm throughput

LARGE LANGUAGE MODELS

In large-language-model (LLM) inference serving contexts, once the model compute becomes sufficiently fast, the performance bottleneck often shifts to the key-value (KV) cache...

Jul 16, 2025

FastMCP MCP Server Hub

AI ENGINEERING

MCP Server Hub Currently, our different projects are using various MCP servers. To streamline and unify the process, we plan to implement a HUB MCP server that can handle multiple...

Jul 11, 2025

How LLM Tools work

AI ENGINEERING

Tools in Large Language Models (LLMs) Tools enable large language models (LLMs) to interact with external systems, APIs, or data sources, extending their capabilities beyond text...

Jun 23, 2025

MCP Transports

AI ENGINEERING

| Feature | stdio | sse (Server-Sent Events) | streamable-http | |--------------------------|------------------------------------------|--------------------------------------------...

May 4, 2025

Text to SQL (Smolagents)

AI ENGINEERING

Out: None [Step 1: Duration 146.87 seconds| Input tokens: 2,113 | Output tokens: 923] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ─ Executing...

Apr 22, 2025

RAG-Reranking

AI ENGINEERING

Retrieval-Augmented Generation (RAG) is a powerful approach that combines retrieval and generation to produce high-quality responses. However, the quality of the final response can...

Feb 2, 2025

Local LLM Setup

LARGE LANGUAGE MODELS

If you find this in your VSCode, congratulations! You have successfully set up Ollama for code generation and assistance in Visual Studio Code. alt text

Jul 18, 2021

Setup Minikube

DEVOPS

bin/spark-submit \ master k8s://https://192.168.99.100:8443 \ deploy-mode cluster \ name spark-pi \ class org.apache.spark.examples.SparkPi \ conf spark.driver.cores=1 \ conf...

Oct 15, 2012

Repo List

Repos Repo List language link