Archive year

2026

Archive year

2025

vllm throughput

In large-language-model (LLM) inference serving contexts, once the model compute becomes sufficiently fast, the performance bottleneck often shifts to...

FastMCP MCP Server Hub

MCP Server Hub Currently, our different projects are using various MCP servers. To streamline and unify the process, we plan to implement a HUB MCP...

How LLM Tools work

Tools in Large Language Models (LLMs) Tools enable large language models (LLMs) to interact with external systems, APIs, or data sources, extending...

MCP Transports

| Feature | stdio | sse (Server-Sent Events) | streamable-http | |--------------------------|------------------------------------------|--------------...

Text to SQL (Smolagents)

Out: None [Step 1: Duration 146.87 seconds| Input tokens: 2,113 | Output tokens: 923] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2...

RAG-Reranking

Retrieval-Augmented Generation (RAG) is a powerful approach that combines retrieval and generation to produce high-quality responses. However, the...

Local LLM Setup

If you find this in your VSCode, congratulations! You have successfully set up Ollama for code generation and assistance in Visual Studio Code. alt...

Archive year

2024

Archive year

2021

Setup Minikube

bin/spark-submit \ master k8s://https://192.168.99.100:8443 \ deploy-mode cluster \ name spark-pi \ class org.apache.spark.examples.SparkPi \ conf...

Archive year

2020

Archive year

2012

Repo List

Repos Repo List language link