Objective
Environment Setup

Objective#

The goal of this project is to design, implement, and train a small-scale Large Language Model (LLM) from scratch, progressing through the full training lifecycle:

Pre-training on large-scale unlabeled text.
Supervised Fine-Tuning (SFT) on high-quality instruction-following datasets.
Parameter-Efficient Fine-Tuning (LoRA) for resource-efficient adaptation.
Direct Preference Optimization (DPO) for aligning the model with human preferences.

The project aims to serve as a practical, hands-on implementation of LLM training concepts from recent research.

Environment Setup#

macOS with M Series chip

‼️ MPS is not optimized for training

    Testing on: macOS MPS device (M4, 64GB RAM)
    PyTorch version: 2.3.0
    MPS available: True

    Matrix 1024x1024: 10.40 TFLOPS | Time: 20.65ms
    Matrix 2048x2048: 13.45 TFLOPS | Time: 127.76ms
    Matrix 4096x4096: 13.49 TFLOPS | Time: 1018.53ms
    Matrix 8192x8192: 12.82 TFLOPS | Time: 8573.45ms
    Matrix 16384x16384: 9.37 TFLOPS | Time: 93871.68ms

windows with CUDA (recommended)

    Testing on:CUDA device (2080Ti, Memory 11G)
    PyTorch version: 2.8.0+cu129
    CUDA available: True

    Matrix 1024x1024: 65.62 TFLOPS | Time: 3.27ms
    Matrix 2048x2048: 634.46 TFLOPS | Time: 2.71ms
    Matrix 4096x4096: 4447.00 TFLOPS | Time: 3.09ms
    Matrix 8192x8192: 34163.30 TFLOPS | Time: 3.22ms
    Matrix 16384x16384: 199933.93 TFLOPS | Time: 4.40ms

python package for CUDA support
- torch PyTorch ‼️ Be careful: CUDA version must match the PyTorch (12.6, 12.8, 12.9)
  
  pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu129
- transformers Install
  - pip install transformers
  - or pip3 install --no-build-isolation transformer_engine[pytorch] CUDA Transformer Engine
- peft
  - pip install peft
cuda toolkit
- choose the right version for your system link
- install cudnn from link
- check your Nvidia driver and cuda version: nvidia-smi

Dataset#

tokenizer dataset
pre-training dataset
sft (Supervised Fine-Tuning) dataset
dpo (Direct Preference Optimization) dataset