Most teams reach for fine-tuning too early. Learn when fine-tuning actually outperforms prompting or RAG, which techniques to use (LoRA, QLoRA, DPO), and how to build an evaluation-driven training loop.

# Fine-Tuning LLMs: When to Do It and How to Do It Right Fine-tuning is the most over-applied and under-understood technique in the LLM toolkit. Before you spend GPU hours and engineering time, make sure fine-tuning is actually the right tool. ## Should You Fine-Tune? Fine-tuning wins when you need: | Problem | Best Solution | |---|---| | Model doesn't know your private data | RAG | | Model uses wrong output format | System prompt + few-shot | | Model needs specialized domain style | **Fine-tune** | | Model is consistently wrong on domain reasoning | **Fine-tune** | | Model needs to follow complex rules reliably | **Fine-tune** | If prompting or RAG solves your problem, use them. Fine-tuning is expensive and introduces a maintenance burden. ## Efficient Fine-Tuning with LoRA Full fine-tuning updates all model weights — impractical for most teams. LoRA (Low-Rank Adaptation) adds a small number of trainable parameters while freezing the base model. ```python from peft import LoraConfig, get_peft_model config = LoraConfig( r=16, # rank lora_alpha=32, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM" ) model = get_peft_model(base_model, config) # Only ~0.5% of parameters are trainable model.print_trainable_parameters() ``` **QLoRA** goes further: quantize the base model to 4-bit, then apply LoRA. This lets you fine-tune a 7B model on a single consumer GPU. ## Data Quality Over Quantity 100 high-quality examples beat 10,000 noisy ones. Your training data should: - Reflect the exact format you want outputs in - Cover edge cases and failure modes - Be curated by domain experts, not scraped automatically ## DPO: Teaching the Model What Not to Say Direct Preference Optimization (DPO) is now the dominant technique for aligning model behavior, replacing RLHF for most use cases. Feed it pairs of (preferred, rejected) outputs and it learns directly from the comparison. ```python # DPO training pair { "prompt": "Summarize this legal document:", "chosen": "The contract establishes a 2-year service agreement...", "rejected": "This document is about a contract between two parties..." } ``` ## Evaluation-Driven Training Loop Never fine-tune without a held-out eval set. Define metrics before training: 1. **Task-specific metrics** (ROUGE, F1, accuracy on domain benchmarks) 2. **Safety checks** (does the model still refuse harmful requests?) 3. **Regression tests** (did general capability degrade?) Fine-tuning without evals is flying blind. Build the eval suite first.