Table of Contents
Quick Answer
- Zero-shot: ask the model to do a task with no examples
- Few-shot: include 2-10 examples in the prompt
- Fine-tuning: train the model on hundreds-to-thousands of examples
Accuracy and cost rise left to right. So does setup time.
What Do These Terms Mean?
These are three points on the spectrum of how much task-specific information you give the model (Brown et al., "Language Models are Few-Shot Learners," OpenAI, 2020).
Zero-shot relies entirely on pre-training. Fine-tuning permanently adapts weights. Few-shot is the middle ground — shown examples shape behavior for that one request.
How Each Works
Zero-shot
Classify this review as positive or negative: "Loved it!"
The model pattern-matches from pre-training.
Few-shot
Review: "Amazing product" -> positive
Review: "Waste of money" -> negative
Review: "Loved it!" -> ?
Examples anchor the format and edge cases.
Fine-tuning
Upload 1000+ labeled review pairs to OpenAI / Anthropic / open-source training script. Model weights update. You now query without any examples and get the fine-tuned behavior.
Examples
- Zero-shot translation: GPT-4 translates Swahili -> English without prior examples
- Few-shot JSON extraction: 3 examples of parsed resumes before the real one
- Fine-tuned classifier: 10K labeled support tickets -> dedicated model that routes accurately
- Zero-shot code review: "Find bugs in this function"
- Fine-tuned brand voice: 500 brand-approved emails train a model to always sound on-brand
When to Use Each
Need
Approach
Prototype quickly
Zero-shot
Consistent format / edge cases
Few-shot
High volume, latency-sensitive, specific style
Fine-tuning
Fresh data changes often
Zero-shot + RAG
Tiny output space (classify into 10 categories)
Fine-tuning
FAQs
How many examples count as few-shot? Typically 1-10. Beyond that, diminishing returns — fine-tuning becomes viable.
Does few-shot cost more per request? Yes — examples eat tokens. At scale, fine-tuning often wins on cost.
Is fine-tuning worth it? Only if zero-shot + few-shot cannot hit accuracy, OR you have >100K requests/month where per-request savings matter.
Can I combine approaches? Yes — fine-tune for style, then RAG for facts, then few-shot for format.
What is instruction tuning? A specific fine-tuning that teaches models to follow instructions. All modern chatbots are instruction-tuned.
Can open-source models be fine-tuned cheaply? Yes — LoRA / QLoRA fine-tunes 7B models on a single GPU for ~$5-50.
Does fine-tuning cause forgetting? Yes — models can lose general capability. Monitor regressions.
Conclusion
Start zero-shot. Add few-shot when format slips. Fine-tune only when zero-shot + few-shot + RAG hit a wall. Read more patterns on Misar Blog↗.