Table of Contents
Quick Answer
- Foundation model: large model pre-trained on broad data, adaptable to many downstream tasks
- LLM (Large Language Model): a text-focused foundation model
All LLMs are foundation models. Not all foundation models are LLMs.
What Do These Terms Mean?
The term foundation model was coined by Stanford's HAI (Bommasani et al., "On the Opportunities and Risks of Foundation Models," 2021). It describes models like GPT, Stable Diffusion, CLIP, and AlphaFold — all trained at scale and adaptable.
An LLM is specifically a language foundation model. "Large" is informal — usually billions of parameters trained on trillions of tokens (Stanford HAI, 2024).
How They Relate
Foundation Models (umbrella)
|
+-- LLMs (GPT, Claude, Llama, Gemini text mode)
+-- Image models (Stable Diffusion, DALL-E)
+-- Multimodal (Gemini, GPT-4o, Claude Opus vision)
+-- Audio (Whisper, Suno)
+-- Scientific (AlphaFold, ESM)
+-- Robotics (RT-2, OpenVLA)
Examples
Foundation models that are LLMs
- GPT-5
- Claude Opus 4.1
- Llama 3.1 405B
- Gemini 2.0 Pro
- Mistral Large
Foundation models that are not LLMs
- Stable Diffusion (image)
- Whisper (audio)
- AlphaFold (protein structure)
- Segment Anything (vision)
- CLIP (vision-language embedding, not strictly generative language)
Foundation Model vs LLM
| Aspect | Foundation Model | LLM |
|---|---|---|
| Scope | Any modality | Text (primarily) |
| Pre-training data | Broad — text, images, audio, scientific | Text corpora |
| Adaptable | Yes — fine-tune, prompt, RAG | Yes |
| Examples | GPT, SAM, AlphaFold | GPT, Claude, Llama |
When the Distinction Matters
- Regulation: EU AI Act defines "general-purpose AI models" roughly aligning with foundation models
- Research: safety and alignment debates apply to all foundation models, not just LLMs
- Product: marketing teams often conflate the two, confusing buyers
Multimodal Blur
Modern "LLMs" like GPT-4o and Gemini handle images and audio. Are they LLMs or multimodal foundation models? Both — the field's nomenclature is settling. "Large multimodal model (LMM)" is increasingly used.
Conclusion
Use "foundation model" when you mean the broader category, "LLM" when you specifically mean language. Your architecture diagrams will be cleaner for it. More on Misar Blog.
