Table of Contents
Quick Answer
- Foundation model: large model pre-trained on broad data, adaptable to many downstream tasks
- LLM (Large Language Model): a text-focused foundation model
All LLMs are foundation models. Not all foundation models are LLMs.
What Do These Terms Mean?
The term foundation model was coined by Stanford's HAI (Bommasani et al., "On the Opportunities and Risks of Foundation Models," 2021). It describes models like GPT, Stable Diffusion, CLIP, and AlphaFold — all trained at scale and adaptable.
An LLM is specifically a language foundation model. "Large" is informal — usually billions of parameters trained on trillions of tokens↗ (Stanford HAI, 2024).
How They Relate
Foundation Models (umbrella)
+-- LLMs (GPT, Claude, Llama, Gemini text mode)
+-- Image models (Stable Diffusion, DALL-E)
+-- Multimodal (Gemini, GPT-4o, Claude Opus vision)
+-- Audio (Whisper, Suno)
+-- Scientific (AlphaFold, ESM)
+-- Robotics (RT-2, OpenVLA)
Examples
Foundation models that are LLMs
- GPT-5
- Claude Opus 4.1
- Llama 3.1 405B
- Gemini 2.0 Pro
- Mistral Large
Foundation models that are not LLMs
- Stable Diffusion (image)
- Whisper (audio)
- AlphaFold (protein structure)
- Segment Anything (vision)
- CLIP (vision-language embedding, not strictly generative language)
Foundation Model vs LLM
Aspect
Foundation Model
LLM
Scope
Any modality
Text (primarily)
Pre-training data
Broad — text, images, audio, scientific
Text corpora
Adaptable
Yes — fine-tune, prompt, RAG
Yes
Examples
GPT, SAM, AlphaFold
GPT, Claude, Llama
When the Distinction Matters
- Regulation: EU AI Act defines "general-purpose AI models" roughly aligning with foundation models
- Research: safety and alignment debates apply to all foundation models, not just LLMs
- Product: marketing teams often conflate the two, confusing buyers
Multimodal Blur
Modern "LLMs" like GPT-4o and Gemini handle images and audio. Are they LLMs or multimodal foundation models? Both — the field's nomenclature is settling. "Large multimodal model (LMM)" is increasingly used.
FAQs
Is every big model a foundation model? Only if broadly capable and adaptable. A specialized medical-imaging model trained only on X-rays is a domain model, not a foundation model.
Is CLIP an LLM? No — it learns joint text-image embeddings but is not generative language.
Are coding models LLMs? Usually yes — they are text models with heavy code data.
What size is "large"? Arbitrary. Circa 2026, "small" LLMs start around 1B; "frontier" are 100B+ activated parameters.
Do foundation models need to be open? No — most frontier ones are closed.
Why the term "foundation"? Because downstream apps are built on top — the model is the foundation.
Is AGI a foundation model? Hypothetically, an AGI system would be built atop one or more foundation models, but AGI is undefined.
Conclusion
Use "foundation model" when you mean the broader category, "LLM" when you specifically mean language. Your architecture diagrams will be cleaner for it. More on Misar Blog↗.