AI Model Taxonomy

Right … AI acronyms. I started off looking at LLMs (Large Language Models), but now I also look at SLMs, LAMs, VLAs, and MoEs.

Are these just marketing buzzwords? Sometimes. But usually, they describe a fundamental shift in what the AI is actually doing. We are moving from models that just “chat” to models that can “see,” “hear,” “act,” and—crucially—remember.

Here is the definitive taxonomy—the technical breakdown of the models and systems actually running the streets in 2026.

1. The Core Models (The Brains)

AcronymFull NameInput(s)Output(s)The Technical Distinction
LLMLarge Language ModelText, CodeText, CodeThe Baseline. Uses Transformer architecture to predict the next token. Purely semantic; it has no eyes or ears, only text data.
SLMSmall Language ModelTextTextThe Edge Runner. Optimised for phones/laptops (usually <7B parameters). Sacrifices some reasoning depth for privacy, speed, and offline capability.
VLMVision-Language ModelImages + TextTextThe Eyeball. An LLM with a “Vision Encoder” (like CLIP) bolted on. It “tokenises” images into a language the LLM understands, allowing you to chat about pixels.
MLLMMultimodal Large Language ModelText, Audio, Video, ImageText, Audio, ImageThe Godzilla. The evolution of VLMs. It doesn’t just “see” static images; it processes time-series data (video/audio) natively, effectively “watching” and “listening” in real-time.
LAMLarge Action ModelUser Intent (Text/Voice)Actions (JSON, API calls, Clicks)The Agent. Designed for doing. Instead of writing a poem, it outputs executable actions (e.g., navigating a UI, clicking buttons, calling an API) to perform work.
VLAVision-Language-ActionVisual Feed + Text InstructionRobot ActuationThe Robot Brain. It takes a camera feed and a command (“Pick up the apple”) and outputs direct motor controls (joint angles/velocity) rather than text.

2. The Architecture & Memory (The Systems)

This is where the magic happens for businesses. These aren’t just “models”; they are strategies for making models smarter and faster.

AcronymFull NameThe Technical Distinction
RAGRetrieval-Augmented GenerationThe Librarian. Not a model weight, but a system. It connects a frozen LLM to your private data (Vector Database). Before answering, it “looks up” the facts. This is how you stop hallucinations.
CAGCache-Augmented GenerationThe RAM Upgrade. The evolution of RAG. Instead of searching a database, you load the entire dataset (books, codebases, videos) into the model’s massive context window (Concept Caching). It eliminates the search step entirely.
MoEMixture of ExpertsThe Efficiency Hack. Instead of one giant brain, it uses multiple smaller “expert” sub-models. A “router” sends your prompt only to the relevant expert (e.g., the Coding Expert), saving massive compute.
LCMLatent Consistency ModelThe Speed Demon. A hyper-fast variation of Diffusion models. It creates high-quality images in huge leaps (1-4 steps) rather than the slow 50+ steps of traditional Stable Diffusion.

Why This Matters

The most important shift in this table is the move from Language to Action and Memory.

  • LLMs are thinkers. They live in a box and dream up text.
  • RAG/CAG gives them memory. It lets them read your company handbook.
  • LAMs/VLAs give them hands. It lets them actually do the work.

We aren’t just building chatbots anymore; we are building digital employees and physical labourers.

Learn More

For a deeper visual breakdown of how these architectures stack up against each other, check out this excellent explainer:

Watch: Every Type of AI Model Explained Clearly