A lot of "AI agent" products you'll see this year are one of a handful of diagrams on a whiteboard. Learn the common shapes and the hype cycle starts to make sense.
By Ryan · Belief Engines
Everyone is building "agents" right now, and almost nobody agrees on what the word means. The useful move is to stop arguing about the label and start looking at the shape of the data flow. A handful of shapes show up over and over — this post walks through six of the most common ones, and real production systems are almost always a combination of them (and sometimes a shape not in this guide at all). Once you can name the shape, conversations about scope, cost, and risk get a lot less mystical.
Below is a plain-English tour. For each pattern: a diagram, one paragraph on what it actually does, and a line on when to reach for it. The patterns aren't mine — this taxonomy was popularized by Anthropic's "Building effective agents" essay, which is required reading. I'm just translating it into the voice of someone who has shipped a few of these. Treat it as a starting vocabulary, not a closed set.
A workflow is a program that calls an LLM on rails. An agent is an LLM that decides where the rails go.
— The working definition, mostly
A quick vocabulary check
Three words do most of the work in these diagrams. LLM is a call to a language model — Claude, GPT, Gemini, whichever — with a specific prompt. Tool is anything the model can invoke to touch the outside world: a database query, an HTTP request, a shell command, a calendar write. Environment is the state the system lives in and changes over time — files, tickets, a codebase, a Slack channel, a browser. Everything else is plumbing.
The distinction between a workflow and an agent is the important one. In a workflow, the path through the system is decided by code that you wrote. In an agent, the path is decided by the model itself, turn by turn, based on what the environment sends back. Workflows are predictable and cheap. Agents are flexible and expensive. Most useful products are workflows with one small agentic loop somewhere inside.
01Autonomous agentLoop · open-ended
A human hands the model a goal; the model plans, takes an action in the world, reads what happened, and decides what to do next. The loop keeps running until the goal is met or the budget runs out. This is what most people mean when they say "an agent" — Claude Code, Cursor's agent mode, a research assistant that browses the web and compiles a report. Power comes from the feedback edge: the model gets to see the consequence of each action before choosing the next one.
Use when the task is open-ended, the number of steps is unknown in advance, and the environment gives fast, reliable feedback (a code sandbox, a database, a browser). Avoid when a failure is irreversible or expensive.
02Prompt chainingPipeline · sequential
Break a task into an ordered pipeline. Each LLM call feeds the next, with an optional gate — a programmatic or LLM-based check — that can short-circuit the run if something looks wrong. This is the simplest workflow that's still worth its complexity: it swaps one smart-but-unreliable call for a handful of focused calls, each with a smaller job and a clearer prompt.
Use when the task decomposes cleanly into stages (outline → draft → edit, or classify → extract → format). The latency cost is real — you're paying for N round-trips instead of one — but accuracy usually jumps.
03RoutingSwitchboard · specialized
A first LLM reads the input and picks which specialist to hand it off to. The specialists can be other models, tuned prompts, or entirely different systems — a cheap model for easy questions, a reasoning model for hard ones, a SQL-generator for database requests. Only one branch actually runs; the rest stay dim. This is the pattern every customer-support triage system eventually lands on.
Use when the input space is heterogeneous and the cost profile varies wildly by case. Routing pays for itself the moment your easy cases outnumber your hard ones ten to one.
04ParallelizationFan-out · concurrent
Run multiple LLM calls at the same time against the same input, then combine the answers. Two flavors: sectioning, where each call handles a different slice of the job (one summarizes, one extracts entities, one checks for policy violations), and voting, where you run the same prompt several times and pick the best or most common answer. The aggregator — often a small piece of code, sometimes another LLM — makes the final call.
Use when you need speed (parallel is free latency), diverse perspectives on a single input (safety + helpfulness + correctness), or higher accuracy through self-consistency. Costs N× one call but often returns far more than N× the value.
05OrchestratorConductor · dynamic fan-out
Looks like parallelization, but the crucial difference is that the orchestrator decides at runtime what subtasks to spawn — they aren't hard-coded. Hand it "build me a landing page," and it figures out it needs a copywriter sub-agent, a layout sub-agent, and a visual-design sub-agent, spawns them with tailored prompts, then hands everything to a synthesizer that stitches the pieces together. This is the shape of Claude Code's sub-agents, of deep-research systems, of anything that dynamically divides labor.
Use when you can't predict the shape of the subtasks in advance. Expensive and harder to debug than parallelization — reach for it only when the flexibility actually earns its keep.
06Evaluator–optimizerCritique loop · iterative
One LLM generates a draft; a second LLM — playing the role of a critic — reads the draft and either accepts it or kicks it back with specific feedback. The generator uses the feedback to produce a better version. Loop until the evaluator is satisfied or you hit a retry cap. Used well, this is how you get a small model to approach large-model quality; used poorly, it's how you burn a month of tokens going in circles.
Use when you have clear criteria for "good enough" that are easier to check than to produce on the first try — code that compiles and passes tests, translations that preserve specific terms, drafts that hit a defined rubric.
How to actually pick one
The dirty secret of this taxonomy is that most people reach for the fanciest shape first, when the right answer is almost always the simplest one that gets the job done. The order above is roughly the order of complexity: a single well-prompted call is simpler than a chain, a chain is simpler than a router, a router is simpler than an orchestrator, and a fully autonomous loop is the most complex of all. Every step up the ladder costs you predictability, latency, and debuggability — and gains you flexibility. Only pay that cost if the flexibility is actually buying something the simpler shape can't.
The practical heuristic I use on consulting work is: start with the most basic workflow that could plausibly solve the task, run it on real inputs, and only add complexity where the evaluation shows you need it. A prompt chain with a gate handles a shocking percentage of real jobs. A router on top of that handles most of the rest. By the time you actually need an orchestrator or an autonomous agent, the requirements will have made themselves loud.
And if you read nothing else, read the Anthropic essay this post is in conversation with. It is still the best short introduction to how these systems are actually designed in practice.