Three capabilities distinguish AI agents from simple AI models: the ability to plan multi-step tasks, the ability to remember context across those steps, and the ability to use external tools to take actions. Understanding these building blocks demystifies how agents work.
Planning — breaking goals into steps
When given a complex goal, an agent must figure out what steps are needed and in what order. This planning capability comes from the underlying LLM's reasoning ability — the same model that can explain a concept step by step can also plan a sequence of actions.
Common planning approaches:
- ReAct (Reason + Act) — the agent alternates between reasoning about what to do next and taking an action. Think step by step, act, observe result, reason again.
- Chain-of-thought planning — the agent writes out a plan before acting, then follows and adapts it.
- Tree of thought — explores multiple possible action paths before committing to one.
Goal: "Find the current CEO of Infosys and their educational background."
Thought: I need to search for the current CEO of Infosys.
Action: web_search("Infosys CEO 2025")
Observation: Results show Salil Parekh is CEO.
Thought: Now I need his educational background.
Action: web_search("Salil Parekh education background")
Observation: IIT Bombay, Cornell University…
Final answer: Salil Parekh, IIT Bombay (engineering) + Cornell (MBA)
Memory — what the agent remembers
Agents need different types of memory to function effectively:
- In-context memory — the conversation history and task progress within the current context window. Everything the agent has seen and done in this session.
- External memory — information stored outside the model in databases or files. Retrieved when relevant using search. Enables memory beyond the context window limit.
- Episodic memory — records of past agent runs. "Last time I processed this customer's request, I…" Allows learning from previous sessions.
Tools — how agents act on the world
Tools are the capabilities an agent can invoke. The model decides when to use a tool and what inputs to provide. The tool executes and returns a result. Common tools include:
| Tool category | Examples | What it enables |
|---|---|---|
| Web search | Google, Bing, Tavily | Current information beyond training data |
| Code execution | Python sandbox, SQL runner | Computations, data analysis, file manipulation |
| APIs | Calendar, email, CRM, Slack | Integration with external services |
| File system | Read/write files | Processing documents, saving outputs |
| Browser | Web navigation, form filling | Any task a human can do in a browser |
A model without tools can only generate text. The same model with web search, code execution, and API access can research, calculate, create, and act. Tools transform a generative model into an agent.
Key takeaways
- Planning enables agents to decompose complex goals into executable steps
- ReAct is the dominant pattern: reason about what to do → act → observe → reason again
- Memory comes in three types: in-context (current session), external (database), episodic (past runs)
- Tools are what make agents act — web search, code execution, APIs, file systems, browsers
- Tools transform a text generator into an agent capable of real-world action