Three capabilities distinguish AI agents from simple AI models: the ability to plan multi-step tasks, the ability to remember context across those steps, and the ability to use external tools to take actions. Understanding these building blocks demystifies how agents work.

Planning — breaking goals into steps

When given a complex goal, an agent must figure out what steps are needed and in what order. This planning capability comes from the underlying LLM's reasoning ability — the same model that can explain a concept step by step can also plan a sequence of actions.

Common planning approaches:

ReAct in action

Goal: "Find the current CEO of Infosys and their educational background."

Thought: I need to search for the current CEO of Infosys.
Action: web_search("Infosys CEO 2025")
Observation: Results show Salil Parekh is CEO.
Thought: Now I need his educational background.
Action: web_search("Salil Parekh education background")
Observation: IIT Bombay, Cornell University…
Final answer: Salil Parekh, IIT Bombay (engineering) + Cornell (MBA)

Memory — what the agent remembers

Agents need different types of memory to function effectively:

Tools — how agents act on the world

Tools are the capabilities an agent can invoke. The model decides when to use a tool and what inputs to provide. The tool executes and returns a result. Common tools include:

Tool categoryExamplesWhat it enables
Web searchGoogle, Bing, TavilyCurrent information beyond training data
Code executionPython sandbox, SQL runnerComputations, data analysis, file manipulation
APIsCalendar, email, CRM, SlackIntegration with external services
File systemRead/write filesProcessing documents, saving outputs
BrowserWeb navigation, form fillingAny task a human can do in a browser
Tool use is the real unlock

A model without tools can only generate text. The same model with web search, code execution, and API access can research, calculate, create, and act. Tools transform a generative model into an agent.

Key takeaways

  • Planning enables agents to decompose complex goals into executable steps
  • ReAct is the dominant pattern: reason about what to do → act → observe → reason again
  • Memory comes in three types: in-context (current session), external (database), episodic (past runs)
  • Tools are what make agents act — web search, code execution, APIs, file systems, browsers
  • Tools transform a text generator into an agent capable of real-world action