Agents 101: part 3 - workflow
This is the third part of a series of texts breaking down AI agents
Key points
Tools in AI agents are external functions, like web search or APIs, that extend agent capabilities.
The agent's workflow follows a thought-action-observation cycle, iterating until tasks are completed.
Memory, including short-term and long-term, helps agents retain context and learn from past interactions.
Agent frameworks, such as LangChain and Hugging Face, simplify building and deploying agents.
The future of AI agents likely involves increased autonomy, system integration, and ethical challenges.
This blog post is the third in a series on AI agents, following discussions on their foundation and LLMs as their brain. Today, we dive into tools and the agent’s workflow, crucial for understanding how these systems operate in practice. Before diving into the actual workflow, let's first talk about tools.
What Are Tools in AI Agents?
Tools are like extra helpers for AI agents—external functions or services they can call to do things they can’t manage on their own. For example, if an agent needs to find the current weather, it might use a weather API tool instead of guessing. These tools can include web searches, database queries, or even other AI models, expanding what the agent can do.
Think of tools as specific tasks the agent can request, like asking a friend for help. Each tool has inputs (what you give it) and outputs (what it returns). Tools operate as functions with defined inputs and outputs, accessible via interfaces the agent understands.
For instance, an agent has a list of tools available, defined with a name, description, types of inputs and outputs. The agent decides which tool to use based on its current state and task, often guided by the LLM’s reasoning. In practice, the agent might call a web search tool with a query like “current population of Paris” and process the returned data, iterating if results are inconsistent. This interaction is facilitated by frameworks that manage tool calling, reducing the complexity for developers and ensuring scalability.
Agent workflow
The agent’s workflow follows a thought-action-observation cycle, a continuous loop that mirrors human decision-making. This pattern, called ReAct, emphasizes actions guided by verbal reasoning, with observations enabling short-term memory retention.
Thought: The LLM decides the next step, reasoning about the task and available tools.
Action: The agent executes an action, typically calling a tool with specific arguments.
Observation: It reflects on the tool’s output, using this feedback to inform the next thought.
Consider the task, “What is the current population of Paris?” The agent’s workflow unfolds as follows:
Thought: “I need the current population of Paris; I don’t know it, so I’ll search.”
Action: Uses web search tool, querying “current population of Paris.”
Observation: Receives results like 2,165,423 from Wikipedia and 2,000,000 from news, noting inconsistency.
Thought: “Results vary; I need a reliable source, perhaps an official site.”
Action: Uses browse page tool to access the French National Institute of Statistics and Economic Studies (INSEE).
Observation: Finds the 2021 census data, 2,165,423, confirming reliability.
Thought: “This is trustworthy; the answer is 2,165,423.”
Operationalizing AI Agents
Agent Frameworks
Agent frameworks are platforms that streamline agent development, offering pre-built tools for memory, tool integration, and workflow management. Notable frameworks include:
LangChain: Simplifies creating agents with LLMs, supporting tool calling and memory management. Includes multi agent framework with LangGraph.
Hugging Face: Provides models and tools for agent development, enhancing accessibility.
AutoGen, CrewAI: Offer collaborative and role-based architectures for complex tasks, as seen in multi-agent systems.
It's not limited to these ones, new frameworks are being available as I write this post. My suggestion is to find the framework that offers the tools you need to solve your problem. In several situations, the don't exist and you need to create custom tools.
Memory
Memory is vital for agents to maintain context and learn from interactions. It’s categorized into:
Short-term memory: Holds recent interactions, like conversation history, for immediate context.
Long-term memory: Stores facts for future recall, such as knowing Paris is France’s capital, often using external databases or vector stores for embeddings
Implementation varies, with techniques like conversation history inclusion in prompts or graph databases for multi-agent support.
Conclusion
This exploration of tools, workflow, and operational aspects ins an introduction of AI agents. From defining tools as external helpers to detailing the thought-action-observation cycle. The future trends is increasing autonomy by handling complex, multi-step tasks with minimal oversight, driven by advancements in LLMs.