What are AI agents? A comprehensive guide


When you think of AI agents, do you imagine a personal AI assistant like Tony Stark’s Jarvis? Perhaps a calm-under-pressure TARS from Interstellar? Or, more on the scary spectrum, an amoral HAL 9000 straight out of 2001: A Space Odyssey?

Don’t worry: current technology doesn’t come close to that kind of science fiction—not yet. Right now, AI agents leverage large language models like GPT to understand goals, generate tasks, and go about completing them. You can use them to automate work and outsource complex cognitive tasks, creating a team of robotic coworkers to support your human ones—11 a.m. chat by the watercooler optional.

This field is now evolving faster, especially on the software side, with new AI models and agent frameworks becoming better and more reliable. Even the no-code platforms to put them together are becoming more powerful, so this is a great time to get your feet wet and run some experiments.

Table of contents:

What are AI agents?

An AI agent is an entity that can act autonomously in an environment. It can take information from its surroundings, make decisions based on that data, and act to transform those circumstances—physical, digital, or mixed. More advanced systems can learn and update their behavior over time, constantly trying out new solutions to a problem until they achieve the goal.

Some agents can be seen in the real world—as robots, automated drones, or self-driving cars. Others are purely software-based, running inside computers to complete tasks. The actual aspect, components, and interface of each AI agent vary widely depending on the task it’s meant to work on.

And unlike with a chatbot like ChatGPT, you don’t need to constantly send prompts with new instructions. AI agents will run once you give them an objective or a stimulus to trigger their behavior. Depending on the complexity of the agent system, it will use its processors to consider the problem, understand the best way to solve it, and then take action to close the gap to the goal. While you may define rules to have it gather your feedback and additional instructions at certain points, it can work by itself.

More flexible and versatile than traditional computer programs, AI agents can understand and interact with their circumstances: they don’t need to rely on fixed programmed rules to make decisions. This makes them great for complex and unpredictable tasks. And even though they don’t have complete accuracy, they can detect their mistakes and figure out ways to solve them as they move forward.

While building fully-fledged AI agents is out of reach for non-technical folks like me, you can already get one step closer to this technology with AI automation. For example, when you set up a new Zap in Zapier, you can send data to an AI model to summarize an article, stay on top of your meetings, or write a blog post, among many other possibilities. It’s artificially intelligent automation.

One last note: there’s a bit of confusion between AI agents and regular agent software. The latter falls under robotic process automation—RPA for short. These apps can use a computer like a human user, looking at screens, clicking elements, and automating work. They’re based on pre-determined rules and deal with structured data, lacking flexibility and adaptability. They don’t use AI at all—but you can definitely integrate AI into it to give it extra powers.

Components of an AI agent system

AI agents have different components that make up their body or software, each with its own capabilities.

  • Sensors let the agent sense its surroundings to gather percepts (inputs from the world: images, sounds, radio frequencies, etc). These sensors can be cameras, microphones, or antennae, among other things. For software agents, it could be a web search function or a tool to read PDF files.

  • Actuators help the agent act in the world. These can be wheels, robotic arms, or a tool to create files in a computer.

  • Processors, control systems, and decision-making mechanisms compose the “brain” of the agent. I’ve bundled these together as they share similar functions, but they may not all be present in an AI agent system. They process information from the sensors, brainstorm the best course of action, and issue commands to the actuators.

  • Learning and knowledge base systems store data that help the AI agent complete tasks; for example, a database of facts or past percepts, difficulties encountered, and solutions found.

Since the form of an AI agent depends so much on the tasks it carries out, you may find that some AI agents have all these components and others don’t. For example, a smart thermostat may lack learning components, only having basic sensors, actuators, and a simple control system. A self-driving car has everything on this list: it needs sensors to see the road, actuators to move around, decision-making to change lanes, and a learning system to remember how to navigate challenging parts of a city.

Types of AI agents

A robotic vacuum cleaner
Robotic vacuum cleaners can have anything from a simple AI agent system for obstacle detection to more complex systems that can recognize objects. (Image source)

Based on their components, complexity, and real-world applications, here are the most common types of AI agents.

  • Simple-reflex agents look for a stimulus in one or a small collection of sensors. Once that signal is detected, they interpret it, run a decision, and produce an action or an output. You can find these in simple digital thermostats or the smart vacuum cleaner currently freaking out your dog.

  • Model-based reflex agents keep an active internal state, gathering information about how the world works and how their actions affect it. This helps improve decision-making over time. You’ll find them forecasting inventory needs at a warehouse or in the self-driving car now parking in front of your living room window.

  • Goal-based agents create a strategy to solve a particular problem. They generate a task list, take steps to solve it, and understand whether those actions are moving them closer to the goal. You can find these agents defeating human chess masters and in AI agent apps—I’ll talk more about this later in this article.

  • Utility-based agents brainstorm the outcomes of decisions in circumstances with many viable courses of action. It runs each possibility and scores it based on its utility function: Is the best option the cheapest? The fastest? The most efficient? Super useful to help identify the ideal choice—and perhaps tackle bouts of human analysis paralysis. You can see them optimizing traffic in your city or recommending the best shows you should watch on TV.

  • Learning agents, as the name implies, learn from their surroundings and behavior. They use a problem generator to create tests to explore the world and a performance element to make decisions and take action based on what they learned so far. On top of that, they have an internal critic to compare the actions taken versus the impact seen in the world. These agents are preventing spam from landing in your inbox.

And if you have a really, really complex task to complete, you can combine these into multi-agent systems. You can have an AI agent as the control system, generating a list of tasks and delegating them to other specialized AI agents. As they complete these tasks, the output is stored and analyzed by an internal critic, and the whole system will keep iterating until it finds a solution.

How does an AI agent work?

An AI agent at work

In a nutshell, an AI agent uses its sensors to gather data, control systems to think through hypotheses and solutions, actuators to carry out actions in the real world, and a learning system to keep track of its progress and learn from mistakes. 

But what does this look like step-by-step? Let’s drill down on how a goal-based AI agent works, since it’s likely you’ll build or use one of these in the future.

  1. When you input your objective, the AI agent goes through goal initialization. It passes your prompt to the core LLM (like GPT), and returns the first output of its internal monologue, displaying that it understands what it needs to do.

  2. The next step is creating a task list. Based on the goal, it’ll generate a set of tasks and understand in which order it should complete them. Once it decides it has a viable plan, it’ll start searching for information.

  3. Since the agent can use a computer the same way you do, it can gather information from the internet. I’ve also seen some agents that can connect to other AI models or agents to outsource tasks and decisions, letting them access image generation, geographical data processing, or computer vision features.

  4. All data is stored and managed by the agent in its learning/knowledge base system, so it can relay it back to you and to improve its strategy as it moves forward.

  5. As tasks are crossed off the list, the agent assesses how far it still is from the goal by gathering feedback, both from external sources and from its internal monologue.

  6. And until the goal is met, the agent will keep iterating, creating more tasks, gathering more information and feedback, and moving forward without pause.

AI agents examples: What do they look like in action?

Here are three examples of actual AI agents:

1. Devin, the first AI software engineer

Armed with its shell to write commands, a code editor, and a browser, you can share what you want to create or what you’d like to update in your current project. Devin drafts an action plan to understand what it needs to do, makes sure it has all the resources to do it, and starts writing the code before your eyes. 

Is this the end for software engineers? No, as Devin is only about 13.86% effective. Still, with an assistant like this, seasoned coders can save hours, and perhaps non-technical people can build something with code from scratch.

2. A virtual town with 25 AI agents

Stanford University and Google used OpenAI’s API to create virtual inhabitants and observe how they’d lead their lives. 

To support this experiment, the team created a platform for storing memories and the base prompt that gives purpose to each agent. From that point on, the AI agents shared information with one another, remembered details about their relationships, and could even plan a Valentine’s Day party.

3. Waymo, self-driving cars 

These cars are driving and learning on the streets of Phoenix, San Francisco, and Los Angeles. They can navigate from point A to B autonomously. With a range of sensors and learning systems, they detect the road, other cars, and people to reach their destination as safely as possible. (The jury is—literally—still out on how safe they actually are.)

The best AI agents you can try right now

To show you this isn’t a dream I had last night, I put together a short list of apps you can try out. These are all in very early development stage, so expect bugs and long-ish wait times along the way. Still, I’m sure you’ll be able to feel the potential here.

OpenAI and Google

The big players are fighting for the top spot in the AI market, and both are developing AI agent platforms.

  • The OpenAI Assistants API, built for developers, lets you create AI agents that can run OpenAI models, access multiple tools (such as web search or the code interpreter), access files, and talk to other assistants. Try out a basic lower-code version in the OpenAI Playground, or with no-code on Zapier.

  • The release of GPT-4o—and the showcase of its speech capabilities—suggests that ChatGPT may get more features in the future that make it a full AI agent. At this moment, it’s still hard to tell.

  • On the other side of the field, Project Astra, unveiled at Google’s I/O, shows a lot of promise. Aimed at consumers, it’ll be a real-time AI agent that will help you navigate the world and complete tasks for you. It can detect objects, explain code, tell you which part of town you’re currently in, and even help you find lost items. The video is mindblowing—take a look.

General-purpose AI agent apps

A screenshot of AI Agent, a general purpose AI agent app
  • AI Agent is a flexible app that lets you create your own agents, by picking a name, an objective, and the AI model it should use (GPT-3.5 Turbo and GPT-4 are available right now). After it initializes the goal and creates the first task list, you can edit and add your own tasks. The visual workflow builder is coming soon. Give it time to complete each step: it can take more than 20 minutes to complete advanced tasks.

  • AgentGPT has the familiar layout of ChatGPT, letting you create and manage multiple AI agents. It’s very intuitive and fast, but the results are inconsistent. There’s also a library for developers, so you can implement your own spin on it.

  • HyperWrite Assistant is an AI agent that lives inside your Chrome browser. It’s still in early development and on waitlist status, so we’ll have to sit tight for the time being.

AI agent apps for online research

A screenshot of aomni, an AI agent app for online research
  • aomni is an AI agent that crawls the web looking for information on any topic you choose. It takes your goal, creates a task list, completes it one by one, and hands you the result over email. It takes at least 15 minutes to complete, and you can use it two times per day for free.

  • Toliman AI is another viable option for online research, following a similar process. You can select how many references it should find before putting together the end result. When signing up, you get a few credits to test the features. Any credits you buy on top of those will support the development of the app.

AI agent to-do list assistants

These are both in waitlist stage, so they’re not easy to hop into right now. But if you sign up for the list, you’ll be notified as soon as they’re open.

  • Spellpage. Enter your to-do list in Spellpage, and its AI agent will help you figure out all the tasks you need to do to complete everything. It will also help with research and give you motivation to help you stay on track.

  • Do Anything Machine will help you tackle your tasks and propagate the results to Notion or Google Calendar, among other places.

AI agents for developers

You can do a lot more with AI agents if you know how to code. Here are a few starting points:

Build your own AI agents with Zapier Central

A sneak peek at Zapier Central.

Remember when I said earlier that you may be using AI agents soon? That’s what Zapier Central is all about: a platform where you can build your own bots to automate tasks. It’s as if you took ChatGPT, connected it to over 6,000 apps, and had ways to trigger behaviors automatically. While it’s not a full no-code AI agent tool yet, it’s moving closer to that territory.

When configuring a new bot, there are three ways you can customize it:

  • Behaviors. You can use these to control how the bot talks to you and the actions that it can take. Start by providing instructions in a prompt. Then, connect actions the bot can run: for example, create a new Gmail draft or add a new entry to an Airtable base. When you add a trigger to start this behavior, the bot can run on this instruction on its own.

  • Instant actions. These are actions the bot can run as you’re chatting with it. Set up the connected apps and data, and just write in the chat how you want to run them.

  • Data sources. This is all the connected data to the bot, its own kind of knowledge base system. Add as many Google Sheets, Airtable, or Notion pages as you want (among thousands of other sources), and the bot will be able to read and work with that data.

No need to hop into the Zapier editor—you can automate actions with your bot using English.

AI agents FAQ

The possibilities here are astonishing, but what happens if and when AI agents start spreading into every part of our lives? Can we trust this technology to assume more critical tasks in the future?

There are no clear answers yet, as all of this is still very new. Here’s a roundup of the core questions floating online on this issue.

Is ChatGPT an AI agent?

No, ChatGPT is not an AI agent—but it will try hard to convince you that it is if you chat with it. It has limited autonomy when generating content or completing tasks. You have to send a prompt to get an answer. It can’t prompt itself or work toward achieving a goal through multiple attempts.

However, it has components you could find in an AI agent:

  • Sensors: human chat input, web search tool, context window (conversational memory)

  • Actuators: multimodal generation tools (text, images, audio, the G in GPT), file creation tool

  • Control system: transformer architecture (the T in GPT)

  • Knowledge base system: pre-training (the P in GPT) and fine-tuning data

Here’s a Reddit discussion that provides more context on this topic.

Are GPTs AI agents?

GPTs are almost AI agents, but aren’t there yet. Why? You can tune them to complete specific tasks with a combination of prompt engineering, connecting tools via API, and providing a knowledge base. This is better when compared with the base ChatGPT. Still, they can’t prompt themselves or work autonomously to reach a goal.

Are AI agents sentient?

While a Google employee believed that one of the company’s large language models was sentient, the current consensus is that no, AI is not sentient.

Will AI agents take our jobs?

This technology will absolutely displace jobs and bring change to the market, although there’s no clear vision of when and how that may happen. Human workers may be replaced by AI agents in multiple industries. At the same time, more positions for AI development and maintenance may be created, along with human-in-the-loop positions, to make sure that human decisions drive AI actions and not the other way around.

Do AI agents perpetuate bias and discrimination?

An AI model is only as impartial as the data it’s been trained on—so yes, they’re biased. Tackling these problems involves making changes to machine learning processes and creating datasets that represent the full spectrum of the human world and experience.

Who’s to blame when an AI agent makes a mistake?

A thorny issue in ethics and law, it’s still unclear who should be blamed for accidents and unintended consequences. The developers? The owners of the hardware/software? The human operator? As new legislation is created and industry guardrails are implemented, we’ll be able to understand what kinds of roles AI agents can—and can’t—play.

AI agents for everyone

It’s hard to imagine all the possibilities and implications of AI agents down the line. Only one thing is certain: life and work will be transformed in the process.

If this field grows as fast as general AI has been growing, viable commercial AI agent platforms may hit the market very soon. They may not have as many features as their fictional counterparts, but if they start cackling maniacally, be sure to hit the off switch.

Moving on, it’s unclear how (and if?) artificial general intelligence (AGI) will come into being. It could be a single powerful AI model or a sprawling network of AI agents working together. One thing is clear, though: this technology is improving very quickly and delivering astonishing results. If a machine is ever capable of meeting or surpassing human intelligence, we may have to find a new definition of what it means to be human. “I think, therefore I am”?

Cue the doomsday letter.

Related reading:

Leave a Reply

Your email address will not be published. Required fields are marked *