About: Staff Developer Advocate at Block; previously at GitHub.
I'm interested in Generative AI, AI Dev Tools, Data Privacy, and Responsible AI
Joined:
Dec 28, 2019
Beyond LLMs: My Introductory Experience with AI Agents
Publish Date: Oct 30 '24
71 6
If you ask ten developers to define an 'AI agent,' you'll get fifteen different answers. And if you ask a senior developer, they will say, "It depends."
However, the definition problem goes deeper than that of developer pedantry. Our industry shifted rapidly from rule-based systems to machine learning to large language models before we could agree on standard definitions. For example, in my last blog post, I wrote about our struggle to define 'Open Source AI.' We seem to face a similar challenge with autonomous agents.
Many of our AI definitions came from theory. Since the 1950s, academia was forward looking. They defined terms for AI systems that people weren't using yet. Today, these systems are real and practical, but reality doesn't match our theoretical frameworks. I noticed the term "agent" is applied to everything from workflows to Large Language Models (LLMs), so I asked my community for clarity. Kurt Kemple ( @theworstdev ) , a Senior Director of Developer Relations at Slack, shared a perspective that resonates with me:
"This is a pretty big issue honestly! For me, autonomous agents are those that can take actions (with or without human interaction) as part of their response. Like a genAI app can generate text based on a question, an autonomous agent could also run code, kick off workflows, make API calls, etc. as part of the response to an event or interaction."
I also appreciated my coworker Max Novich's definition, which he describes in the video below.
Max says,
"So to me, autonomous, an agent is some form of software, not specifically AI, that can execute actions on your behalf, from a simple request to a more complex action. So, like, you don't have to hold its hand allough, like, every step of the way. Like, you know, you don't have to go, go open that, click that, go there. Like, okay, I need to log in into GitHub. Just go do right. And like, it can extrapolate what actions need to be taken, and it can actually take those actions."
To maintain clarity throughout this post, I'll define an autonomous agent as a tool that can execute operations without human intervention.
Introducing Goose
I'm experimenting with an AI developer agent called Goose. Many AI programming tools increase development speed, but Goose is unique because it's semi-autonomous. This means it independently executes tasks from start to finish but knows when to ask for human assistance.
You can tell Goose once to develop a web application, perform a migration, or create a test suite, and it will handle everything else—from planning to execution—without needing more input from you.
By default, Goose:
Creates a plan
It shows you the plan
Executes the plan
In this context, a plan is your prompt broken down into a series of concrete steps. I think it’s particularly cool that Goose can retry steps or update its plan when a step in the plan fails.
Here's video example of Goose creating an executing a plan:
In a more advanced example, Max prompts Goose to browse the internet for him and do a little online shopping:
The Logic that Makes Goose Semi-Autonomous
Since Goose is open source, we can examine the repository to better understand its planning and execution capabilities. Please note that what I describe below is subject to change as Goose is still in its early stages.
Goose adopts a “Bring Your Own LLM” approach, meaning you select and connect it to any of LLM providers below:
Anthropic
Azure
Bedrock
Databricks
Google
Ollama
OpenAI
This flexibility allows developers to experiment with different models. To connect Goose to an LLM, you’ll need your personal API key, and you will need to configure a profile.yaml file.
If you wanted to use OpenAI’s GPT-4 mini, you might configure your profile.yaml to look like this:
Goose connects to the LLM through a class called Exchange, which handles communication with the AI model. When you prompt Goose, Goose uses the ask_an_ai method to consult the LLM and create a plan. Goose then follows the plan by executing a set of shell commands.
Here's the flow: User writes prompt → Goose communicates goal with LLM → LLM and Goose work together to create a plan. → Goose executes the plan via shell commands.
When a step fails, Goose may decide to:
Tell the LLM about failures and ask for an updated plan
Let you know if commands run too long
Alert you when it can’t run certain commands because it needs elevated permissions
This process is what makes Goose semi-autonomous because it works independently but knows when to ask for help.
Building Trust and Maintaining Control
For me, as a developer, at first I had a few trust issues with it. I’m more accustomed to working with AI tools where I can check the work each step of the way, so this was a workflow shift for me. I always believe in checking the work of an AI first before you commit the code, which I still continue to do with Goose. I already double check my code and my coworkers' code, so AI is not an exception.
Creating User-Generated Plans
One option that provided control for me is that I could create a custom plan for Goose to follow instead of relying on Goose to create a plan. This works great when I know what needs to happen but don't want to do it all manually.
You can create a plan in a markdown or yaml file. User-generated plans start with a kickoff message to provide Goose with some initial context.
Here’s an example kickoff message to guide a database migration:
kickoff_message:We're initiating a database migration to transfer data from the legacy database system to the new architecture. This migration requires us to ensure data integrity, minimize downtime, and verify successful migration at various stages.
Then, you can list concrete steps for the plan delineated by dashes -:
tasks:-Backup the current database:Ensure all data is backed up to a secure location before starting the migration process.-Set up the new database environment:Deploy the necessary infrastructure and configurations for the new database system.-Export data from the legacy database:Use database export tools to create data dump files.-Transfer data files to the new system:Securely copy the data dump files to the environment of the new database.-Import data into the new database:Utilize database import tools to load the data into the new database structure.-Validate data integrity:Run checks to compare and verify that data in the new database matches the legacy database.-Update database connections in application:Modify application settings topoint to the new database.-Monitor performance:Observe the new database's performance and configuration for any anomalies post-migration.-Document the migration process:Record detailed steps taken during the migration for future reference.
After you save the plan in a file, you can run it using the following command:
LLMs changed the way we work, but they only scratch the surface of what’s possible. Technologists are pushing the boundaries and automating our workflows via agents.
Some people may not like AI, but I appreciate that AI has helped move my programming career forward. As a neurodivergent developer, AI developer tools like Goose help me stay focused and productive despite my ADHD. There's so much more that we can do to use AI to build AI tools that make coding more accessible!
I hope the open source nature of tools like Goose empowers developers to make coding more accessible. For example, my coworker, Max Novich developed “Goose Talk to Me”, which lets developers use voice commands to work with code. This lowers the barrier for developers with visual impairments or limited mobility.
As we push the boundaries of what agents can do, we need to prioritize ethics and accessibility. And I invite you to get involved in the movement.
Get involved
Goose is still in its infancy, so there’s many opportunities for you to help us improve it! (I also think it's cool to get involved with a project at the ground level).
A voice interaction plugin for your goose. This project
leverages a local copy of Whisper for voice interaction and transcription.
Project Description
Goose-Talk-To-Me is a project dedicated to enabling voice interactions using state-of-the-art AI
technologies. It uses tools and libraries like goose-ai, openai-whisper, sounddevice, and
others to provide seamless voice processing capabilities.
Features
Voice Interaction using goose-ai
Voice to text transcription
Real-time voice processing
Text to speech using pyttsx4
Requirements
Python >= 3.12
goose-ai
openai-whisper
sounddevice
soundfile
numpy
scipy
torch
numba
more-itertools
ffmpeg
pyttsx4
Installation
Install the dependencies and prepare your environment:
block-open-source.github.io/goose-plugins is our community content showcase for Goose. We've created this hub of content creation tasks for the community to share their experiences and help others learn about Goose.
🤝 Pick ONE of the following issues to contribute to this project
❗You must only assign yourself one task at a time to give everyone a chance to participate.❗
You may assign yourself your next task after your current task is reviewed & accepted.
🚫 You must not steal an issue assigned to another person. If you submit a PR for an issue not assigned to you, you will not receive points. 🚫
This is really interesting! I’ve been hearing about autonomous agents but am still unclear—how do you ensure that the agent doesn’t make unintended or harmful decisions when executing commands on its own? Is there a safeguard in place?
Your follow-up article is as revealing as the first and provides so much more clarity from my current POV. Opening the door a bit wider allowing me to learn more about the "Why's", "Where's", "What's", etc. as well as some clearly defined "How's" for developing personalised, "field expert" AI Agents.
Thank you! This article is a goldmine in disguise, IMO.
Combining these building blocks with Graph tech and vector-based DBs makes so much more sense to me now.
Goose offers fascinating capabilities as an AI agent, and I love the emphasis on accessibility and open-source collaboration. It’s exciting to see how Goose’s semi-autonomous model empowers developers with ADHD or other challenges to stay productive and focused.
In exploring AI agents, I’ve tested tools like ChatDev, Devin AI, and SWE-Agent, which take distinct approaches that complement Goose’s capabilities:
ChatDev simulates an entire software company, using multiple agents for roles like CEO and tester. While this multi-agent model excels at project oversight, it offers less direct developer control compared to Goose’s hands-on approach.
Devin AI focuses on real-time collaboration within an integrated development environment. It strikes a balance between automation and developer involvement, letting users actively guide the process as needed.
SWE-Agent specializes in automated bug fixing through GitHub issues, demonstrating the power of AI agents in solving highly specific problems with measurable outcomes.
For a deeper dive into these tools and their unique strengths, I recommend this comparison by Guilherme Assemany: scalablepath.com/machine-learning/....
I’m excited to see how Goose and other AI agents continue evolving to make development more accessible and collaborative. Looking forward to updates on Goose’s journey!
This article is a goldmine in disguise, IMO. I really enjoyed learning about how you’re combining building blocks with Graph tech and vector-based DBs. It’s fascinating to see how these technologies can play such a critical role in shaping the future of AI agent architecture.
While reading, I came across this article: mobisoftinfotech.com/resources/blo...and it provided some great insights into AI agent frameworks and how to streamline the process of building AI agents. I wanted to ask—do you think the ideas presented there align with the approach you've shared in your post, or is there something else I should focus on to stay on the right track in building AI agents?
Thanks again for sharing such an insightful piece! Looking forward to hearing your thoughts and any additional suggestions you may have.
This is really interesting! I’ve been hearing about autonomous agents but am still unclear—how do you ensure that the agent doesn’t make unintended or harmful decisions when executing commands on its own? Is there a safeguard in place?