Build Your Own AI Agent that Can Browse the Web and Take Actions 🤖

In this article, we'll explore how to build an AI agent that can automatically browse the web and perform actions using the Supercog Agentic Framework. This powerful agent combines browser automation with AI vision models to create an intelligent web assistant that can navigate websites, extract information, and perform tasks just like a human would.

Installation 🛠️

Let's install the framework from source to get started:

# Clone the repository
git clone https://github.com/supercog-ai/agentic.git

# Change to the agentic directory
cd agentic

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install with browser-use dependencies
uv pip install -e ".[browser-use,dev]"

# Install Playwright browsers
playwright install

Set up Required API Keys 🔑

To use the OSS Operator Agent, you'll need to set up an API key for your chosen model:

Google Gemini API Key (Recommended):
- Sign up at Google AI Studio
- Create an API key
- We recommend using Gemini 2.0 Flash model which is optimized for browser automation tasks and has better context handling
OpenAI API Key (Alternative):
- Sign up at OpenAI
- Create an API key in your account settings
- Can be used with GPT-4 or other OpenAI models

You can set the key as an environment variable:

# For Gemini (recommended)
export GEMINI_API_KEY=your_gemini_key

# OR for OpenAI
export OPENAI_API_KEY=your_openai_key

Running the OSS Operator Agent 🚀

Navigate to the examples directory:

cd examples

Run the OSS operator agent:

python oss_operator.py

The agent will start and you can interact with it through the command line. Try asking it to:
- Get the first 5 posts from Hacker News homepage
- Search for specific information on a website
- Fill out forms or perform actions on websites

Example interaction:

I am your open source Operator. What task would you like me to perform?
> get me the first 5 posts on hackernews homepage

The agent will:

Open a browser window
Navigate to Hacker News
Extract the first 5 posts
Return the results to you

Customizing the Agent 🎯

You can customize the agent by modifying the oss_operator.py file:

Choose Your Model: The agent supports different AI models:

agent = Agent(
    name="OSS Operator",
    model="gemini/gemini-2.0-flash",  # Recommended for browser automation
    # Other options:
    # model="openai/gpt-4o"
    # model="openai/gpt-4"
    tools=[BrowserUseTool()]
)

Use Your Browser: You can configure the agent to use your actual Chrome browser instance (with your cookies and state):

BrowserUseTool(
    chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',  # MacOS
    # For Windows: 'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
    # For Linux: '/usr/bin/google-chrome'
)

Conclusion 🎉

The OSS Operator Agent demonstrates the power of combining browser automation with AI vision models. Using the Supercog Agentic Framework, you can create all kinds of sophisticated AI Agents just like the web automation agents that can understand and interact with websites in a human-like way.

Some exciting use cases:

Automated web research
Data collection and monitoring
Form filling and submission
Website testing
Content extraction
Automated workflows

To learn more about Supercog Agentic Framework:

Visit the GitHub repository: https://github.com/supercog-ai/agentic
Check out the documentation: https://supercog-ai.github.io/agentic/latest/
Join the Discord community: https://discord.gg/EmPGShjmGu

Don't forget to star the repository if you find it useful! ⭐

Troubleshooting 🔧

If you encounter any issues:

Browser Installation: Make sure Playwright is properly installed:

playwright install

API Key Issues: Verify your API keys are correctly set in the environment variables.
Model Selection: If you run into context limits, try using Gemini 2.0 Flash instead of GPT-4.
Browser Path: If using a custom Chrome instance, ensure the path is correct for your operating system.

Feel free to ask questions in the comments below or open an issue on GitHub if you need help!

Emmanuel Onwuegbusi @emmakodes_