In this article, we'll explore how to build an AI agent that can automatically browse the web and perform actions using the Supercog Agentic Framework. This powerful agent combines browser automation with AI vision models to create an intelligent web assistant that can navigate websites, extract information, and perform tasks just like a human would.
Outline 📋
- Installation
- Set up Required API Keys
- Running the OSS Operator Agent
- Customizing the Agent
- Conclusion
Installation 🛠️
Let's install the framework from source to get started:
# Clone the repository
git clone https://github.com/supercog-ai/agentic.git
# Change to the agentic directory
cd agentic
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install with browser-use dependencies
uv pip install -e ".[browser-use,dev]"
# Install Playwright browsers
playwright install
Set up Required API Keys 🔑
To use the OSS Operator Agent, you'll need to set up an API key for your chosen model:
-
Google Gemini API Key (Recommended):
- Sign up at Google AI Studio
- Create an API key
- We recommend using Gemini 2.0 Flash model which is optimized for browser automation tasks and has better context handling
-
OpenAI API Key (Alternative):
- Sign up at OpenAI
- Create an API key in your account settings
- Can be used with GPT-4 or other OpenAI models
You can set the key as an environment variable:
# For Gemini (recommended)
export GEMINI_API_KEY=your_gemini_key
# OR for OpenAI
export OPENAI_API_KEY=your_openai_key
Running the OSS Operator Agent 🚀
- Navigate to the examples directory:
cd examples
- Run the OSS operator agent:
python oss_operator.py
- The agent will start and you can interact with it through the command line. Try asking it to:
- Get the first 5 posts from Hacker News homepage
- Search for specific information on a website
- Fill out forms or perform actions on websites
Example interaction:
I am your open source Operator. What task would you like me to perform?
> get me the first 5 posts on hackernews homepage
The agent will:
- Open a browser window
- Navigate to Hacker News
- Extract the first 5 posts
- Return the results to you
Customizing the Agent 🎯
You can customize the agent by modifying the oss_operator.py
file:
- Choose Your Model: The agent supports different AI models:
agent = Agent(
name="OSS Operator",
model="gemini/gemini-2.0-flash", # Recommended for browser automation
# Other options:
# model="openai/gpt-4o"
# model="openai/gpt-4"
tools=[BrowserUseTool()]
)
- Use Your Browser: You can configure the agent to use your actual Chrome browser instance (with your cookies and state):
BrowserUseTool(
chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome', # MacOS
# For Windows: 'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
# For Linux: '/usr/bin/google-chrome'
)
Conclusion 🎉
The OSS Operator Agent demonstrates the power of combining browser automation with AI vision models. Using the Supercog Agentic Framework, you can create all kinds of sophisticated AI Agents just like the web automation agents that can understand and interact with websites in a human-like way.
Some exciting use cases:
- Automated web research
- Data collection and monitoring
- Form filling and submission
- Website testing
- Content extraction
- Automated workflows
To learn more about Supercog Agentic Framework:
- Visit the GitHub repository: https://github.com/supercog-ai/agentic
- Check out the documentation: https://supercog-ai.github.io/agentic/latest/
- Join the Discord community: https://discord.gg/EmPGShjmGu
Don't forget to star the repository if you find it useful! ⭐
Troubleshooting 🔧
If you encounter any issues:
- Browser Installation: Make sure Playwright is properly installed:
playwright install
API Key Issues: Verify your API keys are correctly set in the environment variables.
Model Selection: If you run into context limits, try using Gemini 2.0 Flash instead of GPT-4.
Browser Path: If using a custom Chrome instance, ensure the path is correct for your operating system.
Feel free to ask questions in the comments below or open an issue on GitHub if you need help!