I Tested Gemini CLI and Other Top Coding Agents - Here's What I Found
Emmanuel Mumba

Emmanuel Mumba @therealmrmumba

About: 👨‍💻 Emmanuel Mumba | Tech Innovator | Expert in web development, graphic design, and tech-driven innovations.

Joined:
Sep 19, 2024

I Tested Gemini CLI and Other Top Coding Agents - Here's What I Found

Publish Date: Jun 27
55 8

If you’ve been paying attention to the rise of AI tools for developers, you’ve probably seen Gemini CLI show up in conversations across dev Twitter, Hacker News, and GitHub. It’s Google’s latest command-line coding assistant designed to bring smart, context-aware AI directly into your terminal.

But in a space filled with open-source agents and AI dev tools, how does Gemini CLI actually compare?

To find out, I spent time testing Gemini CLI alongside other popular coding agents in 2025 — including Claude CLI, Cody CLI, GPT Engineer, and several more. I ran each one through a set of practical tasks: debugging broken scripts, generating tests, refactoring messy code, and spinning up simple API scaffolds.

Some tools were shockingly capable. Others didn’t even make it past install.

Why CLI Coding Agents Matter

CLI agents fill a sweet spot between full-blown IDE copilots and browser-based code generators. They’re lighter, faster, and integrate directly into workflows that many developers already use especially those working with Docker, Git, or server-side stacks.

They let you automate code generation, debugging, scaffolding, and even task planning all from your terminal. If you love working in tmux, Vim, or just don’t want to leave your terminal every 10 seconds, these tools might be the future.

How I Tested Them

To make things fair, I tested each tool under similar conditions:

  • OS: Windows 11
  • CPU: Intel Core i7 (13th Gen)
  • GPU: NVIDIA RTX 3060 (Laptop)
  • RAM: 32GB DDR5
  • Environment: VS Code + Git + Windows Terminal
  • Extras: WSL2 (Ubuntu), Docker Desktop, Python 3.11, Node.js LTS

Language Focus:

  • Python
  • JavaScript

Use Cases:

  1. Scaffold a simple CRUD API
  2. Generate unit tests for a function
  3. Debug a broken script
  4. Refactor messy code
  5. Ask natural language questions about a local project

Evaluation Criteria:

  • Setup time and stability
  • Quality of code output
  • Ease of use (commands, prompts, interface)
  • Context awareness (does it understand my project?)
  • Usefulness in actual dev workflows

I tested these agents in real-world repositories, not demo projects, and documented how each performed under these constraints.

Quick Overview of the Agents

Gemini CLI

What it does:

An open-source, terminal-based AI assistant powered by Google’s Gemini models. You can use it for code generation, debugging, shell commands, writing documentation, problem-solving, and general AI-assisted workflows—all without leaving the terminal .

Setup:

  • Install via Homebrew/Linux package manager (supports macOS and Linux-WSL).
  • Requires a free Gemini Code Assist account.
  • No Docker or heavy dependencies—quick and clean setup .

Performance:

  • Fast, versatile responses to coding prompts and shell questions.
  • Ideal for generating code snippets, writing tests, fixing bugs, or running research commands.
  • Supports local customization and prompt chaining with CLI flags.

Pros:

  • Brings Google-grade LLM directly into your terminal
  • Clean, extensible interface (Apache 2.0 license)
  • Integrates with Gemini Code Assist for shared context and productivity

Cons:

  • Requires online access and Gemini account
  • Some features are still being extended as part of the recent launch.

Verdict:

An excellent modern alternative to AutoCode. Gemini CLI feels polished, powerful, and clearly designed for terminal-loving developers. If you're looking for a cutting-edge, actively maintained CLI agent, this is one of the best picks available today.

Claude Code CLI

Image description

What it does:

A terminal-based AI coding assistant powered by Claude 3 models from Anthropic. Designed for local or project-specific tasks, it can write, explain, debug, and refactor code with an emphasis on context depth and safe output.

Setup:

Install via pip or npm (depending on the community version). Requires an Anthropic API key. Lightweight, with no Docker or extra dependencies.

Performance:

Claude Code CLI shines when working with larger code contexts. It can handle full files and understand complex logic chains across multiple files better than most agents. Especially impressive when used in monorepos or messy legacy projects.


Pros:

  • Exceptional at maintaining context across long files
  • Outputs are safe, readable, and explainable
  • Fast responses with little hallucination
  • Great for pair-programming or reviewing legacy code

Cons:

  • Requires Anthropic API key (not free-tier friendly)
  • No agentic memory or step-by-step task planner
  • Still under rapid development — features vary by version

Verdict:

Claude Code CLI feels like the smartest pair-programmer in your terminal. If you’re dealing with tricky refactoring, legacy code, or need confident multi-file reasoning, this one stands out — especially over models that fail at long-range understanding.

Smol Developer

What it does:

Smol Developer is a minimalist CLI agent built for speed. You give it a prompt, and it replies with code, explanations, or file suggestions — no agents, memory, or complex UI.

Setup:

Ridiculously easy. Clone the repo, install dependencies, and run. No Docker, no API key hell.

Performance:

Handled basic prompts like “build a FastAPI CRUD app” or “add login to this Flask app” without fuss. It doesn’t keep project-wide memory, but it’s fast, useful, and rarely breaks.

Pros:

  • Fast and responsive
  • Easy to install
  • Doesn’t try to do too much

Cons:

  • No persistent memory
  • Can’t reason across files

Verdict:

A great assistant for generating quick snippets or files. Low learning curve, high utility.

OpenHands (formerly OpenDevin)

What it does:

Tries to be your full autonomous developer — planning tasks, executing steps, and writing code while “thinking out loud.”

Setup:

Painful on the first go. Needs Docker, specific Python versions, and system resources. Often ran into container issues.

Performance:

Impressive ambition. It generated full file trees, discussed its logic, and attempted multi-step workflows. But it failed more than it succeeded.

Pros:

  • Visionary design
  • Multi-step reasoning
  • Terminal GUI is slick

Cons:

  • Setup is fragile
  • Performance is inconsistent
  • Needs lots of system resources

Verdict:

Super promising, but not stable enough for everyday work — yet.

Continue CLI

What it does:

A CLI companion to the Continue IDE plugin. Acts like a GPT-powered REPL for your codebase.

Setup:

Straightforward. Just install and run.

Once installed, you’ll typically use commands like:

Performance:

Helpful for asking questions, writing short functions, and explaining bugs. Lacks deep file context but made up for it with speed.

Pros:

  • Clean interface
  • Useful for quick Q&A
  • Lightweight

Cons:

  • Limited project awareness
  • Doesn’t edit files automatically

Verdict:

A solid "Copilot for your terminal." Great for small questions and guided coding.

Devika CLI

What it does:

Takes in prompts, plans sub-tasks, and builds projects in stages. Explains what it's doing at every step.

Devika Installation

Devika is a local AI coding agent (like an autonomous developer). Here's the breakdown of what the steps are doing:

  • Clone the repo:
  • Set up Python virtual environment with uv:
  • Install Playwright dependencies (for browsing):
  • Run the backend:
  • Run the frontend:
  • Access via browser:http://127.0.0.1:3001

Takes a bit of time to configure but not too bad.

Performance:

Did really well with simple web apps. For larger or poorly scoped prompts, it hallucinated or got stuck. But watching it “think” was interesting.

Pros:

  • Sub-task planning
  • Clear explanations
  • Can generate full apps

Cons:

  • Prone to errors
  • Doesn’t validate well
  • Output can be noisy

Verdict:

Good for rapid prototyping and experimentation. Not great for polishing or maintaining code.

Cody CLI

What it does:

Sourcegraph’s CLI agent that understands your actual codebase. You can ask it things like "Where is this class used?" or “Refactor this function.”

Setup:

Tied closely to Sourcegraph’s indexing tools. If you already use Sourcegraph, it’s a no-brainer.

Performance:

The most “intelligent” in terms of context. It answers based on actual usage and file relations. But limited if you don’t integrate fully with Sourcegraph.

Pros:

  • Deep code awareness
  • Accurate answers
  • Excellent search

Cons:

  • Requires Sourcegraph
  • Less helpful outside its ecosystem

Verdict:

Incredible for teams using Sourcegraph. Niche, but powerful.

GPT Engineer

What it does:

Give it a spec, and it builds a project from scratch. Includes thought-process logs and file-by-file explanations.

Setup:

Needs a proper Python setup and API key.

Performance:

Excellent at structured prompts. You say, “Build a todo app in Flask with login,” and it goes to work, creating files and comments.

Pros:

  • Easy to iterate
  • Explains reasoning
  • Customizable configs

Cons:

  • Slow on big prompts
  • Needs polishing
  • Doesn’t validate or test

Verdict:

Great for MVPs or idea exploration. Review all output manually.

ChatDev

What it does:

An AI “company” where roles like CEO, CTO, and Dev interact to build software. Yes, really.

Setup:

Hefty, but well-documented.

Performance:

More of a toy. Watching AI roles debate over architecture is entertaining, but results are inconsistent and often verbose.

Pros:

  • Unique idea
  • Fun to watch
  • Multi-agent logic

Cons:

  • Slow
  • Prone to weird outputs
  • Not usable for serious projects

Verdict:

Best as a novelty. Not for production work.

Comparison Table

Comparison Table

My Top Picks

What Surprised Me

  • Most agents hallucinate less when prompts are very specific.
  • Tools that don’t try to do everything often performed best.
  • The CLI UX matters more than I thought — clean logs and structured steps make a huge difference.

The Current State of CLI AI Agents

Are they ready to replace your full dev environment? No.

But are they useful right now? Absolutely.

For scaffolding, explanation, or small automation tasks, CLI agents are already useful. For large refactors or full-stack builds — they’re getting there, but still need supervision.

What I'd Like to See Next

  • Offline/local LLM support
  • Smarter file editing (not just generation)
  • Better handling of multi-file projects
  • Git-aware workflows (e.g., generate commit messages, suggest PRs)

Conclusion

CLI coding agents are no longer just a concept — they’re real, functional, and in some cases, pretty amazing. While most of them aren’t “set and forget” just yet, they can absolutely help reduce your mental load and speed up development.

Give them a try, especially if you spend a lot of time in the terminal. Just keep your expectations grounded — and your git diff clean.

Got a favorite CLI coding agent I missed? Let me know — I’m always down to test another one.

Comments 8 total

  • Lynn Mikami
    Lynn MikamiJun 27, 2025

    This is cool!

    Gemini CLI, Claude Code... I have been waiting for a review of these CLI coding tools, and, Here, It, Is.

    Good work, Emmanuel!

    • Emmanuel Mumba
      Emmanuel MumbaJun 27, 2025

      Thanks a lot! I really appreciate that. These tools are evolving fast, and I figured it was time someone put them head-to-head in real-world tasks. Glad it was helpful

  • Ayama
    AyamaJun 27, 2025

    Amazing list!

  • Gary Svenson
    Gary SvensonJun 27, 2025

    Very professionally done.

  • Kristen
    KristenJun 27, 2025

    Gemini CLI has a generous free tier, I am gonna use that!

  • Ananya Balehithlu
    Ananya BalehithluJun 27, 2025

    Really nice list! The only thing missing I think is the pricing (Yes I am a student so I am concerned about that)

    Cursor has free student tier for Pro but it is not available in India.

    Claude Code is super expensive.

    Gemini CLI has free tier that is good enough for me to test it out!

Add comment