Hackerbot Hackathon
This past weekend, I participated in an AI and robotics hackathon hosted by an awesome company called Hackerbot where I got to hack on one of these awesome robots.
I had lofty ambitions but it turns out robotics is hard. Even the task of just pointing the robotic arm at an object (never mind picking it up) came down to the wire and was achieved only an hour before submissions were due.
Here is my demo of the app that uses Gemini 2.5 Pro to locate an object within an image and then point the robotic arm at that object. The code corresponding to the demo is here: hackerbot_chainlit.py
Why I now dislike Model Context Protocol
My biggest learning from the weekend is that I don't like Model Context Protocol (MCP) and will probably avoid using it in the future. For those not familiar with MCP, see the official MCP website:
MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.
I was very excited about the premise and spent many precious hours on Saturday trying to get MCP to work for the project. You can see my MCP server code here: hackerbot_mcp.
MCP Server: How do you know it works?
I was following the MCP quickstart guide. The Python version of the quickstart guide makes it unclear whether using uv
is mandatory to make the server work, or optional. Just in case, I had to set up a complete uv
project.
So you follow the quickstart guide and write some code, but how do you test it? The guide says you can run something like uv run weather.py
, but that doesn't actually test the functionality of your code. The only thing this tells you is that your MCP server can run at all, not how well it works. To test your logic, you must use an MCP client application.
MCP Client: Write or get off the shelf?
We were hacking in a Linux environment, so that immediately ruled out using Claude Desktop, the "official" MCP client, as that's not available on Linux.
The documentation "helpfully" points out that Linux users could build their own client, but the quickstart guide for that is not quick at all. It has 9 steps, some with dozens of lines of code. In addition, if you have to write your own MCP client, you have to implement it for every LLM provider you're targeting separately, which defeats the point of MCP. At that point you might as well just implement your core logic for each provider separately. If I were to go down that path, I'd spend the entire hackathon writing the MCP client. I didn't have time for this.
Instead I spent hours frantically going down this list of MCP clients, trying out increasingly sketchy Chinese ones out of desperation to finish the hackathon on time. But even having a working and feature-rich MCP client is not enough (for the record the best one was AIaW because it has ARM binaries that work on Raspberry Pi). Now you have to actually make your MCP code work.
Debugging MCP
This was my debugging process trying to get my MCP code to work:
- Ask the LLM if it has access to the tool I exposed with MCP.
- Ask the LLM to use the tool.
- Figure out what broke.
- Change the Python file.
- Go into the client's MCP settings and turn the MCP server off and then on again.
- Repeat.
Because I was using an off-the-shelf client, I was at its mercy. I didn't see any logs. I didn't know if the server was sending error messages. I didn't know if the client was having issues talking to the server. I didn't know how exactly the client was exposing the server's resources to each LLM. Finally, I didn't have control over how much context was shared with the LLM. The client (reasonably) tries to share the entire chat history with the LLM, but if that history includes multiple images, the LLM starts throwing rate limit errors.
Moving on from MCP
After a few hours of the above, I had to take a walk and rethink my choices. With MCP, I felt like I was hacking with my hands tied behind my back. Even though I love the idea of an open standard for creating tools and resources in an LLM-agnostic manner, the reality was a lot harder and uglier than I expected.
Ultimately I decided to abandon MCP and to commit to just the Gemini model family for the rest of the hackathon. I still wanted a chat-like interface, so I settled on an awesome framework called Chainlit to help me with that. Chainlit gives you complete control over the function callbacks and the LLM API calls. This turned out to be very helpful as I no longer had to send the entire chat history to the LLM, but could still display the chat history to the user. The LLM doesn't need any context to locate a rubber ducky in the current image.
Winning the hackathon
Afterwards I was able to focus on actual image processing and robotics stuff. Another important piece of learning for me was that when you ask Gemini to locate an object within the image, ask it to return normalized (0-1) coordinates rather than pixels. This was another area where I got stuck for a couple of hours, as LLMs kept returning nonsensical pixel coordinates. Asking for normalized 0-1 coordinates worked perfectly and they're easy to convert back to pixels in code.
With all of these learnings and effort I was finally able to put together a working application and win the hackathon!
A huge thank you to Ian and Allen at Hackerbot for hosting the hackathon and letting us hack on their amazing robots! I learned a ton and am looking forward to the next one!
you just need 1 MCP tool for CLI and you've got all the Apis and access you need 🤷