This is a submission for the Bright Data AI Web Access Hackathon
🔧 What I Built
BrightCoding AI Assistant is a full-stack autonomous coding partner that keeps your LLM reasoning in sync with the latest framework documentation—no more stale outputs or manual scraping. It features:
- Two-phase LLM pipeline (intent detection + code generation) powered by Perplexity & OpenAI
- Live documentation ingestion via BrightDataRecursiveRequester, embedding every page into pickled vector DBs
- Real-time project scaffolding, code snippet generation, and automatic error diagnosis
- Automatic error rectification for imports, API calls, tests and other framework-specific issues
- React/Vite chat UI with framework selector, “request framework” modal, session history and stoppable generations
- Lightweight MCP server for any editor (Cursor, Windsurf, VS Code, etc.)—on-demand framework loading, .pkl downloads, background tasks
- Built from real pain with outdated LLM knowledge and clunky scraping, this tool ensures your AI partner always reasons over the newest APIs and best practices.
- I have worked with the cursor's docs feature, but it was still inaccurate and had many errors while generating the code. So, I have come with this idea.
- Due to limited API credits, access is temporarily restricted exclusively to judges. I have mailed the password to noah@brightdata.com
- You can request any kind of framework you want to work with(eg. React, FastAPI, Brightdata, etc)
- You can download the vector embedded .pkl files for the requested frameworks
Demo
Live link:Brightcoder
Git repository link:Github
Youtube Video link:Youtube
How I Leveraged Bright Data’s Tools to Discover, Access, Extract, and Interact with Live Documentation
Here’s how we lean on Bright Data at every step:
🔎 Discover:
We use Bright Data’s recursive crawler to automatically map out every page in a framework’s docs—no manual sitemaps or headless-browser hacks.
🌐 Access:
Every request passes through Bright Data’s proxy API, transparently handling rate limits, geo-blocks and bot defenses so we always receive raw HTML.
📥 Extract:
With Bright Data reliably fetching full HTML, we strip out navigation, scripts and footers and pull clean titles and body text for embedding.
🤖 Interact:
Those Bright Data–sourced contents feed directly into our OpenAI embedding pipeline, powering real-time similarity searches that our LLM uses to generate up-to-date code.
PERFROMACE IMPROVEMENTS
By integrating live documentation into our AI pipeline—rather than relying on a static, pre-training snapshot—we unlock a host of benefits over traditional approaches:
🔍 Far more accurate, up-to-date code:
Never generates deprecated APIs or obsolete patterns, because every similarity search and generation step uses the freshest docs.
🤖 Dramatically fewer hallucinations:
When you ask “How do I call this new v2 endpoint?”, the agent pulls the real v2 spec instead of guessing from stale examples.
🚀 Instant adaptation to breaking changes:
As soon as a library ships a major update, our scraper-embed cycle ingests those pages; users immediately get code for the new APIs.
⚡ Faster dev feedback loops:
No more context-switching between chat and Google—your next code snippet or bug fix is pre-validated against live docs, slashing edit/test cycles.
🌐 Private & multi-repo support:
The same Bright Data–powered pipeline can ingest internal or partner docs behind VPNs or firewalls, keeping proprietary APIs in-scope.
📂 Millisecond-scale retrieval:
Even with thousands of pages embedded, vector-based similarity search returns the most relevant passages in real time.
🤝 Team collaboration:
Shared .pkl stores ensure everyone on your team is coding against the same, up-to-date knowledge base—no onboarding friction.
Together, these enhancements mean your AI coding partner is not just “smarter” but truly live—always reflecting the newest best practices, edge-case details, and private APIs you depend on.
Github is broken