This is a submission for the Open Source AI Challenge with pgai and Ollama
What I Built
Code Archeologist is an AI-powered application that analyzes Git repository histories to identify patterns in code evolution. It creates a "genetic tree" of your code's ancestry, provides refactoring suggestions, generates a codebase heatmap, commit activity timeline, contributor statistics, dependency graph, file change frequency, and integrates with issue tracking. Leveraging PostgreSQL extensions pgvector
and pgvectorscale
, along with Ollama for text embeddings, the application performs high-performance similarity searches and advanced AI-driven insights without the need for external vector databases.
Demo
Code Archeologist analyzes your Git repository history to identify patterns in code evolution.
Code Archeologist
Code Archeologist analyzes your Git repository history to identify patterns in code evolution. It creates a "genetic tree" of your code's ancestry, provides refactoring suggestions, generates a codebase heatmap, commit activity timeline, contributor statistics, dependency graph, file change frequency, and integrates with issue tracking.
Features
-
Genetic Tree: Visualize the ancestry of your codebase.
-
Refactoring Suggestions: Receive actionable recommendations to improve your code.
-
Codebase Heatmap: Identify hotspots and areas with high activity.
-
Commit Activity Timeline: Track commit patterns over time.
-
Contributor Statistics: Analyze contributions from different team members.
-
Dependency Graph: Visualize project dependencies.
-
File Change Frequency: Monitor how often files are modified.
-
Issue Integration: Link code changes with issue tracking systems.
-
Semantic Search in Commits: Find similar commits based on semantic meaning using vector embeddings.
-
Question Answering: Ask questions about your codebase and receive AI-generated answers.
-
Summarization: Get concise summaries of commit messages.
Features with AI Integration











How the Project Works
Code Archeologist utilizes a combination of PostgreSQL extensions and open-source AI models to deliver a seamless analysis experience:
The frontend built with Vue.js displays various visualizations such as genetic trees, heatmaps, timelines, and dependency graphs, providing users with insightful views of their codebase.
- Performance and Scalability
By leveraging PostgreSQL with pgvector
and pgvectorscale
, the application ensures efficient storage, rapid querying, and scalability to handle large datasets without relying on external vector databases.
How the Backend Works
The backend of Code Archeologist is built using Node.js and Express.js, interfacing with a PostgreSQL database enhanced with AI-specific extensions. Here’s an overview of its functionality:
-
Session Management
- Utilizes
express-session
to handle user sessions, ensuring secure and persistent interactions.
-
Database Initialization
- Connects to PostgreSQL database.
- Initializes the database schema, creating tables like
code_analysis
and commit_embeddings
.
- Ensures necessary extensions (
pgvector
, pgvectorscale
) are installed for vector operations and AI functionalities.
-
Data Processing
-
Fetching Data: Retrieves commits, contributors, issues, and dependencies from GitHub repositories.
-
Embedding Generation: Uses Ollama to generate vector embeddings for commit messages, storing them with
pgvector
for efficient similarity searches.
-
Indexing: Implements
pgvectorscale
with diskann
indexing to optimize search performance.
-
AI Integration
-
OpenAI & Ollama: Integrates OpenAI for generating completions and Ollama for creating text embeddings, facilitating features like refactoring suggestions and question answering.
-
Error Handling
- Implements robust error handling across all endpoints, ensuring meaningful responses and logging errors for debugging.
Technologies and Tools Used
-
Frontend:
-
Vue.js: JavaScript framework for building user interfaces.
-
Axios: Promise-based HTTP client for making API requests.
-
Cytoscape.js: Library for graph theory (network) data visualization.
-
Chart.js: Simple yet flexible JavaScript charting library.
-
Highlight.js: Syntax highlighting for code snippets.
-
QTip2: Advanced tooltips for enhanced user interactions.
-
D3.js: Data-driven documents for creating dynamic visualizations.
-
DOMPurify: Sanitizes HTML to prevent XSS attacks.
-
Backend:
-
Node.js: Server-side JavaScript runtime.
-
Express.js: Web framework for building API endpoints.
-
PostgreSQL: Relational database system.
-
vector: PostgreSQL extension for storing vector embeddings.
-
vectorscale: Extension for optimized vector similarity searches.
-
Octokit: GitHub API client for fetching repository data.
-
dotenv: Loads environment variables from a
.env
file.
-
express-session: Manages user sessions.
-
Winston: Logging library for capturing application logs.
-
Cors: Enables Cross-Origin Resource Sharing.
-
AI & Machine Learning:
-
Ollama: Generates text embeddings using open-source models.
-
OpenAI SDK: Facilitates AI-driven features like completions and question answering.
-
APIs
-
GitHub API: Accesses repository information.
-
OpenAI API: For generating summaries and answering questions.
Final Thoughts
Building Code Archeologist was a great experience that showed me just how powerful combining PostgreSQL extensions with AI tools can be. Using pgvector, pgvectorscale, and Ollama together made it possible to create a strong, scalable app that can handle complex searches and give useful insights. This project really boosted my appreciation for using open-source tools to build smart AI applications.
Thanks for considering my submission!