Building Production-Ready RAG Systems with Gaia and Weaviate
Harish Kotra (he/him)

Harish Kotra (he/him) @harishkotra

About: I do/build: Chatbots | Hackathons | WordPress | Developer Events | Web & Mobile Apps

Location:
Hyderabad, India
Joined:
Sep 14, 2018

Building Production-Ready RAG Systems with Gaia and Weaviate

Publish Date: Aug 18
0 0

TL;DR

This post demonstrates how to build a production-ready Retrieval Augmented Generation (RAG) system using:

  • 🌐 Gaia: Decentralized AI infrastructure with OpenAI-compatible APIs
  • 🗄️ Weaviate: Advanced vector database replacing traditional solutions
  • 📊 Real-World Data: Live integration with Wikipedia, ArXiv, GitHub, and news sources

Key Result: A complete RAG pipeline that processes 50+ documents, performs semantic search, and generates responses using decentralized AI infrastructure.


🎯 Why This Matters

Traditional RAG systems rely on centralized providers like OpenAI, creating single points of failure and vendor lock-in. This architecture demonstrates:

  1. Decentralization: Use public Gaia nodes instead of centralized APIs
  2. Flexibility: Replace built-in vector stores with specialized solutions
  3. Real-World Data: Process live data from multiple internet sources
  4. Production Ready: Environment configuration, health monitoring, error handling

🧠 Understanding the Platforms

Gaia: Decentralized AI Infrastructure

What is Gaia?
Gaia is a decentralized infrastructure for AI agents that provides OpenAI-compatible APIs while running on distributed nodes.

Key Features:

  • OpenAI Compatibility: Drop-in replacement for OpenAI APIs
  • Decentralized: No single point of failure
  • Model Flexibility: Support for Llama, Qwen, Gemma, and other open models

Example Gaia Node:

https://0x299eae67ba6bbae8d61faad2d70115dc5a6855c8.gaia.domains/v1
Enter fullscreen mode Exit fullscreen mode

Example Gaia Node Config:

{
  "address": "",
  "chat": "https://huggingface.co/gaianet/gemma-3-4b-it-GGUF/resolve/main/gemma-3-4b-it-Q5_K_M.gguf",
  "chat_batch_size": "128",
  "chat_ctx_size": "8192",
  "chat_name": "Gemma-3.4B-IT",
  "chat_ubatch_size": "128",
  "context_window": "1",
  "description": "Gaia node running with Gemma-3.4B-IT model without any knowledgebase.",
  "domain": "gaia.domains",
  "embedding": "https://huggingface.co/gaianet/gte-Qwen2-1.5B-instruct-GGUF/resolve/main/gte-Qwen2-1.5B-instruct-f16.gguf",
  "embedding_batch_size": "8192",
  "embedding_collection_name": "default",
  "embedding_ctx_size": "8192",
  "embedding_name": "gte-Qwen2-1.5B-instruct-f16",
  "embedding_ubatch_size": "8192",
  "llamaedge_chat_port": "9075",
  "llamaedge_embedding_port": "9076",
  "llamaedge_port": "8086",
  "prompt_template": "gemma-3",
  "qdrant_limit": "1",
  "qdrant_score_threshold": "0.5",
  "rag_policy": "system-message",
  "rag_prompt": "Use the following information to answer the question.\n----------------\n",
  "reverse_prompt": "",
  "snapshot": "",
  "system_prompt": "You're a helpful assistant"
}
Enter fullscreen mode Exit fullscreen mode

5. RAG Pipeline Implementation

Complete RAG flow with context integration:

Weaviate: Advanced Vector Database

What is Weaviate?
Weaviate is an open-source vector database designed for AI applications, offering advanced features beyond simple vector storage.

Why Choose Weaviate Over Qdrant?

Gaia's Qdrant vs Weaviate Comparison

Weaviate Vectorizer Options:

# Local embeddings (no API key needed)
VECTORIZER_MODULE=text2vec-transformers

# OpenAI embeddings
VECTORIZER_MODULE=text2vec-openai
OPENAI_API_KEY=your-key

# Cohere embeddings
VECTORIZER_MODULE=text2vec-cohere
COHERE_API_KEY=your-key
Enter fullscreen mode Exit fullscreen mode

🏗️ System Architecture

System Architecture


🛠️ Implementation Deep Dive

1. Start Weaviate with Docker Compose

Create a docker-compose.yml file with this production-ready configuration:

---
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.30.0
    ports:
    - 8080:8080
    - 50051:50051
    restart: on-failure:0
    environment:
      TRANSFORMERS_INFERENCE_API: 'http://t2v-transformers:8080'
      QNA_INFERENCE_API: 'http://qna-transformers:8080'
      OPENAI_APIKEY: $OPENAI_APIKEY
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
      ENABLE_MODULES: 'text2vec-transformers,qna-transformers,generative-openai'
      CLUSTER_HOSTNAME: 'node1'
  t2v-transformers:
    image: cr.weaviate.io/semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    environment:
      ENABLE_CUDA: '0'
  qna-transformers:
    image: cr.weaviate.io/semitechnologies/qna-transformers:distilbert-base-uncased-distilled-squad
    environment:
      ENABLE_CUDA: '0'
Enter fullscreen mode Exit fullscreen mode

Start Weaviate:

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

This configuration provides:

  • Latest Weaviate: Version 1.30.0 with latest features
  • Multiple Vectorizers: text2vec-transformers + QnA transformers
  • Production Ready: Proper restart policies and persistence
  • GPU Support: Set ENABLE_CUDA: '1' if you have NVIDIA GPU
# Gaia Node Configuration
GAIA_BASE_URL=https://0x299eae67ba6bbae8d61faad2d70115dc5a6855c8.gaia.domains/v1
GAIA_API_KEY=test-key
GAIA_MODEL_NAME=Gemma-3.4B-IT

# Weaviate Configuration
WEAVIATE_HOST=localhost
WEAVIATE_PORT=8080
WEAVIATE_USE_AUTH=false

# Vector Configuration
VECTORIZER_MODULE=text2vec-transformers
DEFAULT_COLLECTION_NAME=RealWorldKnowledgeBase

# Generation Parameters
MAX_TOKENS=300
TEMPERATURE=0.7
SEARCH_LIMIT=3

# Performance Tuning
BATCH_SIZE=100
CONNECTION_TIMEOUT=30
DEBUG=true
Enter fullscreen mode Exit fullscreen mode

2. Environment Configuration

The system uses a comprehensive .env configuration for production readiness:

The system fetches real-world data from multiple sources:

Wikipedia Integration

class WikipediaSource(DataSource):
    def fetch_data(self, topics: List[str]) -> List[Dict[str, Any]]:
        for topic in topics:
            # Fetch full article content
            params = {
                'action': 'query',
                'format': 'json', 
                'titles': topic,
                'prop': 'extracts',
                'explaintext': True
            }
            # Process and chunk content
            chunks = self.chunk_text(content, max_length=1500)
Enter fullscreen mode Exit fullscreen mode

ArXiv Research Papers

class ArXivSource(DataSource):
    def fetch_data(self, search_terms: List[str]) -> List[Dict[str, Any]]:
        for term in search_terms:
            params = {
                'search_query': f'all:{term}',
                'sortBy': 'submittedDate',
                'sortOrder': 'descending'
            }
            # Parse XML response and extract metadata
Enter fullscreen mode Exit fullscreen mode

GitHub Documentation

class GitHubSource(DataSource):
    def fetch_data(self, repos: List[str]) -> List[Dict[str, Any]]:
        for repo in repos:
            # Fetch README via GitHub API
            readme_url = f"https://api.github.com/repos/{repo}/readme"
            # Decode base64 content and process
Enter fullscreen mode Exit fullscreen mode

3. Data Source Integration

The system fetches real-world data from multiple sources:

Advanced schema with nested properties for rich metadata:

collection = weaviate_client.collections.create(
    name="RealWorldKnowledgeBase",
    vectorizer_config=Configure.Vectorizer.text2vec_transformers(),
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="content", data_type=DataType.TEXT),
        Property(name="source", data_type=DataType.TEXT),
        Property(name="category", data_type=DataType.TEXT),
        Property(
            name="metadata", 
            data_type=DataType.OBJECT,
            nested_properties=[
                Property(name="url", data_type=DataType.TEXT),
                Property(name="author", data_type=DataType.TEXT),
                Property(name="published", data_type=DataType.TEXT),
                Property(name="difficulty", data_type=DataType.TEXT),
                Property(name="topic", data_type=DataType.TEXT),
                Property(name="tags", data_type=DataType.TEXT_ARRAY),
                Property(name="fetched_at", data_type=DataType.TEXT),
                Property(name="chunk_index", data_type=DataType.INT),
                Property(name="total_chunks", data_type=DataType.INT),
            ]
        ),
    ]
)
Enter fullscreen mode Exit fullscreen mode

4. Weaviate Schema Design

Advanced schema with nested properties for rich metadata:

Complete RAG flow with context integration:

def rag_query(self, query: str, collection_name: str = None) -> Dict[str, Any]:
    # Step 1: Vector search in Weaviate
    relevant_docs = self.search_knowledge(query, collection_name)

    # Step 2: Prepare context for LLM
    context_parts = []
    for doc in relevant_docs:
        context_parts.append(f"Title: {doc['title']}\nContent: {doc['content']}")
    context = "\n\n".join(context_parts)

    # Step 3: Generate response with Gaia node
    response = self.llm_client.chat.completions.create(
        model="Gemma-3.4B-IT",
        messages=[
            {"role": "system", "content": f"Use this context: {context}"},
            {"role": "user", "content": query}
        ],
        max_tokens=self.config.MAX_TOKENS,
        temperature=self.config.TEMPERATURE
    )

    return {
        "query": query,
        "response": response.choices[0].message.content,
        "sources": relevant_docs,
        "model_used": "Gemma-3.4B-IT"
    }
Enter fullscreen mode Exit fullscreen mode

📊 Real-World Results

Performance Metrics

Here are the actual results from our demo run:

🎉 Total documents collected: 57
  - 📖 Wikipedia: 7 articles (chunked into multiple docs)
  - 🔬 ArXiv: 6 research papers
  - 📂 GitHub: 37 documentation chunks
  - 📰 News: 6 recent articles

⏱️ Processing Time: 92.0 seconds (quick mode)
🔍 Search Performance: ~50ms per query
🤖 Generation Speed: ~2-5 seconds per response
💾 Storage: 63,479 characters across 57 documents
📊 Average document size: 1,113 characters
Enter fullscreen mode Exit fullscreen mode

Sample Query Results

Here are real responses from our system:

Query: "What are the latest developments in large language models?"

Response: Based on the provided context, here's a summary of recent developments in large language models:

  • GPT Models: The largest and most capable LLMs are generative pretrained transformers (GPTs), like ChatGPT, Gemini, and Claude, which are widely used in generative chatbots.
  • Fine-tuning & Prompt Engineering: LLMs can be adapted for specific tasks through fine-tuning or guided by prompt engineering.
  • Integration with other tools: Semantic Kernel integrates with LLMs from OpenAI, Azure OpenAI, Hugging Face, and Nvidia. Weaviate also supports Cohere Multilingual ML Models.

The documents don't provide a comprehensive list of all the latest developments, but this captures the key trends highlighted in the given information.

Sources Used:

  1. Large language model (Wikipedia) - Category: encyclopedia
  2. weaviate Documentation (Part 6) - Category: documentation
  3. semantic-kernel Documentation (Part 2) - Category: documentation

Query: "How does retrieval augmented generation work?"

Response: Here's how retrieval-augmented generation (RAG) works, based on the provided context:

  1. Collect and Embed Documents: You gather text documents related to a specific topic or domain.
  2. Create Feature Vectors: For each document (or section of a document), you use a deep learning network to create a "feature vector" – often called an "embedding." This vector represents the meaning of the text in a numerical format.
  3. Store in a Vector Database: These feature vectors are stored in a special database designed for quickly searching through large amounts of data based on similarity.
  4. User Prompt: When a user asks a question or provides a prompt, the same process is applied to that prompt – it's converted into a feature vector.
  5. Retrieve Relevant Documents: The vector database searches for documents whose feature vectors are most similar to the prompt's feature vector. This identifies the most relevant pieces of information.
  6. Add to Context: These retrieved documents are then added to what's called the "context window" – essentially, they become part of the input provided to a large language model (LLM).
  7. Generate Response: The LLM uses this expanded context to generate a response to the user's original prompt.

Sources Used:

  1. Vector database (Part 2) - Category: encyclopedia
  2. FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation - Category: research
  3. Controlling the false discovery rate under a non-parametric graphical dependence model - Category: research

Query: "How do I use the OpenAI API?"

Response: Here's how you can use the OpenAI API with the Python library, based on the provided documentation:

1. Installation:

pip install openai
Enter fullscreen mode Exit fullscreen mode

2. Basic Usage (Text Generation):

import openai

client = openai.OpenAI()
response = client.chat.completions.create(model="gpt-3.5-turbo")
print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

3. Handling Errors:

import openai

client = openai.OpenAI()
try:
    client.fine_tuning.jobs.create(model="gpt-4o", training_file="file-abc123")
except openai.APIConnectionError as e:
    print("The server could not be reached")
    print(e.__cause__)
except openai.APIStatusError as e:
    print(f"API Error: {e.status_code}")
    print(e.response)
Enter fullscreen mode Exit fullscreen mode

Sources Used:

  1. openai-python Documentation (Part 1) - Category: documentation
  2. openai-python Documentation (Part 11) - Category: documentation
  3. openai-python Documentation (Part 19) - Category: documentation

Data Source Statistics

📋 Categories:
  documentation: 37 documents
  encyclopedia: 7 documents
  metadata: 1 documents
  research: 6 documents
  tech_news: 6 documents

🌐 Sources:
  arxiv: 6 documents
  collection: 1 documents
  github: 37 documents
  rss: 6 documents
  wikipedia: 7 documents

💾 Weaviate Collection:
  Collection name: RealWorldKnowledgeBase
  Documents in collection: 57
  Vectorizer: text2vec-transformers
Enter fullscreen mode Exit fullscreen mode

🎯 Production Use Cases

1. AI Research Assistant

Scenario: Researchers need up-to-date information about AI developments

Data Sources: ArXiv papers, Wikipedia articles, GitHub repositories
Query Examples:

  • "What are the latest developments in retrieval augmented generation?"
  • "How do transformer architectures work?"
  • "What are the current challenges in LLM training?"

2. Technical Documentation Helper

Scenario: Developers need help with API integration and implementation

Data Sources: GitHub READMEs, API documentation, technical guides
Query Examples:

  • "How do I integrate OpenAI API with my application?"
  • "What are the best practices for vector database setup?"
  • "How to implement RAG with Weaviate?"

3. News and Trends Analyzer

Scenario: Businesses need insights into industry developments and market trends

Data Sources: TechCrunch, Hacker News, AI News feeds, industry reports
Query Examples:

  • "What are the recent AI funding rounds and acquisitions?"
  • "What companies are leading in AI innovation?"
  • "What are the current regulatory challenges for AI?"

4. Educational Content Generator

Scenario: Educators and content creators need accurate, well-sourced explanations

Data Sources: Wikipedia, academic papers, documentation, tutorials
Query Examples:

  • "Explain machine learning to beginners"
  • "What is the difference between supervised and unsupervised learning?"
  • "How do neural networks process information?"

🔧 Technical Implementation Details

Available Models on Our Node:

  • Gemma-3.4B-IT: Google's instruction-tuned model (3.4B parameters)
  • gte-Qwen2-1.5B-instruct-f16: Qwen-based model optimized for efficiency (1.5B parameters)

Actual Configuration from Our Demo:

🔧 Current Configuration:
  Gaia URL: https://0x299eae67ba6bbae8d61faad2d70115dc5a6855c8.gaia.domains/v1
  Weaviate: localhost:8080
  Collection: MyKnowledgeBase
  Vectorizer: text2vec-transformers
  Max Tokens: 300
  Temperature: 0.7
📋 Available models: ['Gemma-3.4B-IT', 'gte-Qwen2-1.5B-instruct-f16']
Enter fullscreen mode Exit fullscreen mode

Model Performance Analysis

Our demo showcases two different models available on the Gaia node:

Gemma-3.4B-IT (Google)

  • Size: 3.4 billion parameters
  • Type: Instruction-tuned model
  • Strengths: Excellent for conversational AI and instruction following
  • Performance: ~2-5 seconds per response (observed in demo)
  • Use Cases: General Q&A, educational content, technical explanations
  • Quality: Provides detailed, well-structured responses as seen in our examples

gte-Qwen2-1.5B-instruct-f16 (Alibaba)

  • Size: 1.5 billion parameters
  • Type: Instruction-tuned with 16-bit precision
  • Strengths: Fast inference, good multilingual support
  • Performance: ~1-2 seconds per response
  • Use Cases: Quick responses, batch processing, resource-constrained environments

Data Processing Pipeline

Text Chunking Strategy:

def chunk_text(self, text: str, max_length: int = 1500) -> List[str]:
    # Split by sentences to maintain context
    sentences = re.split(r'(?<=[.!?])\s+', text)
    chunks = []
    current_chunk = ""

    for sentence in sentences:
        if len(current_chunk) + len(sentence) <= max_length:
            current_chunk += " " + sentence if current_chunk else sentence
        else:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = sentence

    return chunks
Enter fullscreen mode Exit fullscreen mode

Metadata Extraction:

  • Source tracking: Wikipedia, ArXiv, GitHub, RSS
  • Category classification: encyclopedia, research, documentation, news
  • Timestamp tracking: When content was fetched
  • Author information: Where available
  • Difficulty levels: beginner, intermediate, advanced

🚀 Production Deployment Considerations

Scaling the System

Horizontal Scaling Options:

  1. Multiple Gaia Nodes: Load balance across different nodes
gaia_nodes = [
    "https://node1.gaia.domains/v1",
    "https://node2.gaia.domains/v1", 
    "https://node3.gaia.domains/v1"
]
# Implement round-robin or weighted distribution
Enter fullscreen mode Exit fullscreen mode
  1. Weaviate Clustering: Scale vector operations
# docker-compose.yml for cluster
services:
  weaviate-node-1:
    image: semitechnologies/weaviate:1.23.7
    environment:
      CLUSTER_HOSTNAME: 'node1'
  weaviate-node-2:
    image: semitechnologies/weaviate:1.23.7
    environment:
      CLUSTER_HOSTNAME: 'node2'
Enter fullscreen mode Exit fullscreen mode
  1. Data Source Distribution: Parallel fetching
# Async data fetching
async def fetch_all_sources():
    tasks = [
        fetch_wikipedia_async(topics),
        fetch_arxiv_async(search_terms),
        fetch_github_async(repos),
        fetch_rss_async(feeds)
    ]
    results = await asyncio.gather(*tasks)
    return flatten(results)
Enter fullscreen mode Exit fullscreen mode

Security and Authentication

Production Security Checklist:

Environment Variables: Never commit API keys
Weaviate Authentication: Enable for production
Rate Limiting: Implement client-side throttling
Input Validation: Sanitize user queries
Network Security: Use HTTPS/TLS encryption
Access Control: Implement user permissions

# Production Weaviate with auth
WEAVIATE_USE_AUTH=true
WEAVIATE_API_KEY=your-secure-production-key
Enter fullscreen mode Exit fullscreen mode

Monitoring and Observability

Health Check Implementation:

def health_check(self) -> Dict[str, Any]:
    health = {
        "timestamp": time.time(),
        "gaia": {"status": "unknown", "models": []},
        "weaviate": {"status": "unknown", "collections": []},
        "overall": "unknown"
    }

    # Test Gaia connection
    try:
        models = self.llm_client.models.list()
        health["gaia"]["status"] = "healthy"
        health["gaia"]["models"] = [m.id for m in models.data]
    except Exception as e:
        health["gaia"]["status"] = f"error: {e}"

    # Test Weaviate connection
    try:
        is_ready = self.weaviate_client.is_ready()
        if is_ready:
            collections = self.weaviate_client.collections.list_all()
            health["weaviate"]["status"] = "healthy"
            health["weaviate"]["collections"] = list(collections.keys())
    except Exception as e:
        health["weaviate"]["status"] = f"error: {e}"

    return health
Enter fullscreen mode Exit fullscreen mode

📈 Performance Optimization Tips

1. Vector Search Optimization

Batch Processing:

# Process multiple queries simultaneously
queries = ["query1", "query2", "query3"]
results = []
for query in queries:
    result = collection.query.near_text(query=query, limit=5)
    results.append(result)
Enter fullscreen mode Exit fullscreen mode

Index Tuning:

# Configure HNSW parameters for better performance
vectorizer_config = Configure.Vectorizer.text2vec_transformers(
    vectorize_class_name=True,
    model_config={
        "ef_construction": 256,  # Higher = better recall, slower build
        "max_connections": 32,   # Higher = better recall, more memory
    }
)
Enter fullscreen mode Exit fullscreen mode

2. LLM Response Optimization

Context Window Management:

def optimize_context(self, docs: List[Dict], max_tokens: int = 2000) -> str:
    context_parts = []
    current_length = 0

    for doc in sorted(docs, key=lambda x: x['score'], reverse=True):
        doc_length = len(doc['content'])
        if current_length + doc_length <= max_tokens:
            context_parts.append(f"Title: {doc['title']}\n{doc['content']}")
            current_length += doc_length
        else:
            break

    return "\n\n".join(context_parts)
Enter fullscreen mode Exit fullscreen mode

Prompt Engineering:

system_prompt = """You are an AI assistant specializing in technical documentation and research. 
Use the provided context to answer questions accurately and cite your sources when possible.
If the context doesn't contain relevant information, say so clearly.

Context:
{context}

Guidelines:
- Be concise but comprehensive
- Use bullet points for lists
- Cite sources when referencing specific information
- If uncertain, acknowledge limitations
"""
Enter fullscreen mode Exit fullscreen mode

3. Data Ingestion Optimization

Smart Caching:

import hashlib
from datetime import datetime, timedelta

def should_refresh_source(source_name: str, max_age_hours: int = 24) -> bool:
    cache_file = f"cache/{source_name}_last_update.txt"
    try:
        with open(cache_file, 'r') as f:
            last_update = datetime.fromisoformat(f.read().strip())
            age = datetime.now() - last_update
            return age > timedelta(hours=max_age_hours)
    except FileNotFoundError:
        return True
Enter fullscreen mode Exit fullscreen mode

Incremental Updates:

def get_new_documents_only(self, source: str, since: datetime) -> List[Dict]:
    # Only fetch documents newer than the timestamp
    # Implement based on source API capabilities
    pass
Enter fullscreen mode Exit fullscreen mode

🔮 Future Enhancements

1. Advanced Retrieval Strategies

Hybrid Search Implementation:

# Combine vector search with keyword search
def hybrid_search(self, query: str, alpha: float = 0.7):
    # Vector search (semantic similarity)
    vector_results = collection.query.near_text(query=query, limit=10)

    # BM25 search (keyword matching)  
    bm25_results = collection.query.bm25(query=query, limit=10)

    # Combine results with weighted scoring
    combined_results = self.combine_results(vector_results, bm25_results, alpha)
    return combined_results
Enter fullscreen mode Exit fullscreen mode

Re-ranking with Cross-Encoders:

from sentence_transformers import CrossEncoder

def rerank_results(self, query: str, documents: List[Dict]) -> List[Dict]:
    reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

    pairs = [(query, doc['content']) for doc in documents]
    scores = reranker.predict(pairs)

    # Re-order documents by cross-encoder scores
    for doc, score in zip(documents, scores):
        doc['rerank_score'] = score

    return sorted(documents, key=lambda x: x['rerank_score'], reverse=True)
Enter fullscreen mode Exit fullscreen mode

2. Multi-Modal Capabilities

Image and Document Processing:

# Future: Add support for PDFs, images, videos
class MultiModalSource(DataSource):
    def process_pdf(self, pdf_path: str) -> List[Dict]:
        # Extract text, images, tables from PDFs
        pass

    def process_image(self, image_path: str) -> Dict:
        # OCR + image description
        pass
Enter fullscreen mode Exit fullscreen mode

3. Advanced Analytics

Query Performance Tracking:

import time
from collections import defaultdict

class AnalyticsTracker:
    def __init__(self):
        self.query_times = defaultdict(list)
        self.popular_queries = defaultdict(int)
        self.source_usage = defaultdict(int)

    def track_query(self, query: str, response_time: float, sources: List[str]):
        self.query_times[query].append(response_time)
        self.popular_queries[query] += 1
        for source in sources:
            self.source_usage[source] += 1
Enter fullscreen mode Exit fullscreen mode

🏁 Conclusion

This implementation demonstrates that building production-ready RAG systems with decentralized infrastructure is not only possible but practical. The combination of Gaia and Weaviate provides:

Key Achievements

Decentralized AI: Successfully replaced OpenAI with public Gaia nodes
Advanced Vector Operations: Weaviate's capabilities exceed basic vector storage
Real-World Data: Live integration with multiple internet sources
Production Features: Configuration management, health monitoring, error handling
Performance: Sub-second search, 2-5 second generation times
Scalability: Architecture supports horizontal scaling

Business Impact

  • Cost Reduction: No API fees for LLM inference
  • Vendor Independence: Avoid lock-in with centralized providers
  • Data Privacy: Keep sensitive data within your infrastructure
  • Customization: Full control over models and vectorization
  • Reliability: Distributed infrastructure reduces single points of failure

Technical Benefits

  • Modern Architecture: Microservices-ready with clean separation of concerns
  • Flexibility: Easy to swap models, vectorizers, or data sources
  • Observability: Built-in health checks and performance monitoring
  • Developer Experience: Environment-based configuration, comprehensive logging

Getting Started

Ready to build your own decentralized RAG system? The complete implementation is available on GitHub with:

  • 📋 Step-by-step setup instructions
  • 🧪 Interactive demo with real data
  • 📊 Performance benchmarks and optimization tips
  • 🛠️ Production deployment guidelines
  • 🔧 Troubleshooting and debugging tools

Repository: https://github.com/GaiaNet-AI/gaia-cookbook/tree/main/python/gaia-weaviate
Demo Video: https://youtu.be/zf9_WFhySho

Comments 0 total

    Add comment