Enterprise Java developers know the pain: slow startup times, high memory usage, and complex AI integration. If you're coming from Spring Boot, prepare to be pleasantly shocked. In this tutorial, we’ll build a blazing-fast document assistant that can ingest Text Documents, create embeddings, and answer questions using a Retrieval-Augmented Generation (RAG) pipeline. All powered by Quarkus and LangChain4j, running locally with Ollama and PGVector.
Prerequisites
Make sure you have:
Java 17+
Maven 3.8.1+
Quarkus CLI (optional)
Podman (for containers)
Ollama (installed natively for local models)
And as usual, if you just wanna peek at the code, don’t hesitate to take a look at the repository. It also has some sample documents for testing the RAG.
Today is gonna be a super quick post with just snippets! Let me know if you like this format better instead of full blown classes!
Bootstrap the Project
Run the following to create your Quarkus project:
mvn io.quarkus.platform:quarkus-maven-plugin:create \
-DprojectGroupId=com.ibm.developer \
-DprojectArtifactId=ai-document-assistant \
-Dextensions="rest-jackson,langchain4j-ollama,langchain4j-pgvector,smallrye-health,quarkus-smallrye-metrics"
These extensions give us REST APIs, local AI model support, vector search via PostgreSQL, health checks, and metrics.
Understand RAG in 30 Seconds
RAG bridges the gap between static LLM knowledge and your private data. Documents are converted into vectors and stored in a vector database. When a question is asked, similar chunks are retrieved and used as context for the LLM. You can learn way more about this in the excellent Quarkus documentation.
Configure the Application
Edit src/main/resources/application.properties
:
# A chat model
quarkus.langchain4j.ollama.chat-model.model-id=granite3.3
# an embedding model.
quarkus.langchain4j.ollama.embedding-model.model-name=granite-embedding:latest
# the vector dimension of the embedding model!
quarkus.langchain4j.pgvector.dimension=384
# config property where we store the docs
rag.location=src/main/resources/rag
# setting a longer timeout. Just in case.
quarkus.langchain4j.ollama.timeout=120s
Quarkus Dev Services will auto-provision both the Ollama model and a PostgreSQL+PGVector instance for local dev.
Document Ingestion Service
Create a service to load documents and ingest them as embeddings:
@ApplicationScoped
public class DocumentIngestionService {
public void ingest(@Observes StartupEvent ev,
EmbeddingStore store, EmbeddingModel model,
@ConfigProperty(name = "rag.location") Path documents) {
store.removeAll();
var list = FileSystemDocumentLoader.loadDocumentsRecursively(documents);
EmbeddingStoreIngestor.builder()
.embeddingStore(store)
.embeddingModel(model)
.documentSplitter(recursive(100, 25))
.build()
.ingest(list);
}
}
Add the Retrieval Augmentor
This component enables retrieval of contextually relevant documents:
@Produces
@ApplicationScoped
public RetrievalAugmentor create(EmbeddingStore store, EmbeddingModel model) {
return DefaultRetrievalAugmentor.builder()
.contentRetriever(EmbeddingStoreContentRetriever.builder()
.embeddingModel(model)
.embeddingStore(store)
.maxResults(3)
.build())
.build();
}
Create the AI Service
Define the interface that talks to the model:
@RegisterAiService
public interface DocumentAssistant {
@SystemMessage("""
You are a helpful document assistant. Answer questions based on the provided context.
If you can't find the answer, say so politely.
""")
String answerQuestion(@UserMessage String question);
}
REST API Layer
Expose the assistant over HTTP:
@Path("/api/assistant")
public class DocumentAssistantResource {
@Inject DocumentAssistant assistant;
@POST @Path("/ask")
public Response ask(Map<String, String> request) {
var question = request.get("question");
if (question == null || question.isBlank()) {
return Response.status(400).entity(Map.of("error", "Missing question")).build();
}
try {
var answer = assistant.answerQuestion(question);
return Response.ok(Map.of("question", question, "answer", answer)).build();
} catch (Exception e) {
return Response.serverError().entity(Map.of("error", e.getMessage())).build();
}
}
}
Health Check
Create a readiness probe:
@Readiness
@ApplicationScoped
public class DocumentAssistantHealthCheck implements HealthCheck {
@Inject DocumentAssistant assistant;
@Override
public HealthCheckResponse call() {
try {
var result = assistant.answerQuestion("Are you online?");
return HealthCheckResponse.named("document-assistant").up().build();
} catch (Exception e) {
return HealthCheckResponse.named("document-assistant").down().withData("error", e.getMessage()).build();
}
}
}
Run the Application
Start Quarkus in dev mode:
mvn quarkus:dev
If it's your first run, it may take time to download models. Once up, Quarkus will ingest your documents and you are ready to test the endpoint:
curl -X POST http://localhost:8080/api/assistant/ask \
-H "Content-Type: application/json" \
-d '{"question":"What was Max\'s badge number?"}'
Observe and Iterate
Visit /q/health-ui
to verify health.
Visit /q/metrics
for detailed AI metrics.
Modify your code and observe hot reload magic.
Congratulations
You’ve built an enterprise-grade AI document assistant that:
Starts in under 2 seconds
Runs local LLMs via Ollama
Uses PostgreSQL + PGVector
Exposes REST endpoints
Integrates health and metrics
Uses declarative AI services with CDI
This is the future of Java: fast, easy, intelligent.
What’s Next?
Add support for:
PDF parsing via LangChain4j loaders
Hybrid search (keywords + vectors)
Role-based access and audit logs
Production deployment on Kubernetes or OpenShift
The combination of Quarkus and LangChain4j is not just a dev-time booster. It’s your entry point into building real AI-powered business applications in Java. Stay fast. Stay native. Stay smart.