Build an AI-Powered Document Assistant with Quarkus and LangChain4j
Markus

Markus @myfear

About: Markus is a Java Champion, former Java EE Expert Group member, founder of JavaLand, reputed speaker at Java conferences around the world, and a very well known figure in the Enterprise Java world.

Location:
Munich, Germany
Joined:
Mar 26, 2024

Build an AI-Powered Document Assistant with Quarkus and LangChain4j

Publish Date: Jul 11
0 0

Agent

Enterprise Java developers know the pain: slow startup times, high memory usage, and complex AI integration. If you're coming from Spring Boot, prepare to be pleasantly shocked. In this tutorial, we’ll build a blazing-fast document assistant that can ingest Text Documents, create embeddings, and answer questions using a Retrieval-Augmented Generation (RAG) pipeline. All powered by Quarkus and LangChain4j, running locally with Ollama and PGVector.

Prerequisites

Make sure you have:

  • Java 17+

  • Maven 3.8.1+

  • Quarkus CLI (optional)

  • Podman (for containers)

  • Ollama (installed natively for local models)

And as usual, if you just wanna peek at the code, don’t hesitate to take a look at the repository. It also has some sample documents for testing the RAG.

Today is gonna be a super quick post with just snippets! Let me know if you like this format better instead of full blown classes!

Bootstrap the Project

Run the following to create your Quarkus project:

mvn io.quarkus.platform:quarkus-maven-plugin:create \
    -DprojectGroupId=com.ibm.developer \
    -DprojectArtifactId=ai-document-assistant \
    -Dextensions="rest-jackson,langchain4j-ollama,langchain4j-pgvector,smallrye-health,quarkus-smallrye-metrics"
Enter fullscreen mode Exit fullscreen mode

These extensions give us REST APIs, local AI model support, vector search via PostgreSQL, health checks, and metrics.

Understand RAG in 30 Seconds

RAG bridges the gap between static LLM knowledge and your private data. Documents are converted into vectors and stored in a vector database. When a question is asked, similar chunks are retrieved and used as context for the LLM. You can learn way more about this in the excellent Quarkus documentation.

Configure the Application

Edit src/main/resources/application.properties:

# A chat model
quarkus.langchain4j.ollama.chat-model.model-id=granite3.3
# an embedding model.
quarkus.langchain4j.ollama.embedding-model.model-name=granite-embedding:latest
# the vector dimension of the embedding model!
quarkus.langchain4j.pgvector.dimension=384
# config property where we store the docs
rag.location=src/main/resources/rag
# setting a longer timeout. Just in case.
quarkus.langchain4j.ollama.timeout=120s
Enter fullscreen mode Exit fullscreen mode

Quarkus Dev Services will auto-provision both the Ollama model and a PostgreSQL+PGVector instance for local dev.

Document Ingestion Service

Create a service to load documents and ingest them as embeddings:

@ApplicationScoped
public class DocumentIngestionService {
    public void ingest(@Observes StartupEvent ev,
                       EmbeddingStore store, EmbeddingModel model,
                       @ConfigProperty(name = "rag.location") Path documents) {
        store.removeAll();
        var list = FileSystemDocumentLoader.loadDocumentsRecursively(documents);
        EmbeddingStoreIngestor.builder()
            .embeddingStore(store)
            .embeddingModel(model)
            .documentSplitter(recursive(100, 25))
            .build()
            .ingest(list);
    }
}
Enter fullscreen mode Exit fullscreen mode

Add the Retrieval Augmentor

This component enables retrieval of contextually relevant documents:

@Produces
@ApplicationScoped
public RetrievalAugmentor create(EmbeddingStore store, EmbeddingModel model) {
    return DefaultRetrievalAugmentor.builder()
        .contentRetriever(EmbeddingStoreContentRetriever.builder()
            .embeddingModel(model)
            .embeddingStore(store)
            .maxResults(3)
            .build())
        .build();
}
Enter fullscreen mode Exit fullscreen mode

Create the AI Service

Define the interface that talks to the model:

@RegisterAiService
public interface DocumentAssistant {
    @SystemMessage("""
        You are a helpful document assistant. Answer questions based on the provided context.
        If you can't find the answer, say so politely.
        """)
    String answerQuestion(@UserMessage String question);
}
Enter fullscreen mode Exit fullscreen mode

REST API Layer

Expose the assistant over HTTP:

@Path("/api/assistant")
public class DocumentAssistantResource {
    @Inject DocumentAssistant assistant;

    @POST @Path("/ask")
    public Response ask(Map<String, String> request) {
        var question = request.get("question");
        if (question == null || question.isBlank()) {
            return Response.status(400).entity(Map.of("error", "Missing question")).build();
        }
        try {
            var answer = assistant.answerQuestion(question);
            return Response.ok(Map.of("question", question, "answer", answer)).build();
        } catch (Exception e) {
            return Response.serverError().entity(Map.of("error", e.getMessage())).build();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Health Check

Create a readiness probe:

@Readiness
@ApplicationScoped
public class DocumentAssistantHealthCheck implements HealthCheck {
    @Inject DocumentAssistant assistant;

    @Override
    public HealthCheckResponse call() {
        try {
            var result = assistant.answerQuestion("Are you online?");
            return HealthCheckResponse.named("document-assistant").up().build();
        } catch (Exception e) {
            return HealthCheckResponse.named("document-assistant").down().withData("error", e.getMessage()).build();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Run the Application

Start Quarkus in dev mode:

mvn quarkus:dev
Enter fullscreen mode Exit fullscreen mode

If it's your first run, it may take time to download models. Once up, Quarkus will ingest your documents and you are ready to test the endpoint:

curl -X POST http://localhost:8080/api/assistant/ask \
  -H "Content-Type: application/json" \
  -d '{"question":"What was Max\'s badge number?"}'
Enter fullscreen mode Exit fullscreen mode

Observe and Iterate

Visit /q/health-ui to verify health.

Visit /q/metrics for detailed AI metrics.

Modify your code and observe hot reload magic.

Congratulations

You’ve built an enterprise-grade AI document assistant that:

  • Starts in under 2 seconds

  • Runs local LLMs via Ollama

  • Uses PostgreSQL + PGVector

  • Exposes REST endpoints

  • Integrates health and metrics

  • Uses declarative AI services with CDI

This is the future of Java: fast, easy, intelligent.

What’s Next?

Add support for:

  • PDF parsing via LangChain4j loaders

  • Hybrid search (keywords + vectors)

  • Role-based access and audit logs

  • Production deployment on Kubernetes or OpenShift

The combination of Quarkus and LangChain4j is not just a dev-time booster. It’s your entry point into building real AI-powered business applications in Java. Stay fast. Stay native. Stay smart.

Comments 0 total

    Add comment