Beyond Borders: Seamless Document Translation with Docling and Granite
Alain Airom

Alain Airom @aairom

About: Senior Engineer - Ecosystem Engineering - Build Lab 28+ years experience in the IT industry. Always learning!

Location:
France
Joined:
Jul 13, 2020

Beyond Borders: Seamless Document Translation with Docling and Granite

Publish Date: Jun 5
1 0

Using Granite alongside with Docling (and Ollama) to translate documents.

Introduction

In today’s interconnected world, breaking down language barriers in documents has never been more critical. Imagine effortlessly converting complex reports, legal texts, or technical manuals from one language to another, retaining not just the meaning but also the original formatting and context. This is now seamlessly achievable by leveraging the power of Large Language Models (LLMs). When combined with advanced techniques like Retrieval Augmented Generation (RAG), which grounds LLMs in specific, accurate information, and orchestrated by intelligent AI Agents, these systems become incredibly adept at nuanced translation. This approach offers a powerful and highly versatile solution for bridging linguistic divides, proving to be an indispensable tool in our increasingly globalized interactions.

Tools Used

Image description

To bring this seamless translation capability to life, our architecture leverages a robust set of tools, each playing a crucial role. We utilize Ollama as our local LLM host, offering the flexibility and privacy of running powerful language models directly on our machines. For the core translation engine, we’ve selected IBM Granite-dense, a particularly advantageous choice due to its lightweight nature, which ensures efficient local execution without demanding excessive computational resources. This makes it easily accessible for a wide range of users and applications. Finally, Docling serves as the backbone for document parsing and sophisticated translation implementation, meticulously extracting content, handling diverse formats, and preparing the translated output, thus ensuring a comprehensive and high-fidelity translation experience.

Image description

Code Implementation

The high-level application architecture is represented below.

Image description

  • First things first, prepare the environment.
pip install --upgrade pip

pip install docling
pip install ollama

# assuming ollama is installed! you can test the LLM locally
ollama run granite3-dense:latest


/Applications/Python\ 3.12/Install\ Certificates.command

export PYTHONHTTPSVERIFY=0
Enter fullscreen mode Exit fullscreen mode
  • The main sample application ⬇
import logging
from pathlib import Path

# Import the ollama client library
import ollama

from docling_core.types.doc import ImageRefMode, TableItem, TextItem

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption

_log = logging.getLogger(__name__)

IMAGE_RESOLUTION_SCALE = 2.0

def translate(text: str, src: str = "en", dest: str = "de"):
    """
    Translates text using a local Ollama LLM (granite3-dense:latest).
    """
    _log.info(f"Translating text using Ollama with granite3-dense:latest from {src} to {dest}")
    try:
        # Construct the prompt for the LLM.
        # You can adjust this prompt to better suit the translation task
        # and the capabilities of the granite3-dense model.
        prompt = f"Translate the following {src} text to {dest}: {text}"

        # Call the Ollama model
        response = ollama.chat(model='granite3-dense:latest', messages=[
            {
                'role': 'user',
                'content': prompt,
            },
        ])
        translated_text = response['message']['content']
        return translated_text
    except Exception as e:
        _log.error(f"Error during translation with Ollama: {e}")
        return text # Return original text on error

def main():
    logging.basicConfig(level=logging.INFO)

    input_doc_path = Path("./input/2206.01062v1.pdf")
    output_dir = Path("scratch")

    # Ensure the output directory exists
    output_dir.mkdir(parents=True, exist_ok=True)

    # Important: For operating with page images, we must keep them, otherwise the DocumentConverter
    # will destroy them for cleaning up memory.
    # This is done by setting PdfPipelineOptions.images_scale, which also defines the scale of images.
    # scale=1 correspond of a standard 72 DPI image
    # The PdfPipelineOptions.generate_* are the selectors for the document elements which will be enriched
    # with the image field
    pipeline_options = PdfPipelineOptions()
    pipeline_options.images_scale = IMAGE_RESOLUTION_SCALE
    pipeline_options.generate_page_images = True
    pipeline_options.generate_picture_images = True

    doc_converter = DocumentConverter(
        format_options={
            InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
        }
    )

    _log.info(f"Converting document: {input_doc_path}")
    conv_res = doc_converter.convert(input_doc_path)
    conv_doc = conv_res.document
    doc_filename = conv_res.input.file.name # Use .name to get just the filename

    # Save markdown with embedded pictures in original text
    md_filename_orig = output_dir / f"{doc_filename}-with-images-orig.md"
    _log.info(f"Saving original markdown to: {md_filename_orig}")
    conv_doc.save_as_markdown(md_filename_orig, image_mode=ImageRefMode.EMBEDDED)

    _log.info("Starting translation of document elements...")
    for element, _level in conv_res.document.iterate_items():
        if isinstance(element, TextItem):
            element.orig = element.text
            element.text = translate(text=element.text)

        elif isinstance(element, TableItem):
            for cell in element.data.table_cells:
                if cell.text: # Ensure there's text to translate in the cell
                    cell.text = translate(text=cell.text)

    # Save markdown with embedded pictures in translated text
    md_filename_translated = output_dir / f"{doc_filename}-with-images-translated.md"
    _log.info(f"Saving translated markdown to: {md_filename_translated}")
    conv_doc.save_as_markdown(md_filename_translated, image_mode=ImageRefMode.EMBEDDED)
    _log.info("Translation and saving complete.")

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode
  • Now run the application.
python main.py

### long output on console excerpt
...
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"
INFO:__main__:Translating text using Ollama with granite3-dense:latest from en to de
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"
INFO:__main__:Translating text using Ollama with granite3-dense:latest from en to de
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"
INFO:__main__:Translating text using Ollama with granite3-dense:latest from en to de
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"
INFO:__main__:Translating text using Ollama with granite3-dense:latest from en to de
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"
INFO:__main__:Saving translated markdown to: scratch/2206.01062v1.pdf-with-images-translated.md
INFO:__main__:Translation and saving complete.
Enter fullscreen mode Exit fullscreen mode

Image description

  • At the end the application generates to distinct markdwn files, hereafter the sample contents.

Image description

And there we go ⛳

It’s important to note that while this approach offers a powerful solution for document translation, not all desired output results are perfect. The quality of the translation can vary depending on the complexity of the source material and the specific language pair. This sample utilizes IBM Granite-dense as the Large Language Model; however, you have the flexibility to integrate other LLMs of your choice, whether they are run locally via Ollama or accessed through remote services. Experimenting with different models can help you achieve the best possible translation quality for your specific needs.

Conclusion

In conclusion, the document translation approach presented offers a compelling solution for overcoming linguistic barriers in business. By intelligently combining Docling’s robust document parsing and structuring capabilities with the power of locally-hosted Large Language Models (LLMs) like IBM Granite-dense via Ollama, we’ve crafted a system that delivers efficient, private, and high-fidelity translations. This method is particularly useful for today’s businesses operating in a globalized landscape, enabling rapid conversion of diverse documentation — from legal contracts and technical manuals to marketing materials — expediting international collaboration, facilitating market entry, and ensuring clear communication across diverse linguistic audiences. The flexibility to run LLMs locally or remotely, coupled with the potential for further refinement using techniques like RAG and AI agents, positions this approach as a highly adaptable and valuable asset for any organization seeking to enhance its global reach and operational efficiency.

Link(s)

Comments 0 total

    Add comment