Articles by Tag #vlm

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

Benchmarking Pixtral 12B: MistralAI's New VLM

GitHub Link Youtube Link In the fast-evolving world of AI, Vision-Language Models (VLMs) are...

Learn More 10 0Sep 18 '24

Benchmarking Pixtral Large vs Pixtral 12B

Youtube: Click Me Multimodal AI has taken significant leaps in recent years, and Mistral AI's...

Learn More 8 0Nov 25 '24

Stress Testing VLMs: Multi QnA and Description Tasks

Video Link: https://youtu.be/pwW9zwVQ4L8 Repository Link:...

Learn More 6 0Oct 14 '24

Unlock the Magic of Images: A Quick and Easy Guide to Using the Cutting-Edge SmolVLM-500M Model

The model SmolVLM-500M-Instruct is a state-of-the-art, compact model with 500 million parameters....

Learn More 1 0Jan 24

Unlocking Visual Intelligence: Picture Annotation with Remote VLM Power

Implementing Picture Annotation using Remote Visual Language Models and Docling! ...

Learn More 0 0Jul 8

Journal of our experiments on VLM token pruning

I and @oldpilluwu have been keenly interested in how to make Large Vision Models (VLM) work and...

Learn More 0 0Aug 2

From OCR to VLMs: How AI Agents Make Financial Docs Understandable

Financial documents are essential for investment decisions, risk assessments, and compliance checks....

Learn More 0 0Aug 8

OCR - ID Card Scanner (VLM)

In this article, we present a production-grade pipeline for extracting Turkish national...

Learn More 0 0Jul 10

Small Model from Huggingface with Video understanding

A couple of weeks ago, SmolVLM-2 got released by Huggingface with an amazing feature — Video...

Learn More 0 0Feb 27

📊 Exploring Vision Language Models (VLMs) for Structured Data Extraction

Over the past few weeks, I've been studying the effectiveness of Vision Language Models (VLMs) for...

Learn More 0 0Sep 27 '24

VLM Pipeline with Docling

Hands-on experience using VLM Pipeline from Docling. Introduction Vision-Language...

Learn More 0 0May 15

NuMarkdown-8B-Thinking: The Open-Source Reasoning OCR that Converts PDFs to Auditable Markdown for Enterprise RAG Pipelines

Reasoning OCR models for automated document-to-markdown workflows

Learn More 0 0Aug 12