Articles by Tag #vlm

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

ClipTagger-12B VLM: Frame Captioning Tutorial

TL;DR The inference-net/ClipTagger-12b is a Gemma-3-12B based VLM with an Apache-2.0...

Learn More 3 0Nov 2 '25

Brand Tagging with VLMs

TL;DR Build a two-stage logo pipeline: Retrieval - generate image embeddings for small...

Learn More 0 0Nov 15 '25

2025 Complete Guide: How to Build End-to-End OCR with HunyuanOCR

🎯 Key Takeaways (TL;DR) A single 1B multimodal architecture covers detection,...

Learn More 3 0Nov 25 '25

Testing qwen3-vl… quite impressive!

Rapid test using qwen3 vision language Introduction Vision Language Models —...

Learn More 0 0Oct 17 '25

在 Jetson 運行 Live VLM WebUI

Live VLM WebUI Live VLM WebUI 是一個方便的介面,用於即時評估視覺語言模型: 🎥 多來源視訊輸入 WebRTC 網路攝影機串流(穩定) 🧪...

Learn More 0 0Dec 30 '25

GLM-4.6V Now on SiliconFlow: Native Multimodal Tool Use Meets SoTA Visual Intelligence

TL;DR: ​GLM-4.6V​, Z.ai's latest multimodal large language model, is now ​available on SiliconFlow​....

Learn More 0 0Dec 26 '25

VLM Pipeline with Docling

Hands-on experience using VLM Pipeline from Docling. Introduction Vision-Language...

Learn More 0 0May 15 '25

Journal of our experiments on VLM token pruning

I and @oldpilluwu have been keenly interested in how to make Large Vision Models (VLM) work and...

Learn More 0 0Aug 2 '25

OCR - ID Card Scanner (VLM)

In this article, we present a production-grade pipeline for extracting Turkish national...

Learn More 0 0Jul 10 '25

Small Model from Huggingface with Video understanding

A couple of weeks ago, SmolVLM-2 got released by Huggingface with an amazing feature — Video...

Learn More 0 0Feb 27 '25

NuMarkdown-8B-Thinking: The Open-Source Reasoning OCR that Converts PDFs to Auditable Markdown for Enterprise RAG Pipelines

Reasoning OCR models for automated document-to-markdown workflows

Learn More 0 0Aug 12 '25

Unlocking Visual Intelligence: Picture Annotation with Remote VLM Power

Implementing Picture Annotation using Remote Visual Language Models and Docling! ...

Learn More 0 0Jul 8 '25

From OCR to VLMs: How AI Agents Make Financial Docs Understandable

Financial documents are essential for investment decisions, risk assessments, and compliance checks....

Learn More 0 0Aug 8 '25

JoyCaption: The Open, Uncensored VLM That Will Supercharge Your Diffusion Models

Quick Summary: 📝 JoyCaption is an open-source Visual Language Model (VLM) designed for...

Learn More 0 0Oct 27 '25