Articles by Tag #quantization

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

Squeezing AI into Tiny Spaces: The Integer Revolution

Squeezing AI into Tiny Spaces: The Integer Revolution Tired of bulky, power-hungry AI...

Learn More 0 0Oct 30 '25

TorchAO vs ONNX Runtime: 8-bit Quantization Benchmark

TorchAO Just Beat ONNX Runtime on My M1 MacBook (And I Didn't Expect It) I ran the same...

Learn More 0 0Feb 22

Bringing 2-Bit Quantization to ONNX Runtime's WebGPU Backend

A story of five bugs, bit-level debugging, and running transformer models at 2-bit precision in the...

Learn More 0 0Feb 11

Fine-Tuning LLMs: LoRA, Quantization, and Distillation Simplified

Large Language Models (LLMs) like LLaMA, Gemma, and Mistral are incredibly capable — but adapting...

Learn More 3 0Nov 15 '25

The Era of Small Models — SLM, MoE, Distillation, and Quantization Explained

Bigger isn't always better. Four techniques for efficient model deployment.

Learn More 0 0Mar 10

작은 모델의 시대 — SLM, MoE, Distillation, Quantization 총정리

큰 모델만이 답이 아니다. 작은 모델을 효율적으로 쓰는 4가지 기술.

Learn More 0 0Mar 10

Unleash AI on Tiny Hardware: Quantization for Embedded Reinforcement Learning by Arvind Sundararajan

Unleash AI on Tiny Hardware: Quantization for Embedded Reinforcement Learning Tired of...

Learn More 0 0Nov 11 '25

How to Calculate Perplexity (PPL) the Right Way (and Avoid Common Pitfalls)

Overview Perplexity (PPL) is a widely used metric for evaluating language models. It...

Learn More 0 0Aug 2 '25

QAT vs PTQ: When 3% Accuracy Drop Kills Your Model

Post-training quantization destroyed my ResNet-50 deployment last year — not because INT8 is broken,...

Learn More 0 0Feb 19

Paper Review: QLoRA — 4-Bit Finetuning That Runs 65B Models on One GPU

The Memory Wall Problem Finetuning a 65-billion parameter LLM requires roughly 780GB of...

Learn More 0 0Feb 11