Tag #quantization Articles

Articles by Tag #quantization

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

Squeezing AI into Tiny Spaces: The Integer Revolution

Squeezing AI into Tiny Spaces: The Integer Revolution

Squeezing AI into Tiny Spaces: The Integer Revolution Tired of bulky, power-hungry AI...

Learn More 0 0Oct 30 '25

TorchAO vs ONNX Runtime: 8-bit Quantization Benchmark

TorchAO vs ONNX Runtime: 8-bit Quantization Benchmark

TorchAO Just Beat ONNX Runtime on My M1 MacBook (And I Didn't Expect It) I ran the same...

Learn More 0 0Feb 22

Bringing 2-Bit Quantization to ONNX Runtime's WebGPU Backend

Bringing 2-Bit Quantization to ONNX Runtime's WebGPU Backend

A story of five bugs, bit-level debugging, and running transformer models at 2-bit precision in the...

Learn More 0 0Feb 11

Fine-Tuning LLMs: LoRA, Quantization, and Distillation Simplified

Fine-Tuning LLMs: LoRA, Quantization, and Distillation Simplified

Large Language Models (LLMs) like LLaMA, Gemma, and Mistral are incredibly capable — but adapting...

Learn More 3 0Nov 15 '25

The Era of Small Models — SLM, MoE, Distillation, and Quantization Explained

The Era of Small Models — SLM, MoE, Distillation, and Quantization Explained

Bigger isn't always better. Four techniques for efficient model deployment.

Learn More 0 0Mar 10

작은 모델의 시대 — SLM, MoE, Distillation, Quantization 총정리

작은 모델의 시대 — SLM, MoE, Distillation, Quantization 총정리

큰 모델만이 답이 아니다. 작은 모델을 효율적으로 쓰는 4가지 기술.

Learn More 0 0Mar 10

Unleash AI on Tiny Hardware: Quantization for Embedded Reinforcement Learning by Arvind Sundararajan

Unleash AI on Tiny Hardware: Quantization for Embedded Reinforcement Learning Tired of...

Learn More 0 0Nov 11 '25

How to Calculate Perplexity (PPL) the Right Way (and Avoid Common Pitfalls)

How to Calculate Perplexity (PPL) the Right Way (and Avoid Common Pitfalls)

Overview Perplexity (PPL) is a widely used metric for evaluating language models. It...

Learn More 0 0Aug 2 '25

QAT vs PTQ: When 3% Accuracy Drop Kills Your Model

QAT vs PTQ: When 3% Accuracy Drop Kills Your Model

Post-training quantization destroyed my ResNet-50 deployment last year — not because INT8 is broken,...

Learn More 0 0Feb 19

Paper Review: QLoRA — 4-Bit Finetuning That Runs 65B Models on One GPU

Paper Review: QLoRA — 4-Bit Finetuning That Runs 65B Models on One GPU

The Memory Wall Problem Finetuning a 65-billion parameter LLM requires roughly 780GB of...

Learn More 0 0Feb 11