Articles by Tag #llminference

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

TorchAO vs ONNX Runtime: 8-bit Quantization Benchmark

TorchAO Just Beat ONNX Runtime on My M1 MacBook (And I Didn't Expect It) I ran the same...

Learn More 0 0Feb 22

HeMA-MISO: Heterogeneous Memory Architecture for LLM Inference with SW Optimization

Note: This research was conducted in the first half of 2025. Some information may be outdated at the...

Learn More 0 0Sep 27 '25

Fine-Tuning and Inference in LLMs: From Custom Models to Production Deployment

As organizations move beyond using pre-trained models, two critical concepts become essential:...

Learn More 1 0Jan 13

Paper Review: GQA — Grouped Query Attention for Faster LLM Inference

Why Multi-Head Attention Has a Memory Problem Here's a number that might surprise you: in...

Learn More 0 0Feb 9

Speculative Decoding: Why 2x Faster Inference Fails

The Promise That Breaks Under Load Speculative decoding claims to make LLM inference 2-3x...

Learn More 0 0Mar 3