Smarter Math Inputs for LLMs: Our Claude-Ready Pipeline

Hi, folks! I’d like to share a project idea I'm working on and hear your thoughts: would it be useful to developers, researchers, or anyone dealing with mathematics and AI?

The Core Problem

If you’ve ever tried to prepare a complex math paper or a collection of formulas for an LLM (for example, Claude AI), you’ve probably noticed these pains:

Massive “fluff” and dense formulas. Simple LaTeX documents balloon into huge blocks of text that the model struggles to “digest” without wasting tokens.
Poor formula extraction accuracy. Automated converters often miss nuances—fractions, subscripts, nonstandard symbols.
High computational costs. To get acceptable accuracy (say, > 95 %), you need a lot of memory and time, especially if you’re running locally.
Integration headaches with AI tools. What works well on PDF might not suit a plain-text API, and vice versa.

My project addresses these issues by combining TRIZ methodology with smart templates tailored to Claude.

What Is the “Mathematical Knowledge Base?"

In essence, it’s a system that lets you:

Upload documents (Markdown, PDF, LaTeX, TXT) and automatically classify formulas as “simple,” “standard,” or “complex.”
Run formulas through a multilayered pipeline, where:

≈ 60 % of formulas (the simplest ones) go through a lightweight template engine (< 100 ms, < 1 MB RAM)
≈ 30 % are processed via a hybrid approach (local AI + templates)
≈ 8 % are “heavy” formulas sent to a full-fledged AI module with subsequent verification
≈ 2 % go into a manual review queue for guaranteed 99.9 % accuracy
1. Export results in a format optimized for Claude AI. Our TRIZ-guided templates achieve 30–50 % token savings without losing meaning.

How TRIZ Helps

Segmentation principle (TRIZ-1). Most formulas are “familiar” and fit pre-hardcoded templates that process instantly.
Prior action principle (TRIZ-10). We precompute patterns and metadata for frequently encountered constructs.
“Removal” principle (TRIZ-2). Heavy computations are offloaded from the main pipeline—either to a local vector engine (ChromaDB) or to high-resource nodes in a queue.

The result: 96–98 % accuracy, under 12 GB RAM, and < 20 minutes on a ~15 000-word document.

Key Features

Multithreaded Resource Orchestrator

Monitors CPU/RAM and redistributes tasks based on load.
Includes a fairness balancer to prevent heavy jobs from hogging the pipeline.

Adaptive Formula-Processing Pipeline

Automatic classification: simple/standard/complex/expert.
Hybrid approach: templates + local AI (OpenHermes via Ollama) + manual check.

Knowledge Catalog with TRIZ Meta-Information

Auto-generation of cross-references between formulas and documents.
TRIZ-based contradiction hunting in mathematical concepts, highlighting potential research angles.

Claude-Optimized Export

Multiple compression tiers (Ultra-Compact, Compact, Standard, Detailed) with different context-preservation levels:
- Ultra-Compact (~ 890 tokens) → quick tasks, ~ 94 % success
- Compact (~ 1 680 tokens) → standard tasks, ~ 97 % success
- Standard (~ 2 340 tokens) → balanced, ~ 97 % success
- Detailed (~ 3 240 tokens) → heavy computations, ~ 96 % success
Token-count accuracy control (± 5 % margin).

Streamlit Interface

A web dashboard for uploading docs, monitoring resources, viewing the knowledge graph, and previewing exports.
Real-time: see how each formula is processed, how much memory is used, and how much time remains.

Who Might Benefit?

Researchers and Math Instructors: the familiar challenge of converting LaTeX scripts into formats AI assistants can handle.
AI Developers working with Claude or other LLMs—save tokens and reduce “noise” in the prompt.
Research Institutes that maintain dense collections of math papers: automatic generation of a TRIZ-informed knowledge map.
Students and Grad Students needing fast AI feedback on their conjectures and proofs.

I Want Your Feedback

Sure, partial solutions exist, but my focus is specifically on TRIZ contradictions and local optimization for Claude.

How often do you hit the “formula overload” problem when working with an LLM?
How important is token savings for you, to avoid paying for extraneous “fluff”?
Would you find a UI that prioritizes formula types and recommends the right level of detail helpful?

Please share in the comments: how useful would such a system be for you? What requirements or ideas do you already have for working with mathematical docs and AI? If you’ve used similar tools before, let me know what you liked or didn’t like!

Thanks for reading—looking forward to your thoughts in the comments!

Pavel Belov @bladerunner_ai