Cost Amplification Score: Make cost behavior observable

2025-02-25Quantlix Team

Retries and fallbacks can silently multiply your inference cost. A request that succeeds on the third try ran three inference jobs — but you might only see one line item. The Cost Amplification Score makes that visible.

What it shows

For each completed job, the Decision Trace includes a Cost Amplification Score when usage data exists:

Base cost — The cost of a single successful inference run (€).

Retry cost — Additional cost from orchestrator retries. Each retry runs inference again; this shows the estimated cost of those extra runs.

Fallback cost — When the primary model failed and a fallback model succeeded, both ran. This shows the cost attributed to the fallback run.

Total amplification factor — How many inference runs this request effectively consumed. 1.0x means one run. 2.5x means two or three runs (e.g. one retry plus fallback).

Example

A job that succeeded after two retries might show:

Base cost: €0.0002

Retry cost: €0.0004

Fallback cost: €0.0000

Total amplification factor: 3.0x

That tells you: this request ran three times. The base cost is one third of the total; the retry cost is two thirds. If retries are frequent, you can investigate timeouts or model stability.

How it's calculated

We estimate cost from tokens, CPU compute seconds, and GPU seconds using standard rates (€10/1M tokens, €0.36/hr CPU, €0.50/hr GPU). Total inference runs = 1 + retry_count + (1 if fallback else 0). Cost per run = total cost / total runs. Base, retry, and fallback costs are derived from that split.

Why it matters

Cost behavior should be observable, not abstract. When you see a 2x or 3x amplification factor, you know retries or fallback are driving cost. You can then optimize model reliability, tune timeouts, or adjust fallback config — with data, not guesswork.