Decision Trace View: Per-request observability

2025-02-25Quantlix Team

When debugging inference runs, you need more than a job ID and a status. You need to see what happened: which model ran, whether fallback kicked in, how long each stage took, and what the guardrails decided. That's what the Decision Trace View gives you.

What it shows

Every completed inference job has a trace. Click a job in your dashboard to open it. The trace includes:

Input summary — Keys and a truncated preview of the prompt or input data.

Model — The deployment's model_id that was used for this run.

Fallback triggered — Whether the primary model failed and a fallback model succeeded. Configure fallback via deployment config.

Retry count — How many times the orchestrator retried inference before success (transient failures like timeouts).

Latency breakdown — Queue time, inference time, and guardrail time in milliseconds. Helps you see where time is spent.

Token amplification — Input tokens, output tokens, and the output/input ratio. Useful for understanding generation length.

Guardrail outcome — Whether the output was blocked, allowed, or flagged for review. Includes policy action and any flags.

Cost — Tokens used, compute seconds, and GPU seconds from the usage record.

Cost Amplification Score — When retries or fallback add cost, the trace shows the breakdown. See the separate article on Cost Amplification Score.

API

You can fetch the trace programmatically:

GET /jobs/{job_id}/trace

Returns a JSON object with all trace fields. Use it for dashboards, alerting, or post-run analysis.

Why it matters

Inference isn't a black box. Retries, fallbacks, and guardrails all affect latency and cost. The Decision Trace View makes that visible so you can tune your deployment, debug failures, and understand cost behavior per request.