Architecture fit
Integrations
Quantlix sits in the request path between your application and model providers. This page lists only integrations that ship in code today — gateway routes, provider adapters, workflow surfaces, and observability export paths you can verify before a pilot.
Drop-in gateway
Use Quantlix as the base URL for OpenAI- or Anthropic-compatible clients. Every route below runs policy enforcement, metering, and trace capture before provider inference.
POST /v1/chat/completions
OpenAI-compatible chat, tools, and streaming. Policies and traces run in the request path.
POST /v1/messages
Anthropic Messages API proxy when the deployment targets an Anthropic provider model.
POST /v1/embeddings
OpenAI-compatible embeddings for knowledge bases and retrieval pipelines.
GET /v1/models
List models available on the bound deployment inference target.
POST /run
Deployment inference for provider-backed or self-hosted workloads, with policy enforcement before execution.
Gateway routes require X-Quantlix-Deployment-Id so the runtime knows which provider model and policies to apply.
Model providers
Provider adapters are registered in api/gateway/providers/registry.py. Connect credentials in Dashboard → Providers, sync models, then bind a deployment inference target.
OpenAI
openai
Chat and general inference via OpenAIAdapter.
Anthropic
anthropic
Claude chat workloads; also served through POST /v1/messages when bound to an Anthropic target.
Azure OpenAI
azure_openai
Same OpenAI-compatible adapter with Azure base URL and API key configuration.
AWS Bedrock
bedrock
Model access through your AWS Bedrock configuration.
Groq
groq
OpenAI-compatible chat provider.
Together AI
together
OpenAI-compatible chat provider.
Voyage AI
voyage_ai
Embeddings for knowledge bases and document pipelines — not a chat provider.
demo_embeddings is a local deterministic embeddings adapter for onboarding demos — not a production vendor integration.
Provider setup guide →Internal and self-hosted models
Not every workload routes through a cloud API. Quantlix also runs container and bundle artifacts you build and host in your own cluster.
- Provider-backed deployments bind directly to a provider model — no container build required.
- qx-example and other demo models run immediately for onboarding and tests.
- GitHub-sourced container_image and model_bundle workloads build through the orchestrator and run via POST /run.
- GPU inference schedules on cluster nodes labeled quantlix.com/gpu=true when deployment config requests GPU.
RAG and agents
Retrieval and agent behavior run inside workflows — each step is traced, policy-checked, and linked to the same run timeline as gateway traffic.
Workflow retrieval
retrieval, rerank, and answer_with_citations nodes query knowledge bases and record citations in the trace.
Vector backends
pgvector (built into Postgres), Pinecone, Weaviate, and Qdrant — configured per vector index.
RAG API
POST /retrieval/query for semantic search; POST /rag/run for retrieve-and-answer with citations.
Agent nodes
Bounded native tool-calling loops against a provider-backed chat deployment. Tools come from function, tool_call, and mcp_input nodes.
Observability export
Quantlix records runs in its own observability store and can link enforcement events into the tracing and alerting tools you already operate.
Trace propagation
Send W3C traceparent, Datadog, or B3 headers on /run and gateway routes. Quantlix stores parent trace context on enforcement events.
Response linking
Responses include X-Quantlix-Request-ID. Configure integrations.trace_url_template to deep-link into Datadog, Honeycomb, or your own trace viewer.
OpenTelemetry export
When OTEL_ENABLED is set, enforcement spans export over OTLP HTTP with parent trace linkage attributes.
Webhooks and audit export
Per-deployment webhook_url for blocked or warned policy events. Audit bundles export signed run evidence for review outside the portal.
Portal observability
Runs, trace spans, node outputs, citations, agent steps, latency, and cost estimates in Dashboard → Observability.
Integration effort
For provider-backed chat, the smallest path is a base-URL swap — your SDK and request shapes stay the same; Quantlix adds the runtime layer in front.
- Point your OpenAI or Anthropic SDK base URL at the Quantlix API host.
- Add your Quantlix API key as the provider API key.
- Set X-Quantlix-Deployment-Id to the deployment that binds your provider model and policies.
- Keep your existing request bodies — policies, redaction, and traces attach in the request path.
For multi-step RAG or agent flows, compose a workflow and execute it via POST /workflows/{workflow_id}/execute instead of calling providers directly.