Architecture fit

Integrations

Quantlix sits in the request path between your application and model providers. This page lists only integrations that ship in code today — gateway routes, provider adapters, workflow surfaces, and observability export paths you can verify before a pilot.

Drop-in gateway

Use Quantlix as the base URL for OpenAI- or Anthropic-compatible clients. Every route below runs policy enforcement, metering, and trace capture before provider inference.

POST /v1/chat/completions

OpenAI-compatible chat, tools, and streaming. Policies and traces run in the request path.

POST /v1/messages

Anthropic Messages API proxy when the deployment targets an Anthropic provider model.

POST /v1/embeddings

OpenAI-compatible embeddings for knowledge bases and retrieval pipelines.

GET /v1/models

List models available on the bound deployment inference target.

POST /run

Deployment inference for provider-backed or self-hosted workloads, with policy enforcement before execution.

Gateway routes require X-Quantlix-Deployment-Id so the runtime knows which provider model and policies to apply.

Model providers

Provider adapters are registered in api/gateway/providers/registry.py. Connect credentials in Dashboard → Providers, sync models, then bind a deployment inference target.

OpenAI

openai

Chat and general inference via OpenAIAdapter.

Anthropic

anthropic

Claude chat workloads; also served through POST /v1/messages when bound to an Anthropic target.

Azure OpenAI

azure_openai

Same OpenAI-compatible adapter with Azure base URL and API key configuration.

AWS Bedrock

bedrock

Model access through your AWS Bedrock configuration.

Groq

groq

OpenAI-compatible chat provider.

Together AI

together

OpenAI-compatible chat provider.

Voyage AI

voyage_ai

Embeddings for knowledge bases and document pipelines — not a chat provider.

demo_embeddings is a local deterministic embeddings adapter for onboarding demos — not a production vendor integration.

Provider setup guide →

Internal and self-hosted models

Not every workload routes through a cloud API. Quantlix also runs container and bundle artifacts you build and host in your own cluster.

  • Provider-backed deployments bind directly to a provider model — no container build required.
  • qx-example and other demo models run immediately for onboarding and tests.
  • GitHub-sourced container_image and model_bundle workloads build through the orchestrator and run via POST /run.
  • GPU inference schedules on cluster nodes labeled quantlix.com/gpu=true when deployment config requests GPU.
Self-hosted setup guide →

RAG and agents

Retrieval and agent behavior run inside workflows — each step is traced, policy-checked, and linked to the same run timeline as gateway traffic.

Workflow retrieval

retrieval, rerank, and answer_with_citations nodes query knowledge bases and record citations in the trace.

Vector backends

pgvector (built into Postgres), Pinecone, Weaviate, and Qdrant — configured per vector index.

RAG API

POST /retrieval/query for semantic search; POST /rag/run for retrieve-and-answer with citations.

Agent nodes

Bounded native tool-calling loops against a provider-backed chat deployment. Tools come from function, tool_call, and mcp_input nodes.

Observability export

Quantlix records runs in its own observability store and can link enforcement events into the tracing and alerting tools you already operate.

Trace propagation

Send W3C traceparent, Datadog, or B3 headers on /run and gateway routes. Quantlix stores parent trace context on enforcement events.

Response linking

Responses include X-Quantlix-Request-ID. Configure integrations.trace_url_template to deep-link into Datadog, Honeycomb, or your own trace viewer.

OpenTelemetry export

When OTEL_ENABLED is set, enforcement spans export over OTLP HTTP with parent trace linkage attributes.

Webhooks and audit export

Per-deployment webhook_url for blocked or warned policy events. Audit bundles export signed run evidence for review outside the portal.

Portal observability

Runs, trace spans, node outputs, citations, agent steps, latency, and cost estimates in Dashboard → Observability.

Observability guide →

Integration effort

For provider-backed chat, the smallest path is a base-URL swap — your SDK and request shapes stay the same; Quantlix adds the runtime layer in front.

  1. Point your OpenAI or Anthropic SDK base URL at the Quantlix API host.
  2. Add your Quantlix API key as the provider API key.
  3. Set X-Quantlix-Deployment-Id to the deployment that binds your provider model and policies.
  4. Keep your existing request bodies — policies, redaction, and traces attach in the request path.

For multi-step RAG or agent flows, compose a workflow and execute it via POST /workflows/{workflow_id}/execute instead of calling providers directly.

Quickstart →
Integrations — Quantlix — Quantlix