Deploy

Self-hosted setup

Run the Quantlix control plane and runtime infrastructure in your own environment when data residency and network boundaries require it. This guide covers the platform stack and how workload deployments are built and run inside your cluster.

When to self-host

Choose self-hosted when you need the API, portal, orchestrator, database, and supporting infrastructure to run inside your own environment — for example strict data residency, private networking, or operating the full stack on your Kubernetes cluster.

Managed

Quantlix hosts the control plane. You configure providers, deployments, workflows, policies, users, and audit exports through the portal and API.

Self-hosted

You run the API, portal, orchestrator, database, and supporting infrastructure in your own environment. Inference for container workloads executes in your cluster; provider-backed routes still call external model APIs from your API layer.

Self-hosting the platform is separate from bring-your-own external models. The external_model workflow node keeps Quantlix policy, tracing, and output validation in the request path while HTTP calls go to a model you host elsewhere — without running the full Quantlix stack yourself. See Runtime Layer Extraction (BYO model).

Provider-backed deployments are another distinct path: bind a deployment to an OpenAI, Anthropic, or other provider model via an inference target. Quantlix routes /run through policy, then to the provider — no container build in your cluster. See Provider integrations.

What gets deployed

Self-hosting has two layers: the platform (Quantlix services in Kubernetes) and workload deployments (models and workflows you register through the API).

Platform stack

API

Control plane and runtime entry point. Handles deploy, run, policies, providers, workflows, and audit exports.

Portal

Dashboard for deployments, policies, providers, observability, and governance. Self-host with Docker or host separately (for example Vercel with NEXT_PUBLIC_API_URL).

Orchestrator

Consumes build and inference tasks from Redis. Builds GitHub-sourced artifacts, schedules inference jobs, and updates deployment status.

Inference

Runs container_image workloads in the cluster. GPU jobs schedule on nodes labeled quantlix.com/gpu=true when config.gpu is set.

PostgreSQL, Redis, MinIO

Database, task queue, and object storage. Required for API and orchestrator operation.

Production images are built and pushed for api, orchestrator, and inference, then applied via Kustomize overlays (infra/kubernetes/overlays/prod or dev for local k3d). The orchestrator patch sets the inference image your cluster uses for scheduled jobs.

Workload build path (GitHub source)

When you deploy from GitHub, the API enqueues a build task. The orchestrator worker detects build mode from the repository Dockerfile, then either:

container_image — Kaniko builds and pushes an image; runtime_image is set to that reference.
model_bundle — source archive is stored in MinIO; artifact_uri points at the bundle location.

On success, deployment and revision records are updated and the deployment is marked ready. On failure, tasks retry with backoff; exhausted retries land in the build dead-letter queue.

Artifact fields on deployment / revision

artifact_type — container_image (Kaniko build) or model_bundle (archive to MinIO)
artifact_uri — registry image reference or s3:// bundle location after a successful build
artifact_digest — git commit digest or sha256 bundle digest
runtime_image — digest-pinned image used for container_image deployments
build_status — queued, running, succeeded, or failed (GitHub-sourced deployments)
build_error and build_logs_ref — failure detail and Kaniko job reference when applicable
source_commit_sha — pinned commit from GitHub source

GPU inference: set gpu in deployment config. Join dedicated GPU nodes to the cluster (for example Hetzner GEX44) with the quantlix.com/gpu=true label so jobs schedule correctly. The inference image must support CUDA when running GPU workloads.

Setup steps

1. Stand up the platform

Provision Kubernetes (Terraform on Hetzner, an existing cluster, or local k3d per docs/API_BUILD_DEPLOY.md).
Build and push quantlix-api, quantlix-orchestrator, and quantlix-inference images; update the Kustomize overlay for your registry.
Create namespace secrets (postgres, minio, api with jwt-secret and optional billing/email keys) and apply the overlay: kubectl apply -k infra/kubernetes/overlays/prod.
Deploy the portal with NEXT_PUBLIC_API_URL pointing at your API (Docker self-host or external host per docs/GO_LIVE.md).
Verify: curl https://api.yourdomain.com/health, sign up, and confirm orchestrator and Redis pods are healthy.

2. Register workload deployments

Create or update a deployment via POST /deploy (portal Dashboard → Deploy or quantlix deploy). Pass deployment_id to update an existing deployment and create a new revision.
For GitHub-sourced deployments (source.provider == github): the API queues a build task, sets build_status to queued, and creates a DeploymentRevision. The orchestrator builds container_image (Kaniko) or model_bundle (MinIO), then writes artifact_uri, runtime_image, and related fields.
Wait for build_status succeeded before calling /run on GitHub deployments — the run route returns 409 while the build is incomplete.
Provider-backed deployments (InferenceTarget target_type provider_model) skip container builds and can reach READY without a GitHub artifact.
Demo models (qx-example) and MOCK_K8S environments mark deployments READY immediately; other custom container workloads may stay PENDING until the orchestrator processes the first run.
Revisions snapshot model_id, config, source, and artifact metadata. Rollout strategies (immediate, canary, shadow) control how config updates apply.
Assign deployments to projects and set registry metadata (purpose, data_classification, lifecycle_stage) as part of deploy or deployment settings.
Policy, contracts, redaction, and budget gates still run on every /run request before provider inference or orchestrator-scheduled inference.

For GitHub auto-redeploy: connect GitHub in the portal, configure source with owner, repo, branch, and optional auto_redeploy. Pushes that match create a new revision, update source.commit_sha, and re-queue the build worker.

What Quantlix stores vs what stays in your environment

In self-hosted mode you operate the infrastructure, but Quantlix still persists the same categories of control-plane data in your PostgreSQL (and object storage where applicable):

Organizations, teams, projects, deployments, providers, and workflow definitions.
Execution records, node executions, trace IDs, timing, status, and error payloads.
Policy decisions, enforcement events, budget outcomes, and audit export metadata.
Provider credentials encrypted before storage.

What stays in your environment: the Kubernetes cluster, inference containers built from your GitHub source, model bundles in your MinIO, and network paths between your applications and the API. Provider-backed and external_model calls still leave your boundary to reach third-party or separately hosted model endpoints — policy and traces are captured in Quantlix regardless.

Self-hosting shifts where data is stored and which network paths you control; it does not by itself guarantee a specific regulatory outcome. Map residency and subprocessors to your deployment configuration and provider choices.

Next steps

Architecture — request lifecycle, managed vs self-hosted, and data flow.
Provider integrations — connect OpenAI, Anthropic, Azure, and bind provider-backed inference targets.
Runtime Layer Extraction (BYO model) — run policy and tracing around a model you host elsewhere via the external_model node.