← Back to Blog

Deploying models with GPU

Quantlix Team

GPU acceleration can significantly reduce latency for larger models. Here's how to enable it.

Enabling GPU deployments

GPU-backed deployments are available on supported sandboxes and Enterprise engagements. Enable a GPU deployment with:

quantlix deploy qx-example-gpu --gpu --api-key <your_api_key>

RTX 4000 Ada (20GB)

We use NVIDIA RTX 4000 Ada GPUs with 20GB VRAM. GPU compute is billed at €0.50/hour.

When to use GPU

  • Larger models that don't fit well on CPU
  • Latency-sensitive applications
  • Batch inference workloads
Deploying models with GPU — Quantlix — Quantlix