Deploying models with GPU
Quantlix Team
GPU acceleration can significantly reduce latency for larger models. Here's how to enable it.
Enabling GPU deployments
GPU-backed deployments are available on supported sandboxes and Enterprise engagements. Enable a GPU deployment with:
quantlix deploy qx-example-gpu --gpu --api-key <your_api_key>RTX 4000 Ada (20GB)
We use NVIDIA RTX 4000 Ada GPUs with 20GB VRAM. GPU compute is billed at €0.50/hour.
When to use GPU
- Larger models that don't fit well on CPU
- Latency-sensitive applications
- Batch inference workloads