Deploying models with GPU
Quantlix Team
GPU acceleration can significantly reduce latency for larger models. Here's how to enable it.
Pro plan GPU access
Pro plan includes 2 hours of GPU compute per month. Enable GPU deployment with:
quantlix deploy qx-example-gpu --gpu --api-key <your_api_key>RTX 4000 Ada (20GB)
We use NVIDIA RTX 4000 Ada GPUs with 20GB VRAM. Extra GPU hours beyond the included 2h cost €0.50/hour on Pro.
When to use GPU
- Larger models that don't fit well on CPU
- Latency-sensitive applications
- Batch inference workloads