← Back to Blog

Deploying models with GPU

Quantlix Team

GPU acceleration can significantly reduce latency for larger models. Here's how to enable it.

Pro plan GPU access

Pro plan includes 2 hours of GPU compute per month. Enable GPU deployment with:

quantlix deploy qx-example-gpu --gpu --api-key <your_api_key>

RTX 4000 Ada (20GB)

We use NVIDIA RTX 4000 Ada GPUs with 20GB VRAM. Extra GPU hours beyond the included 2h cost €0.50/hour on Pro.

When to use GPU

- Larger models that don't fit well on CPU

- Latency-sensitive applications

- Batch inference workloads