RAG deployment

Deploy a RAG model with Quantlix

This guide shows the production path for retrieval-augmented generation: connect a model, ingest documents, test retrieval, create a governed workflow, and inspect traces and audit evidence.

1. Choose the model provider

Connect the chat model you want to use for answer generation, such as Anthropic, OpenAI, Azure OpenAI, Bedrock, Groq, or Together.

2. Add an embedding provider

Add an embedding-capable provider, such as Voyage AI or another configured embedding model. Embeddings power semantic search over your documents.

3. Create a deployment

Create a Quantlix deployment for the chat model, then bind it to the provider model you want Quantlix to govern.

4. Create a knowledge base

In Dashboard → Knowledge, create a knowledge base. This is the container for sources, chunks, embeddings, and vector indexes.

5. Configure chunking and embeddings

Pick a chunking profile, embedding profile, and vector index. The default pgvector path is enough for local or early production testing.

6. Add documents or sources

Upload files, connect S3-compatible storage, or add web sources. Trigger ingestion so Quantlix can fetch, chunk, embed, and index the content.

7. Test retrieval

Use the Knowledge area or retrieval API to ask a question and confirm relevant chunks are returned before adding model generation.

8. Build the RAG workflow

Use input → retrieval → rerank → answer_with_citations → output for evidence-backed workflow responses. For full generated answers, use the RAG API or add a supported prompt-building step before a model node.

9. Add policies and budgets

Apply schema contracts, PII redaction, budget gates, and audit exports so the RAG model behaves predictably in production.

10. Run, trace, and iterate

Run a question, inspect traces, citations, retrieval quality, cost, latency, and model payloads. Add evals once you have golden questions.

Recommended workflow shape

For a simple evidence-backed RAG workflow, start with this graph and add controls as risk increases:

input
  → retrieval
  → rerank (optional)
  → answer_with_citations
  → output

This workflow returns cited evidence from retrieved chunks. If you need provider-generated prose, use the RAG API or build a prompt field that explicitly contains the retrieved context before calling a model node.

Common questions

Do I need to train a model?

No. RAG usually means you keep the model general-purpose and retrieve your documents at request time. You update the knowledge base, not the model weights.

Can I use Claude, OpenAI, Azure OpenAI, Bedrock, Groq, or Together?

Yes. Connect the provider, sync models, create a provider-backed deployment, then use that deployment in the RAG workflow or RAG API.

Where are documents stored?

Documents and chunks are stored in Quantlix storage for the selected deployment model. Embeddings are written to the configured vector backend such as pgvector, Pinecone, Weaviate, or Qdrant.

How do citations work?

Retrieval returns chunks with document and source metadata. RAG answers can include citations when the answer generation path uses retrieved evidence.