1. Choose the model provider
Connect the chat model you want to use for answer generation, such as Anthropic, OpenAI, Azure OpenAI, Bedrock, Groq, or Together.
RAG deployment
This guide shows the production path for retrieval-augmented generation: connect a model, ingest documents, test retrieval, create a governed workflow, and inspect traces and audit evidence.
Connect the chat model you want to use for answer generation, such as Anthropic, OpenAI, Azure OpenAI, Bedrock, Groq, or Together.
Add an embedding-capable provider, such as Voyage AI or another configured embedding model. Embeddings power semantic search over your documents.
Create a Quantlix deployment for the chat model, then bind it to the provider model you want Quantlix to govern.
In Dashboard → Knowledge, create a knowledge base. This is the container for sources, chunks, embeddings, and vector indexes.
Pick a chunking profile, embedding profile, and vector index. The default pgvector path is enough for local or early production testing.
Upload files, connect S3-compatible storage, or add web sources. Trigger ingestion so Quantlix can fetch, chunk, embed, and index the content.
Use the Knowledge area or retrieval API to ask a question and confirm relevant chunks are returned before adding model generation.
Use input → retrieval → rerank → answer_with_citations → output for evidence-backed workflow responses. For full generated answers, use the RAG API or add a supported prompt-building step before a model node.
Apply schema contracts, PII redaction, budget gates, and audit exports so the RAG model behaves predictably in production.
Run a question, inspect traces, citations, retrieval quality, cost, latency, and model payloads. Add evals once you have golden questions.
For a simple evidence-backed RAG workflow, start with this graph and add controls as risk increases:
input → retrieval → rerank (optional) → answer_with_citations → output
This workflow returns cited evidence from retrieved chunks. If you need provider-generated prose, use the RAG API or build a prompt field that explicitly contains the retrieved context before calling a model node.
No. RAG usually means you keep the model general-purpose and retrieve your documents at request time. You update the knowledge base, not the model weights.
Yes. Connect the provider, sync models, create a provider-backed deployment, then use that deployment in the RAG workflow or RAG API.
Documents and chunks are stored in Quantlix storage for the selected deployment model. Embeddings are written to the configured vector backend such as pgvector, Pinecone, Weaviate, or Qdrant.
Retrieval returns chunks with document and source metadata. RAG answers can include citations when the answer generation path uses retrieved evidence.