Retrieval integrations: Vector backends and RAG
RAG (Retrieval Augmented Generation) needs a vector store. Quantlix supports several: pgvector (built-in), Pinecone, Weaviate, and Qdrant. You choose the backend that fits your scale and deployment model.
Vector backends
| Backend | Type | Use case |
|---------|------|----------|
| **pgvector** | Built-in (Postgres) | Default. No extra setup. |
| **Pinecone** | Cloud | Managed, scalable. |
| **Weaviate** | Local or cloud | Self-hosted or cloud. |
| **Qdrant** | Local or cloud | Self-hosted or cloud. |
Create vector indexes in **Dashboard → Knowledge → Vector indexes**. Assign one to each knowledge base. pgvector works out of the box. For Pinecone, Weaviate, or Qdrant, you add the service (e.g. via Docker for local) and configure the URL and credentials.
Two ways to query
**Semantic search only** — `POST /retrieval/query` returns chunks. You get the relevant text chunks for a query; you decide what to do with them (e.g. pass to your own LLM).
curl -s -X POST "$API_URL/retrieval/query" \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"knowledge_base_id": "'"$KB_ID"'", "query": "What is RAG?", "top_k": 5}' | python3 -m json.tool**Full RAG** — `POST /rag/run` retrieves chunks and generates an answer with citations. You provide a chat model and a question; Quantlix does the rest.
curl -s -X POST "$API_URL/rag/run" \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"knowledge_base_id": "'"$KB_ID"'", "provider_model_id": "'"$CHAT_MODEL_ID"'", "question": "What is RAG?", "top_k": 5}' | python3 -m json.toolReturns `answer`, `citations`, `model`, and token usage.
Try it in the portal
Add a knowledge base, add sources, upload documents, run ingestion. Then use the **Try RAG** modal on the knowledge base page to ask questions and see answers with citations.