Retrieval integrations: Vector backends and RAG

2025-02-23Quantlix Team

RAG (Retrieval Augmented Generation) needs a vector store. Quantlix supports several: pgvector (built-in), Pinecone, Weaviate, and Qdrant. You choose the backend that fits your scale and deployment model.

Vector backends

Backend	Type	Use case
pgvector	Built-in (Postgres)	Default. No extra setup.
Pinecone	Cloud	Managed, scalable.
Weaviate	Local or cloud	Self-hosted or cloud.
Qdrant	Local or cloud	Self-hosted or cloud.

Create vector indexes in Dashboard → Knowledge → Vector indexes. Assign one to each knowledge base. pgvector works out of the box. For Pinecone, Weaviate, or Qdrant, you add the service (e.g. via Docker for local) and configure the URL and credentials.

Two ways to query

Semantic search only — `POST /retrieval/query` returns chunks. You get the relevant text chunks for a query; you decide what to do with them (e.g. pass to your own LLM).

curl -s -X POST "$API_URL/retrieval/query" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"knowledge_base_id": "'"$KB_ID"'", "query": "What is RAG?", "top_k": 5}' | python3 -m json.tool

Full RAG — `POST /rag/run` retrieves chunks and generates an answer with citations. You provide a chat model and a question; Quantlix does the rest.

curl -s -X POST "$API_URL/rag/run" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"knowledge_base_id": "'"$KB_ID"'", "provider_model_id": "'"$CHAT_MODEL_ID"'", "question": "What is RAG?", "top_k": 5}' | python3 -m json.tool

Returns `answer`, `citations`, `model`, and token usage.

Try it in the portal

Add a knowledge base, add sources, upload documents, run ingestion. Then use the Try RAG modal on the knowledge base page to ask questions and see answers with citations.