Knowledge source configuration: Build RAG from upload, S3, or web

2025-02-23Quantlix Team

A knowledge base is a RAG container: it groups sources (where documents come from) and defines how they're chunked, embedded, and indexed. You configure it once, then ingest and query.

The setup flow

Provider — Add a provider with embeddings (e.g. Voyage AI)
Chunking profile — Strategy (fixed, markdown, semantic), chunk size, overlap
Embedding profile — Which provider model to use for embeddings
Vector index — Backend (pgvector, Pinecone, Weaviate, Qdrant)
Knowledge base — Container with default profiles + vector index
Sources — Add upload, S3, or web source
Ingestion — Upload docs, trigger ingestion job

All of this lives in Dashboard → Knowledge.

Source types

Type	Use case
upload	Files uploaded via portal or API. PDFs, markdown, etc.
s3	S3-compatible storage (MinIO, Scaleway, AWS). Bucket + path.
web	Fetch from a URL. Good for docs sites.

Each source has a sync mode: manual (you trigger ingestion), scheduled (runs on an interval), or webhook (external system triggers via URL).

The pipeline

Ingestion runs: fetch documents → chunk → embed → index into vector store. Documents and chunks are stored in Postgres; embeddings go to the vector backend. Once ingested, you can query via the retrieval API or the RAG endpoint.

Webhook for CI/CD

For sources that update from external systems (e.g. docs built in CI), use `sync_mode: "webhook"`. The API returns a webhook secret. When your docs change, call the webhook URL with the token — no API key needed. Quantlix triggers ingestion automatically.