Knowledge source configuration: Build RAG from upload, S3, or web
A knowledge base is a RAG container: it groups sources (where documents come from) and defines how they're chunked, embedded, and indexed. You configure it once, then ingest and query.
The setup flow
1. **Provider** — Add a provider with embeddings (e.g. Voyage AI)
2. **Chunking profile** — Strategy (fixed, markdown, semantic), chunk size, overlap
3. **Embedding profile** — Which provider model to use for embeddings
4. **Vector index** — Backend (pgvector, Pinecone, Weaviate, Qdrant)
5. **Knowledge base** — Container with default profiles + vector index
6. **Sources** — Add upload, S3, or web source
7. **Ingestion** — Upload docs, trigger ingestion job
All of this lives in **Dashboard → Knowledge**.
Source types
| Type | Use case |
|------|----------|
| **upload** | Files uploaded via portal or API. PDFs, markdown, etc. |
| **s3** | S3-compatible storage (MinIO, Scaleway, AWS). Bucket + path. |
| **web** | Fetch from a URL. Good for docs sites. |
Each source has a sync mode: **manual** (you trigger ingestion), **scheduled** (runs on an interval), or **webhook** (external system triggers via URL).
The pipeline
Ingestion runs: fetch documents → chunk → embed → index into vector store. Documents and chunks are stored in Postgres; embeddings go to the vector backend. Once ingested, you can query via the retrieval API or the RAG endpoint.
Webhook for CI/CD
For sources that update from external systems (e.g. docs built in CI), use `sync_mode: "webhook"`. The API returns a webhook secret. When your docs change, call the webhook URL with the token — no API key needed. Quantlix triggers ingestion automatically.