Knowledge source configuration: Build RAG from upload, S3, or web
A knowledge base is a RAG container: it groups sources (where documents come from) and defines how they're chunked, embedded, and indexed. You configure it once, then ingest and query.
The setup flow
- Provider — Add a provider with embeddings (e.g. Voyage AI)
- Chunking profile — Strategy (fixed, markdown, semantic), chunk size, overlap
- Embedding profile — Which provider model to use for embeddings
- Vector index — Backend (pgvector, Pinecone, Weaviate, Qdrant)
- Knowledge base — Container with default profiles + vector index
- Sources — Add upload, S3, or web source
- Ingestion — Upload docs, trigger ingestion job
All of this lives in Dashboard → Knowledge.
Source types
| Type | Use case |
|---|---|
| upload | Files uploaded via portal or API. PDFs, markdown, etc. |
| s3 | S3-compatible storage (MinIO, Scaleway, AWS). Bucket + path. |
| web | Fetch from a URL. Good for docs sites. |
Each source has a sync mode: manual (you trigger ingestion), scheduled (runs on an interval), or webhook (external system triggers via URL).
The pipeline
Ingestion runs: fetch documents → chunk → embed → index into vector store. Documents and chunks are stored in Postgres; embeddings go to the vector backend. Once ingested, you can query via the retrieval API or the RAG endpoint.
Webhook for CI/CD
For sources that update from external systems (e.g. docs built in CI), use `sync_mode: "webhook"`. The API returns a webhook secret. When your docs change, call the webhook URL with the token — no API key needed. Quantlix triggers ingestion automatically.