← Back to Blog

Knowledge source configuration: Build RAG from upload, S3, or web

Quantlix Team

A knowledge base is a RAG container: it groups sources (where documents come from) and defines how they're chunked, embedded, and indexed. You configure it once, then ingest and query.

The setup flow

1. **Provider** — Add a provider with embeddings (e.g. Voyage AI)

2. **Chunking profile** — Strategy (fixed, markdown, semantic), chunk size, overlap

3. **Embedding profile** — Which provider model to use for embeddings

4. **Vector index** — Backend (pgvector, Pinecone, Weaviate, Qdrant)

5. **Knowledge base** — Container with default profiles + vector index

6. **Sources** — Add upload, S3, or web source

7. **Ingestion** — Upload docs, trigger ingestion job

All of this lives in **Dashboard → Knowledge**.

Source types

| Type | Use case |

|------|----------|

| **upload** | Files uploaded via portal or API. PDFs, markdown, etc. |

| **s3** | S3-compatible storage (MinIO, Scaleway, AWS). Bucket + path. |

| **web** | Fetch from a URL. Good for docs sites. |

Each source has a sync mode: **manual** (you trigger ingestion), **scheduled** (runs on an interval), or **webhook** (external system triggers via URL).

The pipeline

Ingestion runs: fetch documents → chunk → embed → index into vector store. Documents and chunks are stored in Postgres; embeddings go to the vector backend. Once ingested, you can query via the retrieval API or the RAG endpoint.

Webhook for CI/CD

For sources that update from external systems (e.g. docs built in CI), use `sync_mode: "webhook"`. The API returns a webhook secret. When your docs change, call the webhook URL with the token — no API key needed. Quantlix triggers ingestion automatically.