---
id: "vault-part-1"
date: "2026-03-20"
title: "The Vault, part 1: your private document knowledge base"
summary: "Import your own documents, embed them locally, and chat with your personal knowledge base. PDF, DOCX, Markdown, HTML, plain text — all searchable, all private."
image: "/medias/vault.png"
header: "Feature"
tags: ["feature", "vault", "rag", "documents", "privacy"]
---

## Your documents, your AI

Daneel started as a way to chat with web pages. The Vault takes that further: import your own files, build a personal knowledge base, and ask questions across your entire document collection. Everything runs locally. Nothing gets uploaded.

Think of it as a private, searchable library where the librarian actually reads every document and can answer questions about any of them.

## Import anything you have

The Vault accepts five document formats: Markdown, plain text, PDF, DOCX, and HTML. Import through a file picker, or select an entire folder using the FileSystem Access API. Each file goes through a format-specific converter before entering the pipeline:

- **PDF** files are converted to structured Markdown with [EdgeParse](https://github.com/raphaelmansuy/edgeparse), a Rust-based parser compiled to WebAssembly that preserves headings, tables, and reading order. Fully scanned (image-only) PDFs are detected and rejected with a clear error rather than silently producing empty results.
- **DOCX** files pass through the mammoth library, converting Word documents to HTML, then to clean Markdown using the same Turndown pipeline that handles web pages.
- **HTML** files are converted with the same Markdown pipeline.
- **Markdown and plain text** pass through directly with UTF-8 BOM stripping.

Every imported file gets a SHA-256 content hash. If you try to import a file that's already in the vault, Daneel skips it. No duplicates, no wasted embeddings.

## Semantic chunking and embedding

Once imported, documents are split into chunks using a recursive semantic chunker (powered by Chonkie.js). Unlike naive splitting on word count, this chunker respects natural boundaries: it splits on paragraphs first, then sentences, then words. Chunks stay coherent, which directly improves search quality.

Each chunk is embedded using the same WebGPU pipeline that powers Site RAG. The default model (bge-small-en-v1.5 at fp16) runs on your GPU with sub-second batch processing. Embeddings are stored in IndexedDB, partitioned by vault ID, so each vault is an isolated search space.

A progress bar tracks every stage: file conversion, chunking, and embedding. Large imports with dozens of documents give you real-time feedback on exactly where things stand.

## Search and chat

With documents embedded, you can search your vault or chat with it.

**Vault-wide search** finds the most relevant passages across all your documents. Type a question and Daneel embeds your query, runs GPU-accelerated cosine similarity against every chunk in the vault (typically under 5 milliseconds even for thousands of chunks), and returns the best matches with similarity scores and source attribution.

**Document-scoped search** narrows to a single file. Select a document in the sidebar and your questions only match against that document's chunks. Useful when you know which file has the answer and want precise results.

**Chat mode** takes search results and feeds them to your active LLM as context, producing a grounded answer with references to specific documents. This works with every provider: ask a question about your contract collection using WebGPU, Ollama, Claude, or Azure, and get an answer backed by actual passages from your files.

## The document viewer

Select any document in your vault and its full content renders in a viewer pane alongside the chat. The Markdown is parsed and displayed with GitHub Flavored Markdown support, including tables, code blocks, and links (which open in new tabs). You can read the source material while chatting about it, without switching contexts.

If the cached content is missing (rare edge case after a storage cleanup), the viewer offers a re-import button so you can restore it instantly.

## Attach MCP tools

Vaults can have MCP servers attached directly. This means your AI can combine local document search with live external data in the same conversation.

Attach Stripe to your invoices vault and ask "Which client from my Q1 contracts has an overdue payment?" The LLM searches your local documents for client names, then calls Stripe to check payment status. Local knowledge plus live tools, in one question.

The tools section in the vault sidebar shows all registered MCP servers with available tools. Click to attach, click to detach. Simple.

## Bind an agent

For more structured workflows, attach an agent instead of individual MCP servers. An agent brings its own system prompt (persona, task, constraints, style) and its own set of MCP tools, creating a focused AI environment tuned for the vault's content.

A finance analyst agent on your invoices vault. A legal reviewer agent on your contracts vault. A research assistant agent on your papers vault. The agent shapes both the AI's behavior and its available tools.

One rule keeps things clean: a vault uses either standalone MCP servers or an agent, never both. This mutual exclusion prevents conflicting tool configurations and makes it obvious which tools are active.

## Free and paid tiers

The Vault is available to everyone, with generous free limits:

- **Free**: 1 vault, 5 documents per vault, 1 MB file size, 100 chunks per document
- **Paid**: unlimited vaults, 50 documents per vault, 10 MB file size, 1,000 chunks per document

The free tier is enough to try the feature with a handful of documents. The paid tier opens it up for serious use with large document collections.

## What comes next

This is Part 1 of the Vault story. In Part 2, we cover something more ambitious: how Daneel extracts named entities from your documents, builds a knowledge graph of their relationships, and lets you explore it in an interactive 3D visualization. Stay tuned.

---

[Read on site](https://daneel.injen.io/news/vault-part-1.html?utm_source=extension_news_reader&utm_medium=extension_settings&utm_campaign=extension)
