---
id: "vault-part-2"
date: "2026-04-01"
title: "The Vault, part 2: your knowledge graph"
summary: "Daneel extracts entities from your documents, resolves duplicates, builds a relationship graph, and renders it as an interactive 3D visualization you can filter and explore."
image: "/medias/kg.2.png"
header: "Feature"
tags: ["feature", "vault", "knowledge-graph", "ner", "visualization"]
---

## From documents to understanding

In Part 1, we covered how the Vault stores and searches your documents using semantic embeddings. That gives you great retrieval, but it treats documents as bags of text. The knowledge graph goes further: it reads your documents, identifies the people, organizations, places, and concepts they mention, figures out which entities co-occur, and builds a navigable graph of relationships.

The result is a map of what your documents are actually about.

![Knowledge graph from Albert Einstein Wikipedia page](https://daneel.injen.io/medias/articles/kg.graph.albert.large.jpg)

## How entity extraction works

The extraction pipeline uses GLiNER, a span-level transformer model designed for named entity recognition with custom labels. Unlike traditional NER systems locked to a fixed set of types (person, location, organization), GLiNER accepts whatever entity types you define. Tell it to look for "regulation," "court," and "penalty" in your legal documents, and it will.

Four model variants are available, depending on your hardware and language needs:

- **Small v2.1 (fp32)** at 583 MB for best English accuracy on GPU
- **Small v2.1 (int8)** at 183 MB for fast CPU-only extraction
- **Multi v2.1 (int8)** at 349 MB for 100+ languages on CPU
- **Multi v2.1 (fp16)** at 580 MB for multilingual with GPU acceleration

The model runs locally through ONNX Runtime, using WebGPU when available and falling back to WASM. Your documents never leave the browser for entity extraction.

Chunks are processed in batches of eight. For each batch, GLiNER evaluates all candidate spans (up to 12 tokens wide) against your entity types, scores them with a confidence threshold (default 0.3), and resolves overlapping spans with greedy selection. A noise filter strips out pronouns, stop words, URL fragments, and overly short strings before anything reaches the graph.

## Ontologies: tell the AI what to look for

Entity types are defined through ontologies. Daneel ships eight domain-specific presets:

- **General** for broad documents: person, organization, location, product, concept
- **Academic & Research** for papers: institution, theory, method, publication, dataset
- **Legal & Regulatory** for compliance: regulation, law, jurisdiction, contract, clause
- **Medical & Healthcare** for clinical docs: disease, symptom, treatment, drug, gene
- **Programming & Software** for code docs: class, function, library, framework, API
- **Business & Finance** for corporate: company, market, revenue, acquisition
- **Travel & Tourism** for guides: hotel, airline, landmark, restaurant
- **History & Geography** for reference: person, event, battle, dynasty, artifact

Each preset is a starting point. You can create your own ontologies from scratch, picking from a universal vocabulary of 100 entity types organized across eight pillars: agents, organizations, places, events, knowledge, creations, legal concepts, and technology. Or just type in your own labels. The ontology editor lets you name it, pick an icon, and toggle it on or off per vault.

Different vaults can use different ontologies. Your legal vault extracts regulations and clauses. Your research vault extracts theories and datasets. Same pipeline, different lenses.

## Entity resolution: no duplicates

Raw NER output is messy. "Albert Einstein," "Einstein," and "A. Einstein" all refer to the same person. The entity resolver handles this with a two-pass algorithm.

First pass: text matching. After normalizing case and punctuation, the resolver checks for exact matches, substring containment ("Einstein" inside "Albert Einstein"), and reversed name order ("Einstein, Albert" matches "Albert Einstein"). All comparisons are scoped by entity type, so a company named "Einstein" won't merge with the physicist.

Second pass: embedding similarity. Each new entity gets embedded using the same model that handles document chunks. The resolver computes cosine similarity against all existing entities of the same type. If similarity exceeds the threshold (default 0.92), the entities merge. If not, a new canonical entity is created.

The result is a clean set of deduplicated entities with mention counts that reflect how often each one appears across your documents.

## Building the graph

With entities extracted and deduplicated, the graph builder connects them through co-occurrence. When two entities appear in the same chunk, they get an edge. Multiple co-occurrences in different chunks increase the edge weight, reflecting stronger relationships.

The build process is incremental and tracks progress in real time. You see the chunk count advancing, the entity count growing, and a type breakdown (e.g., "person: 42, organization: 38, regulation: 15") updating as extraction proceeds.

Graphs are stored in a dedicated IndexedDB database (`daneel-kg`), separate from the main chunk store. Three object stores hold entities, mentions (chunk-to-entity edges), and serialized graph state. Everything is vault-scoped, so deleting a vault cleanly removes its graph.

## 3D interactive visualization

This is the part you have to see. The knowledge graph renders as a force-directed 3D network using Three.js and WebGL.

Entities become spheres, sized logarithmically by mention count. Colors are assigned by type from a curated palette: blue tones for people and agents, orange for organizations, green for places, red for events, purple for knowledge concepts, teal for creative works. Over 100 entity types have dedicated colors; anything else falls back to neutral gray.

Edges connect co-occurring entities. Their width reflects relationship strength. Gold directional particles flow along edges, showing the structure of the network in motion.

The simulation uses D3 force physics: nodes repel each other (configurable charge strength), edges pull connected nodes together (configurable link distance), and the whole system settles into a stable layout over a few seconds. You can orbit, zoom, and drag to explore from any angle.

## Filtering and focus

A type legend overlays the visualization, showing every entity type present in the graph with its count. Click a type to toggle it off. This lets you strip away noise and focus on the entity categories that matter for your question.

A document dropdown scopes the graph to a single file. Select a specific document and only entities mentioned in that document remain visible, with their co-occurrence edges intact. Switch back to "all documents" to see the full picture.

Click any node to focus on its neighborhood. The camera zooms to that entity and highlights its direct connections (capped at 25 neighbors to keep things readable). Click again to zoom in further. This makes it easy to explore dense graphs by starting from a known entity and walking outward.

## Tuning the visualization

The settings panel exposes the physics and rendering parameters for users who want control:

- **Charge strength** adjusts how strongly nodes repel each other (denser or more spread out)
- **Link distance** sets the preferred edge length
- **Node scale** controls sphere sizes
- **Link opacity** dims or brightens edges
- **Particle speed** controls the flow rate of directional particles along edges
- **Bloom** enables a GPU-powered glow effect (UnrealBloomPass) for a more dramatic look

These are saved per session and apply immediately, so you can experiment in real time.

## What you can learn

The knowledge graph surfaces relationships that are hard to spot by reading documents sequentially. Which people are mentioned alongside which organizations? Which regulations co-occur with which penalties? Which concepts bridge two otherwise separate document clusters?

Combined with the Vault's RAG capabilities from Part 1, you can explore the graph to identify patterns, then switch to chat mode and ask targeted questions grounded in the actual source text. The graph tells you where to look. The RAG gives you the answers.

This is still an early feature, and we're actively expanding it. Entity-augmented search (using graph relationships to boost retrieval) and cross-vault graph merging are on the roadmap. For now, build a vault, extract entities, and explore what your documents know.

---

[Read on site](https://daneel.injen.io/news/vault-part-2.html?utm_source=extension_news_reader&utm_medium=extension_settings&utm_campaign=extension)
