100% local AI — zero cloud, full privacy

Daneel AI

R. Daneel Olivaw

Chat with anything,
build knowledge graphs and run agents,
entirely in your browser.



A browser extension that turns any page, local documents or an entire site into agentic conversations and actionable knowledge. Connect to MCP servers, build your own agents and run AI on your GPU. No cloud, API keys, no accounts, no data leaves your trusted environment.

Private Release Candidate
Invitation-only access while we finalize the public launch.

Request RC access

Can your machine run this?

Benchmark your GPU right here in the browser. We measure WebGPU support, fp16 shaders, compute throughput and memory bandwidth — then tell you which model tier fits your hardware.

No data leaves your trusted environment
Works offline after setup
No API keys or accounts
Full guaranteed privacy
Free & Premium versions

Up and running in minutes

No configuration headaches, no sign-ups, no API keys. Just install and ask.

1

Request RC access

Open an issue on GitHub. We'll send you a packaged build — drop it in Chrome's extensions page with Developer Mode on and you're running.

2

Wait for the model

First launch downloads ~600 MB to your local cache. After that, it's instant forever.

3

Ask anything

Click the floating widget on any page. Ask questions, get AI-powered answers with citations.

4

Index entire sites

Switch to Site mode to crawl, index, and search entire websites as a knowledge base.

5

Index local folders

Switch to Vault mode to index, and search entire local documents as a knowledge base.

6

Connect MCP servers

Attach tools to LLMs and empower them with live data.

7

Build your own Agents

Attach tools to LLMs and empower them with live data.

Search in sites by meaning, not just keywords.

Powerful semantic search across all indexed sites

Semantic Search

Everything you need, nothing you don't

Two powerful modes, one seamless experience.

💬

Page Q&A

Ask questions about any webpage. Smart content extraction strips noise (nav, ads, scripts) and feeds clean markdown to the AI. Streaming responses appear in real time.

🕸

Full Site RAG

Auto-discovers sitemaps, crawls up to 150 pages, chunks and embeds everything on your GPU. Ask questions and get answers with source citations.

🧠

16 Local AI Models

From 280 MB ultra-light to 2.3 GB high-quality. The extension benchmarks your GPU and recommends the best model for your hardware.

🔌

4 LLM Backends

WebGPU (default), Ollama, built-in Gemini Nano, or Claude API if you have no GPU. Switch with one click — the entire experience stays the same.

🔍

Semantic Search

Search across all indexed sites by meaning, not just keywords. Find what you're looking for even if you don't remember the exact words.

📦

Data Portability

Export your entire knowledge base as a ZIP — embeddings, conversations, settings. Import on another machine to pick up where you left off.

The competition requires your data

Every other "chat with a page" tool sends your content to external servers.

Capability This Extension Everyone Else
AI inference Your GPU via WebGPU OpenAI / Anthropic servers
Your data Never leaves your browser Uploaded to third-party clouds
After install Works offline, forever Breaks without API key
Cost Free core, $9 one-time $10–20/month subscriptions
Setup Click install. Done. API keys, accounts, config
Models 16 local + 3 remote backends Whatever the vendor gives you

35 models, ranked for your hardware

The extension profiles your GPU on first run and recommends the best fit. Models that won't fit are automatically flagged.

Granite 4.0 Micro 3B
2.3 GB · Best quality
Phi-3.5 Mini 3.8B
2.0 GB · Reasoning
Granite 4.0 1B
2.15 GB · Balanced
SmolLM2 1.7B
1.65 GB · Fast
Pleias RAG 1B
1.45 GB · RAG specialist
LFM2.5 1.2B Thinking
800 MB · Best ratio
DeepSeek Coder 1.3B
1 GB · Code-focused
Qwen2.5 1.5B
2 GB · General
Llama 3.2 1B
1.9 GB · General
Gemma 3 1B
1.95 GB · General
Qwen3 0.6B
1.05 GB · Lightweight
Qwen2.5 0.5B
420 MB · Lightweight
Granite 4.0 350M
280 MB · Ultra-light
SmolLM 360M
300 MB · Ultra-light
And more...
From 350M to 120B depending on your hardware

Built for performance and privacy

Clean architecture, modern stack, zero runtime overhead.

WebGPU Inference

ONNX Runtime Web + @huggingface/transformers. Models run as native GPU compute shaders — no WASM overhead for inference.

Svelte 5 UI

Compiles to vanilla JS at build time. Zero runtime, zero virtual DOM. The popup and widget add virtually nothing to page weight.

Web Workers

LLM inference, embedding generation, and site crawling each run in dedicated workers. The UI stays at 60fps no matter what.

Common questions

How can I get Daneel today?

Daneel is currently in private Release Candidate testing — access is by invitation while we finalize the public launch. To request access, open an issue on our public GitHub repository telling us a bit about your setup and use case — we'll reach out with a packaged build.

How much data does the model download?

The default model is ~600 MB. It downloads once from Hugging Face's CDN and caches permanently. Smaller models start at ~280 MB.

Does it work offline?

Yes. After the initial model download, all AI inference runs locally. Querying already-indexed sites works fully offline. Site crawling requires network access.

Is my data sent anywhere?

No. Page content, questions, and AI responses stay in your browser. The only network calls: model download (once), optional license refresh (weekly, paid users), and optional anonymous telemetry (opt-in).

Is the paid license a subscription?

No. One-time payment of $9. No recurring charges, no expiration. The license key works forever.

What browsers are supported?

Chrome 113+ with WebGPU. Edge (Chromium-based) works with sideloading. Without WebGPU, it falls back to CPU inference, or you can connect to Ollama / Claude API.

Why is inference slow on my machine?

Try a smaller model in Settings. The extension benchmarks your GPU and recommends a model, but integrated GPUs may struggle with larger models. You can also connect to a local Ollama server for better performance.

What's new in Daneel

Features, improvements, and announcements.

More news...

Understanding in-browser LLM inference

In-depth research papers from Daneel AI on local inference, model architecture, quantization, and adjacent technologies.

More papers...