100% local AI — zero cloud dependency + full privacy

Daneel AI


Chat with anything,
entirely in your browser.


A browser extension that turns any page, local documents or an entire site into a conversation. AI runs on your GPU via WebGPU. No cloud, API keys, no accounts, no data leaves your machine.

No data leaves your browser
Works offline after setup
No API keys or accounts
Full guaranteed privacy
Free & Premium versions

Up and running in 60 seconds

No configuration, no sign-ups, no API keys. Just install and ask.

1

Install the extension

Add it from the Chrome Web Store or load from source. One click.

2

Wait for the model

First launch downloads ~600 MB to your local cache. After that, it's instant forever.

3

Ask anything

Click the floating widget on any page. Ask questions, get AI-powered answers with citations.

4

Index entire sites

Switch to Site mode to crawl, index, and search entire websites as a knowledge base.

5

Index local folders

Switch to Vault mode to index, and search entire local documents as a knowledge base.

Everything you need, nothing you don't

Two powerful modes, one seamless experience.

💬

Page Q&A

Ask questions about any webpage. Smart content extraction strips noise (nav, ads, scripts) and feeds clean markdown to the AI. Streaming responses appear in real time.

🕸

Full Site RAG

Auto-discovers sitemaps, crawls up to 150 pages, chunks and embeds everything on your GPU. Ask questions and get answers with source citations.

🧠

16 Local AI Models

From 280 MB ultra-light to 2.3 GB high-quality. The extension benchmarks your GPU and recommends the best model for your hardware.

🔌

4 LLM Backends

WebGPU (default), Ollama, built-in Gemini Nano, or Claude API if you have no GPU. Switch with one click — the entire experience stays the same.

🔍

Semantic Search

Search across all indexed sites by meaning, not just keywords. Find what you're looking for even if you don't remember the exact words.

📦

Data Portability

Export your entire knowledge base as a ZIP — embeddings, conversations, settings. Import on another machine to pick up where you left off.

The competition requires your data

Every other "chat with a page" tool sends your content to external servers.

Capability This Extension Everyone Else
AI inference Your GPU via WebGPU OpenAI / Anthropic servers
Your data Never leaves your browser Uploaded to third-party clouds
After install Works offline, forever Breaks without API key
Cost Free core, $9 one-time $10–20/month subscriptions
Setup Click install. Done. API keys, accounts, config
Models 16 local + 3 remote backends Whatever the vendor gives you

16 models, ranked for your hardware

The extension profiles your GPU on first run and recommends the best fit. Models that won't fit are automatically flagged.

Granite 4.0 Micro 3B
2.3 GB · Best quality
Phi-3.5 Mini 3.8B
2.0 GB · Reasoning
Granite 4.0 1B
2.15 GB · Balanced
SmolLM2 1.7B
1.65 GB · Fast
Pleias RAG 1B
1.45 GB · RAG specialist
LFM2.5 1.2B Thinking
800 MB · Best ratio
DeepSeek Coder 1.3B
1 GB · Code-focused
Qwen2.5 1.5B
2 GB · General
Llama 3.2 1B
1.9 GB · General
Gemma 3 1B
1.95 GB · General
Qwen3 0.6B
1.05 GB · Lightweight
Qwen2.5 0.5B
420 MB · Lightweight
Granite 4.0 350M
280 MB · Ultra-light
SmolLM 360M
300 MB · Ultra-light

Free forever. Premium if you want more.

The core experience is completely free. Premium unlocks data portability with a one-time payment.

Free

$0
forever
  • Page Q&A with streaming
  • Full site RAG & search
  • All 16 local AI models
  • Ollama / Gemini / Claude
  • Theme & appearance
  • Conversation persistence
  • Data export & import
Install Free

Sponsor

$9+
pay what you want
  • Everything in Paid
  • Same features, bigger impact
  • Fund ongoing development
  • Eternal gratitude
Become a Sponsor

Built for performance and privacy

Clean architecture, modern stack, zero runtime overhead.

WebGPU Inference

ONNX Runtime Web + @huggingface/transformers. Models run as native GPU compute shaders — no WASM overhead for inference.

Svelte 5 UI

Compiles to vanilla JS at build time. Zero runtime, zero virtual DOM. The popup and widget add virtually nothing to page weight.

Provider Architecture

Every subsystem (LLM, embeddings, vector store, crawler) is defined by a TypeScript interface. Swap any implementation with one line.

Web Workers

LLM inference, embedding generation, and site crawling each run in dedicated workers. The UI stays at 60fps no matter what.

Common questions

How much data does the model download?

The default model is ~600 MB. It downloads once from Hugging Face's CDN and caches permanently. Smaller models start at ~280 MB.

Does it work offline?

Yes. After the initial model download, all AI inference runs locally. Querying already-indexed sites works fully offline. Site crawling requires network access.

Is my data sent anywhere?

No. Page content, questions, and AI responses stay in your browser. The only network calls: model download (once), optional license refresh (weekly, paid users), and optional anonymous telemetry (opt-in).

Is the paid license a subscription?

No. One-time payment of $9. No recurring charges, no expiration. The license key works forever.

What browsers are supported?

Chrome 113+ with WebGPU. Edge (Chromium-based) works with sideloading. Without WebGPU, it falls back to CPU inference, or you can connect to Ollama / Claude API.

Why is inference slow on my machine?

Try a smaller model in Settings. The extension benchmarks your GPU and recommends a model, but integrated GPUs may struggle with larger models. You can also connect to a local Ollama server for better performance.