A browser extension that turns any page, local documents or an entire site into a conversation. AI runs on your GPU via WebGPU. No cloud, API keys, no accounts, no data leaves your machine.
No configuration, no sign-ups, no API keys. Just install and ask.
Add it from the Chrome Web Store or load from source. One click.
First launch downloads ~600 MB to your local cache. After that, it's instant forever.
Click the floating widget on any page. Ask questions, get AI-powered answers with citations.
Switch to Site mode to crawl, index, and search entire websites as a knowledge base.
Switch to Vault mode to index, and search entire local documents as a knowledge base.
Two powerful modes, one seamless experience.
Ask questions about any webpage. Smart content extraction strips noise (nav, ads, scripts) and feeds clean markdown to the AI. Streaming responses appear in real time.
Auto-discovers sitemaps, crawls up to 150 pages, chunks and embeds everything on your GPU. Ask questions and get answers with source citations.
From 280 MB ultra-light to 2.3 GB high-quality. The extension benchmarks your GPU and recommends the best model for your hardware.
WebGPU (default), Ollama, built-in Gemini Nano, or Claude API if you have no GPU. Switch with one click — the entire experience stays the same.
Search across all indexed sites by meaning, not just keywords. Find what you're looking for even if you don't remember the exact words.
Export your entire knowledge base as a ZIP — embeddings, conversations, settings. Import on another machine to pick up where you left off.
Every other "chat with a page" tool sends your content to external servers.
| Capability | This Extension | Everyone Else |
|---|---|---|
| AI inference | Your GPU via WebGPU | OpenAI / Anthropic servers |
| Your data | Never leaves your browser | Uploaded to third-party clouds |
| After install | Works offline, forever | Breaks without API key |
| Cost | Free core, $9 one-time | $10–20/month subscriptions |
| Setup | Click install. Done. | API keys, accounts, config |
| Models | 16 local + 3 remote backends | Whatever the vendor gives you |
The extension profiles your GPU on first run and recommends the best fit. Models that won't fit are automatically flagged.
The core experience is completely free. Premium unlocks data portability with a one-time payment.
Clean architecture, modern stack, zero runtime overhead.
ONNX Runtime Web + @huggingface/transformers. Models run as native GPU compute shaders — no WASM overhead for inference.
Compiles to vanilla JS at build time. Zero runtime, zero virtual DOM. The popup and widget add virtually nothing to page weight.
Every subsystem (LLM, embeddings, vector store, crawler) is defined by a TypeScript interface. Swap any implementation with one line.
LLM inference, embedding generation, and site crawling each run in dedicated workers. The UI stays at 60fps no matter what.
The default model is ~600 MB. It downloads once from Hugging Face's CDN and caches permanently. Smaller models start at ~280 MB.
Yes. After the initial model download, all AI inference runs locally. Querying already-indexed sites works fully offline. Site crawling requires network access.
No. Page content, questions, and AI responses stay in your browser. The only network calls: model download (once), optional license refresh (weekly, paid users), and optional anonymous telemetry (opt-in).
No. One-time payment of $9. No recurring charges, no expiration. The license key works forever.
Chrome 113+ with WebGPU. Edge (Chromium-based) works with sideloading. Without WebGPU, it falls back to CPU inference, or you can connect to Ollama / Claude API.
Try a smaller model in Settings. The extension benchmarks your GPU and recommends a model, but integrated GPUs may struggle with larger models. You can also connect to a local Ollama server for better performance.