Thirty-nine models, five providers, one question: which one is right for you?
Daneel supports AI models that run on your GPU, on your local Ollama server, through Chrome's built-in Gemini Nano, via the Claude API, or on Azure OpenAI. That's a lot of choice — and choice without guidance is just confusion.
The Model Registry changes that. It's a unified catalog of every model Daneel supports, paired with a hardware evaluation engine that tells you exactly which ones will run well on your machine, which ones will struggle, and which ones won't fit at all.
No more guessing. No more GPU out-of-memory errors halfway through a conversation.
One registry, five providers
The registry ships with 39 models across every backend Daneel supports:
- WebGPU: runs entirely on your GPU, from the 350M-parameter Granite 4.0 Micro to the 3.8B Phi-3.5 Mini. No network, no API keys, no cost.
- Ollama: local server inference, including heavyweight options like Llama 3.1 70B for machines that can handle it
- Claude: Anthropic's API, from the fast Haiku 4.5 to the flagship Opus 4.6
- Azure Foundry: Your Azure Foundry deployed models like GTP-5.4, GTP-5.4-mini, GPT-4o, GPT-4.1, Phi-4...
- Gemini Nano: Chrome's built-in on-device AI, zero setup
Every model carries the same structured metadata: capabilities (tool calling, thinking, streaming, vision), cost, context window, quality tier, license, hardware requirements, and a full privacy profile. One format, every provider.
Your hardware, measured
Before recommending anything, Daneel needs to know what you're working with. The hardware probe runs automatically in the background and measures four things:
- GPU identity — vendor, architecture, discrete vs. integrated, shader-f16 support
- VRAM — estimated from WebGPU buffer limits, snapped to known hardware tiers (4 GB, 8 GB, 16 GB, etc.)
- Compute power — a real GPU benchmark: 512x512 matrix multiplication, three timed runs, median result in GFLOPS
- Memory bandwidth — 64 MB buffer copy across ten iterations, measuring actual throughput in GB/s
The results appear in a hardware card at the top of the model browser — your GPU name, VRAM, RAM, compute power, and whether your hardware supports fp16 shaders. You can re-probe anytime with one click.
This isn't a spec sheet lookup. It's a live measurement of what your GPU can actually do right now.
Models that fit
With your hardware measured, every model in the registry gets an evaluation. The engine checks hard constraints first:
- Does it fit? Peak GPU memory compared against your available VRAM budget, with safety margins for discrete GPUs (75%), integrated (50%), and a hard cap at 35% of system RAM for shared-memory GPUs
- Does it need fp16? Models requiring shader-f16 are marked incompatible if your GPU doesn't support it
- Will it be fast enough? Estimated tokens-per-second from your measured memory bandwidth and the model's size
Each model gets a status: compatible (runs well), marginal (fits but tight — expect reduced context or slower inference), or incompatible (won't run on your hardware). Marginal models include specific warnings: "Requires 16 GB VRAM; you have 8 GB" or "Expect slow inference below recommended GFLOPS."
The engine also scales effective context windows. If memory headroom drops below 30%, the available context shrinks proportionally — because the KV cache grows with context length, and an OOM error mid-conversation helps nobody.
Find my best model
Don't want to browse? The Model Wizard gets you to the right model in two steps.
Step 1 — What matters most? Pick one of four priorities:
- Balanced — good quality with reasonable privacy and cost
- Privacy first — everything stays on your device, no exceptions
- Best quality — most capable models regardless of where they run
- Cost conscious — free and local models only, no API costs
Each priority maps to a scoring preset that weights quality, capabilities, privacy, and cost differently. Privacy-first puts 50% weight on data residency. Quality-first puts 55% on quality tier and 30% on capabilities.
Step 2 — Your top picks. The wizard runs the recommendation engine against your hardware and selected priority, then presents up to five models ranked by composite score. The top pick gets highlighted styling. Each result shows its match score, compatibility status, privacy level, effective context window, cost, and capability badges. One click takes you to the provider's settings tab to configure it.
Browse and filter
The full model browser gives you everything. A search bar matches across model names, descriptions, IDs, and providers. Three filter dropdowns narrow by provider, privacy floor, and required capability.
Every model appears as an expandable card. Collapsed: provider icon, name, privacy badge, compatibility status, context window, cost, and capability pills. Expanded: description, quality stars, license, detailed hardware evaluation with warnings, estimated tokens-per-second, and a direct link to configure the model in its provider's settings.
Models re-score in real time as you change filters. The summary bar always shows how many models match and how many are compatible with your hardware.
Privacy as a first-class filter
Every model carries a privacy profile with four data residency levels:
- On-device — never leaves your browser (WebGPU)
- Local network — leaves the browser but stays on your machine (Ollama)
- Your cloud — goes to your organization's cloud tenant (Azure OpenAI)
- Third-party cloud — reaches an external API provider (Claude, OpenAI)
The privacy filter works as a floor: select "Local network" and you'll see on-device and LAN models, but not cloud. Select "On-device only" and only WebGPU and Gemini Nano remain.
Each model also declares whether the weights are auditable (open-source vs. proprietary) and exactly who can observe your data. No ambiguity.
Ollama: your models, recognized
If you run Ollama, Daneel doesn't just show the registry's suggested models — it recognizes what you already have installed.
The merge engine pulls your installed model list from Ollama's API and matches each one against the registry in three tiers: exact tag match (high confidence), family match (same model family, different size — metadata inherited with warnings), or unknown (no match — capabilities inferred from parameter count using conservative heuristics).
The result: your installed models appear at the top with full metadata, followed by recommended models you might want to pull. You see what you have, what it can do, and what else is available — all in one view.
Always fresh, always available
The registry loads through a three-tier strategy:
- Chrome storage cache (24-hour TTL) — instant load, works offline
- Backend API (3-second timeout) — fresh data from the server, cached on success
- Bundled fallback — compiled into the extension package, guaranteed available
You never see a loading spinner on repeat visits. The cache serves immediately while a background refresh checks for updates. If the network is down, the bundled fallback ensures the full registry is always available.
What's next
We're working on per-model download tracking for WebGPU (so you can see which models are already cached), automatic model suggestions when hardware changes (new GPU? new recommendations), and deeper Ollama integration with pull progress tracking in the browser.
The right model for you depends on your hardware, your priorities, and your privacy requirements. Now Daneel can figure that out for you.