Local models, browser convenience

Running AI on your own hardware used to mean terminal commands, Python scripts, and a lot of configuration. Ollama made local inference approachable. Daneel makes it seamless.

With Ollama support, you can run models like Llama, Mistral, Qwen, Phi, and Granite entirely on your machine — and interact with them through the same browser interface you use for everything else. No API keys, no cloud, no usage fees. Just your GPU doing the work.

Connect in seconds

Open Settings, go to the Ollama tab, and you're looking at a connection panel with one field: the base URL. The default is http://localhost:11434 — if Ollama is running, Daneel finds it automatically.

The auto-probe runs silently when you open settings. If Ollama is reachable, you'll see a green "Connected" status with your installed model count. If it's not running, the panel stays quiet — no error spam on first visit.

If you hit a 403 error, Daneel knows exactly what happened: Chrome extensions send an Origin header that Ollama blocks by default. Instead of a cryptic error, you get step-by-step instructions to restart Ollama with the right environment variable:

OLLAMA_ORIGINS=chrome-extension://* ollama serve

One restart, problem solved permanently.

Your model library, in the browser

Once connected, the installed models section shows everything Ollama has downloaded. Each model appears with its name, parameter count, quantization level, and capability badges — green for tool calling, purple for thinking, blue when it's the active model.

Select a model and a details card appears: family, parameter size, quantization, context length, and capabilities. This metadata comes from two sources — Ollama's own API and Daneel's model registry, which enriches installed models with quality scores, descriptions, and hardware evaluations.

Below your installed models, a recommendations section suggests popular models you haven't pulled yet, sorted by quality and filtered to lightweight options that are likely to run well on your hardware.

Pull, delete, manage

You don't need the terminal to manage models. The settings panel gives you full lifecycle control:

Pull — type any model name from the Ollama library (or click a suggested model) and hit Pull. A live progress bar streams download status in real time: bytes downloaded, total size, percentage. Large models download in the background while you keep browsing.

Delete — hover over any installed model and click the trash icon. The model is removed from Ollama's storage immediately.

Running models — a dedicated section shows which models are currently loaded in VRAM, with memory usage, quantization info, and an unload countdown timer. You can see exactly what's consuming your GPU right now.

Test query — before committing to a model, type a quick question in the test box. Tokens stream back in real time with a blinking cursor, confirming the model works and giving you a feel for its speed and quality.

Streaming conversations

Ollama serves responses as newline-delimited JSON — each line is a token. Daneel reads this stream and renders tokens as they arrive, giving you the same real-time typing experience you'd expect from a cloud API.

The streaming pipeline handles edge cases cleanly: malformed lines are skipped, the final completion chunk is detected and processed, and the UI stays responsive even with fast models generating hundreds of tokens per second.

Tool calling — MCP servers work with Ollama

This is where Ollama goes beyond basic chat. Models that support function calling — Qwen 3, Llama 3.1, Mistral, and others — can use MCP tools just like Claude does.

Connect Stripe, Vercel, Supabase, or any MCP server, select a tool-capable Ollama model, and ask a question that needs external data. Daneel formats the tools in OpenAI-compatible function calling format, sends them to Ollama, parses the tool call response, executes the tools against the MCP servers, and feeds results back for the next turn.

The multi-turn tool loop works identically to cloud providers. The only difference is that the LLM reasoning happens on your GPU instead of someone else's server.

Capability badges in the model list make it clear which models support tools, so you know before you start a conversation.

Thinking models, handled

Reasoning models like Qwen 3 wrap their chain-of-thought in <think>...</think> blocks. Useful for the model, noisy for you. Daneel strips these automatically — a stateful parser detects think tags across token boundaries during streaming and only surfaces the final answer.

The stripping works for both explicit think blocks and the implicit reasoning format some models use. You get clean output regardless of model quirks.

Privacy by architecture

Ollama is the highest-privacy option after WebGPU. Your prompts and responses travel over localhost — they leave the browser process but never leave your machine. No API keys to manage, no usage to track, no data to audit.

In Daneel's privacy model, Ollama gets "Local network" residency: data leaves the browser but stays on your hardware. The only observer is you. Model weights are open-source and auditable.

For users who need inference power beyond what WebGPU can handle in a browser tab — larger models, longer context, faster generation — Ollama is the privacy-preserving step up.

Works with everything else

Ollama isn't a silo. It integrates with every Daneel feature:

  • Model Registry — installed models appear alongside WebGPU, Claude, and Azure models with unified scoring
  • Agents — attach an agent with a custom persona and MCP tools, powered by your local model
  • Vaults — chat with your documents using Ollama for inference
  • Page Q&A and Site RAG — ask questions about any page or search indexed sites

Switch between Ollama and any other provider with one click. Your conversations, agents, and vault configurations don't change.

What's next

We're working on deeper model management — pull progress in notifications, automatic model suggestions based on your hardware, and preset configurations for common use cases. Ollama's ecosystem is growing fast, and Daneel will keep pace.

Your machine is more capable than you think. Connect Ollama and find out.