Chat with PDFs, the same way you chat with any page

PDFs have always been a bit awkward in browsers. You can view them, but interacting with the content usually means downloading, uploading to some tool, and starting from scratch. Not anymore.

Starting today, Daneel detects when Chrome is displaying a PDF and automatically extracts its text. The full widget appears in the corner, just like on any web page. Open the chat, ask a question, get an answer grounded in the document.

No upload step. No copy-paste. Just open the PDF and start talking.

How it works

Chrome's modern PDF viewer (OOPIF architecture, rolling out since Chrome 126) renders PDFs at the original URL instead of redirecting to an internal extension page. This means Daneel's widget can inject normally, and we took advantage of it.

When the widget detects a PDF page, it:

  1. Identifies the PDF through multiple signals, including Chrome's pdfoopifenabled DOM attribute and content-type headers
  2. Fetches the binary through the background service worker proxy
  3. Extracts structured Markdown using EdgeParse, a Rust-based PDF parser compiled to WebAssembly — preserving headings, tables, and reading order
  4. Caches the result so every subsequent question reuses the same extraction

The extracted text flows into the same prompt pipeline as any other page. Context selection, prompt building, and streaming all work identically whether you're chatting with a blog post, a YouTube video, or a 25-page research paper.

Markdown export works too

The Markdown button on the launcher handles PDFs transparently:

  • Single click copies the extracted text to your clipboard
  • Double click downloads it as a .md file with a descriptive name like daneel.rotating-attractors.2026-04-09T15-21-26.md

This makes it easy to grab clean text from any PDF for use in notes, documents, or other tools.

Save PDFs to your vault

Click + Vault in the chat panel and the PDF content is imported with a structured filename that includes the source hostname, path, and timestamp. Once in a vault, the document is chunked, embedded, and searchable alongside your other files.

The vault import uses the same pipeline as any other document format. The only difference is where the text comes from.

What's different on PDF pages

A few things adapt automatically when Daneel detects a PDF:

  • The mode button shows PDF instead of Page, with a green status bar showing how much text was extracted
  • Site mode is hidden, since a PDF has no sitemap or crawlable structure
  • The page title is derived from the URL path, since Chrome's PDF viewer leaves document.title empty

Limitations

Scanned PDFs (image-only, no selectable text) cannot be extracted. If every page contains fewer than 20 characters of text, Daneel will show an error. Very large documents work but may take a few seconds to process. And file:// PDFs require granting file access in Chrome's extension settings.

Everything stays local

Like all of Daneel's content extraction, PDF processing runs entirely in your browser. The document binary is fetched through the extension's service worker, text extraction happens client-side via EdgeParse WASM, and the result never leaves your machine unless you choose to send it to a cloud AI provider.

Open a PDF, ask a question. That's it.


PDF extraction is powered by EdgeParse by Raphaël Mansuy. Apache 2.0 licensed.