Running a language model in your browser is free in dollars but not in bytes. A single WebGPU chat model can sit at a gigabyte or more on your disk, and that adds up quickly once you start collecting embedding models and entity extractors too. Until today the extension had no honest answer to "how much space are my models using?". Now it does.
A new Settings section
Open Settings → Models Storage and you get a single page that walks through every downloaded model artifact on your disk, grouped by role:
- Language models, the WebGPU LLMs you chat with.
- Embedding models, the small sentence-embedders used for site and vault indexing.
- Knowledge extraction models, the GLiNER and LFM2-Extract variants that power knowledge graphs.
Each section lists the models sorted by size, with the total it represents, the number of files on disk, and a progress bar showing its share of the whole. A browser-level "used of available" line at the top gives you the wider picture.
Delete what you do not need
Every row has a trash icon. Clicking it shows an inline confirmation with the exact size that will be freed, and after you confirm, the files are gone. A "Delete all" button at the top wipes the entire model cache in one go.
Deleting a model you are currently using is safe: the weights stay in GPU memory until your next cold load, and the files re-download the next time you pick that model. No restart, no reinstall.
A small honesty note
The old WebGPU settings panel claimed to let you manage cache files too. It did not actually work: it was running in the wrong security context and saw an empty cache on every page. That stub is now gone, replaced by the new section which runs in the right place and queries both caches the extension actually uses.
Everything you see and delete here lives on your machine. Nothing about your disk footprint ever leaves the browser.