---
id: "bonsai-1bit-webgpu"
date: "2026-04-16"
title: "Bonsai by PrismML: A 1.7B one bit model in 291 megabytes, running in your browser"
summary: "Bonsai 1.7B brings native 1-bit quantization to Daneel's WebGPU engine, delivering a thinking-capable model at a fraction of the usual download size."
image: "/medias/prism.ml.png"
header: "What's New"
tags: ["webgpu", "models", "performance", "privacy", "1-bit", "quantization", "prism-ml", "small-models", "thinking-capable"]
---

Most language models in the 1-2B parameter range weigh between 800 MB and 1.5 GB after quantization. Bonsai 1.7B, built by [PrismML](https://prismml.com), rewrites that expectation: its native 1-bit variant clocks in at just 291 MB, and it now runs directly in your browser through Daneel's WebGPU engine.

## What is Bonsai?

Bonsai is a family of models from PrismML, a research lab founded at Caltech, that takes 1-bit quantization seriously. Where most models treat quantization as a post-training compression step, Bonsai is designed for 1-bit from the ground up: embeddings, attention layers, MLP layers, and the language model head are all natively 1-bit. The result is a model that stays small without the quality cliff you'd normally expect from aggressive quantization.

The 1.7B variant sits on a Qwen3 backbone, supports 32K context, and is thinking-capable, meaning it can reason step-by-step before answering.

For more on the research and the full Bonsai family (1.7B, 4B, 8B), see [PrismML's announcement](https://prismml.com/news/bonsai-8b).

## Two variants, one repo

We ship Bonsai 1.7B in two quantizations:

- **q4 (1.1 GB)** sits alongside our other mid-tier models. Solid quality, reasonable download, no shader-f16 required.
- **q1 (291 MB)** is the headline. A thinking-capable 1.7B model that weighs less than most embedding models. Cold start is fast, memory footprint is minimal, and it runs on hardware that would struggle with larger downloads.

Both variants load from the same HuggingFace repository (`onnx-community/Bonsai-1.7B-ONNX`). Daneel picks the right ONNX file based on your selection.

## Why 1-bit matters for in-browser AI

Running a language model inside a browser tab comes with hard constraints: no swap file, no CUDA, a 2 GB WebGPU buffer ceiling, and whatever GPU memory Chrome decides to share. Every megabyte counts.

At 291 MB, Bonsai q1 changes the calculus. It fits comfortably on integrated GPUs, downloads in seconds on a decent connection, and leaves room for the embedding model to run alongside it without GPU context contention. For users on laptops or Chromebooks, this is the most accessible thinking-capable model we've offered.

**Whitepaper**: [1-bit-bonsai-8b-whitepaper.pdf](https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf)

## About PrismML

[PrismML](https://prismml.com) is a research lab founded by Caltech researchers, backed by Khosla Ventures, Cerberus, and Google. Their work focuses on compressing neural networks without sacrificing reasoning ability, and the Bonsai family is the result: models that replace most floating-point multiplication with simple additions, trading compute complexity for radical efficiency. The full family spans 1.7B, 4B, and 8B parameters, all released under Apache 2.0.

We're glad to bring their work to the browser. If you're interested in edge AI or efficient inference, PrismML is worth following.

## Try it

Open Settings, go to the WebGPU panel, and look for **Bonsai 1.7B** or **Bonsai 1.7B (1-bit)** in the model list. Select either variant and start chatting. Everything runs locally, nothing leaves your machine.

---

[Read on site](https://daneel.injen.io/news/bonsai-1bit-webgpu.html?utm_source=extension_news_reader&utm_medium=extension_settings&utm_campaign=extension)
