Choosing AI models

Document Blueprint has two independent AI model selections: the autofill/extraction model, picked in Settings → Preferences, and the chat model, picked in the chat drawer itself. Both offer Gemini 2.5 Flash Lite (the default, 0.3x token cost) and Gemini 3.1 Flash Lite (0.5x). The multiplier weights how fast a model consumes your monthly AI token allowance.

Updated 3 min read

Model choice is a cost dial, not a feature switch — every model in the catalog runs every AI feature. What changes is how quickly each call spends your monthly token allowance, and the two selections are deliberately separate so an expensive pick for document extraction never silently makes every chat message pricier.

The two model selections

SelectionWhere you pick itWhat it drivesDefault
AI modelSettings → Preferences, the AI model rowAutofill extraction and field placement — document extraction runs, derived AI outputs, document reflowsGemini 2.5 Flash Lite
Chat modelThe selector in the chat drawer header, under the Workspace AI headingChat conversations onlyGemini 2.5 Flash Lite

The Preferences row's own caption says it plainly: "Used for autofill extraction and field placement. Defaults to Gemini 2.5 Flash Lite." The chat selector never inherits the global pick — chat is high-volume, so it defaults to the cheapest model and lets you bump a single conversation up when the task warrants it.

Both selections persist automatically — set them once and they survive reloads. There's no save button.

The model catalog

ModelMultiplierNotes
Gemini 2.5 Flash Lite0.3xThe default for both selections. The only model available on the Free plan.
Gemini 3.1 Flash Lite0.5xThe newer-generation pick — stronger spatial grounding for field placement at a still-low cost.

The catalog is intentionally small and cost-first. Internal pinned operations (like the chat assistant's PDF reading) may use other models under the hood; the catalog lists what you can pick.

What the multiplier means

Your plan includes a monthly AI token allowance. Each AI call debits that allowance by its actual tokens multiplied by the model's multiplier, rounded up:

One extraction, two models
Actual tokens used10,000
On Gemini 2.5 Flash Lite (0.3x)3,000 debited
On Gemini 3.1 Flash Lite (0.5x)5,000 debited
Autofill overhead+1,000 flat per autofill run

So a lower multiplier stretches the same monthly allowance further. Autofill runs additionally debit a flat 1,000-token overhead per run, covering the non-AI compute around the extraction itself.

Plan gating

The Free plan is whitelisted to Gemini 2.5 Flash Lite only — picking anything else is rejected with "Your plan includes Gemini Flash only. Upgrade to Lite or higher to use other models." All paid plans (Lite and up) can use every model in the catalog.

Where usage is metered

Chat, autofill, and automated file ingestion all meter their real token usage against your monthly allowance, weighted by the multiplier of whichever model actually ran. You can see the running total in Settings → Account → Usage, on the AI tokens (this month) meter. Metering observes rather than interrupts — a chat message or interactive autofill isn't blocked mid-flight by a quota check.

We use cookies to keep you signed in and improve the product. See our Cookie Policy.
Manage preferences