Choosing AI models

Document Blueprint has two independent AI model selections: the autofill/extraction model, picked in Settings → Preferences, and the chat model, picked in the chat drawer itself. Both offer Gemini 2.5 Flash Lite (the default, 0.3x token cost) and Gemini 3.1 Flash Lite (0.5x). The multiplier weights how fast a model consumes your monthly AI token allowance.

Updated 2026-06-113 min read

Model choice is a cost dial, not a feature switch — every model in the catalog runs every AI feature. What changes is how quickly each call spends your monthly token allowance, and the two selections are deliberately separate so an expensive pick for document extraction never silently makes every chat message pricier.

The two model selections

Selection	Where you pick it	What it drives	Default
AI model	Settings → Preferences, the `AI model` row	Autofill extraction and field placement — document extraction runs, derived AI outputs, document reflows	Gemini 2.5 Flash Lite
Chat model	The selector in the chat drawer header, under the Workspace AI heading	Chat conversations only	Gemini 2.5 Flash Lite

The Preferences row's own caption says it plainly: "Used for autofill extraction and field placement. Defaults to Gemini 2.5 Flash Lite." The chat selector never inherits the global pick — chat is high-volume, so it defaults to the cheapest model and lets you bump a single conversation up when the task warrants it.

Both selections persist automatically — set them once and they survive reloads. There's no save button.

The model catalog

Model	Multiplier	Notes
`Gemini 2.5 Flash Lite`	0.3x	The default for both selections. The only model available on the Free plan.
`Gemini 3.1 Flash Lite`	0.5x	The newer-generation pick — stronger spatial grounding for field placement at a still-low cost.

The catalog is intentionally small and cost-first. Internal pinned operations (like the chat assistant's PDF reading) may use other models under the hood; the catalog lists what you can pick.

What the multiplier means

Your plan includes a monthly AI token allowance. Each AI call debits that allowance by its actual tokens multiplied by the model's multiplier, rounded up:

One extraction, two models

Actual tokens used10,000

On Gemini 2.5 Flash Lite (0.3x)3,000 debited

On Gemini 3.1 Flash Lite (0.5x)5,000 debited

Autofill overhead+1,000 flat per autofill run

So a lower multiplier stretches the same monthly allowance further. Autofill runs additionally debit a flat 1,000-token overhead per run, covering the non-AI compute around the extraction itself.

Plan gating

The Free plan is whitelisted to Gemini 2.5 Flash Lite only — picking anything else is rejected with "Your plan includes Gemini Flash only. Upgrade to Lite or higher to use other models." All paid plans (Lite and up) can use every model in the catalog.

Where usage is metered

Chat, autofill, and automated file ingestion all meter their real token usage against your monthly allowance, weighted by the multiplier of whichever model actually ran. You can see the running total in Settings → Account → Usage, on the AI tokens (this month) meter. Metering observes rather than interrupts — a chat message or interactive autofill isn't blocked mid-flight by a quota check.