Choosing AI models
Document Blueprint has two independent AI model selections: the autofill/extraction model, picked in Settings → Preferences, and the chat model, picked in the chat drawer itself. Both offer Gemini 2.5 Flash Lite (the default, 0.3x token cost) and Gemini 3.1 Flash Lite (0.5x). The multiplier weights how fast a model consumes your monthly AI token allowance.
Model choice is a cost dial, not a feature switch — every model in the catalog runs every AI feature. What changes is how quickly each call spends your monthly token allowance, and the two selections are deliberately separate so an expensive pick for document extraction never silently makes every chat message pricier.
The two model selections
| Selection | Where you pick it | What it drives | Default |
|---|---|---|---|
| AI model | Settings → Preferences, the AI model row | Autofill extraction and field placement — document extraction runs, derived AI outputs, document reflows | Gemini 2.5 Flash Lite |
| Chat model | The selector in the chat drawer header, under the Workspace AI heading | Chat conversations only | Gemini 2.5 Flash Lite |
The Preferences row's own caption says it plainly: "Used for autofill extraction and field placement. Defaults to Gemini 2.5 Flash Lite." The chat selector never inherits the global pick — chat is high-volume, so it defaults to the cheapest model and lets you bump a single conversation up when the task warrants it.
Both selections persist automatically — set them once and they survive reloads. There's no save button.
The model catalog
| Model | Multiplier | Notes |
|---|---|---|
Gemini 2.5 Flash Lite | 0.3x | The default for both selections. The only model available on the Free plan. |
Gemini 3.1 Flash Lite | 0.5x | The newer-generation pick — stronger spatial grounding for field placement at a still-low cost. |
The catalog is intentionally small and cost-first. Internal pinned operations (like the chat assistant's PDF reading) may use other models under the hood; the catalog lists what you can pick.
What the multiplier means
Your plan includes a monthly AI token allowance. Each AI call debits that allowance by its actual tokens multiplied by the model's multiplier, rounded up:
So a lower multiplier stretches the same monthly allowance further. Autofill runs additionally debit a flat 1,000-token overhead per run, covering the non-AI compute around the extraction itself.
Plan gating
The Free plan is whitelisted to Gemini 2.5 Flash Lite only — picking anything else is rejected with "Your plan includes Gemini Flash only. Upgrade to Lite or higher to use other models." All paid plans (Lite and up) can use every model in the catalog.
Where usage is metered
Chat, autofill, and automated file ingestion all meter their real token usage against your monthly allowance, weighted by the multiplier of whichever model actually ran. You can see the running total in Settings → Account → Usage, on the AI tokens (this month) meter. Metering observes rather than interrupts — a chat message or interactive autofill isn't blocked mid-flight by a quota check.