AI extraction & instructions

AI extraction reads each ingested file and pulls structured data into the fields you defined, bounded by your schema; you steer it with instructions layered from the single value up to a one-off run.

Updated 2026-06-295 min read

The AI is not a freeform assistant. It runs against the schema you define (the fields on a template) and returns one value per declared slot, doing the boring work of reading every page. If you didn't declare a field, it isn't extracted. Anything the AI can't find stays empty, and each value lands in case data under a stable key, so renaming a field's display label later doesn't disturb the data. See core concepts for the schema model.

Schema-bounded extraction & model choice

On ingestion the AI sees the file's full text and visuals; every extracted input carrying at least one signal (an AI instruction, a read zone, or an AI-prompt transform); and any per-value rules. A bare extracted input with none of those is skipped silently. Field types act as type checks: date forces a parseable date, select forces an allowed option. Type constraints beat text instructions, so don't restate them; reserve the AI instruction box for what the type system can't enforce, like "use the customer's billing name, not their shipping name."

Extraction runs on Google Gemini, switched from the AI model row in Settings → Preferences: Gemini 2.5 Flash Lite (default, fastest and lowest cost) or Gemini 3.1 Flash Lite (stronger on dense or messy docs, modestly higher cost). Your pick covers extraction, autofill, and document reflows. Chat has its own selector in the chat drawer.

The five instruction layers

Push every rule to the lowest applicable layer: a per-value rule is shown only when that value extracts; template rules show for every value and dilute the signal. Iterate one rule at a time, since changing two layers at once makes it impossible to tell which moved the needle.

#	Layer	Where you edit it	Scope
1	Value	The AI instruction box on an extracted input in the field's Inputs section (or the Prompt on an AI-prompt transform, in Outputs)	One value, every extraction
2	Template	The AI Instructions field in the template's details dropdown (click the template name, with the chevron, in the editor toolbar)	All values; covers the assembled PDF
3	Per-source	The `AI instructions for this row` textarea in a source tab's chevron popover (multi-source) or `⋯` menu (single-source)	One source inside a stitched packet
4	Per-file	The Add AI instructions checkbox + textarea on a ran-template row in the case detail Documents section	One file + one template on one case; persists across re-runs until cleared, never affects other cases
5	Special	The `Special AI Instructions` box in the autofill review modal (also the editor's Test & Preview form, which pre-fills the last run's text)	One run only; never saved

Layer 2 has exactly one editable home: the template's details dropdown. It briefly lived on the editor's Form page; that was changed, and the Form page no longer hosts it.

How they merge: a value's own Rule (1) wins for that value, always. Layers 2 and 3 fold into the "ground rules" above the value list (per-source is not deduplicated against template, since they describe different scopes). Layers 4 and 5 land in the CASE-SPECIFIC INSTRUCTIONS section; Special (5) overrides template/per-source guidance for any value it names, but not a value's own Rule unless you say so. The old combiner-level AI Instructions box was folded into Layer 2 in May 2026.

1Open the field's extracted input→2Add a specific AI instruction→3Re-extract one case→✓Iterate one rule at a time

A permit packet using three layers

ValuePermit Number: PMT- prefix, may be rotated 90°

Per-sourceW-9: tax ID from box 5 only

Templatetrust insurance cert for address

Working well with extraction

Be specific about edge cases: "invoice date in ISO format, top-right of page 1," not "extract the date." For a batch field's columns, name the position too, e.g. "unit price is the per-line cost before tax, usually the second-to-last column."
Test 5–10 files manually before enabling automations that ingest hundreds.
Don't extract everything: if a value is hard, switch the input's source to question so a human fills it once.
Skip attachment categories: in Settings → Workspace → Case List Display → Categories, uncheck Extract on photos/receipts. They stay on the activity timeline and case files list but skip the Run Extraction panel. Uncategorized files behave the same, so tag with an extractable category to get template options.
Re-extract respects edits: a hand-fixed value flips origin from_extraction → modified; re-extract leaves modified fields alone until you clear the override.

Troubleshooting

AI returns "N/A" for a visible value. Add a hint about where it lives. The AI is looking in the wrong section.

A field is never extracted. It carries no signal: add an instruction, read zone, or AI-prompt transform.

Wrong number format. Use a number type with format hints (currency, percent, count). The AI handles format conversion better than free-form text parsing.

Slow on large files. Switch to the faster model in Settings → Preferences, or split into smaller field groups so each call has less work.

Template rule ignored for one value. That value has its own Rule, which wins (even if the contradiction is subtle).

Special instructions vanished. By design, they're run-only (the label reads "Applied to this run only"). Use Layer 4 to persist on a file, or Layer 2/3 for all cases.

A rule changed nothing. The AI is probabilistic, so a rule shifts the distribution but doesn't guarantee an output; tighten the value type or move the source to question.

Output naming & behavior
Source tabs
Build a template
Process an Invitation to Bid: see it in action

Schema-bounded extraction & model choice

The five instruction layers

Working well with extraction

Troubleshooting

Related