AI field extraction
AI field extraction reads each ingested file and pulls structured data into the fields you defined on a template. The AI is bounded by your field schema, so extraction stays predictable.
Last updated
AI field extraction reads each ingested file and pulls structured data into the outputs you defined on each field of a template. The AI is bounded by your output schema, so extraction stays predictable.
The AI in Document Blueprint is not a freeform assistant. It runs against a schema you define — the fields on a template — and produces values for those fields. If you didn't declare a field, the AI doesn't extract it. You keep control of what data flows through the system, and the AI does the boring work of reading every page.
Core concepts
Schema-bounded extraction
When a file is ingested, the AI sees:
- The full text and visual content of the file
- The list of extractable outputs declared by each field on the matching template
- Any per-output extraction prompts you've authored
It returns one value per declared output. Anything it can't find stays empty. Each value lands in case.data keyed by the output's stable output.key — renaming a field or output's display label later doesn't disturb the data.
Where extraction instructions live
An output's Prompt box is the most targeted place to guide the AI — it attaches a Rule directly to that one output, and the prompt explicitly tells the model that an output's own Rule wins over any other guidance for that specific output. There are also broader, less-specific layers — template-wide, per-source, and a one-off run-time override. See The four layers of AI instructions for the full picture and when to use each.
Use the most specific layer that gets the result you want. An output rule beats a template rule; a template rule beats nothing.
Model choice
Document Blueprint runs extraction on Google Gemini (Flash by default, Pro for harder documents). Switch models from the AI dropdown in the nav when you need a different cost/speed/quality tradeoff.
Walkthrough: tuning extraction for invoice line items
Your invoices have a Line Items batch field. The AI gets the line items mostly right but sometimes mislabels the unit price column.
- Open the template and click the
Line Itemsfield. - In the field's Outputs section, open the zone- or prompt-kind output and add an extraction prompt: "Unit price is the per-line cost before tax. It's usually the second-to-last column on each row."
- Re-extract one of the cases that was wrong.
- If still wrong, refine the prompt. Repeat until the AI gets it right consistently.
Common patterns
Be specific about edge cases
Vague instructions like "extract the date" don't help. Specific instructions like "extract the invoice date in ISO format from the top right of page 1" do. The AI follows what you write — write what you actually want.
Use field types as constraints
Field types act as type checks. A date field forces the AI to return a parseable date. A select field forces it to return one of the allowed options. Type constraints are usually more reliable than instructions.
Run extraction on a few cases first
Before turning on automations that ingest hundreds of files, run extraction on 5-10 sample files manually. Check the results. Tune. Then enable the automations.
Don't try to extract everything
If a value is hard for the AI to find, it's often faster to change the field's source to question and have the user fill it in once at autofill time. Not everything needs to be automatic.
Mark attachment-class categories so they skip the extraction picker
Site photos, receipts, and other supporting material aren't meant for template extraction — they're evidence that lives on the case. From Settings → Workspace → Case List Display → Categories, uncheck Extract on those categories. Files tagged with them still appear on the activity timeline and the case files list, but the Run Extraction panel in case details skips them. Uncategorized files behave the same way; tag a file with an extractable category first if you want template options to appear.
Re-extract respects user edits
When you fix an AI-extracted value by hand, the field's origin flips from from_extraction to modified. The next re-extract leaves modified fields alone unless you explicitly clear the override. Your fixes survive every retry.
Troubleshooting
The AI extracted "N/A" for a field that's clearly visible on the file. Check the field's instructions. The AI might be looking in the wrong section. Add a hint about where the value lives.
Numbers are extracted but with the wrong format.
Use a number field type with explicit format hints (currency, percent, count). The AI is better at format conversion than free-form text parsing.
Extraction is slow on large files. Switch to a faster model in the AI dropdown, or split the template into smaller field groups so each extraction call has less work.
Values are inconsistent across runs.
The AI is probabilistic. Add stronger output-level prompts or constrain the output's value type. If the value is critical and must be exact, change the field's source to question instead.