Document Processing
Drop in a PDF, invoice, or scanned form and get structured data back. Three non-overlapping profiles — Read, Layout, and Form Extraction — let you control cost and output granularity. Combine them freely.
PDF Read — text and language
The Read profile extracts OCR text with positional layout, semantic paragraphs with roles (title, heading, footer, footnote), font styles (bold, italic, handwritten), and per-span language detection. Built on the $1.50/1K-page tier — the most cost-effective way to get text out of documents.
PDF Layout — structure and figures
The Layout profile extracts page images, structured tables as JSON, document sections with hierarchy, detected figures and charts with bounding boxes, and mathematical formulas as LaTeX. Built on the $10/1K-page tier for deep structural understanding.
Form Extraction — fields and barcodes
The Form Extraction profile returns key-value pairs (labels mapped to values), structured tables, selection marks (checkboxes, radio buttons with state), and detected barcodes/QR codes with type and value. Purpose-built for invoices, receipts, and application forms.
Composable profiles
Profiles are non-overlapping cost buckets. Combine Read + Layout for full-page understanding without paying the prebuilt tier for features available at the read tier. Add Form Extraction only when you need structured field data. The platform groups derivations automatically.
Frequently Asked Questions
Document Processing
What document formats are supported?
What are the three document profiles?
Can Interlocute extract tables from PDFs?
Does it detect barcodes and QR codes?
How is document processing billed?
Can I process specific pages of a document?
Documentation
Related Features
Video Intelligence
Upload a video and get a structured AI index: speech transcripts, visual scene analysis, entity extraction, sentiment, and AI summaries — choose the signals you need.
Image Intelligence
Upload an image and get layered AI analysis: a structural fingerprint with instant local metrics, semantic understanding from a multimodal LLM, and full forensic verification with manipulation detection.
RAG (Knowledge Retrieval)
Give your AI nodes access to your own documents and data. Interlocute handles the vector search, chunking, and context injection automatically.
Ready to build with Document Processing?
Deploy your node in seconds and start using Document Processing today.