API Reference · v1.0
DocReadi Document Intelligence API
REST API for ingesting financial documents (PDFs, images, scans) and extracting structured data — invoices, receipts, delivery notes, statements, custom shapes. Push results to your stack via signed webhooks or pull on a schedule.
Base URL: https://docreadi.com
· OpenAPI 3.1 · JSON over HTTPS · UTF-8.
Quickstart
Three commands from a fresh API key to extracted JSON. Get a key at /ui/settings/api-keys (sign up for the free trial first if you don't have a workspace).
1. Submit a document
# Replace dr_abc… with your real key. curl -X POST https://docreadi.com/ingest/process \ -H "X-Api-Key: dr_abc123…" \ -F "file=@invoice.pdf" \ -F "source=api"
Returns {{"document_id": "…", "status": "ingested"}}.
2. Poll until extraction completes
curl https://docreadi.com/ingest/document/"$DOC_ID" \ -H "X-Api-Key: dr_abc123…"
Status walks ingested → classified → parsed → extracted.
Webhooks fire on document.approved if you'd rather not poll.
3. Read the structured fields
# status == "extracted" → response.data carries the typed fields # for the document_type. AP invoices: invoice_number, vendor_name, # subtotal, vat_amount, total_amount, line_items, …
Authentication
Every protected endpoint requires an API key. Send it in the
X-Api-Key
request header (preferred) or as
Authorization: Bearer <key>.
Each key is scoped to the tenant that created it; queries only return that
tenant's data.
Manage keys at /ui/settings/api-keys. Keys are shown to you exactly once at create time — store them in your secrets manager, not in source control.
Endpoints
Public endpoints are marked. Everything else requires an API key. 30 endpoints across 6 groups.
Ingest 8 endpoints
Submit documents and check their processing status.
/ingest/processAuth requiredIngest a document and start the processing pipeline
| Param | Type | Required | Description |
|---|---|---|---|
| file | file | required | PDF, JPG, or PNG file |
| source | string | optional | Source identifier — e.g. 'whatsapp', 'email', 'manual' |
| document_type | string | optional | Force document type — ap_invoice | ap_credit_note | sales_invoice | sales_credit_note | delivery_note | receipt | generic_invoice. Empty string = auto-detect. |
| sender_phone | string | optional | Sender phone number (WhatsApp integrations) |
| template_id | string | optional | Custom template UUID (mutually exclusive with document_type) |
| hold_for_type_selection | boolean | optional | Store the file with status 'awaiting_type' and skip processing. Used by the WhatsApp type-menu flow — finalise the choice via POST /ingest/whatsapp/select-type. |
Processing runs asynchronously. Poll GET /ingest/document/{id} to check status.
/ingest/document/{document_id}Auth requiredPoll document processing status and get extracted fields
| Param | Type | Required | Description |
|---|---|---|---|
| document_id | path | required | UUID returned by POST /ingest/process |
Poll every 5–10 seconds. Status 'extracted' or 'approved' = complete. Status 'failed' = pipeline error, check validation_errors. The 'display' sub-object holds tenant-formatted strings — prefer those in user-facing replies; raw fields are for programmatic access.
/ingestAuth requiredIngest a document without processing (creates record only)
| Param | Type | Required | Description |
|---|---|---|---|
| file | file | required | PDF, JPG, or PNG file |
| source | string | optional |
/ingest/whatsapp/prefs/{sender_phone}Auth requiredGet per-sender WhatsApp preferences
/ingest/whatsapp/prefs/{sender_phone}Auth requiredUpdate per-sender WhatsApp preferences
| Param | Type | Required | Description |
|---|---|---|---|
| document_type | string | optional | Force document type for this sender |
| reply_format | string | optional | summary | detail |
/ingest/whatsapp/type-menu/{sender_phone}Auth requiredBuild a WhatsApp interactive-list payload for document-type selection
| Param | Type | Required | Description |
|---|---|---|---|
| sender_phone | path | required | |
| document_id | query | required | The document UUID returned by /ingest/process (hold_for_type_selection=true) |
| page | query | optional | 1-based page index. Each page carries at most 10 rows (Auto Detect + built-ins + custom templates); a 'More…' row links to the next page. |
Row IDs encode the selection: type_
/ingest/whatsapp/select-typeAuth requiredApply a user's type selection and kick off processing
| Param | Type | Required | Description |
|---|---|---|---|
| document_id | string | required | |
| document_type | string | optional | Built-in doc type. Leave empty and also leave template_id empty to let the pipeline auto-classify. |
| template_id | string | optional | Custom template UUID (mutually exclusive with document_type) |
/ingest/expenses/summaryAuth requiredMonthly expense totals for a WhatsApp sender
| Param | Type | Required | Description |
|---|---|---|---|
| sender_phone | query | optional | Filter by sender phone. Omit for all senders. |
Pipeline 3 endpoints
Lower-level stages exposed for advanced workflows. Most callers use Ingest.
/classifyAuth requiredClassify a document's type (ap_invoice / sales_invoice / delivery_note / receipt / …)
| Param | Type | Required | Description |
|---|---|---|---|
| document_id | string | required | UUID of an already-ingested document |
Called automatically by /ingest/process. Use this endpoint only to re-classify or run standalone.
/parseAuth requiredParse an ingested document to markdown (PyMuPDF for native PDFs, Mistral OCR for scans/images)
| Param | Type | Required | Description |
|---|---|---|---|
| document_id | string | required | |
| run_docling | boolean | optional | Run Docling layout hint in parallel (best-effort, non-blocking) |
| parser | string | optional | Force a parser: pymupdf | mistral_ocr | mistral_direct_ |
/extractAuth requiredExtract structured fields from parsed markdown into the typed schema
| Param | Type | Required | Description |
|---|---|---|---|
| document_id | string | required | |
| document_type | string | required | ap_invoice | delivery_note | sales_invoice (other built-ins routed via /ingest/process) |
| markdown | string | required | Markdown returned by /parse |
| vendor_hint | string | optional | Docling-derived layout hint, if any |
Counterparties 7 endpoints
Vendor / customer registry — CRUD via JSON, bulk via CSV.
/api/v1/counterpartiesAuth requiredList counterparties (paginated). RLS-scoped to the API key's tenant.
| Param | Type | Required | Description |
|---|---|---|---|
| page | query | optional | |
| page_size | query | optional | Max 500 |
| status | query | optional | Filter: confirmed | candidate |
/api/v1/counterpartiesAuth requiredCreate a counterparty. 409 if canonical_name already exists (case-insensitive).
| Param | Type | Required | Description |
|---|---|---|---|
| canonical_name | string | required | |
| counterparty_type | string | optional | vendor | customer | both (default: vendor) |
| status | string | optional | candidate | confirmed (default: confirmed) |
| active | boolean | optional | |
| vat_number | string | optional | |
| address | string | optional | |
| addresses | array | optional | |
| bank_name | string | optional | |
| bank_account | string | optional | |
| bank_accounts | array | optional | |
| keywords | array | optional | |
| aliases | array | optional | |
| typical_vat_rate | float | optional | Decimal, e.g. 0.15 for 15% |
| typical_payment_terms | int | optional | Days |
| currency | string | optional | ISO 4217 |
/api/v1/counterparties/{counterparty_id}Auth requiredFetch a single counterparty by UUID
/api/v1/counterparties/{counterparty_id}Auth requiredPartial update — only supplied fields are written
| Param | Type | Required | Description |
|---|---|---|---|
| any creation field | see POST | optional | Pass only the fields you want to change |
/api/v1/counterparties/{counterparty_id}Auth requiredDelete a counterparty
/api/v1/counterparties/export.csvAuth requiredDownload all counterparties for the API key's tenant as CSV
List columns (addresses, bank_accounts, keywords, aliases) are JSON-encoded in the CSV so they round-trip cleanly through Excel.
/api/v1/counterparties/import.csvAuth requiredBulk upsert from CSV (multipart upload)
| Param | Type | Required | Description |
|---|---|---|---|
| file | file | required | CSV with a header row |
Required column: canonical_name. Upsert is case-insensitive on canonical_name. Unknown columns are ignored. List columns accept either JSON arrays or comma-separated strings.
Entities 7 endpoints
Your own legal entities used for AP/sales direction validation.
/api/v1/entitiesAuth requiredList known company entities (your own legal names, used for AP/sales direction validation)
| Param | Type | Required | Description |
|---|---|---|---|
| page | query | optional | |
| page_size | query | optional | Max 500 |
| status | query | optional | Filter: confirmed | candidate |
/api/v1/entitiesAuth requiredCreate an entity. 409 if canonical_name already exists.
| Param | Type | Required | Description |
|---|---|---|---|
| canonical_name | string | required | |
| status | string | optional | candidate | confirmed (default: confirmed) |
| active | boolean | optional | |
| vat_number | string | optional | |
| keywords | array | optional | |
| notes | string | optional |
/api/v1/entities/{entity_id}Auth requiredFetch a single entity by UUID
/api/v1/entities/{entity_id}Auth requiredPartial update — only supplied fields are written
/api/v1/entities/{entity_id}Auth requiredDelete an entity
/api/v1/entities/export.csvAuth requiredDownload all entities for the API key's tenant as CSV
/api/v1/entities/import.csvAuth requiredBulk upsert entities from CSV (multipart upload)
| Param | Type | Required | Description |
|---|---|---|---|
| file | file | required | CSV with a header row |
Required column: canonical_name. Same upsert semantics as counterparties.
Exports 1 endpoint
Pre-shaped CSV downloads for ERP imports (Xero today, Sage / QuickBooks roadmap).
/api/v1/exports/xero/bills.csvAuth requiredApproved AP invoices in Xero's Bills CSV import format
| Param | Type | Required | Description |
|---|---|---|---|
| date_from | string | optional | ISO date (YYYY-MM-DD); default = today − 90 days |
| date_to | string | optional | ISO date; default = today |
| status | string | optional | Lifecycle filter; default = 'approved' |
One row per *line item* — Xero groups lines back into a single bill by InvoiceNumber + ContactName. AccountCode and TaxType are intentionally blank: map them in Xero's import wizard once and Xero remembers the mapping. Credit notes are excluded (Xero imports them via a separate CreditNotes CSV). External-system identifiers stored on the Counterparty record (counterparties.external_ids JSONB) round-trip via the /api/v1/counterparties endpoints.
Other 4 endpoints
Misc / cross-cutting endpoints.
/healthPublicService health check
/templates/common-fieldsPublicGet curated + tenant-learned common field definitions for the template builder
/api/v1/reports/{report_id}Auth requiredPaginated JSON export of a saved custom report (API key auth)
| Param | Type | Required | Description |
|---|---|---|---|
| report_id | path | required | Saved report UUID |
| page | query | optional | |
| page_size | query | optional |
/api/v1/reports/{report_id}/csvAuth requiredCSV export of a saved custom report (API key auth)
Document types
The classifier picks one of these at the
classified
step; the extractor then runs the matching schema.
Custom templates (created at /ui/templates)
override the built-in schemas when you want a different field set.
| ID | Label | Description |
|---|---|---|
| ap_invoice | AP Invoice | Inbound invoice from a supplier |
| ap_credit_note | AP Credit Note | Inbound credit note |
| sales_invoice | Sales Invoice | Outbound invoice to a customer |
| sales_credit_note | Sales Credit Note | Outbound credit note |
| delivery_note | Delivery Note | Goods delivery note |
| receipt | Receipt | Payment receipt or till slip |
| generic_invoice | Generic Invoice | Invoice where direction cannot be determined |
Confidence
Every extracted document carries two confidence signals on the response. The headline number (`confidence_score`) is the model's overall self-reported confidence in the whole extraction. The granular signal (`field_confidences`) is a dict of field-name → 0.0-1.0 score covering every field the model populated. Use the granular shape — averaging it out into a single document number washes away exactly the information that drives routing decisions.
Response shape
Every extracted document carries both signals on the response. Field names match the keys in data.
{
"confidence_score": 0.94,
"field_confidences": {
"invoice_number": 0.99,
"vendor_name": 0.98,
"subtotal": 0.97,
"vat_amount": 0.96,
"total_amount": 0.99,
"due_date": 0.65,
"vendor_address": 0.78
},
"low_confidence_fields": [
"due_date",
"vendor_address"
]
}
Interpreting the numbers
| Range | Meaning |
|---|---|
| >= 0.95 | Verbatim-clean text — the model copied the value off the document with no ambiguity. Safe to auto-route in most tenants' workflows. |
| 0.70 - 0.94 | Required interpretation — the value was inferred or parsed (e.g. date-format disambiguation, OCR'd from a scan). Send to review for high-stakes documents; auto-route on volume tiers. |
| <= 0.69 | Smudged / ambiguous / candidate-pick — the model is explicitly uncertain. Always review before approving. |
The numbers come from the model's self-report. They're a useful ordinal signal — 'which field should the reviewer look at first' — but not a calibrated probability you can divide further. Validators (subtotal+VAT=total, ISO dates, VAT-number shapes) layer additional signal on top in a follow-up release.
Display rules
- We cap the displayed percentage at 99%. A model returning 1.0 still renders as 99% — true 100% is rare in any probabilistic system, and a 100% display trains users to over-trust.
- The Lowest Field column on /ui (the document queue) shows min(field_confidences.values()). Older docs without field_confidences fall back to confidence_score.
- low_confidence_fields is auto-derived from field_confidences (any field < 0.85 lands in the list). Kept for back-compat with the legacy binary signal.
Auto-approve gate. Per-tenant, per-doc-type confidence threshold at /ui/settings/review. Threshold gate compares against confidence_score today; future release will optionally gate on min(field_confidences) so a single bad field on an otherwise-good doc still routes to review.
Webhooks
Outbound HMAC-signed event push. Configure a subscription URL at /ui/settings/webhooks; the secret is shown to you exactly once at create time. Every event is delivered with at-least-once semantics — use the X-DocReadi-Event-Id header to dedupe.
Event types
| Event type | Fires when |
|---|---|
| document.approved | Reviewer approves (or auto-approve gate fires). |
| document.rejected | Reviewer rejects with a reason. |
| document.failed | Pipeline failure terminal state (extraction / parse error). |
| counterparty.candidate_created | First sighting of a new vendor / customer. |
| counterparty.confirmed | Counterparty status flips candidate → confirmed. |
| validation_error.raised | Cross-field validator flagged an issue (bank account, address, math). |
Headers on every delivery
| Header | Value |
|---|---|
| Content-Type | application/json |
| User-Agent | DocReadi-Webhook/1.0 |
| X-DocReadi-Signature | t= |
| X-DocReadi-Event-Id | |
| X-DocReadi-Event-Type | |
| X-DocReadi-Delivery-Id |
Verify the signature
Stripe-compatible. The signed payload is exactly f"{t}.{raw_request_body}". Recompute HMAC-SHA256 with your webhook secret, hex-encode it, and constant-time compare against the v1 value. Reject signatures whose timestamp is more than 5 minutes from now (replay guard).
import hmac, hashlib, time
def verify(secret: str, body: bytes, header: str) -> bool:
parts = dict(p.split('=', 1) for p in header.split(','))
t, v1 = parts.get('t'), parts.get('v1')
if not t or not v1: return False
if abs(int(time.time()) - int(t)) > 300: return False
expected = hmac.new(secret.encode(), f'{t}.'.encode() + body, hashlib.sha256).hexdigest()
return hmac.compare_digest(expected, v1)
Retry policy. Exponential backoff: 30s, 2m, 10m, 1h, 6h. After 5 failed attempts the delivery dead-letters and stops retrying. Any 2xx response counts as success; 4xx and 5xx both schedule a retry.
Configure subscriptions at /ui/settings/webhooks.
Example workflow — receipt OCR
Submit a receipt and get the extracted total:
- POST /ingest/process with file + source=whatsapp → get
document_id - GET /ingest/document/
<document_id>— poll until status =="extracted" - Read
total_amountfrom the response
Or skip the polling — register a
webhook
at /ui/settings/webhooks
and we'll POST document.approved
to your endpoint as soon as the row lands.
Machine-readable version: /api/guide