API Reference · v1.0

DocReadi Document Intelligence API

REST API for ingesting financial documents (PDFs, images, scans) and extracting structured data — invoices, receipts, delivery notes, statements, custom shapes. Push results to your stack via signed webhooks or pull on a schedule.

Base URL: https://docreadi.com · OpenAPI 3.1 · JSON over HTTPS · UTF-8.

Quickstart

Three commands from a fresh API key to extracted JSON. Get a key at /ui/settings/api-keys (sign up for the free trial first if you don't have a workspace).

1. Submit a document

# Replace dr_abc… with your real key.
curl -X POST https://docreadi.com/ingest/process \
  -H "X-Api-Key: dr_abc123…" \
  -F "file=@invoice.pdf" \
  -F "source=api"

Returns {{"document_id": "…", "status": "ingested"}}.

2. Poll until extraction completes

curl https://docreadi.com/ingest/document/"$DOC_ID" \
  -H "X-Api-Key: dr_abc123…"

Status walks ingested → classified → parsed → extracted. Webhooks fire on document.approved if you'd rather not poll.

3. Read the structured fields

# status == "extracted" → response.data carries the typed fields
# for the document_type. AP invoices: invoice_number, vendor_name,
# subtotal, vat_amount, total_amount, line_items, …

Authentication

Every protected endpoint requires an API key. Send it in the X-Api-Key request header (preferred) or as Authorization: Bearer <key>. Each key is scoped to the tenant that created it; queries only return that tenant's data.

Manage keys at /ui/settings/api-keys. Keys are shown to you exactly once at create time — store them in your secrets manager, not in source control.

Endpoints

Public endpoints are marked. Everything else requires an API key. 30 endpoints across 6 groups.

Ingest 8 endpoints

Submit documents and check their processing status.

POST/ingest/processAuth required

Ingest a document and start the processing pipeline

Param	Type	Required	Description
file	file	required	PDF, JPG, or PNG file
source	string	optional	Source identifier — e.g. 'whatsapp', 'email', 'manual'
document_type	string	optional	Force document type — ap_invoice \| ap_credit_note \| sales_invoice \| sales_credit_note \| delivery_note \| receipt \| generic_invoice. Empty string = auto-detect.
sender_phone	string	optional	Sender phone number (WhatsApp integrations)
template_id	string	optional	Custom template UUID (mutually exclusive with document_type)
hold_for_type_selection	boolean	optional	Store the file with status 'awaiting_type' and skip processing. Used by the WhatsApp type-menu flow — finalise the choice via POST /ingest/whatsapp/select-type.

Processing runs asynchronously. Poll GET /ingest/document/{id} to check status.

GET/ingest/document/{document_id}Auth required

Poll document processing status and get extracted fields

Param	Type	Required	Description
document_id	path	required	UUID returned by POST /ingest/process

Poll every 5–10 seconds. Status 'extracted' or 'approved' = complete. Status 'failed' = pipeline error, check validation_errors. The 'display' sub-object holds tenant-formatted strings — prefer those in user-facing replies; raw fields are for programmatic access.

POST/ingestAuth required

Ingest a document without processing (creates record only)

Param	Type	Required	Description
file	file	required	PDF, JPG, or PNG file
source	string	optional

GET/ingest/whatsapp/prefs/{sender_phone}Auth required

Get per-sender WhatsApp preferences

POST/ingest/whatsapp/prefs/{sender_phone}Auth required

Update per-sender WhatsApp preferences

Param	Type	Required	Description
document_type	string	optional	Force document type for this sender
reply_format	string	optional	summary \| detail

GET/ingest/whatsapp/type-menu/{sender_phone}Auth required

Build a WhatsApp interactive-list payload for document-type selection

Param	Type	Required	Description
sender_phone	path	required
document_id	query	required	The document UUID returned by /ingest/process (hold_for_type_selection=true)
page	query	optional	1-based page index. Each page carries at most 10 rows (Auto Detect + built-ins + custom templates); a 'More…' row links to the next page.

Row IDs encode the selection: type__auto, type__builtin_, type__tpl_, or type__more_. Parse the tapped row's id, extract the selection, and call POST /ingest/whatsapp/select-type.

POST/ingest/whatsapp/select-typeAuth required

Apply a user's type selection and kick off processing

Param	Type	Required	Description
document_id	string	required
document_type	string	optional	Built-in doc type. Leave empty and also leave template_id empty to let the pipeline auto-classify.
template_id	string	optional	Custom template UUID (mutually exclusive with document_type)

GET/ingest/expenses/summaryAuth required

Monthly expense totals for a WhatsApp sender

Param	Type	Required	Description
sender_phone	query	optional	Filter by sender phone. Omit for all senders.

Pipeline 3 endpoints

Lower-level stages exposed for advanced workflows. Most callers use Ingest.

POST/classifyAuth required

Classify a document's type (ap_invoice / sales_invoice / delivery_note / receipt / …)

Param	Type	Required	Description
document_id	string	required	UUID of an already-ingested document

Called automatically by /ingest/process. Use this endpoint only to re-classify or run standalone.

POST/parseAuth required

Parse an ingested document to markdown (PyMuPDF for native PDFs, Mistral OCR for scans/images)

Param	Type	Required	Description
document_id	string	required
run_docling	boolean	optional	Run Docling layout hint in parallel (best-effort, non-blocking)
parser	string	optional	Force a parser: pymupdf \| mistral_ocr \| mistral_direct_

POST/extractAuth required

Extract structured fields from parsed markdown into the typed schema

Param	Type	Required	Description
document_id	string	required
document_type	string	required	ap_invoice \| delivery_note \| sales_invoice (other built-ins routed via /ingest/process)
markdown	string	required	Markdown returned by /parse
vendor_hint	string	optional	Docling-derived layout hint, if any

Counterparties 7 endpoints

Vendor / customer registry — CRUD via JSON, bulk via CSV.

GET/api/v1/counterpartiesAuth required

List counterparties (paginated). RLS-scoped to the API key's tenant.

Param	Type	Required	Description
page	query	optional
page_size	query	optional	Max 500
status	query	optional	Filter: confirmed \| candidate

POST/api/v1/counterpartiesAuth required

Create a counterparty. 409 if canonical_name already exists (case-insensitive).

Param	Type	Required	Description
canonical_name	string	required
counterparty_type	string	optional	vendor \| customer \| both (default: vendor)
status	string	optional	candidate \| confirmed (default: confirmed)
active	boolean	optional
vat_number	string	optional
address	string	optional
addresses	array	optional
bank_name	string	optional
bank_account	string	optional
bank_accounts	array	optional
keywords	array	optional
aliases	array	optional
typical_vat_rate	float	optional	Decimal, e.g. 0.15 for 15%
typical_payment_terms	int	optional	Days
currency	string	optional	ISO 4217

GET/api/v1/counterparties/{counterparty_id}Auth required

Fetch a single counterparty by UUID

PATCH/api/v1/counterparties/{counterparty_id}Auth required

Partial update — only supplied fields are written

Param	Type	Required	Description
any creation field	see POST	optional	Pass only the fields you want to change

DELETE/api/v1/counterparties/{counterparty_id}Auth required

Delete a counterparty

GET/api/v1/counterparties/export.csvAuth required

Download all counterparties for the API key's tenant as CSV

List columns (addresses, bank_accounts, keywords, aliases) are JSON-encoded in the CSV so they round-trip cleanly through Excel.

POST/api/v1/counterparties/import.csvAuth required

Bulk upsert from CSV (multipart upload)

Param	Type	Required	Description
file	file	required	CSV with a header row

Required column: canonical_name. Upsert is case-insensitive on canonical_name. Unknown columns are ignored. List columns accept either JSON arrays or comma-separated strings.

Entities 7 endpoints

Your own legal entities used for AP/sales direction validation.

GET/api/v1/entitiesAuth required

List known company entities (your own legal names, used for AP/sales direction validation)

Param	Type	Required	Description
page	query	optional
page_size	query	optional	Max 500
status	query	optional	Filter: confirmed \| candidate

POST/api/v1/entitiesAuth required

Create an entity. 409 if canonical_name already exists.

Param	Type	Required	Description
canonical_name	string	required
status	string	optional	candidate \| confirmed (default: confirmed)
active	boolean	optional
vat_number	string	optional
keywords	array	optional
notes	string	optional

GET/api/v1/entities/{entity_id}Auth required

Fetch a single entity by UUID

PATCH/api/v1/entities/{entity_id}Auth required

Partial update — only supplied fields are written

DELETE/api/v1/entities/{entity_id}Auth required

Delete an entity

GET/api/v1/entities/export.csvAuth required

Download all entities for the API key's tenant as CSV

POST/api/v1/entities/import.csvAuth required

Bulk upsert entities from CSV (multipart upload)

Param	Type	Required	Description
file	file	required	CSV with a header row

Required column: canonical_name. Same upsert semantics as counterparties.

Exports 1 endpoint

Pre-shaped CSV downloads for ERP imports (Xero today, Sage / QuickBooks roadmap).

GET/api/v1/exports/xero/bills.csvAuth required

Approved AP invoices in Xero's Bills CSV import format

Param	Type	Required	Description
date_from	string	optional	ISO date (YYYY-MM-DD); default = today − 90 days
date_to	string	optional	ISO date; default = today
status	string	optional	Lifecycle filter; default = 'approved'

One row per *line item* — Xero groups lines back into a single bill by InvoiceNumber + ContactName. AccountCode and TaxType are intentionally blank: map them in Xero's import wizard once and Xero remembers the mapping. Credit notes are excluded (Xero imports them via a separate CreditNotes CSV). External-system identifiers stored on the Counterparty record (counterparties.external_ids JSONB) round-trip via the /api/v1/counterparties endpoints.

Other 4 endpoints

Misc / cross-cutting endpoints.

GET/healthPublic

Service health check

GET/templates/common-fieldsPublic

Get curated + tenant-learned common field definitions for the template builder

GET/api/v1/reports/{report_id}Auth required

Paginated JSON export of a saved custom report (API key auth)

Param	Type	Required	Description
report_id	path	required	Saved report UUID
page	query	optional
page_size	query	optional

GET/api/v1/reports/{report_id}/csvAuth required

CSV export of a saved custom report (API key auth)

Document types

The classifier picks one of these at the classified step; the extractor then runs the matching schema. Custom templates (created at /ui/templates) override the built-in schemas when you want a different field set.

ID	Label	Description
ap_invoice	AP Invoice	Inbound invoice from a supplier
ap_credit_note	AP Credit Note	Inbound credit note
sales_invoice	Sales Invoice	Outbound invoice to a customer
sales_credit_note	Sales Credit Note	Outbound credit note
delivery_note	Delivery Note	Goods delivery note
receipt	Receipt	Payment receipt or till slip
generic_invoice	Generic Invoice	Invoice where direction cannot be determined

Confidence

Every extracted document carries two confidence signals on the response. The headline number (`confidence_score`) is the model's overall self-reported confidence in the whole extraction. The granular signal (`field_confidences`) is a dict of field-name → 0.0-1.0 score covering every field the model populated. Use the granular shape — averaging it out into a single document number washes away exactly the information that drives routing decisions.

Response shape

Every extracted document carries both signals on the response. Field names match the keys in data.

{
  "confidence_score": 0.94,
  "field_confidences": {
    "invoice_number": 0.99,
    "vendor_name": 0.98,
    "subtotal": 0.97,
    "vat_amount": 0.96,
    "total_amount": 0.99,
    "due_date": 0.65,
    "vendor_address": 0.78
  },
  "low_confidence_fields": [
    "due_date",
    "vendor_address"
  ]
}

Interpreting the numbers

Range	Meaning
>= 0.95	Verbatim-clean text — the model copied the value off the document with no ambiguity. Safe to auto-route in most tenants' workflows.
0.70 - 0.94	Required interpretation — the value was inferred or parsed (e.g. date-format disambiguation, OCR'd from a scan). Send to review for high-stakes documents; auto-route on volume tiers.
<= 0.69	Smudged / ambiguous / candidate-pick — the model is explicitly uncertain. Always review before approving.

The numbers come from the model's self-report. They're a useful ordinal signal — 'which field should the reviewer look at first' — but not a calibrated probability you can divide further. Validators (subtotal+VAT=total, ISO dates, VAT-number shapes) layer additional signal on top in a follow-up release.

Display rules

We cap the displayed percentage at 99%. A model returning 1.0 still renders as 99% — true 100% is rare in any probabilistic system, and a 100% display trains users to over-trust.
The Lowest Field column on /ui (the document queue) shows min(field_confidences.values()). Older docs without field_confidences fall back to confidence_score.
low_confidence_fields is auto-derived from field_confidences (any field < 0.85 lands in the list). Kept for back-compat with the legacy binary signal.

Auto-approve gate. Per-tenant, per-doc-type confidence threshold at /ui/settings/review. Threshold gate compares against confidence_score today; future release will optionally gate on min(field_confidences) so a single bad field on an otherwise-good doc still routes to review.

Webhooks

Outbound HMAC-signed event push. Configure a subscription URL at /ui/settings/webhooks; the secret is shown to you exactly once at create time. Every event is delivered with at-least-once semantics — use the X-DocReadi-Event-Id header to dedupe.

Event types

Event type	Fires when
document.approved	Reviewer approves (or auto-approve gate fires).
document.rejected	Reviewer rejects with a reason.
document.failed	Pipeline failure terminal state (extraction / parse error).
counterparty.candidate_created	First sighting of a new vendor / customer.
counterparty.confirmed	Counterparty status flips candidate → confirmed.
validation_error.raised	Cross-field validator flagged an issue (bank account, address, math).

Headers on every delivery

Header	Value
Content-Type	application/json
User-Agent	DocReadi-Webhook/1.0
X-DocReadi-Signature	t=,v1=
X-DocReadi-Event-Id
X-DocReadi-Event-Type
X-DocReadi-Delivery-Id

Verify the signature

Stripe-compatible. The signed payload is exactly f"{t}.{raw_request_body}". Recompute HMAC-SHA256 with your webhook secret, hex-encode it, and constant-time compare against the v1 value. Reject signatures whose timestamp is more than 5 minutes from now (replay guard).

import hmac, hashlib, time
def verify(secret: str, body: bytes, header: str) -> bool:
    parts = dict(p.split('=', 1) for p in header.split(','))
    t, v1 = parts.get('t'), parts.get('v1')
    if not t or not v1: return False
    if abs(int(time.time()) - int(t)) > 300: return False
    expected = hmac.new(secret.encode(), f'{t}.'.encode() + body, hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected, v1)

Retry policy. Exponential backoff: 30s, 2m, 10m, 1h, 6h. After 5 failed attempts the delivery dead-letters and stops retrying. Any 2xx response counts as success; 4xx and 5xx both schedule a retry.

Configure subscriptions at /ui/settings/webhooks.

Example workflow — receipt OCR

Submit a receipt and get the extracted total:

POST /ingest/process with file + source=whatsapp → get document_id
GET /ingest/document/<document_id> — poll until status == "extracted"
Read total_amount from the response

Or skip the polling — register a webhook at /ui/settings/webhooks and we'll POST document.approved to your endpoint as soon as the row lands.

Machine-readable version: /api/guide