DocReadi Document Intelligence

API Reference · v1.0

DocReadi Document Intelligence API

REST API for ingesting financial documents (PDFs, images, scans) and extracting structured data — invoices, receipts, delivery notes, statements, custom shapes. Push results to your stack via signed webhooks or pull on a schedule.

Base URL: https://docreadi.com · OpenAPI 3.1 · JSON over HTTPS · UTF-8.

Quickstart

Three commands from a fresh API key to extracted JSON. Get a key at /ui/settings/api-keys (sign up for the free trial first if you don't have a workspace).

1. Submit a document

# Replace dr_abc… with your real key.
curl -X POST https://docreadi.com/ingest/process \
  -H "X-Api-Key: dr_abc123…" \
  -F "file=@invoice.pdf" \
  -F "source=api"

Returns {{"document_id": "…", "status": "ingested"}}.

2. Poll until extraction completes

curl https://docreadi.com/ingest/document/"$DOC_ID" \
  -H "X-Api-Key: dr_abc123…"

Status walks ingested → classified → parsed → extracted. Webhooks fire on document.approved if you'd rather not poll.

3. Read the structured fields

# status == "extracted" → response.data carries the typed fields
# for the document_type. AP invoices: invoice_number, vendor_name,
# subtotal, vat_amount, total_amount, line_items, …

Authentication

Every protected endpoint requires an API key. Send it in the X-Api-Key request header (preferred) or as Authorization: Bearer <key>. Each key is scoped to the tenant that created it; queries only return that tenant's data.

Manage keys at /ui/settings/api-keys. Keys are shown to you exactly once at create time — store them in your secrets manager, not in source control.

Endpoints

Public endpoints are marked. Everything else requires an API key. 30 endpoints across 6 groups.

Ingest 8 endpoints

Submit documents and check their processing status.

POST/ingest/processAuth required

Ingest a document and start the processing pipeline

ParamTypeRequiredDescription
filefilerequiredPDF, JPG, or PNG file
sourcestringoptionalSource identifier — e.g. 'whatsapp', 'email', 'manual'
document_typestringoptionalForce document type — ap_invoice | ap_credit_note | sales_invoice | sales_credit_note | delivery_note | receipt | generic_invoice. Empty string = auto-detect.
sender_phonestringoptionalSender phone number (WhatsApp integrations)
template_idstringoptionalCustom template UUID (mutually exclusive with document_type)
hold_for_type_selectionbooleanoptionalStore the file with status 'awaiting_type' and skip processing. Used by the WhatsApp type-menu flow — finalise the choice via POST /ingest/whatsapp/select-type.

Processing runs asynchronously. Poll GET /ingest/document/{id} to check status.

GET/ingest/document/{document_id}Auth required

Poll document processing status and get extracted fields

ParamTypeRequiredDescription
document_idpathrequiredUUID returned by POST /ingest/process

Poll every 5–10 seconds. Status 'extracted' or 'approved' = complete. Status 'failed' = pipeline error, check validation_errors. The 'display' sub-object holds tenant-formatted strings — prefer those in user-facing replies; raw fields are for programmatic access.

POST/ingestAuth required

Ingest a document without processing (creates record only)

ParamTypeRequiredDescription
filefilerequiredPDF, JPG, or PNG file
sourcestringoptional
GET/ingest/whatsapp/prefs/{sender_phone}Auth required

Get per-sender WhatsApp preferences

POST/ingest/whatsapp/prefs/{sender_phone}Auth required

Update per-sender WhatsApp preferences

ParamTypeRequiredDescription
document_typestringoptionalForce document type for this sender
reply_formatstringoptionalsummary | detail
GET/ingest/whatsapp/type-menu/{sender_phone}Auth required

Build a WhatsApp interactive-list payload for document-type selection

ParamTypeRequiredDescription
sender_phonepathrequired
document_idqueryrequiredThe document UUID returned by /ingest/process (hold_for_type_selection=true)
pagequeryoptional1-based page index. Each page carries at most 10 rows (Auto Detect + built-ins + custom templates); a 'More…' row links to the next page.

Row IDs encode the selection: type__auto, type__builtin_, type__tpl_, or type__more_. Parse the tapped row's id, extract the selection, and call POST /ingest/whatsapp/select-type.

POST/ingest/whatsapp/select-typeAuth required

Apply a user's type selection and kick off processing

ParamTypeRequiredDescription
document_idstringrequired
document_typestringoptionalBuilt-in doc type. Leave empty and also leave template_id empty to let the pipeline auto-classify.
template_idstringoptionalCustom template UUID (mutually exclusive with document_type)
GET/ingest/expenses/summaryAuth required

Monthly expense totals for a WhatsApp sender

ParamTypeRequiredDescription
sender_phonequeryoptionalFilter by sender phone. Omit for all senders.

Pipeline 3 endpoints

Lower-level stages exposed for advanced workflows. Most callers use Ingest.

POST/classifyAuth required

Classify a document's type (ap_invoice / sales_invoice / delivery_note / receipt / …)

ParamTypeRequiredDescription
document_idstringrequiredUUID of an already-ingested document

Called automatically by /ingest/process. Use this endpoint only to re-classify or run standalone.

POST/parseAuth required

Parse an ingested document to markdown (PyMuPDF for native PDFs, Mistral OCR for scans/images)

ParamTypeRequiredDescription
document_idstringrequired
run_doclingbooleanoptionalRun Docling layout hint in parallel (best-effort, non-blocking)
parserstringoptionalForce a parser: pymupdf | mistral_ocr | mistral_direct_
POST/extractAuth required

Extract structured fields from parsed markdown into the typed schema

ParamTypeRequiredDescription
document_idstringrequired
document_typestringrequiredap_invoice | delivery_note | sales_invoice (other built-ins routed via /ingest/process)
markdownstringrequiredMarkdown returned by /parse
vendor_hintstringoptionalDocling-derived layout hint, if any

Counterparties 7 endpoints

Vendor / customer registry — CRUD via JSON, bulk via CSV.

GET/api/v1/counterpartiesAuth required

List counterparties (paginated). RLS-scoped to the API key's tenant.

ParamTypeRequiredDescription
pagequeryoptional
page_sizequeryoptionalMax 500
statusqueryoptionalFilter: confirmed | candidate
POST/api/v1/counterpartiesAuth required

Create a counterparty. 409 if canonical_name already exists (case-insensitive).

ParamTypeRequiredDescription
canonical_namestringrequired
counterparty_typestringoptionalvendor | customer | both (default: vendor)
statusstringoptionalcandidate | confirmed (default: confirmed)
activebooleanoptional
vat_numberstringoptional
addressstringoptional
addressesarrayoptional
bank_namestringoptional
bank_accountstringoptional
bank_accountsarrayoptional
keywordsarrayoptional
aliasesarrayoptional
typical_vat_ratefloatoptionalDecimal, e.g. 0.15 for 15%
typical_payment_termsintoptionalDays
currencystringoptionalISO 4217
GET/api/v1/counterparties/{counterparty_id}Auth required

Fetch a single counterparty by UUID

PATCH/api/v1/counterparties/{counterparty_id}Auth required

Partial update — only supplied fields are written

ParamTypeRequiredDescription
any creation fieldsee POSToptionalPass only the fields you want to change
DELETE/api/v1/counterparties/{counterparty_id}Auth required

Delete a counterparty

GET/api/v1/counterparties/export.csvAuth required

Download all counterparties for the API key's tenant as CSV

List columns (addresses, bank_accounts, keywords, aliases) are JSON-encoded in the CSV so they round-trip cleanly through Excel.

POST/api/v1/counterparties/import.csvAuth required

Bulk upsert from CSV (multipart upload)

ParamTypeRequiredDescription
filefilerequiredCSV with a header row

Required column: canonical_name. Upsert is case-insensitive on canonical_name. Unknown columns are ignored. List columns accept either JSON arrays or comma-separated strings.

Entities 7 endpoints

Your own legal entities used for AP/sales direction validation.

GET/api/v1/entitiesAuth required

List known company entities (your own legal names, used for AP/sales direction validation)

ParamTypeRequiredDescription
pagequeryoptional
page_sizequeryoptionalMax 500
statusqueryoptionalFilter: confirmed | candidate
POST/api/v1/entitiesAuth required

Create an entity. 409 if canonical_name already exists.

ParamTypeRequiredDescription
canonical_namestringrequired
statusstringoptionalcandidate | confirmed (default: confirmed)
activebooleanoptional
vat_numberstringoptional
keywordsarrayoptional
notesstringoptional
GET/api/v1/entities/{entity_id}Auth required

Fetch a single entity by UUID

PATCH/api/v1/entities/{entity_id}Auth required

Partial update — only supplied fields are written

DELETE/api/v1/entities/{entity_id}Auth required

Delete an entity

GET/api/v1/entities/export.csvAuth required

Download all entities for the API key's tenant as CSV

POST/api/v1/entities/import.csvAuth required

Bulk upsert entities from CSV (multipart upload)

ParamTypeRequiredDescription
filefilerequiredCSV with a header row

Required column: canonical_name. Same upsert semantics as counterparties.

Exports 1 endpoint

Pre-shaped CSV downloads for ERP imports (Xero today, Sage / QuickBooks roadmap).

GET/api/v1/exports/xero/bills.csvAuth required

Approved AP invoices in Xero's Bills CSV import format

ParamTypeRequiredDescription
date_fromstringoptionalISO date (YYYY-MM-DD); default = today − 90 days
date_tostringoptionalISO date; default = today
statusstringoptionalLifecycle filter; default = 'approved'

One row per *line item* — Xero groups lines back into a single bill by InvoiceNumber + ContactName. AccountCode and TaxType are intentionally blank: map them in Xero's import wizard once and Xero remembers the mapping. Credit notes are excluded (Xero imports them via a separate CreditNotes CSV). External-system identifiers stored on the Counterparty record (counterparties.external_ids JSONB) round-trip via the /api/v1/counterparties endpoints.

Other 4 endpoints

Misc / cross-cutting endpoints.

GET/healthPublic

Service health check

GET/templates/common-fieldsPublic

Get curated + tenant-learned common field definitions for the template builder

GET/api/v1/reports/{report_id}Auth required

Paginated JSON export of a saved custom report (API key auth)

ParamTypeRequiredDescription
report_idpathrequiredSaved report UUID
pagequeryoptional
page_sizequeryoptional
GET/api/v1/reports/{report_id}/csvAuth required

CSV export of a saved custom report (API key auth)

Document types

The classifier picks one of these at the classified step; the extractor then runs the matching schema. Custom templates (created at /ui/templates) override the built-in schemas when you want a different field set.

ID Label Description
ap_invoiceAP InvoiceInbound invoice from a supplier
ap_credit_noteAP Credit NoteInbound credit note
sales_invoiceSales InvoiceOutbound invoice to a customer
sales_credit_noteSales Credit NoteOutbound credit note
delivery_noteDelivery NoteGoods delivery note
receiptReceiptPayment receipt or till slip
generic_invoiceGeneric InvoiceInvoice where direction cannot be determined

Confidence

Every extracted document carries two confidence signals on the response. The headline number (`confidence_score`) is the model's overall self-reported confidence in the whole extraction. The granular signal (`field_confidences`) is a dict of field-name → 0.0-1.0 score covering every field the model populated. Use the granular shape — averaging it out into a single document number washes away exactly the information that drives routing decisions.

Response shape

Every extracted document carries both signals on the response. Field names match the keys in data.

{
  "confidence_score": 0.94,
  "field_confidences": {
    "invoice_number": 0.99,
    "vendor_name": 0.98,
    "subtotal": 0.97,
    "vat_amount": 0.96,
    "total_amount": 0.99,
    "due_date": 0.65,
    "vendor_address": 0.78
  },
  "low_confidence_fields": [
    "due_date",
    "vendor_address"
  ]
}

Interpreting the numbers

Range Meaning
>= 0.95Verbatim-clean text — the model copied the value off the document with no ambiguity. Safe to auto-route in most tenants' workflows.
0.70 - 0.94Required interpretation — the value was inferred or parsed (e.g. date-format disambiguation, OCR'd from a scan). Send to review for high-stakes documents; auto-route on volume tiers.
<= 0.69Smudged / ambiguous / candidate-pick — the model is explicitly uncertain. Always review before approving.

The numbers come from the model's self-report. They're a useful ordinal signal — 'which field should the reviewer look at first' — but not a calibrated probability you can divide further. Validators (subtotal+VAT=total, ISO dates, VAT-number shapes) layer additional signal on top in a follow-up release.

Display rules

  • We cap the displayed percentage at 99%. A model returning 1.0 still renders as 99% — true 100% is rare in any probabilistic system, and a 100% display trains users to over-trust.
  • The Lowest Field column on /ui (the document queue) shows min(field_confidences.values()). Older docs without field_confidences fall back to confidence_score.
  • low_confidence_fields is auto-derived from field_confidences (any field < 0.85 lands in the list). Kept for back-compat with the legacy binary signal.

Auto-approve gate. Per-tenant, per-doc-type confidence threshold at /ui/settings/review. Threshold gate compares against confidence_score today; future release will optionally gate on min(field_confidences) so a single bad field on an otherwise-good doc still routes to review.

Webhooks

Outbound HMAC-signed event push. Configure a subscription URL at /ui/settings/webhooks; the secret is shown to you exactly once at create time. Every event is delivered with at-least-once semantics — use the X-DocReadi-Event-Id header to dedupe.

Event types

Event type Fires when
document.approvedReviewer approves (or auto-approve gate fires).
document.rejectedReviewer rejects with a reason.
document.failedPipeline failure terminal state (extraction / parse error).
counterparty.candidate_createdFirst sighting of a new vendor / customer.
counterparty.confirmedCounterparty status flips candidate → confirmed.
validation_error.raisedCross-field validator flagged an issue (bank account, address, math).

Headers on every delivery

Header Value
Content-Typeapplication/json
User-AgentDocReadi-Webhook/1.0
X-DocReadi-Signaturet=,v1=
X-DocReadi-Event-Id
X-DocReadi-Event-Type
X-DocReadi-Delivery-Id

Verify the signature

Stripe-compatible. The signed payload is exactly f"{t}.{raw_request_body}". Recompute HMAC-SHA256 with your webhook secret, hex-encode it, and constant-time compare against the v1 value. Reject signatures whose timestamp is more than 5 minutes from now (replay guard).

import hmac, hashlib, time
def verify(secret: str, body: bytes, header: str) -> bool:
    parts = dict(p.split('=', 1) for p in header.split(','))
    t, v1 = parts.get('t'), parts.get('v1')
    if not t or not v1: return False
    if abs(int(time.time()) - int(t)) > 300: return False
    expected = hmac.new(secret.encode(), f'{t}.'.encode() + body, hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected, v1)

Retry policy. Exponential backoff: 30s, 2m, 10m, 1h, 6h. After 5 failed attempts the delivery dead-letters and stops retrying. Any 2xx response counts as success; 4xx and 5xx both schedule a retry.

Configure subscriptions at /ui/settings/webhooks.

Example workflow — receipt OCR

Submit a receipt and get the extracted total:

  1. POST /ingest/process with file + source=whatsapp → get document_id
  2. GET /ingest/document/<document_id> — poll until status == "extracted"
  3. Read total_amount from the response

Or skip the polling — register a webhook at /ui/settings/webhooks and we'll POST document.approved to your endpoint as soon as the row lands.

Machine-readable version: /api/guide