{"title":"DocReadi Document Intelligence API","version":"1.0","base_url":"https://docreadi.com","authentication":{"type":"api_key","header":"X-Api-Key","alt_header":"Authorization: Bearer <key>","note":"All endpoints except /api/guide and /api/docs-guide require an API key. Send it in the X-Api-Key header (preferred) or as 'Authorization: Bearer <key>'. Request a key from the operator via /ui/settings/api-keys. Each key is scoped to the tenant that created it — queries only return that tenant's documents (Postgres row-level security enforces this on every request)."},"endpoints":[{"method":"POST","path":"/ingest/process","summary":"Ingest a document and start the processing pipeline","auth_required":true,"content_type":"multipart/form-data","params":[{"name":"file","type":"file","required":true,"description":"PDF, JPG, or PNG file"},{"name":"source","type":"string","required":false,"default":"whatsapp","description":"Source identifier — e.g. 'whatsapp', 'email', 'manual'"},{"name":"document_type","type":"string","required":false,"default":"","description":"Force document type — ap_invoice | ap_credit_note | sales_invoice | sales_credit_note | delivery_note | receipt | generic_invoice. Empty string = auto-detect."},{"name":"sender_phone","type":"string","required":false,"default":"","description":"Sender phone number (WhatsApp integrations)"},{"name":"template_id","type":"string","required":false,"default":"","description":"Custom template UUID (mutually exclusive with document_type)"},{"name":"hold_for_type_selection","type":"boolean","required":false,"default":false,"description":"Store the file with status 'awaiting_type' and skip processing. Used by the WhatsApp type-menu flow — finalise the choice via POST /ingest/whatsapp/select-type."}],"response":{"document_id":"UUID string","file_type":"pdf | jpg | jpeg | png","status":"processing | awaiting_type"},"notes":"Processing runs asynchronously. Poll GET /ingest/document/{id} to check status."},{"method":"GET","path":"/ingest/document/{document_id}","summary":"Poll document processing status and get extracted fields","auth_required":true,"params":[{"name":"document_id","type":"path","required":true,"description":"UUID returned by POST /ingest/process"}],"response":{"document_id":"UUID string","status":"ingested | classified | parsed | extracted | approved | failed | voided","source_file":"display filename","document_type":"ap_invoice | sales_invoice | receipt | ...","confidence_score":"0.0–1.0 float (present when extracted) — overall extraction confidence","field_confidences":"dict[str, float] — per-field 0.0-1.0 confidences the model self-reported (Wave 3.5+)","invoice_number":"string or null","total_amount":"float or null","invoice_date":"YYYY-MM-DD or null","raw_vendor_name":"string or null","validation_errors":["list of error strings"],"display":{"_doc":"Pre-formatted strings using the tenant's date/number/currency preferences. Use these in WhatsApp replies and other tenant-facing outputs so reviewers see the same format everywhere. Raw fields above stay as ISO / numeric for programmatic use.","invoice_date":"tenant-formatted date string","due_date":"tenant-formatted date string","delivery_date":"tenant-formatted date string","receipt_date":"tenant-formatted date string","total_amount":"currency-prefixed amount string (e.g. 'ZAR 1 234,56')","subtotal":"currency-prefixed amount string","vat_amount":"currency-prefixed amount string"}},"notes":"Poll every 5–10 seconds. Status 'extracted' or 'approved' = complete. Status 'failed' = pipeline error, check validation_errors. The 'display' sub-object holds tenant-formatted strings — prefer those in user-facing replies; raw fields are for programmatic access."},{"method":"POST","path":"/ingest","summary":"Ingest a document without processing (creates record only)","auth_required":true,"content_type":"multipart/form-data","params":[{"name":"file","type":"file","required":true,"description":"PDF, JPG, or PNG file"},{"name":"source","type":"string","required":false,"default":"manual"}],"response":{"document_id":"UUID","file_path":"internal path","file_type":"pdf | jpg | png"}},{"method":"GET","path":"/ingest/whatsapp/prefs/{sender_phone}","summary":"Get per-sender WhatsApp preferences","auth_required":true,"response":{"sender_phone":"string","document_type":"string (empty = auto)","reply_format":"summary | detail"}},{"method":"POST","path":"/ingest/whatsapp/prefs/{sender_phone}","summary":"Update per-sender WhatsApp preferences","auth_required":true,"content_type":"application/json","params":[{"name":"document_type","type":"string","required":false,"description":"Force document type for this sender"},{"name":"reply_format","type":"string","required":false,"description":"summary | detail"}]},{"method":"GET","path":"/ingest/whatsapp/type-menu/{sender_phone}","summary":"Build a WhatsApp interactive-list payload for document-type selection","auth_required":true,"params":[{"name":"sender_phone","type":"path","required":true},{"name":"document_id","type":"query","required":true,"description":"The document UUID returned by /ingest/process (hold_for_type_selection=true)"},{"name":"page","type":"query","required":false,"default":1,"description":"1-based page index. Each page carries at most 10 rows (Auto Detect + built-ins + custom templates); a 'More…' row links to the next page."}],"response":{"document_id":"UUID","sender_phone":"string","page":"int","has_more":"bool","total_entries":"int","sections":"list of {title, rows:[{id, title, description}]}"},"notes":"Row IDs encode the selection: type_<doc>_auto, type_<doc>_builtin_<type>, type_<doc>_tpl_<template_uuid>, or type_<doc>_more_<page>. Parse the tapped row's id, extract the selection, and call POST /ingest/whatsapp/select-type."},{"method":"POST","path":"/ingest/whatsapp/select-type","summary":"Apply a user's type selection and kick off processing","auth_required":true,"content_type":"application/json","params":[{"name":"document_id","type":"string","required":true},{"name":"document_type","type":"string","required":false,"description":"Built-in doc type. Leave empty and also leave template_id empty to let the pipeline auto-classify."},{"name":"template_id","type":"string","required":false,"description":"Custom template UUID (mutually exclusive with document_type)"}],"response":{"document_id":"UUID","status":"processing","document_type":"string or null","template_id":"UUID or null","label":"Human-readable label shown back to the WhatsApp user"}},{"method":"GET","path":"/ingest/expenses/summary","summary":"Monthly expense totals for a WhatsApp sender","auth_required":true,"params":[{"name":"sender_phone","type":"query","required":false,"description":"Filter by sender phone. Omit for all senders."}],"response":{"sender_phone":"string","months":[{"month":"YYYY-MM","count":"int","total_amount":"float"}]}},{"method":"POST","path":"/classify","summary":"Classify a document's type (ap_invoice / sales_invoice / delivery_note / receipt / …)","auth_required":true,"content_type":"application/json","params":[{"name":"document_id","type":"string","required":true,"description":"UUID of an already-ingested document"}],"response":{"document_id":"UUID","document_type":"ap_invoice | sales_invoice | delivery_note | receipt | ap_credit_note | sales_credit_note | generic_invoice | unknown","reasoning":"Short model explanation"},"notes":"Called automatically by /ingest/process. Use this endpoint only to re-classify or run standalone."},{"method":"POST","path":"/parse","summary":"Parse an ingested document to markdown (PyMuPDF for native PDFs, Mistral OCR for scans/images)","auth_required":true,"content_type":"application/json","params":[{"name":"document_id","type":"string","required":true},{"name":"run_docling","type":"boolean","required":false,"default":true,"description":"Run Docling layout hint in parallel (best-effort, non-blocking)"},{"name":"parser","type":"string","required":false,"description":"Force a parser: pymupdf | mistral_ocr | mistral_direct_<doc_type>"}],"response":{"document_id":"UUID","markdown":"string","page_count":"int","parse_method":"pymupdf | mistral_ocr | mistral_direct_<doc_type>","vendor_hint":"string or null","structured_data":"Mistral structured payload or null","parse_metadata":{"source_format":"markdown | structured"}}},{"method":"POST","path":"/extract","summary":"Extract structured fields from parsed markdown into the typed schema","auth_required":true,"content_type":"application/json","params":[{"name":"document_id","type":"string","required":true},{"name":"document_type","type":"string","required":true,"description":"ap_invoice | delivery_note | sales_invoice (other built-ins routed via /ingest/process)"},{"name":"markdown","type":"string","required":true,"description":"Markdown returned by /parse"},{"name":"vendor_hint","type":"string","required":false,"description":"Docling-derived layout hint, if any"}],"response":{"document_id":"UUID","document_type":"string","data":"Full extracted schema (fields vary by document_type)","confidence_score":"0.0–1.0 float — overall extraction confidence","field_confidences":"dict[str, float] — per-field 0.0-1.0 confidence the model self-reported for every extracted field. Wave 3.5+. Use the lowest value to gate routing decisions; the field name is the same key as in `data`. Older docs may have an empty dict — fall back to confidence_score.","low_confidence_fields":"list[str] — field names where field_confidences[name] < 0.85. Auto-derived from field_confidences and kept for back-compat with the legacy boolean signal.","validation_errors":["list of error strings"]}},{"method":"GET","path":"/health","summary":"Service health check","auth_required":false,"response":{"status":"ok","db_ready":true}},{"method":"GET","path":"/templates/common-fields","summary":"Get curated + tenant-learned common field definitions for the template builder","auth_required":false,"response":{"curated":[{"name":"string","type":"string","description":"string","group":"string"}],"learned":[{"name":"string","count":"int"}]}},{"method":"GET","path":"/api/v1/reports/{report_id}","summary":"Paginated JSON export of a saved custom report (API key auth)","auth_required":true,"params":[{"name":"report_id","type":"path","required":true,"description":"Saved report UUID"},{"name":"page","type":"query","required":false,"default":1},{"name":"page_size","type":"query","required":false,"default":100}],"response":{"report_id":"UUID","report_name":"string","total":"int","page":"int","page_size":"int","data":[{"column_name":"value"}]}},{"method":"GET","path":"/api/v1/reports/{report_id}/csv","summary":"CSV export of a saved custom report (API key auth)","auth_required":true,"response":"text/csv stream"},{"method":"GET","path":"/api/v1/counterparties","summary":"List counterparties (paginated). RLS-scoped to the API key's tenant.","auth_required":true,"params":[{"name":"page","type":"query","required":false,"default":1},{"name":"page_size","type":"query","required":false,"default":100,"description":"Max 500"},{"name":"status","type":"query","required":false,"description":"Filter: confirmed | candidate"}],"response":{"total":"int","page":"int","page_size":"int","data":[{"id":"UUID","canonical_name":"string","counterparty_type":"vendor | customer | both","status":"candidate | confirmed","active":"bool","vat_number":"string or null","address":"string or null","addresses":["list of strings"],"bank_name":"string or null","bank_account":"string or null","bank_accounts":["list of strings"],"keywords":["list of strings"],"aliases":["list of strings"],"typical_vat_rate":"float (decimal, e.g. 0.15) or null","typical_payment_terms":"int (days) or null","currency":"ISO 4217 or null","invoice_count":"int","consistent_observations":"int","created_at":"ISO 8601"}]}},{"method":"POST","path":"/api/v1/counterparties","summary":"Create a counterparty. 409 if canonical_name already exists (case-insensitive).","auth_required":true,"content_type":"application/json","params":[{"name":"canonical_name","type":"string","required":true},{"name":"counterparty_type","type":"string","required":false,"description":"vendor | customer | both (default: vendor)"},{"name":"status","type":"string","required":false,"description":"candidate | confirmed (default: confirmed)"},{"name":"active","type":"boolean","required":false,"default":true},{"name":"vat_number","type":"string","required":false},{"name":"address","type":"string","required":false},{"name":"addresses","type":"array<string>","required":false},{"name":"bank_name","type":"string","required":false},{"name":"bank_account","type":"string","required":false},{"name":"bank_accounts","type":"array<string>","required":false},{"name":"keywords","type":"array<string>","required":false},{"name":"aliases","type":"array<string>","required":false},{"name":"typical_vat_rate","type":"float","required":false,"description":"Decimal, e.g. 0.15 for 15%"},{"name":"typical_payment_terms","type":"int","required":false,"description":"Days"},{"name":"currency","type":"string","required":false,"description":"ISO 4217"}],"response":"Created counterparty (same shape as GET /api/v1/counterparties data[i])"},{"method":"GET","path":"/api/v1/counterparties/{counterparty_id}","summary":"Fetch a single counterparty by UUID","auth_required":true,"response":"Counterparty object (same shape as list data[i])"},{"method":"PATCH","path":"/api/v1/counterparties/{counterparty_id}","summary":"Partial update — only supplied fields are written","auth_required":true,"content_type":"application/json","params":[{"name":"any creation field","type":"see POST","required":false,"description":"Pass only the fields you want to change"}],"response":"Updated counterparty object"},{"method":"DELETE","path":"/api/v1/counterparties/{counterparty_id}","summary":"Delete a counterparty","auth_required":true,"response":{"deleted":"UUID"}},{"method":"GET","path":"/api/v1/counterparties/export.csv","summary":"Download all counterparties for the API key's tenant as CSV","auth_required":true,"response":"text/csv stream","notes":"List columns (addresses, bank_accounts, keywords, aliases) are JSON-encoded in the CSV so they round-trip cleanly through Excel."},{"method":"POST","path":"/api/v1/counterparties/import.csv","summary":"Bulk upsert from CSV (multipart upload)","auth_required":true,"content_type":"multipart/form-data","params":[{"name":"file","type":"file","required":true,"description":"CSV with a header row"}],"response":{"created":"int","updated":"int","skipped":"int","errors":[{"row":"int","error":"string"}]},"notes":"Required column: canonical_name. Upsert is case-insensitive on canonical_name. Unknown columns are ignored. List columns accept either JSON arrays or comma-separated strings."},{"method":"GET","path":"/api/v1/entities","summary":"List known company entities (your own legal names, used for AP/sales direction validation)","auth_required":true,"params":[{"name":"page","type":"query","required":false,"default":1},{"name":"page_size","type":"query","required":false,"default":100,"description":"Max 500"},{"name":"status","type":"query","required":false,"description":"Filter: confirmed | candidate"}],"response":{"total":"int","page":"int","page_size":"int","data":[{"id":"UUID","canonical_name":"string","status":"candidate | confirmed","active":"bool","vat_number":"string or null","keywords":["list of strings"],"notes":"string or null","consistent_observations":"int","created_at":"ISO 8601"}]}},{"method":"POST","path":"/api/v1/entities","summary":"Create an entity. 409 if canonical_name already exists.","auth_required":true,"content_type":"application/json","params":[{"name":"canonical_name","type":"string","required":true},{"name":"status","type":"string","required":false,"description":"candidate | confirmed (default: confirmed)"},{"name":"active","type":"boolean","required":false,"default":true},{"name":"vat_number","type":"string","required":false},{"name":"keywords","type":"array<string>","required":false},{"name":"notes","type":"string","required":false}],"response":"Created entity"},{"method":"GET","path":"/api/v1/entities/{entity_id}","summary":"Fetch a single entity by UUID","auth_required":true,"response":"Entity object"},{"method":"PATCH","path":"/api/v1/entities/{entity_id}","summary":"Partial update — only supplied fields are written","auth_required":true,"content_type":"application/json","response":"Updated entity object"},{"method":"DELETE","path":"/api/v1/entities/{entity_id}","summary":"Delete an entity","auth_required":true,"response":{"deleted":"UUID"}},{"method":"GET","path":"/api/v1/entities/export.csv","summary":"Download all entities for the API key's tenant as CSV","auth_required":true,"response":"text/csv stream"},{"method":"POST","path":"/api/v1/entities/import.csv","summary":"Bulk upsert entities from CSV (multipart upload)","auth_required":true,"content_type":"multipart/form-data","params":[{"name":"file","type":"file","required":true,"description":"CSV with a header row"}],"response":{"created":"int","updated":"int","skipped":"int","errors":[{"row":"int","error":"string"}]},"notes":"Required column: canonical_name. Same upsert semantics as counterparties."},{"method":"GET","path":"/api/v1/exports/xero/bills.csv","summary":"Approved AP invoices in Xero's Bills CSV import format","auth_required":true,"params":[{"name":"date_from","type":"string","required":false,"description":"ISO date (YYYY-MM-DD); default = today − 90 days"},{"name":"date_to","type":"string","required":false,"description":"ISO date; default = today"},{"name":"status","type":"string","required":false,"description":"Lifecycle filter; default = 'approved'"}],"response":"text/csv stream","notes":"One row per *line item* — Xero groups lines back into a single bill by InvoiceNumber + ContactName. AccountCode and TaxType are intentionally blank: map them in Xero's import wizard once and Xero remembers the mapping. Credit notes are excluded (Xero imports them via a separate CreditNotes CSV). External-system identifiers stored on the Counterparty record (counterparties.external_ids JSONB) round-trip via the /api/v1/counterparties endpoints."}],"document_types":[{"id":"ap_invoice","label":"AP Invoice","description":"Inbound invoice from a supplier"},{"id":"ap_credit_note","label":"AP Credit Note","description":"Inbound credit note"},{"id":"sales_invoice","label":"Sales Invoice","description":"Outbound invoice to a customer"},{"id":"sales_credit_note","label":"Sales Credit Note","description":"Outbound credit note"},{"id":"delivery_note","label":"Delivery Note","description":"Goods delivery note"},{"id":"receipt","label":"Receipt","description":"Payment receipt or till slip"},{"id":"generic_invoice","label":"Generic Invoice","description":"Invoice where direction cannot be determined"}],"status_values":[{"status":"ingested","description":"File received, not yet processed"},{"status":"classified","description":"Document type identified"},{"status":"parsed","description":"Text extracted from document"},{"status":"extracted","description":"Fields extracted, validation complete"},{"status":"approved","description":"Manually approved, written to typed tables"},{"status":"failed","description":"Pipeline error"},{"status":"voided","description":"Marked as void (expenses only)"}],"example_workflow":{"description":"Submit a receipt and get the extracted total","steps":["POST /ingest/process with file + source=whatsapp → get document_id","Wait 5s","GET /ingest/document/{document_id} → check status","Repeat until status == 'extracted'","Read total_amount from response"]},"confidence":{"description":"Every extracted document carries two confidence signals on the response. The headline number (`confidence_score`) is the model's overall self-reported confidence in the whole extraction. The granular signal (`field_confidences`) is a dict of field-name → 0.0-1.0 score covering every field the model populated. Use the granular shape — averaging it out into a single document number washes away exactly the information that drives routing decisions.","shape_example":{"confidence_score":0.94,"field_confidences":{"invoice_number":0.99,"vendor_name":0.98,"subtotal":0.97,"vat_amount":0.96,"total_amount":0.99,"due_date":0.65,"vendor_address":0.78},"low_confidence_fields":["due_date","vendor_address"]},"interpretation_guide":{">= 0.95":"Verbatim-clean text — the model copied the value off the document with no ambiguity. Safe to auto-route in most tenants' workflows.","0.70 - 0.94":"Required interpretation — the value was inferred or parsed (e.g. date-format disambiguation, OCR'd from a scan). Send to review for high-stakes documents; auto-route on volume tiers.","<= 0.69":"Smudged / ambiguous / candidate-pick — the model is explicitly uncertain. Always review before approving."},"calibration_disclaimer":"The numbers come from the model's self-report. They're a useful ordinal signal — 'which field should the reviewer look at first' — but not a calibrated probability you can divide further. Validators (subtotal+VAT=total, ISO dates, VAT-number shapes) layer additional signal on top in a follow-up release.","display_rules":["We cap the displayed percentage at 99%. A model returning 1.0 still renders as 99% — true 100% is rare in any probabilistic system, and a 100% display trains users to over-trust.","The Lowest Field column on /ui (the document queue) shows min(field_confidences.values()). Older docs without field_confidences fall back to confidence_score.","low_confidence_fields is auto-derived from field_confidences (any field < 0.85 lands in the list). Kept for back-compat with the legacy binary signal."],"auto_approve_gate":"Per-tenant, per-doc-type confidence threshold at /ui/settings/review. Threshold gate compares against confidence_score today; future release will optionally gate on min(field_confidences) so a single bad field on an otherwise-good doc still routes to review."},"webhooks":{"description":"Outbound HMAC-signed event push. Configure a subscription URL at /ui/settings/webhooks; the secret is shown to you exactly once at create time. Every event is delivered with at-least-once semantics — use the X-DocReadi-Event-Id header to dedupe.","event_types":[{"type":"document.approved","fires_when":"Reviewer approves (or auto-approve gate fires)."},{"type":"document.rejected","fires_when":"Reviewer rejects with a reason."},{"type":"document.failed","fires_when":"Pipeline failure terminal state (extraction / parse error)."},{"type":"counterparty.candidate_created","fires_when":"First sighting of a new vendor / customer."},{"type":"counterparty.confirmed","fires_when":"Counterparty status flips candidate → confirmed."},{"type":"validation_error.raised","fires_when":"Cross-field validator flagged an issue (bank account, address, math)."}],"headers_sent_on_every_delivery":{"Content-Type":"application/json","User-Agent":"DocReadi-Webhook/1.0","X-DocReadi-Signature":"t=<unix-timestamp>,v1=<hex-hmac-sha256>","X-DocReadi-Event-Id":"<uuid — idempotency key>","X-DocReadi-Event-Type":"<one of the event_types above>","X-DocReadi-Delivery-Id":"<uuid — internal delivery row id>"},"signature_scheme":"Stripe-compatible. The signed payload is exactly f\"{t}.{raw_request_body}\". Recompute HMAC-SHA256 with your webhook secret, hex-encode it, and constant-time compare against the v1 value. Reject signatures whose timestamp is more than 5 minutes from now (replay guard).","verify_example_python":"import hmac, hashlib, time\ndef verify(secret: str, body: bytes, header: str) -> bool:\n    parts = dict(p.split('=', 1) for p in header.split(','))\n    t, v1 = parts.get('t'), parts.get('v1')\n    if not t or not v1: return False\n    if abs(int(time.time()) - int(t)) > 300: return False\n    expected = hmac.new(secret.encode(), f'{t}.'.encode() + body, hashlib.sha256).hexdigest()\n    return hmac.compare_digest(expected, v1)\n","retry_policy":"Exponential backoff: 30s, 2m, 10m, 1h, 6h. After 5 failed attempts the delivery dead-letters and stops retrying. Any 2xx response counts as success; 4xx and 5xx both schedule a retry.","envelope_shape":{"id":"uuid (event id, also in X-DocReadi-Event-Id)","type":"string (one of event_types)","occurred_at":"ISO 8601 UTC timestamp","data":"object — event-specific payload"},"document_event_data_shape":{"document_id":"uuid","document_type":"string (ap_invoice, sales_invoice, …)","status":"string (approved | rejected | failed)","source":"string (api | upload | whatsapp | email)","source_file":"string","company_id":"uuid","created_at":"ISO 8601","updated_at":"ISO 8601","line_items":"array — present on document.approved when typed line items have been written. Each item carries the SKU mapping fields when sku_auto_apply_enabled is true: supplier_code (as extracted), internal_sku (mapped, or null when no alias), units_per_pack (mapped, or null), units_total (= quantity × units_per_pack), plus the standard fields description, quantity, unit_of_measure, unit_price, total, vat_rate, line_order. Use internal_sku to route into your ERP."}}}