Blog

How accurate is Reconcily — and what happens when it isn't sure?

Accuracy isn't a single number. Here's how Reconcily grounds every read, cross-checks it, validates the math to the cent, and flags what it can't verify — so wrong answers get caught, not filed.

June 24, 2026 · The Reconcily team

Every bookkeeping tool that touches AI eventually gets the same question: how accurate is it? It’s the right question — but a single percentage is the wrong answer. A number tells you how often the model is right; it tells you nothing about what happens the rest of the time. For books that have to survive an audit, the second part matters more.

So here’s the honest version of how Reconcily thinks about accuracy: we design the system so that wrong answers get caught and flagged, not silently filed.

Two independent reads, then a cross-check

When a receipt, invoice, or statement comes in, Reconcily doesn’t ask one model to “read it” and trust the result. It reads each document twice, two different ways:

A grounded text layer. Enterprise OCR transcribes the document and records where every value sits on the page and how confident it is — down to the individual token.
A structured extraction. A vision-language model pulls out the fields that actually matter: vendor, date, subtotal, tax, tip, total, currency.

Then the two are cross-checked against each other. When the model’s “$312.00 total” lands exactly on the spot the OCR layer is confident about, that agreement is the strongest signal you can get that the number is real. When they disagree, that’s a flag.

The math has to reconcile — to the cent

Numbers that look right often aren’t. So before any document is accepted, Reconcily checks that it’s internally consistent: subtotal + tax − discount has to reconcile to the total within $0.01. Money is handled as exact decimal values throughout — never floating-point — so “reconciled” means reconciled, not “close enough.” A receipt whose parts don’t add up to its whole doesn’t get a pass; it gets reviewed.

Flag, don’t guess

This is the part most tools skip. A document is auto-accepted only when every check lines up:

the text layer is high-confidence (or the PDF carried real embedded text),
the two reads agree on total, tax, and date,
the arithmetic reconciles, and
nothing risky is going on — no foreign currency, no split tip, no two receipts crammed onto one scan.

If any of those fail, the document is routed to a review queue instead of into your books. The agent would rather say “I’m not sure about this one” than quietly guess. That single design choice is why the output is trustworthy: the errors that remain are visible, not buried.

Careful about duplicates and matches

Accuracy isn’t only about reading one document right — it’s about not double-counting or mis-pairing across thousands of them. Reconcily is deliberately conservative here:

Exact duplicates (the same file twice) are caught by a content fingerprint.
Near-duplicates are only merged when the extracted vendor, total, and date corroborate it — image similarity alone is rejected, because two different utility bills can look almost identical.
Transfers between your own accounts are only auto-paired when the match is unambiguous; a coincidental same-amount purchase is left alone.

And when you confirm or reject a grouping, that decision sticks — re-running the pipeline never undoes your call.

Every number traces back to a document

Because each extracted field is anchored to the document — and the spot on the page — it came from, every line in your books is traceable to its source. That’s what makes the result audit-ready: not a promise that the AI was right, but the ability to show the receipt behind every figure.

We measure ourselves, continuously

We maintain a hand-labeled “golden set” of real audit documents and score the agent against it on every change, so accuracy is something we track and defend rather than assert. On that benchmark the agent extracts the large majority of documents correctly with no human touch — and, just as importantly, it’s reliable about flagging the ones it can’t verify. When a stronger model would do better on a hard document, the pipeline escalates to it automatically.

The honest bottom line

No document AI is perfect, and anyone who tells you theirs is should worry you. Reconcily’s goal isn’t a flawless model — it’s a system where the imperfections are caught, flagged, and traceable, with a human in the loop for anything uncertain. You stay in control, your books reconcile to the cent, and every line points back to the document that proves it.

That’s accuracy you can actually take to an audit.

Want to see it on your own documents? Get early access.