.png)
Every week, a prospective client asks me some version of the same question: "Why do we need your OCR software with all its business rules and configurations? Can't we just plug in ChatGPT and be done with it?"
It's a fair question. The demos are breathtaking. You upload a medical claim form, ask an AI to extract the fields, and it does — fluently, confidently, instantly. The same with invoices. The same with explanation of benefits documents. It looks like magic, and honestly, sometimes it is.
But I run a company that processes these documents at scale, in production, for clients where errors have real consequences — claim denials, payment delays, audit failures, regulatory exposure. And what I can tell you, after two years of intensive real-world testing across every major AI platform, is this: the magic is real, and so is the mess it leaves behind.
Here is what happens in a demo: the system works. Someone has selected a clean, representative sample of documents. The AI reads them correctly. The fields populate. Everyone in the room nods.
Here is what happens in production: the system works — most of the time. And "most of the time" sounds fine until you realize what "the rest of the time" looks like in a high-volume document processing environment.
We have tested OpenAI, Claude, and Grok — all of them — against real-world invoice and medical claims scenarios. Each one helps significantly. Each one also causes damage. Not the same damage; each AI makes its own particular category of mistakes. One misreads certain date formats under specific conditions. Another confidently extracts a field from the wrong table when a document has an unusual layout. A third hallucinates a value that was never on the page at all — and does so with complete apparent certainty.
When an agentic AI system breaks down in document processing, it rarely announces itself. There's no error message, no red flag in the dashboard. The data flows out — it just flows out wrong. Corrupted fields, transposed values, missing records, fabricated entries.
At some point, someone on the client side notices. Maybe it's a bookkeeper reconciling accounts. Maybe it's a billing manager running a monthly report. Maybe it's an auditor. By then, weeks or months of data may have passed through the pipeline. The question immediately becomes: how far back does the corruption go?
And then: who is responsible?
The answer, legally and practically, is not the AI. It is the processing company. And if you are a CFO or COO who signed off on an AI-first approach, it is ultimately you.
This is the part that gets glossed over in AI keynotes and vendor pitches. AI agents are tools — remarkable tools. But when an AI tool causes a financial or medical records error, the human who deployed it is responsible. Not the model provider. Not the API. The organization that decided to use it.
There is a profound difference between using AI as an assistive layer within a controlled, rule-governed system, and handing autonomous document processing over to an AI agent end to end.
In the first model, when something goes wrong, we catch it. Our business rules validate outputs. Our exception workflows flag anomalies. Human reviewers see borderline cases. The AI makes us faster and more accurate; it doesn't replace the control structures.
In the second model, you are a C-level executive personally accountable for whatever the agent decides to do with a patient's insurance claim.
The stack that reliably works for high-stakes document processing — invoices, medical claims, explanation of benefits, remittance advices — is not pure AI. It is layered: specialized OCR engines trained on document types, combined with advanced business rules that encode domain knowledge (what a valid NPI number looks like, what CPT code ranges are permissible, what invoice totals must reconcile to), augmented by AI for the judgment calls that rules alone can't capture.
This is not a legacy approach dressed up with AI branding. It is the architecture that delivers sub-1% error rates in production, at volume, over time. The rules exist because the domain has rules. Health insurance has rules. Tax compliance has rules. You cannot route around them with a language model, no matter how sophisticated.
I want to be clear: I am not anti-AI. My team monitors and tests every significant agentic technology as it develops. In simpler environments — basic data extraction, document classification, routing — agentic AI is already delivering real operational value today. The trajectory is steep.
It is entirely possible that within the next 12 to 24 months, some agentic solution will mature to the point where it can be reliably deployed in high-stakes document processing without the level of oversight we currently require. My technical team is watching closely. We will be ready to adopt it when it crosses that threshold.
But we are not there yet. And the cost of pretending we are — betting your data integrity, your regulatory compliance, and your professional liability on a technology still in its confident-but-unreliable adolescence — is one I'm not willing to pass on to our clients.
The hype is real. The progress is real. The risk is also real.
Know which one you're signing up for.
Eyal Barsky is the CEO and Technical AI Director of OCR Solutions, which provides document processing software for insurance, healthcare, and financial services organizations, as well as customized AI software development, ID reading solutions, and enterprise infrastructure management.