Can AI Read Medical Claims & Invoices? Not Reliably (Yet)

I get this question almost every week now. A prospect calls, we start talking about their document processing needs, and at some point they say something like: "Look, can't we just throw this into ChatGPT? Why do we need all the rules and configuration your system requires?"

It's not a stupid question. I've seen the demos. The demos are very impressive. You drop in a medical claim form, the AI pulls out the fields, and it's fast, it's clean, it looks right. Same story with invoices. My first reaction a couple years ago was the same as everyone else's — this is remarkable, this changes everything.

But here's what running a document processing company in the real world teaches you pretty quickly: demos and production are two completely different animals.

The Demo Gap

In a demo, someone has cherry-picked the documents. They're clean. They're representative. The AI performs well and everyone leaves impressed.

In production, you get everything — the weird layouts, the faded scans, the forms that don't quite match the template, the edge cases nobody thought to test. And that's where the gap shows up.

We've run serious tests on OpenAI, Claude, Grok — the major players. Not controlled demos. Real invoice batches, real medical claims, the messy stuff we process daily. Every one of them adds genuine value. And every one of them also screws things up in ways that are hard to predict. What's interesting is they don't all fail the same way. One has trouble with certain date formats. Another will confidently pull a dollar figure from the wrong section of a two-column invoice. A third will just invent a value — not flag it as uncertain, not skip it, just make something up and present it like it's right there on the page.

That last one is what keeps me up at night.

The Problem Nobody Notices Until It's Too Late

Here's what an AI failure looks like in a live document processing environment: nothing. No alarm, no error log lighting up red, no popup saying something went wrong. The system just keeps processing. The data keeps flowing out the other end. It's just that some of it is wrong.

Weeks go by. Sometimes months. Then somebody on the client side is reconciling accounts, or pulling a report, or an auditor asks a question — and something doesn't add up. At that point you're not just fixing the error. You're figuring out how far back it goes and what else got corrupted along the way.

That's when the phone rings on our end. And the question is always the same: who is responsible for this?

Not the AI vendor, not the API provider — the processing company. If you're the CFO who approved an AI-first approach to claims processing this could be a career eliminating move.

The Accountability Gap Is Crucial

When addressing AI in most companies and with vendors, what is always left out is almost always liability.

AI agents' empty databases that learn as we use them. Their capabilities are impressive, but they don't carry legal responsibility. The company that deploys them is fully liable. Key decision makers who signed off do.

Our team utilizes AI in a responsible way — it's integrated into our own software, it helps handle exceptions faster, it catches things a rule engine alone would miss. It's important to note that there is a real difference between AI as one layer in a system with human oversight and validation built in, versus AI running the whole show autonomously.

In the first example, when the AI makes a mistake, we catch it. Preprogrammed business rules validate outputs. A data analyst looks at the enhanced verification station and flags or corrects anomalies. The data analyst is trained to find the borderline cases and make correct decisions based on many factors and rules.

In a built-in automated setup, you're a senior executive personally on the hook for whatever the AI decided to do with a patient's insurance claim.

What Actually Works

The stack that holds up in production — for invoices, medical claims, remittance advice, EOBs — isn't pure AI. It's layered. You need OCR systems designed specifically for the document types you are processing. Business rules that encode what your domain actually requires: valid NPI formats, permissible CPT code ranges, whether the invoice math reconciles. Then you add AI for the actions/process and to make human like judgment calls that the rules can't make.

This architecture must hit sub-1% error rates in production consistently, over time and that is being generous. The rules are not simple and must be followed in these very specific industries, compliance is key for everything. These rules are unique due to the domains in their sphere of influence — healthcare billing, financial compliance are based on rules and procedures. You can't train a language model to understand the nuances of each vertical industry's rules and procedures automatically. It takes months if not years to train these agents on the different combinations of rules per document.

Where This Is Going

To be clear, I am not writing this to say AI doesn't work or won't work. My team follows every significant agentic technology closely, and we integrate AI into our own products. In many cases basic classification, document routing, extraction from standardized forms — agentic AI delivers real value. That all said, AI is catching up, in several months likely we would be having a different conversation but when exactly is hard to predict.

In today's reality, high-volume document processing errors carry legal and financial consequences and we're not there yet. Transparency is key and saying no to a client is not what they want to hear but knowing the issues with AI is key to making it work for them in a production system.

The hype around AI is real. So is the progress. So is the risk.

Before you commit to an AI-first approach for your claims or invoices, make sure you know which one you're actually buying into.

For a tool-specific example of this trade-off, see Gemini vs InvoiceMax: can Gemini do invoice OCR?

Eyal Barsky is the CEO and Technical AI Director of OCR Solutions, which provides document processing software for insurance, healthcare, and financial services organizations, as well as customized AI software development, ID reading solutions, and enterprise infrastructure management.

Eyal Barsky

CEO

Founder and driving force behind OCR Solutions, Eyal leads the company with a vision for innovation in imaging technology, ID capture, and face recognition, ensuring every solution meets the highest standards of quality and performance.

Do You Think AI Can Automatically Read Your Medical Claim Forms or Invoices? Think Again — Let's Explain