OCR Accuracy Level
The question that comes up quite often in our engagements is – “What is your typical field acceptance rate and OCR accuracy level for Marks, Characters, and Handwriting text (Also Red Dropout vs Non-dropout)? Does your software do boilerplate drop-out?”
Accuracy is always dependent on the quality and type of the original document. My rule of thumb is that if you cant see it clearly with your eyes, OCR will not do a good job with it either. The better the image quality the higher our accuracy ratings will be. That said, we employ a voting engine that will catch most mistakes.
This analysis is based on standard 300 dpi TIFF or electronically generated PDF, or at scan time, image processing is applied like red-drop-out. On text, based documents, we typically see upwards of 90% character recognition accuracy (that is 90 out of 100 words and marks related to extracted metadata fields. On “clean” and proper registered documents, this percentage rises to upwards of 95%.
For Handwriting recognition (ICR), we typically see upwards of 75% if it is block letters, constrained and structured (comb fields).
Cursive Handwriting Recognition generally delivers upwards of 50% accuracy, but also improves over use with learning and training.
Boiler template dropout – all image processing settings are global. If required, separate workflows that include unique settings can be designed for identified document sets from a specific identified destination.
Whose engines do you use? Do you have a proprietary engine?
We employ two main OCR engines: Nuance Omnipage and OpenText Recostar.
Optional are the A2iA recognition engines. We can also license other 3rd party engines as required.