Optical Character Recognition – Best Practices

Download our free guide ‘OCR Best Practices

“Although the scanning of paper-based invoices isn’t considered e-invoicing, it is the natural first step for many organisations. Using Optical Character Recognition or ‘OCR’ software, the data can be moved from a paper-based format to a digital format that can be entered in the Accounts Payable system. Outside of EDI and invoice portals, OCR has been a predominant tool of choice to enable the digitization of invoices”.

Optical Character Recognition (OCR) technology is a hardware/software tool that takes a paper document, usually, an invoice, scans then “reads” it and turns it into metadata that can be used to populate fields in a database.

From there the invoice can be brought into an electronic workflow for processing. Using OCR software, the data can be moved from a paper-based format to a digital format that can be entered in the AP system. OCR is the electronic conversion (through scanning) of invoices without extractable data (either paper or image files) into data that can be integrated directly (as an EDI or XML file) into a buyers Accounts Payable finance system for payment.

Whilst OCR solutions enables organisations to automate their AP processes to a certain extent, there are restrictions that are inherent to OCR technology, and which limit its impact beyond achieving a semi-automated state, where human intervention and errors are part and parcel of the technology in question. After all, we speak of “recognition” and not “extraction” when referring to OCR. Fundamentally, OCR solutions are all based on a similar probabilistic technology and methodology. For instance, the number “1” vs. lowercase letter “L”, the number “0” vs. uppercase O, and so on. The latter is mitigated to some extent by the use of dictionaries (for example, “INVOICE” is more likely than “1NV0lCE”), but unfortunately invoice data such as the invoice number or the shipping reference, is usually not to be found in an OCR dictionary.

The challenge gets even more difficult when using OCR for invoice line item extraction. These inherent limitations of OCR result in varying accuracy recognition rates, which invariably requires human operators to check the results produced by OCR. Inaccuracies require manual intervention, leading to errors, long invoice processing time, and low percentage of “touchless” invoices or processed “straight-through”.

Comments are closed.