Text extraction


While reviewing the reported quality feedback, we identified business documents where the value is visible to human eyes but was not captured. This resulted in default values being populated to the business document data file (XML). The default value can be blank or dummy and defined in the implementation guide (SVG/DCG).

The expected value is added on top of the invoice file (PDF) as an additional layer, for example, a comment. Additionally, the information can be inside an image, while the rest of the PDF file is machine-readable (metadata). In automatic processing, PDF comments and/or image based information is not possible to extract from the business document PDF file to the data file (XML).


Example A:

The supplier did not have the purchase order number (PO) available when the invoice was created on the supplier's own invoicing system. The invoice is sent to the buyer, who added the PO as a comment to the PDF file and then emailed the PDF file to the data capturing service. The invoice layout was templated in the data capture's automatic Gateway system, which only reads the technical data file (metadata). The comment layer is not part of the PDF metadata, so the PO is not extracted to the data file (XML). A PDF file is not editable without special software; comments can be placed with certain PDF programs.

Example B:

The supplier's company details are inside an image on their invoice layout. The image does not provide the information in the data code in a machine-readable format. If supplier details cannot be captured, it will result in a default supplier (unknown supplier) unless, for example, the supplier can be matched against the PO number in the invoice processing system.

Customer actions

Example A:

In a scenario where the value is not provided by the supplier on the invoice itself, then make the changes in your invoice processing system. Contact the supplier and ask them to avoid placing information as a layer on the PDF file. Supplier facing instructions are available here.

Example B:

If the details cannot be copied with a mouse from the PDF to the invoice processing system, it could be difficult for a supplier to change their invoice layout. Basware will ask you if you wish to change the capturing engine in this scenario.

Basware actions

Example A:

This error does not require any actions from Basware.

Example B:

Basware will ask you in the follow-up ticket (incident) how you wish to proceed if the service allows to change the capturing engine.

How can I ask questions or raise suggestions?

If there are any further inquiries related to the "Text extraction" finding or if you have suggestions, we welcome your feedback. Please contact Basware Support by filing the following support case.

For a comprehensive summary of the renewed quality feedback process and the selected improvements that have been applied, please review the Data capture feedback analysis process.