HomeProjectsBlogContact
Made with ❤️ by Anton Vasetenkov
Document understanding: Modern techniques and real-world applications
Aug 29, 2020

Document understanding: Modern techniques and real-world applications

How document understanding helps bring order to unstructured data.

Documents are at the center of many business processes. Scanned pages and PDFs are ubiquitous and contain large amounts of information represented as forms and tables.

Historically, this information could only be analysed and used following manual data re-entry—the process which is slow and prone to error—as traditional optical character recognition (OCR) systems haven't been able to analyse such data and preserve its inherent structure in their output.

Document understanding is concerned with advancing the abilities of document intelligence by supporting the retrieval of structured data in addition to simple text. A process that heavily relies on machine learning, it has proven key to automating structured data extraction and unlocking its full potential by making it readily accessible for subsequent processing and analysis.

How document understanding works

"Understanding" a document first involves detecting its layout and key elements such as figures, tables, and forms. These elements are then processed separately to extract the underlying data relationships.

Any embedded forms are parsed into sets of key-value pairs, each pair corresponding to a single form field. An example of a key-value pair is "First name"–"Alice". The sets of such linked data items can subsequently be inserted into a database, one row or document per form.

Document understanding products and services

The easiest way to incorporate document understanding into production workflows is to use existing cloud services. Major cloud providers each offer multiple machine learning-based services which include text and document intelligence. These offerings are summarised in the following table:

ServiceProviderDescription
Amazon TextractAmazon Web ServicesAmazon Textract parses form data and tables. The service is integrated with Amazon Augmented AI (Amazon A2I) for implementing human review.
Document AIGoogle CloudPreviously known as Document Understanding AI, Document AI is capable of parsing forms, tables, and invoice content (the invoice feature is only available for approved customers). The demo is available here or here.
Form RecognizerMicrosoft AzureForm Recognizer extracts tables and key-value form pairs from documents and offers prebuilt models for analysing receipts and business cards.

Industry use cases

Document understanding is a key component of various emerging practical workflows and applications.

An example application of document understanding is invoice processing. Invoices are commonly sent as PDFs or paper documents that can be formatted in different ways but generally contain the same type of information such as invoice date, amount due, payment terms, etc. By being able to automatically recognise and extract this information, cognitive invoicing systems facilitate invoice processing and reduce the associated costs.

The bottom line

By automating manual document activities, document understanding enables organisations to process documents more efficiently, reduce error, and bring down costs. By helping extract the valuable information stored inside scanned and digital documents, it assists in search and discovery and compliance control for these documents.

The extracted structured data can be ingested by various downstream business applications, enabling smarter workflows and more advanced processing at scale.

See also
A technical introduction to OpenAI's GPT-3 language model
An overview of the groundbreaking GPT-3 language model created by OpenAI.
Data discovery at Uber: The continued success of Databook
How Uber's in-house platform powers discovery, exploration, and knowledge at scale.
How a custom solution helps Facebook's engineers discover the data they need
The story of Nemo, Facebook's internal data discovery engine.
One schema, one API: Inside the world of Data Commons
Data Commons brings thousands of public datasets together into one data graph to give data analysts and researchers a jump-start on analysing open data.
Harnessing the power of the Oxford English Dictionary for linguistic research and NLP applications
How the OED Text Annotator may help bring text mining and natural language processing technologies to the next level.
Towards more linked lexicographical data: Lexemes on Wikidata
A glimpse into the meaning and other properties of words described with structured and linked data.
Navigating unstructured data: The rise of question answering
Question answering technologies are key to efficiently dealing with overwhelming amounts of unstructured data.
Why federation is a game-changing feature of SPARQL
SPARQL federation is an incredibly useful feature for querying distributed RDF graphs.
Let's explore the Nobel Prize dataset
An overview of the official Nobel Prize Linked Data dataset with some example SPARQL queries.