Navigating unstructured data: The rise of question answering

Organisations deal with vast amounts of text-based data on a daily basis. Text and unstructured data in general are involved in some of the core parts of every business: customer communication, reference documentation, and reporting, to name a few.

Natural language processing (NLP) and other emerging cognitive technologies are capable of analysing textual information in ways never before possible and are becoming widely used to enhance, scale, and automate various business processes. Examples of such processes include customer support and service, internal knowledge base search, content management, and research.

These and many other examples often require reading through large amounts of text and finding exact answers to given questions. The area of NLP that is concerned with building systems capable of automating these tasks is called machine reading comprehension, or more narrowly, question answering.

What is question answering?

Question answering systems are trained to answer user questions posed in a natural language by returning the relevant segments of the given text. For every question posed, they provide the exact locations of candidate answers.

Suppose that you need to analyse this excerpt from the Wikipedia article about Auckland:

Below are the questions you may ask about this passage, with the answers generated automatically by a question answering system:

QuestionsAnswers
Where is Auckland located?It is located in the Auckland Region, the area governed by Auckland Council, ...
How many people live in Auckland?... Auckland has an urban population of about 1,467,800 (June 2019).
What does Tāmaki Makaurau mean?The Māori-language name for Auckland is Tāmaki Makaurau, meaning "Tāmaki desired by many", ...

In the examples above, the exact answers returned by the question answering implementation are highlighted in bold.

Question answering vs keyword-based search

While both question answering and keyword matching systems assist in navigating text content, there are some key differences between the two technologies.

First of all, unlike keyword-based search engines, question answering systems take questions posed in a natural language as an input rather than a set of keywords. For example, question answering systems will "understand" and find the answer to "How many people live in Auckland?", while keyword-based search engines will only expect keywords.

Secondly and more importantly, rather than simply finding the occurrences of matched keywords in the given documents (as standard search engines do), question answering systems look for segments that answer the input questions. For the "How many people live in Auckland?" question, our question answering system did not highlight the occurrences of "people" or "Auckland" but rather highlight the actual statistic mentioned in the article.

Question answering in practice

Question answering and other NLP techniques are used across a range of industries as diverse as healthcare, law, and retail. The recent advances in machine reading comprehension have resulted in these technologies being used to power countless intelligent applications for a wide variety of business processes within these industries.

For example, in the field of law, practitioners can easily get answers about legal cases by posing questions to intelligent assistants which are capable of analysing hundreds of pages of text and legal documentation in a matter of seconds.

The bottom line

Question answering and natural language processing in general help organisations efficiently navigate their data and allow them to improve operations, risk management, and other business processes. Used across a spectrum of business domains, question answering helps deal with overwhelming amounts of text and powers multiple intelligent applications that modernise the way businesses handle their data.

Related projects

How document understanding helps bring order to unstructured data.
data management
Defining the concept of data liquidity.
data management
How Uber's in-house platform powers discovery, exploration, and knowledge at scale.
data management
An overview of COLID, the data asset management platform built using semantic technologies.
data management

Made by Anton Vasetenkov.

If you want to say hi, you can reach me on LinkedIn or via email. If you like my work, you can support me by buying me a coffee.