My blog
Navigating unstructured data: The rise of question answering

Navigating unstructured data: The rise of question answering

Question answering technologies are key to efficiently dealing with overwhelming amounts of unstructured data.

Published on by Anton Vasetenkov

Organisations deal with vast amounts of text-based data on a daily basis. Text and unstructured data in general are involved in some of the core parts of every business: customer communication, reference documentation, and reporting, to name a few.

Natural language processing (NLP) and other emerging cognitive technologies are capable of analysing textual information in ways never before possible and are becoming widely used to enhance, scale, and automate various business processes. Examples of such processes include customer support and service, internal knowledge base search, content management, and research.

These and many other examples often require reading through large amounts of text and finding exact answers to given questions. The area of NLP that is concerned with building systems capable of automating these tasks is called machine reading comprehension, or more narrowly, question answering.

What is question answering?

Question answering systems are trained to answer user questions posed in a natural language by returning the relevant segments of the given text. For every question posed, they provide the exact locations of candidate answers.

Suppose that you need to analyse this excerpt from the Wikipedia article about Auckland:

Auckland is a metropolitan city in the North Island of New Zealand. The most populous urban area in the country, Auckland has an urban population of about 1,467,800 (June 2019). It is located in the Auckland Region, the area governed by Auckland Council, which includes outlying rural areas and the islands of the Hauraki Gulf, resulting in a total population of 1,642,800. Auckland is a diverse, multicultural and cosmopolitan city, home to the largest Polynesian population in the world. The Māori-language name for Auckland is Tāmaki Makaurau, meaning "Tāmaki desired by many", in reference to the desirability of its natural resources and geography.

Below are the questions you may ask about this passage, with the answers generated automatically by a question answering system:

Where is Auckland located?It is located in the Auckland Region, the area governed by Auckland Council, ...
How many people live in Auckland?... Auckland has an urban population of about 1,467,800 (June 2019).
What does Tāmaki Makaurau mean?The Māori-language name for Auckland is Tāmaki Makaurau, meaning "Tāmaki desired by many", ...

In the examples above, the exact answers returned by the question answering implementation are highlighted in bold.

Question answering vs keyword-based search

While both question answering and keyword matching systems assist in navigating text content, there are some key differences between the two technologies.

First of all, unlike keyword-based search engines, question answering systems take questions posed in a natural language as an input rather than a set of keywords. For example, question answering systems will "understand" and find the answer to "How many people live in Auckland?", while keyword-based search engines will only expect keywords.

Secondly and more importantly, rather than simply finding the occurrences of matched keywords in the given documents (as standard search engines do), question answering systems look for segments that answer the input questions. For the "How many people live in Auckland?" question, our question answering system did not highlight the occurrences of "people" or "Auckland" but rather highlight the actual statistic mentioned in the article.

Question answering in practice

Question answering and other NLP techniques are used across a range of industries as diverse as healthcare, law, and retail. The recent advances in machine reading comprehension have resulted in these technologies being used to power countless intelligent applications for a wide variety of business processes within these industries.

For example, in the field of law, practitioners can easily get answers about legal cases by posing questions to intelligent assistants which are capable of analysing hundreds of pages of text and legal documentation in a matter of seconds.

The bottom line

Question answering and natural language processing in general help organisations efficiently navigate their data and allow them to improve operations, risk management, and other business processes. Used across a spectrum of business domains, question answering helps deal with overwhelming amounts of text and powers multiple intelligent applications that modernise the way businesses handle their data.

See also

Data discovery at Uber: The continued success of Databook
How Uber's in-house platform powers discovery, exploration, and knowledge at scale.
Let's explore the Nobel Prize dataset
An overview of the official Nobel Prize Linked Data dataset with some example SPARQL queries.
Why federation is a game-changing feature of SPARQL
SPARQL federation is an incredibly useful feature for querying distributed RDF graphs.
How a custom solution helps Facebook's engineers discover the data they need
The story of Nemo, Facebook's internal data discovery engine.
Towards more linked lexicographical data: Lexemes on Wikidata
A glimpse into the meaning and other properties of words described with structured and linked data.

Made by Anton Vasetenkov.

If you want to say hi, you can reach me on LinkedIn or via email. If you really-really like my work, you can support me by buying me a coffee.