antvaset.com
/
My blog
One schema, one API: Inside the world of Data Commons

One schema, one API: Inside the world of Data Commons

Data Commons brings thousands of public datasets together into one data graph to give data analysts and researchers a jump-start on analysing open data.

Published on by Anton Vasetenkov

Data published by government agencies, intergovernmental organisations, and other authoritative sources is used by data analysts, software developers, and researchers alike on a daily basis. While each of the public datasets can be relatively easily accessed and used individually, most business and research questions can only be answered using a combination of multiple data sources after they have been "meaningfully" and "carefully" joined together.

Data Commons

A unified data model

An amalgamation of numerous public datasets, Data Commons itself is a dataset structured around a large collection of entities: places, people, organisations, and so on. All data flowing into the Data Commons graph is viewed as facts about those entities, making the resulting dataset a valuable source of "meaningfully" combined entity-oriented data.

The data vocabulary used to structure the Data Commons graph builds upon Schema.org, the most widely used vocabulary for structured data on the web, and is documented at schema.datacommons.org.

Accessing the Data Commons graph

The Data Commons knowledge graph is a source of demographics, housing, education, and other types of data which can be used to power a range of tools and visualisations. Notable users of the Data Commons data already include Google Search and the Common Knowledge Project.

As an RDF-style knowledge graph, Data Commons can be queried using SPARQL. For example, the following query returns the names of three U.S. States with their DCIDs:

SELECT ?name ?dcid
WHERE {
    ?place typeOf Place .
    ?place name ?name .
    ?place dcid ("geoId/06" "geoId/21" "geoId/24") .
    ?place dcid ?dcid
}

Query result:

namedcid
"Maryland""geoId/24"
"Kentucky""geoId/21"
"California""geoId/06"

The Data Commons Graph can also be accessed programmatically using Python helper libraries or in Google Sheets via the Data Commons add-on.

The bottom line

Data Commons compiles and joins thousands of open datasets into a single knowledge graph which provides a unified view across all data sources, making the data more accessible and readily available for research and analysis.

See also

Linked data for the enterprise: Focus on Bayer's corporate asset register
An overview of COLID, the data asset management platform built using semantic technologies.
Why federation is a game-changing feature of SPARQL
SPARQL federation is an incredibly useful feature for querying distributed RDF graphs.
What does a knowledge engineer do?
An overview of knowledge engineering and the core competencies and responsibilities of a knowledge engineer.
Let's explore the Nobel Prize dataset
An overview of the official Nobel Prize Linked Data dataset with some example SPARQL queries.
The building blocks of OWL
What makes up OWL ontologies and how do they support logical inference?

Thanks for stopping by my digital playground! If you want to say hi, you can reach out to me on LinkedIn or via email. I'm always keen to chat and connect.

If you really-really like my work, you can support me by buying me a coffee.