HomeProjectsBlogContact
Made with ❤️ by Anton Vasetenkov
Linked data for the enterprise: Focus on Bayer's corporate asset register
Oct 5, 2020

Linked data for the enterprise: Focus on Bayer's corporate asset register

An overview of COLID, the data asset management platform built using semantic technologies.

Organisations that know how to best leverage all their data position themselves for success in the longer term. As companies, large and small, continue to collect and generate more and more data assets, it gets harder for them to manage and organise those assets without having the modern tools and necessary infrastructure.

Even though there are a number of enterprise data management products available, truly innovative companies choose to build their own solutions that better suit their needs. The Corporate Linked Data Catalog, or COLID for short, is one such solution—an end-to-end cloud-native platform for centrally storing and managing company assets. Built by Bayer, one of the world's largest pharmaceutical companies, COLID has been used by all of Bayer's divisions to organise drug research data, prescriber and sales information, and other company assets.

The case of linked and FAIR data for the enterprise

What makes COLID stand out is the fact that it utilises semantic web technologies. Its main datastore is an RDF graph database, and it makes the data accessible using SPARQL, a query language for RDF. While graph structures are a natural way to describe the interdependencies between things such as company assets, using RDF and SPARQL enables the interoperability and reusability of data, whether it is published on the web or only shared and used internally within an organisation. Together with findability and accessibility, interoperability and reusability constitute the four FAIR properties of data that make it more valuable to an organisation. FAIRness of the data has been the key requirement for building COLID, differentiating it from most other data management solutions.

The metadata and semantic links stored in COLID's RDF graph also help categorise the assets and catalogue their business, technical, and operational characteristics. This aids in the discovery of data inside COLID's data marketplace and its reuse within the organisation.

Bottom line

Bayer's COLID is an end-to-end technological solution for centrally storing and organising company assets such as documents, datasets, ontologies, models, reference data, and so on. An enterprise-grade linked data register, it enables FAIR data within the corporate environment and can drive a wide range of complex decisions and positive outcomes.

Recognising the benefits of metadata and semantic technologies puts innovative companies like Bayer at a significant advantage. Ontology-based solutions like COLID will undoubtedly continue to play the major role in shaping the enterprise data management and governance of tomorrow.

See also
An introduction to Semantic Web Browser
Navigating the Semantic Web and retrieving the structured data about entities made easy with Semantic Web Browser.
AstraZeneca's knowledge graph: Drug discovery is a lot about connections
The biomedical knowledge graph built by AstraZeneca helps the company find new drugs and drug targets.
What does a knowledge engineer do?
An overview of knowledge engineering and the core competencies and responsibilities of a knowledge engineer.
Data discovery at Uber: The continued success of Databook
How Uber's in-house platform powers discovery, exploration, and knowledge at scale.
How a custom solution helps Facebook's engineers discover the data they need
The story of Nemo, Facebook's internal data discovery engine.
One schema, one API: Inside the world of Data Commons
Data Commons brings thousands of public datasets together into one data graph to give data analysts and researchers a jump-start on analysing open data.
A network of drugs: The New Zealand Medicines Terminology
An overview of New Zealand's drug vocabulary.
Data exploration on linked COVID-19 datasets
An overview of the available RDF datasets and discovery tools for COVID-19.
Towards more linked lexicographical data: Lexemes on Wikidata
A glimpse into the meaning and other properties of words described with structured and linked data.
Navigating unstructured data: The rise of question answering
Question answering technologies are key to efficiently dealing with overwhelming amounts of unstructured data.
Let's explore the Nobel Prize dataset
An overview of the official Nobel Prize Linked Data dataset with some example SPARQL queries.