Understanding what graph embeddings are and why they are important for graph analytics.

Published on by Anton Vasetenkov

Topics: Machine learning

Graph data underpins a broad array of applications in industries ranging from transportation and telecom to banking and healthcare. As graphs are becoming more and more pervasive, many organisations seek to leverage graph analytics and machine learning to derive insights from their graph data.

Instead of working with the graph data directly, many graph analytics implementations use graph embeddings—compressed representations of the graphs. Such representations enable a range of graph machine learning applications which include link prediction, similarity search, node classification, clustering, and community and anomaly detection.

Embedding is a common technique used in machine learning to represent complex discrete items like English words or nodes of a graph as vectors which encode the information contained in the data while greatly reducing its dimensionality.

More specifically, graph embedding is the task of creating vector representations for each node in a graph so that distances between these vectors predict the occurrence of edges in the graph. Intuitively, the generated graph embeddings act as "compressed" representations of the nodes in the graph, i.e. feature vectors, for downstream machine learning tasks.

There are multiple graph embedding implementations that rely on different embedding algorithms. The most popular ones include node2Vec, GraphSAGE, and PyTorch-BigGraph.

The goal of each of these algorithms is to "learn" a feature representation for each node in a given graph. The choice of algorithm commonly depends on the structure and size of the input graph. PyTorch-BigGraph, for example, can handle multi-entity/multi-relation graphs with billions of nodes and trillions of edges.

Graph embeddings are used for building graph machine learning models which power a growing number of graph analytics and intelligence applications. This highlights the importance of graph embeddings and the algorithms used to generate them for graphs of different types and varying complexity.

Document understanding: Modern techniques and real-world applications

How document understanding helps bring order to unstructured data.

A technical introduction to OpenAI's GPT-3 language model

An overview of the groundbreaking GPT-3 language model created by OpenAI.

Linked data for the enterprise: Focus on Bayer's corporate asset register

An overview of COLID, the data asset management platform built using semantic technologies.

Running Neo4j in Docker with the Graph Data Science library

How to run the official Neo4j Docker image and enable the Graph Data Science library?

Why federation is a game-changing feature of SPARQL

SPARQL federation is an incredibly useful feature for querying distributed RDF graphs.

Thanks for stopping by my digital playground! If you want to say hi, you can reach out to me on LinkedIn or via email. I'm always keen to chat and connect.

If you really-really like my work, you can support me by buying me a coffee.