A technical introduction to OpenAI's GPT-3 language model

Introduced in May 2020, Generative Pre-trained Transformer 3 (GPT-3) is OpenAI's groundbreaking third-generation predictive language model. Widely considered the world's most powerful NLP technology, it has sparked many discussions in the tech and business communities about its potential use cases and impact on existing business processes and applications.

While some aspects of the GPT-3 model described in the paper "Language Models are Few-Shot Learners" may seem too technical to non-AI researchers, it is worth zooming in on some of the key features of the model in order to better understand what it does and how it can be used in practice.

What is GPT-3?

Strictly speaking, GPT-3 is a family of autoregressive language models which include GPT-3 Small, GPT-3 Medium, GPT-3 Large, GPT-3 XL, GPT-3 2.7B, GPT-3 6.7B, GPT-3 13B, and GPT-3 175B. Introduced in the paper "Language Models are Few-Shot Learners", these models share the same transformer-based architecture similar to that of their predecessor GPT-2. All GPT-3 models are trained on a mixture of datasets consisting of the Common Crawl, WebText2, Books1 and Books2, and English-language Wikipedia datasets.

GPT-3 175B is the largest of all GPT-3 models and is commonly referred to as "the GPT-3". With 175 billion trainable parameters, it is about two orders of magnitude larger than the 1.5 billion parameter GPT-2.

According to OpenAI's paper, GPT-3 175B outperforms other large-scale models on a number of NLP tasks. Being a meta-learning model, it is capable of both recognizing and rapidly adapting to the desired task at inference time after having developed a broad set of skills and pattern recognition abilities during unsupervised pre-training.

OpenAI API, or GPT-3-as-a-Service

In June 2020, OpenAI launched the API product that can be used to access the AI models developed by the company, including those based on GPT-3. Available in a private beta, the OpenAI API is equipped with a general purpose text in–text out interface and enables users to experiment with GTP-3-based models, explore its strengths and weaknesses, and integrate it into their own products.

Bottom line

The GPT-3 autoregressive language model made its debut in May 2020 and marked an important milestone in NLP research. Trained on a large internet-based text corpus, it boasts 175 billion parameters and is two orders of magnitude larger than its predecessor GPT-2.

A number of models based on GPT-3 are available via OpenAI API, OpenAI's commercial product released in private beta in June 2020.

See also

Train sequence-to-sequence models online.
TensorFlow.js, machine learning
A demo of a face detection service.
face-api.js, machine learning, cameras
A primer on support vector machines (SVMs) and their applications.
machine learning
Building and training simple linear regression models in JavaScript using TensorFlow.js.
machine learning
Understanding what graph embeddings are and why they are important for graph analytics.
machine learning

Made by Anton Vasetenkov.

If you want to say hi, you can reach me on LinkedIn or via email. If you like my work, you can support me by buying me a coffee.