What is an Large Language Model (LLM)?

Large Language Models (LLMs) have become the most talked about AI algorithms in 2023, but what are they?

Large Language Models (LLMs) have seemingly become extremely popular (and well publicised) since the realise of ChatGPT 3.5 in 2022. LLMs are a type of AI (what is AI?), specifically a method of Natural Language Processing (what is NLP?).

LLMs use Machine Learning (what is ML?) to learn the best way to predict the next word (in very basic terms). They do so by learning words, the concepts behind them and the links between words, and using probabilities to determine which word is most likely to be coming next. As such, they can predict whether it is minute (as in time) or minute (as in size) being used in the sentence: "I will be one minute" (time).

Why Are They Large?

The large in LLM refers to the dataset which is used when creating (often called training) the algorithm. Unlike other models in NLP which were trained on small datasets (usually a specific set of data, such as IMDB reviews or customer chat logs), these models have enormous datasets created by scraping data off the internet.

Why Have They Suddenly Become Popular?

Data Size

As computing power, specifically the hardware needed for AI (sometimes called AI chips or AI accelerators) have advanced, LLMs can be trained using more and more data. It might be useful to explain this using the evolution of Open AI's GPT (which stands for Generative Pre-trained Transformer) algorithms:

GPT 1 - 0.12 billion parameters - February 2018
GPT-2 - 1.5 billion parameters - 14 February 2019
GPT-3 - 175 billion parameters - 11 June 2020 (3.5 in 15 March 2022)
GPT-4 - unknown (maybe greater than 1 trillion parameters) - 14 March 2023

GPT-3s required dataset is enormous - 800GB. This is significantly larger than those which can before it, and allowed the LLMs to be used in a new way (called prompting like we use when interacting with ChatGPT).

Chatbot Wrappers

Companies have focused on not only building these very impressive and powerful LLMs, they have developed Chatbots to allow the public to interact with the models. This allows many people who have previously no experience or access to AI to see how these models have advanced.

Examples of LLM-based Chatbots

OpenAI's ChatGPT (which uses the GPT3.5 and GPT4 algorithms)
Google's BARD (which uses PaLM)
Google had a model called BERT
Anthropic's Claude

Note: you may have also heard of Meta's (formerly Facebook) LLaMa and LLaMa2, these are the models themselves rather than chatbots. They have been made Open Source meaning any developer or company can take a copy and use them to create their own chatbots (or whatever they want).

Further Resources

What is ChatGPT?