ADVANCEMENTS IN LARGE LANGUAGE MODELS

After transformers were introduced in 2017, a whole new realm of possibilities opened for the future of large language models. It wasn’t just a dream anymore to imagine that computers and machines could comprehend language and communicate just like humans do. Thanks to transformers, incredibly powerful Al chatbots and systems were developed that can hold conversations, translate languages, and even write creative content. The first big AI system developed, following transformers, was known as BERT.

BERT (2018)

Bert (Bidirectional Encoder Representations from Transformers) was a transformer-based model released by Google in 2018, and this model had a huge impact on the field of Al following its launch. BERT is also the model that powers Google search. BERT is a large language model, which was the first of its kind and could be fine-tuned for specific tasks related to language along with pre-training. BERT has 110 million parameters. Parameters are like a memory for a language model. The more data it is trained on, the more information can be stored in its parameters. Transformers are made up of six encoders and decoders stacked upon each other where encoders are tasked with pre-training and understanding the general context of the language, and this context is then passed to the decoders, which are responsible for carrying out the specific task they are tuned for. BERT is only made up of encoders stacked upon each other, which means that it is a model that has a good understanding of the language and its context. The output segment can be customized according to the specific task we want it to perform.

BERT is trained using two different processes: Masked Language Modeling and Next Sentence Prediction.

In masked language modeling, the model is trained by feeding it data and masking certain words, and the model must guess what the words in the blanks might be. In the Next Sentence Prediction, the model is provided with two sentences, and the model has to guess if they are related or not, which allows the model to understand how the flow of language works.

GPT-2 and T5 (2019):

Like BERT, Generative Pre-transformed Transformer (GPT) is also pre-trained on a huge dataset that includes all kinds of online data which allows the GPT to have a good grasp on the context of the language and it can also be fine-tuned after they have gone through the pre-training phase. GPT is mostly trained using next-sentence prediction. GPT 2 (second generation), launched by OpenAI,  is trained on a much larger dataset than its predecessor, including 8 million web pages, and has a parameter count of 1.5 billion. Moreover, GPT 2 could also leverage “zero-shot learning,” which could work with information on which it hadn’t been explicitly trained by the use of carefully prompted inputs.

Google also launched a series of language models known as T5 (Text-To-Text Transfer Transformer) in 2019. These models are also pre-trained on massive amounts of datasets and code and then can be fine-tuned to specific tasks. T5 has lots of variants, but the largest model has a parameter count of 11 billion. T5 uses a text-to-text approach to deal with tasks.

GPT-3 and GPT-4 (2022-present):

One of the most revolutionary language models that have taken the world by storm is GPT-3 and the most recent GPT-4. We encounter these language models every day for all kinds of tasks. Whether it be content writing, seeking knowledge, trying to understand a concept, looking for help in programming, or searching for a recipe, Chat GPT can help us with all sorts of things.

GPT-3 can generate human-like text, and like its ancestors, it is also pre-trained on massive amounts of online data that total up to 45 terabytes of text data and has 175 billion parameters. 60% of the data that GPT-3 was trained on was taken from Common Crawl, which is a non-profit organization that crawls the web, gathers information, and provides information to anyone for free. GPT-3 works quite efficiently using “zero-shot learning” and it does not need to be further fine-tuned. However, Chat GPT is like a fine-tuned version of GPT-3 that is specific for conversational tasks.

GPT-4 is OpenAI’s most recent and advanced technology, with 100 trillion parameters compared to GPT-3’s 175 billion parameters. While GPT-3 only responded to text inputs, GPT-4 can also take images as inputs. It is also much more efficient and factually correct than GPT-3. GPT-3 very often provided incorrect information due to a phenomenon known as “hallucinations”. This happens because chatbots are trained on datasets taken from online resources, and there is a lot of information online that is wrong. For a chatbot, a research paper and a fictional story have the same reliability; therefore, they can often generate wrong information. GPT-4 is much more factually correct than GPT-3, which makes it more reliable. GPT-4 can also solve tasks requiring logical reasoning and critical thinking.

The journey of large language models from the rule-based models from the ’60s to the sophisticated models today is truly remarkable and is marked by a lot of great milestones. Large language models have changed how we interact with machines and how we gather information. Who knows what potential the future holds for large language models? As the research goes on, we can expect even more reliable and sophisticated models to come.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *