What is GPT-4? How much power does it have?

OpenAI was born to tackle the challenge of achieving artificial general intelligence (AGI) — an AI capable of doing anything a human can do.

Such a technology would change the world as we know it. It could benefit us all if used adequately but could become the most devastating weapon in the wrong hands. That’s why OpenAI took over this quest. To ensure it’d benefit everyone evenly: “Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole.”

However, the magnitude of this problem makes it arguably the single biggest scientific enterprise humanity has put its hands upon. Despite all the advances in computer science and artificial intelligence, no one knows how to solve it or when it’ll happen.

Some argue deep learning isn’t enough to achieve AGI. Stuart Russell, a computer science professor at Berkeley and AI pioneer, argues that “focusing on raw computing power misses the point entirely […] We don’t know how to make a machine really intelligent — even if it were the size of the universe.”

OpenAI, in contrast, is confident that large neural networks fed on large datasets and trained on huge computers are the best way towards AGI. Greg Brockman, OpenAI’s CTO, said in an interview for the Financial Times: “We think the most benefits will go to whoever has the biggest computer.”

And that’s what they did. They started training larger and larger models to awaken the hidden power within deep learning. The first non-subtle steps in this direction were the release of GPT and GPT-2. These large language models would set the groundwork for the star of the show: GPT-3. A language model 100 times larger than GPT-2, at 175 billion parameters.

GPT-3 was the largest neural network ever created at the time — and remains the largest dense neural net. Its language expertise and its innumerable capabilities were a surprise for most. And although some experts remained skeptical, large language models already felt strangely human. It was a huge leap forward for OpenAI researchers to reinforce their beliefs and convince us that AGI is a problem for deep learning.

The holy trinity — Algorithms, data, and computers

OpenAI believes in the scaling hypothesis. Given a scalable algorithm, the transformer in this case — the basic architecture behind the GPT family —, there could be a straightforward path to AGI that consists of training increasingly larger models based on this algorithm.

But large models are just one piece of the AGI puzzle. Training them requires large datasets and large amounts of computing power.

Data stopped being a bottleneck when the machine learning community started to unveil the potential of unsupervised learning. That, together with generative language models, and few-shot task transfer, solved the “large datasets” problem for OpenAI.

They only needed huge computational resources to train and deploy their models and they’d be good to go. That’s why they partnered with Microsoft in 2019. They licensed the big tech company so they could use some of OpenAI’s models commercially in exchange for access to its cloud computing infrastructure and the powerful GPUs they needed.

But GPUs aren’t built specifically to train neural nets. The gaming industry developed these chips for graphic processing and the AI industry simply took advantage of its suitability for parallel computation. OpenAI wanted the best models and the best datasets, and they also wanted the best computers chips. GPUs weren’t enough.

Many companies realized it too and started to build in-house specialized chips designed to train neural nets, without losing efficiency or capacity. However, a pure software company like OpenAI can hardly integrate hardware design and fabrication. That’s why they took another route: Using third-party AI-specific chips.

Here’s where Cerebras Systems enter the scene. This chip company already built the largest chip ever to train large neural networks in 2019. Now they’ve done it again, and OpenAI will make good use of this amazing piece of engineering.

A chip and a model — WSE-2 & GPT-4

Two weeks ago, Wired published an article revealing two important news.

First, Cerebras has built again the largest chip in the market, the Wafer Scale Engine Two (WSE-2). It is ~22 cm on each side and has 2.6 trillion transistors. In comparison, Tesla’s brand new training tiles have 1.25 trillion transistors.

Cerebras found a way to condense computational power efficiently and so WSE-2 has 850,000 cores — the computational unit — whereas typical GPUs have a few hundred. They also solved the heating problem with a novel cooling system and also managed to create an efficient I/O flow of data.

There aren’t many uses for ultra-specialized super-expensive mega-powerful chips like WSE-2. Training large neural networks is one of them. And so Cerebras talked to OpenAI.

Here’s the second news. Andrew Feldman, Cerebras’ CEO said to Wired: “From talking to OpenAI, GPT-4 will be about 100 trillion parameters. […] That won’t be ready for several years.”

Since GPT-3, there’s been a lot of expectation around OpenAI and its next release. Now we know it’ll come out in a few years and it’ll be extremely big. It’ll be more than x500 the size of GPT-3. You read that right: x500.

GPT-4 will be five hundred times larger than the language model that shocked the world last year.

What can we expect from GPT-4?

100 trillion parameters is a lot. To understand just how big that number is, let’s compare it with our brain. The brain has around 80–100 billion neurons (GPT-3’s order of magnitude) and around 100 trillion synapses.

GPT-4 will have as many parameters as the brain has synapses.

The sheer size of such a neural network could entail qualitative leaps from GPT-3 we can only imagine. We may not be able to even test the full potential of the system with current prompting methods.

However, comparing an artificial neural network with the brain is a tricky business. The comparison seems fair but that’s only because we assume artificial neurons are at least loosely based on biological neurons. A recent study published in Neuron suggests otherwise. They found that at least a 5-layer neural network is needed to simulate the behavior of a single biological neuron. That’s around 1000 artificial neurons for each biological neuron.

But even if GPT-4 isn’t as powerful as our brain, it sure will leave a few surprises. Unlike GPT-3, it probably won’t be just a language model. Ilya Sutskever, the Chief Scientist at OpenAI, hinted about this when he wrote about multimodality in December 2020:

“In 2021, language models will start to become aware of the visual world. Text alone can express a great deal of information about the world, but it is incomplete, because we live in a visual world as well.”

We already saw some of this with DALL·E, a smaller version of GPT-3 (12 billion parameters), trained specifically on text-image pairs. OpenAI said then that “manipulating visual concepts through language is now within reach.”

OpenAI has been working nonstop in exploiting GPT-3’s hidden abilities. DALL·E was a special case of GPT-3, very much like Codex. But they aren’t absolute improvements, more like particular cases. GPT-4 promises more. It promises the depth of specialist systems like DALL·E (text-images) and Codex (coding) combined with the width of generalist systems like GPT-3 (general language).

And what about other human-like features, like reasoning or common sense? In that regard, Sam Altman says they’re not sure but he remains “optimistic.”

There are many questions and very few answers. No one knows if AGI is possible. No one knows how to build it. but no one knows if larger neural networks will get increasingly closer to it. But something is undeniable: GPT-4 will be something to keep an eye out for.