The (Real) Revolution under the AI Revolution

The AI Hype: Bubble or Revolution?

We've seen a lot of money, time, and effort spent on the development of LLMs, and the consequent development of tools using those LLMs. 
And while there is a lot of potential, and some industries are better equipped to leverage it, the actual business value realization has been underwhelming. 

Some folks are calling the AI hype a bubble, while others are saying it's the next industrial revolution. 
Time will tell, but I think the truth is somewhere in the middle. 

The LLMs and various transformer models themselves are impressive in their capabilities, but are actually underwhelming when you consider the cost (in $, time, and energy) that it takes to produce those results. 

A Mix of Hype and Real Innovation

To be clear, some of this really is a hype bubble. As with any trendy technological innovation, there are many middle tier service providers whose entire business value add is connecting one vendor product to another on your behalf. And most of them will get gobbled up by the larger market players without realizing their valuations. 

But there is also something very real, and very physical happening this time. The AI revolution, as it is today, hasn't produced general artificial intelligence. But it did produce two notable things in my mind: (1) incredibly good next-token prediction (in the form of LLMs), and (2) the tooling required to train these predictor models on arbitrary datasets in both specialized and generalized domains (my favourite example is the Neural Operators used in FourCastNet). 

The Real AI Revolution: Time Compression

The real revolution is the capability to produce these models in such a short amount of time. What NVIDIA and its competitors' massively parallel and scalable computing technology is powering is a time compression of compute

What do I mean by time compression? 

It’s instructive to think of what building an LLM is like mechanically.

An LLM is a neural net trained on a very large corpus of data (that's the Large in Large Language Model). 
These neural networks are a bit like a funky Plinko™ board: 

You insert your input (e.g. an English text prompt) on one end, the input is converted to one or more tokens, and those tokens then traverse a neural net towards its other end, producing new values for the next set of hops from each node on the neural net based on their current value.

The output is also new token. To construct a full response, that output is fed back into the neural net along with the original context to choose the next token, which is concatenated to the output, and fed back into the neural net again… This process repeats until the response achieves a termination criterion (for example, length or output stability).

The way an LLM is trained (and I am grossly over simplifying it) is by feeding an input with a known desired output (target) into the neural net, measuring the difference between the target and output produced, and then changing the weights inside the neural net operators (e.g. moving the Plinko™ pins a little up, down, right, or left) until the result matches the desired output, or gets close enough.

(Training a neural network, source: 3Blue1Brown)

And then we repeat this for another pair, and update the weights, except we also have to check that we didn't ruin the previous result too much, so we might update some weights again to achieve the neural net operator weights that produce the 2 results closet (in aggregate) to the 2 desired outputs for the 2 inputs we've given the neural net.

And then we’ll do it with 3, 4, 5, and so on pairs… up to millions or billions of pairs of inputs and desired outputs.

Every time you change a weight in the neural net, it can affect all of the outputs. So the more input-output pairs we use for training, the larger the number of constraints we have to satisfy. Often as you improve the result for one input-output pair, it degrades the result of a different one. So this process is both tricky and expensive. There are several techniques for tweaking the parameters until we find a configuration that produces a better result in aggregate, then we accept that change, and repeat the process until we can no longer find a change that is better in aggregate than our current state.

As you can imagine, this tuning process is expensive, and there's no real way to short-cut the computations required to convert inputs to outputs, update the weights, and run the computations again, and again, and again…

But there is a way to do a lot of it at once: linear Programming, which uses matrix algebra to solve systems of equations together (i.e. systems of constraints), helps us tune the neural net for multiple pairs at once.

The Role of Hardware in AI

In the case of the LLM, every pair of input and desired output is a single constraint, and the variables being optimized are the weights and activation thresholds of the neurons in the neural net.

Linear Programming allows us to convert the very slow iterative process described earlier into large matrix computations tuning many pairs at once. And the thing about large matrix computations is that, thanks to NVIDIA and their competitors’ graphics and AI accelerator hardware, we have very, very good tools for doing this at a large scale.

The Future Beyond LLMs

And this is where the AI hardware comes into play. NVIDIA sells GPUs, and GPU server racks, and GPU cloud hardware, which is now being used by everyone training their own LLMs. Their competitors are doing the same.

Sure, there's a whole ecosystem of tooling and programming languages and libraries operating on top of that hardware, but at the very core, the real physical revolution is the ability to perform huge matrix computations at scales beyond the capacity of any single computer, and the mind boggling rate at which this capacity has been increasing over the last few years.

What NVIDIA is selling is time compression. You can now execute 10,000 linear compute years of training (if they were run on one single-threaded CPU) in less than a day. 

And I think this is the real revolution behind AI. The market is finally tapping into this time compression in an explosive exploration of different ways of using it. 

I don't think LLMs are going away. I think they are the first example of a generalizable application utilizing this technology we now have access to. 

But I also don't think LLMs (and even AI itself) are where the real revolution is happening. 

The revolution is happening under the hood, in the hardware capabilities, and I am excited to see what other use cases outside of LLMs and Neural Nets people come up with to leverage all of these compute millenia we have available to us. 

Previous
Previous

The Physical Properties of Data (Part 1)

Next
Next

Introducing HyperGraph, database & compute platform