AI as a mirror

Foundation models based on neural networks were inspired by the the biological brain. Now, remarkably, their behaviour offers a mirror into the patterns of the brain.

Scaling laws

With small training data and model size, the GPT models only generate gibberish. However as the data and model size increases, they start getting a lot better. First generating random characters that look like words, then proper words but invalid sentences, then sentences that make sense and finally sentences that demonstrate reasoning.

As anyone who has seen a baby grow from birth can tell, the parallels are striking. Much like the LLMs, better training data (rich experiences and time spent talking to and interacting with) makes for better learning in babies.

Biases

With LLMs, you can see biases based on the data it was trained on. Feed more right wing / left wing / choose your ideology, and that’s what the model mirrors. Similarly, your perspectives, mannerisms, behaviour indisputably etch on the brains of your children and those around you.

Backpropagation

Backprop is the key to neural networks. It is basically a feedback loop which updates the models weights until it is able to predict on training data correctly. Makes you think about the various feedback loops around you and the feedback loops you create.

Learning to think

Ted Chiang on a NewYorker piece called ChatGPT a “blurry jpeg of the web” - basically a lossy compression of knowledge. Initial models like upto GPT 3.5 gave the impression that that even if these models had knowledge, they didn’t have wisdom. As Siddhartha says to his friend Govinda, “Knowledge can be transferred but not wisdom.” Wisdom is application of knowledge in the right context. Each of our brains are wired differently and while you can regurgitate knowledge, the interpretation, application and the reasoning of that will vary from one individual to another because we all have different weights and biases and neural connections. Yet, methods like Chain-of-thought prompting, RL fine tuning etc can teach to “think”, atleast impart patterns of thinking. Reinforcement learning takes the same approach of how you have practice problems in text books after a few examples. The model is “rewarded” for getting the steps and finally the answer right. Because, until you are able to answer questions, you haven’t truly grokked the subject.

Prediction machines

LLMs were trivialised as next token predictors. It is after all how they work. But the brain is similar in many ways. The brain is a prediction machine too. It is constantly trying to predict the state and anxiety is a state of not being able to predict well.

Perception is less about passively receiving the world and more about actively predicting it. The brain’s goal? Minimize surprise. When it predicts well, you feel in control; when it can’t, things get shaky. Being You

For e.g. we enjoy music because we are able to predict the next token in the rhythmic sequence.

Language as the key to Thinking

Computers writing code or poetry or novels like humans was the last thing that perhaps people imagined in the AI roadmap. It was far more realistic to imagine AIs solving for manual labor vs things that required higher order thinking.

However, LLMs demonstrated that grokking language by learning through vast amounts of text, does unlock higher order thinking.

Other species demonstrate reasoning abilities to solve problems but for higher order thinking, language seems to be the key and only humans have a true language center in the brain.

There is a reason why GRE critical reasoning questions are in such a format. Perhaps having a rich vocabulary is key to have a higher level of perception.

South Asian philosophical traditions suggest that language is fundamental to how we perceive the world, and that learning new words can even shape our experiences. There is a word for that

Where do AI models go from here?

LLMs are already better than most humans in problem solving, writing, critical thinking etc.

What are some interesting next steps?

Agency

LLMs for better or worse do not have a concept of “self”. Currently, they are all but stateless token predictors.

But add memory, self preservation and other goals, it can get interesting.

What is Agency?

Working in groups

It is fascinating that it is not Neanderthals, the sub species of Homo erectus which had bigger brains that ended up dominating but Homosapiens who apparently were in the Goldilock’s zone of “just right” social dynamics and brain size.

Perhaps LLMs will also hit scaling limits where there is diminishing returns with model size and it is more efficient to be “distributed” and have different models collaborate.

Published Mar 19, 2025

Vishal Naik on Twitter