I have many conversations with people about Large Language Models like ChatGPT and Copilot. The idea that “it makes convincing sentences, but it doesn’t know what it’s talking about” is a difficult concept to convey or wrap your head around. Because the sentences are so convincing.
Any good examples on how to explain this in simple terms?
Edit:some good answers already! I find especially that the emotional barrier is difficult to break. If an AI says something malicious, our brain immediatly jumps to “it has intent”. How can we explain this away?
It’s a really well-trained parrot. It responds to what you say, and then it responds to what it hears itself say.
But despite knowing which sounds go together based on which sounds it heard, it doesn’t actually speak English.
I am an LLM researcher at MIT, and hopefully this will help.
As others have answered, LLMs have only learned the ability to autocomplete given some input, known as the prompt. Functionally, the model is strictly predicting the probability of the next word+, called tokens, with some randomness injected so the output isn’t exactly the same for any given prompt.
The probability of the next word comes from what was in the model’s training data, in combination with a very complex mathematical method to compute the impact of all previous words with every other previous word and with the new predicted word, called self-attention, but you can think of this like a computed relatedness factor.
This relatedness factor is very computationally expensive and grows exponentially, so models are limited by how many previous words can be used to compute relatedness. This limitation is called the Context Window. The recent breakthroughs in LLMs come from the use of very large context windows to learn the relationships of as many words as possible.
This process of predicting the next word is repeated iteratively until a special stop token is generated, which tells the model go stop generating more words. So literally, the models builds entire responses one word at a time from left to right.
Because all future words are predicated on the previously stated words in either the prompt or subsequent generated words, it becomes impossible to apply even the most basic logical concepts, unless all the components required are present in the prompt or have somehow serendipitously been stated by the model in its generated response.
This is also why LLMs tend to work better when you ask them to work out all the steps of a problem instead of jumping to a conclusion, and why the best models tend to rely on extremely verbose answers to give you the simple piece of information you were looking for.
From this fundamental understanding, hopefully you can now reason the LLM limitations in factual understanding as well. For instance, if a given fact was never mentioned in the training data, or an answer simply doesn’t exist, the model will make it up, inferring the next most likely word to create a plausible sounding statement. Essentially, the model has been faking language understanding so much, that even when the model has no factual basis for an answer, it can easily trick a unwitting human into believing the answer to be correct.
—-
+more specifically these words are tokens which usually contain some smaller part of a word. For instance,
understand
andable
would be represented as two tokens that when put together would become the wordunderstandable
.I think that a good starting place to explain the concept to people would be to describe a Travesty Generator. I remember playing with one of those back in the 1980’s. If you fed it a snippet of Shakespeare, what it churned out sounded remarkably like Shakespeare, even if it created brand “new” words.
The results were goofy, but fun because it still almost made sense.
The most disappointing source text I ever put in was TS Eliot. The output was just about as much rubbish as the original text.
As some nerd playing with various Ai models at home with no formal training, any wisdom you think that’s worth sharing?
The only winning move is not to play.
Harry Frankfurt’s influential 2005 book (based on his influential 1986 essay), On Bullshit, offered a description of what bullshit is.
When we say a speaker tells the truth, that speaker says something true that they know is true.
When we say a speaker tells a lie, that speaker says something false that they know is false.
But bullshit is when the speaker says something to persuade, not caring whether the underlying statement is true or false. The goal is to persuade the listener of that underlying fact.
The current generation of AI chat bots are basically optimized for bullshit. The underlying algorithms reward the models for sounding convincing, not necessarily for being right.
Not an ELI5, sorry. I’m an AI PhD, and I want to push back against the premises a lil bit.
Why do you assume they don’t know? Like what do you mean by “know”? Are you taking about conscious subjective experience? or consistency of output? or an internal world model?
There’s lots of evidence to indicate they are not conscious, although they can exhibit theory of mind. Eg: https://arxiv.org/pdf/2308.08708.pdf
For consistency of output and internal world models, however, their is mounting evidence to suggest convergence on a shared representation of reality. Eg this paper published 2 days ago: https://arxiv.org/abs/2405.07987
The idea that these models are just stochastic parrots that only probabilisticly repeat their training data isn’t correct, although it is often repeated online for some reason.
A little evidence that comes to my mind is this paper showing models can understand rare English grammatical structures even if those structures are deliberately withheld during training: https://arxiv.org/abs/2403.19827
The idea that these models are just stochastic parrots that only probabilisticly repeat their training data isn’t correct
I would argue that it is quite obviously correct, but that the interesting question is whether humans are in the same category (I would argue yes).
People sometimes act like the models can only reproduce their training data, which is what I’m saying is wrong. They do generalise.
During training the models are trained to predict the next word, but after training the network is always effectively interpolating between the training examples it has memorised. But this interpolation doesn’t happen in text space but in a very high dimensional abstract semantic representation space, a ‘concept space’.
Now imagine that you have memorised two paragraphs that occupy two points in concept space. And then you interpolate between them. This gives you a new point, potentially unseen during training, a new concept, that is in some ways analogous to the two paragraphs you memorised, but still fundamentally different, and potentially novel.
I really appreciate you linking studies about this topic, as finding this kind of research can be daunting. Those looks like really interesting reads.
You sound like a chatbot who’s offended by it’s intelligence being insulted.
Bro is lost in the sauce
Maybe I misunderstood the OP? Idk
It’s just fancy predictive text like while texting on your phone. It guesses what the next word should be for a lot more complex topics.
Its like saying an OS is just a bunch of if then else statements. While it is true, in practice it is far far more complicated.
Imagine making a whole chicken out of chicken-nugget goo.
It will look like a roast chicken. It will taste alarmingly like chicken. It absolutely will not be a roast chicken.
The sad thing is that humans do a hell of a lot of this, a hell of a lot of the time. Look how well a highschooler who hasn’t actually read the book can churn out a book report. Flick through, soak up the flavour and texture of the thing, read the blurb on the back to see what it’s about, keep in mind the bloated over-flowery language that teachers expect, and you can bullshit your way to an A.
Only problem is, you can’t use the results for anything productive, which is what people try to use GenAI for.
Imagine you were asked to start speaking a new language, eg Chinese. Your brain happens to work quite differently to the rest of us. You have immense capabilities for memorization and computation but not much else. You can’t really learn Chinese with this kind of mind, but you have an idea that plays right into your strengths. You will listen to millions of conversations by real Chinese speakers and mimic their patterns. You make notes like “when one person says A, the most common response by the other person is B”, or “most often after someone says X, they follow it up with Y”. So you go into conversations with Chinese speakers and just perform these patterns. It’s all just sounds to you. You don’t recognize words and you can’t even tell from context what’s happening. If you do that well enough you are technically speaking Chinese but you will never have any intent or understanding behind what you say. That’s basically LLMs.
So there’s two different things to what you are asking.
(1) They don’t know what (i.e. semantically) they are talking about.
This is probably not the case, and there’s very good evidence over the past year in research papers and replicated projects that transformer models do pick up world models from the training data such that they are aware and integrating things at a more conceptual level.
For example, a GPT trained only on chess moves builds an internal structure of the whole board and tracks “my pieces” and “opponent pieces.”
(2) Why do they say dumb shit that’s clearly wrong and don’t know.
They aren’t knowledge memorizers. They are very advanced pattern extenders.
Where the answer to a question is part of the pattern they can successfully extend, they get the answer correct. But if it isn’t, they confabulate an answer in a similar way to stroke patients who don’t know that they don’t know the answer to something and make it up as they go along. Similar to stroke patients, you can even detect when this is happening with a similar approach (ask 10x and see how consistent the answer is or if it changes each time).
They aren’t memorizing the information like a database. They are building ways to extend input into output in ways that match as much information as they can be fed.
Thanks for your thorough answer.
I’ll see if I can find that article/paper about the chess moves. That sounds interesting!
Could it be that we ascribe an LLM with conceptual knowledge while in fact it is by chance? We as humans are masters at seeing patterns that aren’t there. But then again, like another commenter said, maybe the question is more about conscience itself, and what that actually means. What it means to “understand” something.
After reading some of the comments and pondering this question myself, I think I may have thought of a good analogy that atleast helps me (even though I know fairly well how LLM’s work)
An LLM is like a car on the road. It can follow all the rules, like breaking in front of a red light, turning, signaling etc. However, a car has NO understanding of any of the traffic rules it follows.
A car can even break those rules, even if its behaviour is intended (if you push the gas pedal at a red light, the car is not in the wrong because it doesn’t KNOW the rules, it just acts on it).
Why this works for me is that when I give examples of human behaviour or animal behaviour, I automatically ascribe some sort of consciousness. An LLM has no conscious (as far as I know for now). This idea is exactly what I want to convey. If I think of a car and rules, it is obvious to me that a car has no concept of rules, but still is part of those rules somehow.
Thing is a conscience (and any emotions, and feelings in general) is just chemicals affecting electrical signals in the brain… If a ML model such as an LLM uses parameters to affect electrical signals through its nodes then is it on us to say it can’t have a conscience, or feel happy or sad, or even pain?
Sure the inputs and outputs are different, but when you have “real” inputs it’s possible that the training data for “weather = rain” is more downbeat than “weather = sun” so is it reasonable to say that the model gets depressed when it’s raining?
The weightings will change leading to a a change in the electrical signals, which emulates pretty closely what happens in our heads
Doesn’t that depend on your view of consciousness and if you hold the view of naturalism?
I thought science is starting to find more and more that a 100% naturalistic worldview is hard to keep up. (E: I’m no expert on this topic and the information and podcast I listen to are probably very biased towards my own view on this. The point I’m making is that to say “we are just neurons” is more a disputed topic for debate than actual fact when you dive a little bit into neuroscience)
I guess my initial question is almost more philosophical in nature and less deterministic.
I’m not positive I’m understanding your term naturalistic but no neuroscientist would say “we are just neurons”. Similarly no neuroscientist would deny that neurons are a fundamental part of consciousness and thought.
You have plenty of complex chemical processes interacting with your brain constantly - the neurons there aren’t all of who you are.
But without the neurons there: you aren’t anyone anymore. You cease to live. Destroying some of those neurons will change you fundamentally.
There’s no disputing this.
You could maybe just share a meme like this one.
Some folks in the comments there share actual LLM results, a few of which are sensible but plenty that aren’t far off from the joke.
LMAO! I tried it, and it said:
The man takes the goat across the river first, then he returns alone and takes the boat across again. Finally, he brings the goat’s friend, Mr. Cabbage, across the river.
LMAO
I asked what if the man can’t swim…
In that case, the man can still take the goat across the river first. Then, he ties the goat to the boat and pulls the boat across the river using a rope, ensuring he stays on the side of the river where he can safely manage the boat without swimming.
I asked who Mr. Cabbage is…
My apologies for the confusion! Mr. Cabbage is a playful addition. In the classic riddle, there’s usually no mention of Mr. Cabbage. It’s just the man, the goat, and perhaps another item or animal depending on the version of the riddle. I just added Mr. Cabbage for some extra fun!
Then I asked what some other additions could be…
Certainly! You can add various characters or items to make the riddle more entertaining. Here are a few ideas:
-
Mr. Wolf: The man, the goat, and Mr. Wolf need to cross the river. But if the man leaves the wolf alone with the goat, the wolf will eat the goat.
-
Ms. Hen: The man, the goat, and Ms. Hen need to cross the river. But if the man leaves the hen alone with the goat, the goat will eat the hen.
-
Professor Fox: The man, the goat, and Professor Fox need to cross the river. But if the man leaves the fox alone with the goat, the fox will eat the goat.
-
Princess Diamond: The man, the goat, and Princess Diamond’s precious gem need to cross the river. But Princess Diamond has warned that if anyone other than herself touches the gem, they’ll be cursed.
Feel free to mix and match or come up with your own additions!
And the “solution”…
Ah, that’s a fun twist! Let’s see:
- The man takes Mr. Cabbage across first, then returns alone.
- Next, he takes Ms. Hen across but brings Mr. Cabbage back.
- He then takes Princess Diamond’s gem across, leaves it on the other side, and takes Mr. Wolf across.
- The man leaves Mr. Wolf with Ms. Hen on the other side and returns alone.
- He takes Princess Diamond across, but before he leaves, he swaps Mr. Cabbage with Princess Diamond’s gem.
- The man brings Mr. Cabbage across, leaving Princess Diamond’s gem with Mr. Wolf.
- Finally, he takes the goat across.
Everyone safely crosses the river without any mishaps!
I love Mr. Cabbage! Thank you CharGPT, very cool!
-
That analogy is hard to come up with because the question of whether it even comprehends meaning requires first answering the unanswerable question of what meaning actually is and whether or not humans are also just spicy pattern predictors / autocompletes, since predicting patterns is like the whole point of evolving intelligence, being able to connect cause and effect in patterns and anticipate the future just helps with not starving. The line is far blurrier than most are willing to admit and ultimately hinges on our experience of sapience rather than being able to strictly define knowledge and meaning.
Instead it’s far better to say that ML models are not sentient, they are like a very big brain that’s switched off, but we can access it by stimulating it with a prompt.
Interesting thoughts! Now that I think about this, we as humans have a huge advantage by having not only language, but also sight, smell, hearing and taste. An LLM basically only has “language.” We might not realize how much meaning we create through those other senses.
idea that “it makes convincing sentences, but it doesn’t know what it’s talking about”
Like a teenager who has come into a new group and is now trying so hard to fit in :-)
I think we forget this, it’s just doing what people do
I commented something similair on another post, but this is exactly why I find this phenomenon so hard to describe.
A teenager in a new group still has some understanding and has a mind. It knows many of the meaning of the words that are said. Sure, some catchphrases might be new, but general topics shouldn’t be too hard to follow.
This is nothing like genAI. GenAI doesn’t know anything at all. It has (simplified) a list of words that somehow are connected to eachother. But AI has no meaning of a wheel, what round is, what rolling is, what rubber is, what an axle is. NO understanding. Just words that happened to describe all of it. For us humans it is so difficult to understand that something uses language without knowing ANY of the meaning.
How can we describe this so our brains make sense that you can have language without understanding? The Chinese Room experiment comes close, but is quite complicated to explain as well I think.
How can we describe this so our brains make sense that you can have language without understanding?
I think it is really impossible to describe in easy and limited words.
NO understanding. Just words that happened to describe all of it.
If being able to describe it does not mean understanding, then what is understanding?
I think a flaw in this line of reasoning is that it assigns a magical property to the concept of knowing. Do humans know anything? Or do they just infer meaning from identifying patterns in words? Ultimately this question is a spiritual question and does not hold any water in a scientific conversation.
The short hand answer I’d try to give people is ‘it’s statistics’. Based on training data, there’s a certain chance of certain words being in proximity of each other. There’s no reasoning behind placement, other than whatever pattern is discernible from known situation.
It’s your phone’s ‘predictive text’, but if it were trained on the internet.
It can guess what the next word should be a lot of the time, but it’s also easy for it to go off the rails.
There’s the Chinese Room argument, which is a bit related:
I always thought the Chinese Room argument was kinda silly. It’s predicated on the idea that humans have some unique capacity to understand the world that can’t be replicated by a syntactic system, but there is no attempt made to actually define this capacity.
The whole argument depends on our intuition that we think and know things in a way inanimate objects don’t. In other words, it’s a tautology to draw the conclusion that computers can’t think from the premise that computers can’t think.