Somewhat Contra Marcus On AI Scaling
...
I.
Now it is true that GPT-3 is genuinely better than GPT-2, and maybe (but maybe not, see footnote 1) true that InstructGPT is genuinely better than GPT-3. I do think that for any given example, the probability of a correct answer has gone up. [Scott] is quite right about that, at least for GPT-2 to GPT-3.
ButI see no reason whatsoever to think that the underlying problem — a lack of cognitive models of the world —have been remedied. The improvements, such as they are, come, primarily because the newer models have larger and larger sets of dataabout how human beings use word sequences , and bigger word sequences are certainly helpful for pattern matching machines. But they still don’t convey genuine comprehension, and so they are still very easy for Ernie and me (or anyone else who cares to try) to break.
And including a proposed bet!
I am willing to bet [Scott] now (terms to be negotiated) that if OpenAI gives us unrestricted access to GPT-4, whenever that is released, and assuming that is basically the same architecture but with more data, that within a day of playing around with it, Ernie and I will still be able find lots of examples of failures in physical reasoning, temporal reasoning, causal reasoning, and so forth.
I of course will not take that bet, since I agree that they will be able to find many problems.
To repeat, the main point I was making in my last post was that we should mostly expect certain particular minor problems with DALL-E to get fixed by the next update. I don’t think Marcus particularly wants to argue against that.
But I do think we have a more substantive disagreement that it’s worth fleshing out.
II.
Let’s start with a softball: I think, regardless of whether or not Marcus is right, he’s failed to provide evidence for his position.
Literally billions of dollars have been invested in building systems like GPT-2, and megawatts of energy (perhaps more) have gone into testing them; few systems if any have ever been trained on bigger data sets. Many of the brightest minds have been working on blank-slate-ish sentence prediction systems for decades.
In essence, GPT-2 has been a monumental experiment in Locke's hypothesis, and so far it has failed. Empiricism has been given every advantage in the world; thus far it hasn't worked. Even with massive data sets and enormous compute, the knowledge that it acquires has been superficial and unreliable.
Rather than supporting the Lockean, blank-slate view, GPT-2 appears to be an accidental counter-evidence to that view […]
GPT-2 is both a triumph for empiricism, and, in light of the massive resources of data and computation that have been poured into them, a clear sign that it is time to consider investing in different approaches.
GPT certainly hasn’t yet proven that statistical AI can do everything the brain does. But it hasn’t proven the opposite, either.
GPT-2 has ~1 billion parameters (a measure of neural network size). It failed on a lot of questions, as Marcus demonstrated.
GPT-3 has ~100 billion parameters. It did significantly better than GPT-2, but still failed on some different questions Marcus was able to find.
So: a thing designed to resemble the brain, but 100,000x smaller, is sort of kind of able to reason, but not very well.
A similar thing designed to resemble the brain, but now only 1,000x smaller, does a noticeably better job at reasoning, although still not brain-level.
I don’t want to definitively assert that a brain-sized GPT will definitely be just as good at reasoning as the brain. But I hardly think GPT’s performance provides strong evidence to the contrary.
Marcus is admitting this: each GPT has been better than the one before. He even seems to predict this will continue a bit into the future - he expects OpenAI to release a GPT-4, and surely they wouldn’t release a new product if it wasn’t an improvement on the old. He just seems convinced that the improvements will stop sometime before human level. Why?
III.
My answer is: I think humans only have world-models the same way we have utility functions. That is, we have complicated messy thought patterns which, when they perform well, approximate the beautiful mathematical formalism.
I did IQ research as a grad student, and it involved a lot of this stuff. Did you know that most people (95% with less than 90 IQ) can't understand conditional hypotheticals? For example, "How would you have felt yesterday evening if you hadn't eaten breakfast or lunch?" "What do you mean? I did eat breakfast and lunch." "Yes, but if you had not, how would you have felt?" "Why are you saying that I didn't eat breakfast? I just told you that did." "Imagine that you hadn't eaten it, though. How would you have felt?" "I don't understand the question." It's really fascinating [...]
Other interesting phenomenon around IQ involves recursion. For example: "Write a story with two named characters, each of whom have at least one line of dialogue." Most literate people can manage this, especially once you give them an example. "Write a story with two named characters, each of whom have at least one line of dialogue. In this story, one of the characters must be describing a story with at least two named characters, each of whom have at least one line of dialogue." If you have less than 90 IQ, this second exercise is basically completely impossible. Add a third level ('frame') to the story, and even IQ 100's start to get mixed up with the names and who's talking. Turns out Scheherazade was an IQ test!
Time is practically impossible to understand for sub 80s. They exist only in the present, can barely reflect on the past and can't plan for the future at all. Sub 90s struggle with anachronism too. For example, I remember the 80-85s stumbling on logic problems that involved common sense anachronism stuff. For instance: "Why do you think that military strategists in WWII didn't use laptop computers to help develop their strategies?" "I guess they didn't want to get hacked by Nazis". Admittedly you could argue that this is a history knowledge question, not quite a logic sequencing question, but you get the idea. Sequencing is super hard for them to track, but most 100+ have no problem with it, although I imagine that a movie like Memento strains them a little. Recursion was definitely the killer though. Recursive thinking and recursive knowledge seems genuinely hard for people of even average intelligence.
Luria:
All bears are white where there is always snow. In Novaya Zemlya there is always snow. What color are the bears there?
Peasant:
I have seen only black bears and I do not talk of what I have not seen.
Luria:
What what do my words imply?
Peasant:
If a person has not been there he can not say anything on the basis of words. If a man was 60 or 80 and had seen a white bear there and told me about it, he could be believed.
And:
Luria:
There are no camels in Germany; the city of B is in Germany; are there camels there or not?
Peasant:
I don't know, I have never seen German villages. If is a large city, there should be camels there.
Luria:
But what if there aren't any in all of Germany?
Peasant:
If B is a village, there is probably no room for camels.
And:
Luria:
What do a chicken and a dog have in common?
Peasant:
They are not alike. A chicken has two legs, a dog has four. A chicken has wings but a dog doesn't. A dog has big ears and a chicken's are small.
Luria:
Is there one word you could use for them both?
Peasant:
No, of course not.
Luria:
Would the word "animal" fit?
Peasant:
Yes.
And:
Luria:
What do a fish and a crow have in common?
Peasant:
A fish — it lives in water. A crow flies. If the fish just lies on top of the water, the crow could peck at it. A crow can eat a fish but a fish can't eat a crow.
Luria:
Could you use one word for them both?
Peasant:
If you call them "animals", that wouldn't be right. A fish isn't an animal and a crow isn't either. A crow can eat a fish but a fish can't eat a bird. A person can eat fish but not a crow.
IV.
The human brain is pretty plastic. Usually if one part of it dies, another part can take over. This makes me think that the brain area : function correspondence isn’t entirely a function of different structures in different regions (though some of it might be this), but downstream of an originally poorly-differentiated blob of neurons that get trained by the overall predictive structure based on their proximity to various input ports (eg sensory nerves) output ports (eg motor nerves), and other brain areas.
(this would also explain why the brain has a pretty consistent area dedicated to reading/writing, even though we haven’t been literate long enough to evolve new literacy-related structures)
Deep learning agents are also a poorly-differentiated mass of neurons. As they get inputs and outputs (ie training data) they slowly “evolve”/develop the ability to “recognize” patterns. We don’t know how they do this or what recognition-abilities they’re evolving, except by speculating (the way Marcus and I are doing) based on what kinds of problems they can and can’t solve.
It would make sense to me if poorly-differentiated blobs of neurons, when having lots of problems thrown at them, gradually move from developing simpler pattern-recognition programs (eg edge detectors), to more complicated pattern-recognition programs, all the way up to world-modeling, without any of these being hard-coded into the territory.
(the brain does have a lot of things hard-coded - ie we’re not blank slates - but its plasticity suggests that the forms of hard-coding we’re talking about here are helpful but not completely necessary for cognition)
V.
…is one possible argument.
Suppose that GPT-X took over the world and killed all humans. Millennia later, some alien archaeologists come and investigate. They conclude that since its training data included Alexander the Great and Caesar, it was just pattern-matching to the kind of things they did (multiplied by a vector representing the difference between ancient and modern times), and GPT-X never demonstrated any true intelligence. So . . . what?
The history of the past few decades has been people getting surprised, again and again, at how much AIs can do without being “generally intelligent”. Douglas Hofstadter predicted in 1979 that any AI that could beat a grandmaster at chess would also be able to decide chess was boring and it preferred writing poetry. Instead, we got Deep Blue, so domain-specific it can’t even do so much as play checkers.
So even if GPTs aren’t a step on the path towards some sort of human-like AGI thing, I have no idea where they’ll end up. Replacing humans at all jobs? Writing novels? Taking over the world? If this seems crazy to you, “solve protein folding” sounded crazy ten years ago, and they already did that! At this point I will basically believe anything.
VI.
So I’m not going to take Marcus’ bet that GPT-4 will be perfect (as if anything ever is!). But here are some things I do believe, with confidence levels:
At some point before 2030, someone will come out with a deep-learning-based language model which is significantly better than the current state of the art, by Gary Marcus’ admission(97%) At some point before 2030, someone will come out with a DLBLM which makes few or no embarrassing errors on practical reasoning problems - for example, maybe it can beat a 10 year old child on this genre of question. (66%) When we finally get something that most people agree is AGI, whether byMarcus’ definition here or just by common sense, it will be a descendant in some important way of the kind of deep learning that produced GPT-3. (90% ) …as above, and also it won’t incorporate a further paradigm shift centering around deliberate human addition of the kind of neurosymbolic systems Marcus talks about (it can still include other paradigm shifts) (66%) …in fact, it won’t incorporate any further paradigm shifts at all, beyond the bare minimum required to let it do things other than handle short text strings (eg actuators, sensors, larger attention windows, etc) - its brain won’t be that much more different from current AIs than current AIs are from 2015 AIs. (40% )
Create your profile
Only paid subscribers can comment on this post
Check your email
For your security, we need to re-authenticate you.
Click the link we sent to , or click here to sign in.