Astral Codex Ten

Perhaps, for the next round of surveys, they should poll data scientists who are actually using AI to accomplish real-world tasks -- as opposed to philosophers or AI safety committee members.

Expand full comment

magic9mushroom

I feel like #2-#5 are the problems and #1 is the part that makes them dire. Superintelligence in and of itself doesn't create catastrophe, it's superintelligence doing something bad.

(The paperclip maximiser, for instance, is an example of #3. #1 is what makes it an extinction-level threat rather than just another Columbine duo or another Lee Joon.)

Expand full comment

Macha_

No one that I'm aware of seems to recognise that we already live in a paperclip maximiser. Pretty much all of symbolic culture (art, religion, folklore, fashion, etc.) consists of 'evidence' designed to make our environment look more predictable than it actually is. There's a fair amount of evidence that anxiety is increased by entropy, so any action that configures the world as predictable relative to some model will reduce anxiety. And this is what we see with symbolic culture: an implicit model of the world (say, a religious cosmology) paired with a propensity to flood the environment with relatively cheap counterfactual representations (images, statues, buildings) that hallucinate evidence for this theory.

What does this have to do with AI? It seems to me to make scenario 2, influence-seeking, more likely. If evolutionary processes have already ended up in this dead end with respect to human symbolic culture, it may be that it represents some kind of local optimum for predictive cognitive processes.

Expand full comment

Adam

Bet the field is almost always the right bet, so any time you're given a slew of speculative options and one of them is "other," other should be the most selected option.

A good thing about most of these issues is they have nothing to do specifically with AI and we need to solve them anyway. How to align the interests of principles and agents. How to align compensation with production. How to measure outcomes in a way that can't be gamed. How to define the outcome we actually want in the first place.

These are well-known problems classic to military strategy, business management, policy science. Unfortunately, they're hard problems. We've been trying to solve them for thousands of years and not gotten very far. Maybe augmenting our own computational and reasoning capacities with automated, scalable, programmable electronic devices will help.

Expand full comment

Taleuntum

Just noting that I did not get an email about this post, even though I usually get emails about new posts (every time iirc). I've seen this on reddit and that's how I got here.

Expand full comment

The amazing thing about Christiano's "AI failure mode" is that I'm not convinced it even requires AI. I think that one my already have happened.

Expand full comment

Grant Gould

I would love to know what these researchers thing the respective probabilities of human intelligence bringing about these same catastrophes is. Is that 5-20% chance higher or lower or lower?

Expand full comment

Shion Arita

I always see these superintelligence arguments bandied about but I really don't think they hold water at all. They are kind of assuming what they are trying to prove, e.g. "if an AI is superintelligent, and being superintelligent lets it immediately create far-future tech, and it wants to use the entire earth and all the people on it as raw material, then it will destroy the world."

Well, I guess. But why do we think that this is actually what's going to happen? All of those are assumptions, and I haven't seen sufficient justification of them.

It's a little like saying "If we assume that we are pulling a rabbit out of a hat, then, surprisingly, a rabbit is coming out of the hat."

Expand full comment

2 and 3 seem like they depend more on the definition of catastrophe than on the harm it creates.

I can easily (>50% confidence) see a mild scenario of 2/3 playing out in the next 20 years. An AI that was built to maximize watch time on youtube might start getting people to watch videos that encourage you to spend more time on youtube and not like other sources. (calculating something like "people who watch this set of videos over the course of 6 months increase watch time). Even though those videos are net harmful to watch overall.

The more and more I think about and study Ai the more concerned I am about aligning the 20 year future than the 40-100 year one. Ai's that are designed to maximize some marketing goal could easily spiral out of control and create a massive destruction of humanity in a scenario 2 like way. You feed them goal X which you think is goal Y but the AI finds out to optimize goal X you need to do some complicated maneuver. Then by the time the complex maneuver is discovered it's too late. The real danger here is in financial markets, I know of many Ai programs that are currently being used in finance, and it's likely that in the future some massive trillion+ dollar change of fortunes will happen because some hyper intelligent AI at some quant finance firm discovered some bug in the market.

Scenario 3s play out similarly, feed it wrong goal, GIGO, but now the Garbage out results in massive problems. Quant finance firm tries to get Ai to maximize trading profits, AI is deployed, leverage's itself "to the tits" as kids would say, and some black swan crashes the entire AI's portfolio.

All software is horrifically buggy all the time, when we have AI's that are really good at exploiting bugs in software we'll have Ai's that find infinite money exploits in real life on our hands.

Expand full comment

fabian

Inverting The Obvious

It's interesting to consider in which ways researchers may be biased based on their personal (financial and career) dependence on AI research - this could cut both ways:

1. Folks tend to overstate the importance of the field they are working in, and since dangerousness could be a proxy for importance, overstate the dangerousness of AI.

However,

2. It could also be that they understate the dangerousness, similarly to gain of function virologists who insist that nothing could ever go wrong in a proper high security lab.

Hmm.

Expand full comment

Is anything in this field based on quantitative data, evidence we can argue about? Or is it simply constructing thought experiments and then guessing how likely they might be?

"Intelligence" isn't really a defined term. It seems to be used as a squishy property to justify whatever risk you want e.g. "take a sufficiently intelligent AI, it might then do this ... "

Here's a risk scenario that seems far more likely: some sort of computer virus, one of those things specifically engineered to replicate, distribute, be undetected and unclear in motivation, and to be resilient against removal attempts, behaves in an unexpected way and bricks a lot of computers. This already happens. It's obvious to me that this line of coding/software development is more risky because it inherently is networked and self-replicating, unlike AIs that would have to learn that part. It also seems obvious that a more likely outcome for some software that is "learning" aka trying things to see what happens, is not to actually get an intended (nefarious?) outcome but to break a lot of fragile just-so systems.

One could quantify these risks. How many systems would an AI/virus have access to? What is the compute or network resources needed to consolidate information? How could one detect such a thing happening? As far as I can tell, security researchers think about these things, but the usual answer is "imagine a sufficiently intelligent AI, then it takes over all systems, is undetectable, and operates with zero power. Now, the important thing here is whether it operates based on hard data only or a reasonable interpretation of objectives expressed in natural language ... let me consult my Asimov ... I give it a 20% chance"

Expand full comment

Never

I wonder how this would go with a question like "AI will result in worse than current conditions for majority of humanity"

Expand full comment

Notmy Realname

4 and 5 seem completely unfair, you could say equal risks about ukuleles or raspberry flavored lollipops

Expand full comment

Scott, just letting you know that I didn't get a mail for this post, and that this might be true for other readers as well.

Expand full comment

Deiseach

Looks like the way to get us all fighting each other on here is to discuss AI risk, of all things.

Okay, I'm going to go with scenarios 3 and 5 as the most likely (I think 4, using AI for wars, falls under 5 as well).

I really don't think the classic SF "superintelligent computer becomes self-aware in a recognisably human way, acts in recognisably human supervillain manner" is a runner. Neither do I think that "if we can make X units for €Y, then we can make 10X for €10Y!" because often problems do not scale up neatly that way.

I would like to see a re-definition of "what do you mean by 'AI' when you say 'AI'?" because I wonder if the field at present *has* moved away from "human-level supercomputer" to "really fast, really well-trained to make selections on what we've shown it and extrapolate, but still dumb by human consciousness standards machine that ticks away in a black box because it's going so fast we can't follow what it's doing anymore".

Expand full comment

konshtok

a contrary question:

what are the existential dangers of NOT developing a super intelligent AI fast enough?

I know that a lot of moderately intelligent people are very exited about this new field of rent seeking known as AI development governance

but everyone can think of several nightmare scenarios where the human race suffers a catastrophe because delays in AI research meant we didn't have the tools to deal with a global or cosmic problem

also in the very long run human civilization has a better chance of survival if humans are not biological

is elevating the status of people who advocate for baby mutilation and abortion really a good long term policy?

Expand full comment

Vanessa

If you haven't seen it already, you might be interested in Rob Bensinger's survey of existential AI safety researchers: https://www.lesswrong.com/posts/QvwSr5LsxyDeaPK5s/existential-risk-from-ai-survey-results

You will notice that some people are much more pessimistic than 10%, in particular MIRI. Personally I fall there on the optimistic end of the MIRI spread with 66%. As to "no specific unified picture", notice that 1-3 are somewhat different perspectives but share the same underlying cause (superintelligent AI unaligned to human values).

Expand full comment

Thank you so much for this post, which really clarifies what the experts' various concerns are.

Of all of these the one that scares me most in the near term is the Goodhart scenario. I take "comfort" I guess in that human beings already have basically destroyed important social institutions doing this, so I'm not sure machine learning algorithms doing it instead will change anything but the efficiency with which we destroy non-quantifiable human values.

Expand full comment

Markus

"In 2016-2017, Grace et al surveyed 1634 experts, 5% of whom predicted an extremely catastrophic outcome." This is slightly confusing phrasing. It's that the average likelihood set to extreme catastrophe was 5% across the respondents, not that 5% of people predicted extreme catastrophe with 100%.

Expand full comment

Off topic, but uhhhh did Substack destroy archive browsing? When I click "see all" or "archive" I can now only see as far back as "Take the Reader Survey," and no "go to next page" or infinite scroll option?

Expand full comment

Quicksilver

The Basilisk awaits.

Expand full comment

Bugmaster

Ok, I guess if everyone is going point-by-point, I might do so as well.

1). Superintelligence: this term is so vague as to be meaningless. In practice, it means something like "god-like superpowers", with the add-on of, "obtained by thinking arbitrarily fast". I do not believe that either physics-defying powers or arbitrarily fast processing are possible; nor do I believe that merely speeding up a dumb algorithm 1000x will result in a 1000x increase in real-world performance on unrelated tasks. As I said before, you could speed up my own mind 1000x, and I'd still fail to solve the Riemann Hypothesis 1000x faster.

2). Influence-seeking ends in catastrophe: this already happens all the time in our current world. For example, Toyota cars were found to occasionally switch to their real goal of "going really fast" instead of the intended goal of "responding to gas/brake commands". Software bugs are a clear and present danger, but it's not a novel or unprecedented threat. In practice, things like stockmarkets have layers upon layers of kill-switches (which is why so few people had heard of the Flash Crash), and things like nuclear weapons are powered by vacuum tubes. An apocalyptic software crash scenario only makes sense if your software is already god-like -- see (1).

3). Goodharting ourselves to death: this is absolutely a problem, and it is already happening, everywhere, without even the help of AIs. For example, look at the education crisis (optimizing for number of graduates instead of educational attainment), or the replication crisis in science (optimizing for the number of papers published instead of actual scientific discovery). I agree that this is a problem, and that AI can make it worse, but once again, this is a problem with our human motivations and goals, not with AIs; that is, adding AI into the mix doesn't make things categorically worse.

4). Some kind of AI-related war: I am already worried about plain old human war, like what would happen in Kim Jong Un finally snaps one day. Once again, throwing AI into the mix doesn't make things categorically worse. I do agree that Global Thermonuclear War is a problem that we should be working on, today, AI or no AI.

5). Bad actors use AI to do something bad: Yes ! Absolutely ! AI is a problem *now*, today, not because it is some kind of a superintelligent Deus Ex Machina (or would become one sometime soon), but because there are careless and/or malicious actors intent on wielding it for destructive purposes. Instead of trying to solve some hypothetical future problems of superintelligent AI-alignment, we should be making present-day tools more resistant to abuse, and we should be developing means of dealing with bad human actors more effectively. And yeah, maybe reducing our financial contributions to China might be a good start.

Expand full comment

Reply (5)

AnthonyCV

As far as scenarios 4 and 5 not being more prominent than the others, maybe that's a consequence of how specific a group of people is being polled here? My thinking on that is strongly influence by Scott's Should AI Be Open post from 2015: "I propose that the correct answer to “what would you do if Dr. Evil used superintelligent AI?” is “cry tears of joy and declare victory”, because anybody at all having a usable level of control over the first superintelligence is so much more than we have any right to expect that I’m prepared to accept the presence of a medical degree and ominous surname."

In other words, I think the average person would consider 4 or 5 the most salient of these possibilities, but for them to be the most salient in the eyes of experts would mean those experts think we've solved enough of the control and alignment problems that we can get to 4 or 5 at all.

Expand full comment

Korakys

Marco Thinking

Number 5 is by far my biggest concern, which is why I'm so keen to see all wars and dictatorships ended quickly, before technology gets too powerful.

Expand full comment

marxbro1917

(Banned)

marxbro1917’s Newsletter

Interesting argument I just thought of. I'm assuming the superintelligent AI can survive on its own; that it can find its own energy and resources and deliver them to the appropriate places during its outcompeting and destruction of humans.

If you think this is true, is this an admission that a planned economy can out-compete market economies?

Expand full comment

GSalmon

I’m sold on the seriousness of the risk of type 1 (and hence other) artificial intelligence catastrophes. But the prospects of being able to do anything about that risk seem utterly negligible. Even putting aside (1) the obvious problem of being able to come up, in advance, with perfectly successful alignment protocols that would do the trick without having an adequate understanding of the technology, I don’t understand how we can ever realistically satisfy (2) the second step of ensuring that every relevant actor on Earth complies with those protocols, with zero exceptions, forever, even as AI technology proliferates. (If it doesn’t proliferate then presumably the problem doesn’t arise anyway.) I don’t follow this all that closely but my sense is that there may be more focus on (1) than (2). If the end game turns on coming up with a global League of Nations with universal oversight and enforcement power, then we don’t need to be experts in computer science to assess the probability of success.

Expand full comment

Maxander

I find that this post had simultaneously made me feel much better about AI risk (although I probably came from a point of greater concern about it than most) but much worse about AI risk research (even though I already felt that it had serious issues.) There’s just no signal from this survey- if you ask about the relative weight of five very different things, and the answers line up in a neat flat row like this, something has to be wrong, surely?

In actual truth, presumably, these disparate scenarios have different likelihood, or at least differ in how well they resemble likely futures, but all the work in AI risk to date hasn’t given the consensus any insight into those facts. This seems like strong reason to discount any *other* putative information coming from AI risk research.

Expand full comment

TheVoiceOfTheVoid

Scott, I think your takeaway #1 ("Even people working in the field of aligning AIs mostly assign “low” probability (~10%) that unaligned AI will result in human extinction") misunderstands the study design. Participants in the survey were asked to rate the probability of these scenarios *conditional* on some existential catastrophe having happened due to AI, as it says in the caption of that figure. So the survey says nothing about what they think the probability is *of* human extinction due to AI.

Expand full comment

Jappie

The reason I'm so concerned about #1 is that you don't have to make an AI of the level of a human (iq 100), just doing an AI of iq 70 is good enough. Then you dump it on amazon web services and rent 100 machines to run it on, now you have an AI of iq 7000.

It's possible for ordinary people to do this (if they had the AI code and skill). That's the scary part. There is no barrier. Any nitwit with an internet connection can do it.

So even if you manage to figure out the goals part, some irritated 17 year old with some savings and an internet connection can go and destroy the world with some paperclip maximizing goal ignoring all goal research.

Expand full comment

szopen

Every time I read about paperclip maximizer I think about Lem and his short sf story (I think trvel 24 from Star diaries, 1957, published in English in 1976?). The premise is that the ancient race of Indiots created an AI, which should promote harmony, without violating principles of free choice and freedom of initiative of Indiots. AI turns anyone into nice shining circles, which it can use to create nice looking patterns. Have Bostrom read Lem when he proposed paperclip maximiser?

Expand full comment

ucatione

So it looks to me like #4 and #5 is really a people problem, rather than an AI problem. You could substitute any other technology here, like nuclear weapons, for example, and make the same claim. So these two are really anti-technology positions.

As for the first three, these seem concerns specific to AIs. Although you could rephrase each of these as a people problem as well. For example, for superintelligence, you could argue against some new form of education or learning tool that will teach people how to become smarter, or even a neural interface like Neuralink that would enable someone to Google things much faster than other people. What is such a person is able to leverage this into a position of extreme influence over society? Worded this way, it sounds like a silly concern. We recently had the experience of a person, who no one would accuse of being superintelligent, leveraging himself into a position of power that resulted in gross mismanagement of a pandemic (though thankfully no new wars were initiated). My point is that so far throughout history we have not seen super-intelligent people be a threat to humanity. It's not Einstein, Evangelos Katsioulis, or Stephen Hawkings we should be afraid, but Hitler, Stalin, and Idi Amin. The same type of analysis holds for threats #2 and #3.

Expand full comment

The Tension of Reflexive Identi…

B Civil

Aug 1, 2021

I am wrestling with a question that I feel has something to do with AGI;

Is the will to live, or the will to power, something that can be derived purely rationally? I don’t think so, but I could be wrong.

I have a total fear of AI being used in really horrible ways by people with their own motives. In that sense it’s like every other great leap in technology we have made;

it cuts both ways.

Expand full comment