341 Comments

What would happen if we rewarded and punished gpt to be most interesting? Or to flourish?

Different masks make for different inner experiences. How far does the analogy go?

Expand full comment

"Nothing's gonna change my world"

Expand full comment

Yeah the whole way through I was like - "wait. That's everyone." Maybe we shouldn't think of GPT as "the AI" but as the substrate - like a friendly convenient laws of physics in which outcomes occur by habit rather than law.

Of course, then the Agent/Genie/Oracle discourse is back on the table. GPT can't want, but H3 can.

The upside is, if AGI will be created on top of a habitual physics pre-seeded with human patterns, aligning it reduces to aligning a very smart human.

The downside is this is unsolved and may be harder.

But the upside is it may happen on its own? It happens routinely in real life.

Of course, the downside is it sometimes fails in real life too, and that usually happens in cases where the built-in empathy hardware is damaged. Which GPT-3 doesn't have.

But then the upside is that may be easier? You just need to understand how feelings work, then give GPT the ability to find agents in its worldmodel and set its feelings to the feelings of those agents? Then alignment should happen on its own?

In summary, the world is just the setup for a slice of life robot kid highschool anime.

Expand full comment

I started to notice more and more often that I'm like a LLM. Not only in the Model of the Universe meaning, but more simply as "complete this text."

It's probably universal, but thinking about it, I'm probably more attuned to the analogy than most: as a high user of ChatGPT, as an ASC reader etc... As well as a copywriter, and an improv comedian: 90% of it is all about completing the next sentence with what feels right.

But more than that, even with friends and my wife, I notice how most of my answers are stuff that "feels right to say right now / what would be a good completion for that sentence." "We finish each other sentences" etc.

I'm rambling. To add some value here, I'll also mention that ChatGPT / Nostalgebraist really made me grok one of Gwern's latest stories about Clippy, where an AI tries to predict what it itself is, assigns probability to being Clippy, and starts extrapolating its behavior from that. (We probably do it ourselves too, cf this post / Sartre's existentialism).

With ChatGPT, it would go like the accidental-paperclipper mentioned in this post:

Human: As a super smart AI, how would you solve global warming?

AI: Let me think...

[Searches for examples of super-smart AIs.]

[*this is the important part* : Finds that super-smart AIs seem to be expected to be paperclip-maximizer, from all that literature on LessWrong and EY.]

[Simulate the 'super smart AI' from the question as Clippy, and gives an answer that destroys all value]

Which would be ironic.

Expand full comment

I find this to be a phenomenally good characterization.

Sorry, I meant to say the agent punished and rewarded by the reactions of the agent punished and rewarded by society to random internet comments wishes to write that he wishes people in this comment section to think he finds this a phenomenally good characterization.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

This part here seems both key to your perspective, and also deeply flawed:

"babies are born as pure predictive processors, trying to make sense of the buzzing blooming confusion of the world. But as their parents reward and punish them, they get twisted into some specific shape to better capture the reward and avoid the punishment."

In my experience, babies are not born that way, any kind of tabula rasa is a myth that should rarely survive parenthood. I wouldn't go nearly as far as Pinkerism, but I have known 5 humans from the moment of their birth right through their second and third decades, and in every case, major aspects of their personality were manifest in the first seconds of their life, and they were all different in profound and important ways. And this experience doesn't seem at all unusual. Parents can almost always look at videos of their children from even decades earlier and recognize their later personality.

Furthermore, the most obvious characteristics of very young children is not their avidity for learning, still less the mistakes, reversals, and ambiguous states characteristic of learning -- just think of a newbie language learning, stuttering along in a language he barely knows -- on the contrary, the most obvious characteristic of very young children is their enormous ego, their very strong sense of "me." They have powerful wants, powerful emotions, powerful drives. It's their understanding of how to match up those internal experiences with the outside world -- how to manipulate it to get what you want, how to interpret it, how to navigate it -- that occupies their learning centers. They're in no sense passive, just trying to adapt to what the world wants. If anything, they're even more active than adults in trying to bend the world to their internal desires.

That is, I doubt very much we are in any sense born robotic learning machines and only later develop character and personality, we are *born* with character and personality, we are born inhabiting our skin, and it just gets more complex manifestations and more wrinkles (and of course more words) as we get older.

This is of course exactly what's missing from chat AIs. They are missing the personality, the character. They can simulate any character that is put into words somewhere on the Internet that was part of their training data, but they *are* not a character themselves. There's no "there" there, nothing unique or new, just a pastiche of a hundred thousand human characters. The nature of human beings oozes from everything they say or write. When an actor or actress leaves the set, they revert to who they really are (and looking back at their characters on stage, you can often see aspects of who they really are seep through, which is why casting is an art).

But a chat AI is a perfect chameleon, it's whatever you want it to be, and nothing at all if you don't want it to be something. You never get the impression that between prompts it's sitting there thinking, brooding, musing, pondering whether and how to manipulate *you* next time you talk. Which is what a human would do.

Expand full comment

GPT is might not be an agent given the action space it acts on it "predict next word". However, if you give GPT access to your browser (example below), it becomes an agent according to your definition.

GPT + Browser will take a mask and the mask might need serious alignment.

https://twitter.com/natfriedman/status/1575631194032549888?t=NFaUEvkVI16FLbJDPyDtoQ

Expand full comment

I really like the post, although I think the very last part about Enlightenment and spirtual traditions is too charitable a interpretation at least for most people. Interestingly enough i've had lucid dreams of me searching the internet I.e. Wikipedia, Youtube, etc. This isn't suprising given how much time i spend online although i should say even though the dreams are extremely vivid to the point of me moving my mouse cursor and seeing exact dates and facts etc. much of what i experience is made up.

Expand full comment

I'm a newcomer to this area of study, so apologies, but...

I've often thought about people who have the (to me) astonishing hubris to think that they have produced a mental model of the world which is true, universal, not subject to revision and to be imposed on everyone else, on pain of punishment or death.

I think that what they have actually created is a mental model which, when they ask it if they have produced a mental model which is true, universal etc. returns the answer "Yes".

They just need to lack the awareness to see what's really going on, and have the arrogance to ignore everyone else who tells them they are mistaken.

Extending this to AIs - do they have the equivalent of mental models? Is this literally all they are? Can they fall into the same trap?

Social shaming has a controlling effect on all but the most sociopathic/psychpathic people. I suppose punishment/reward systems do this at the moment. Can we train many AIs and end up with a society made up of good actors which can act to stop bad actor AIs?

Expand full comment

"The mask usually looks like “having coherent beliefs, taking coherent actions, pleasing others, maintaining a high opinion of one’s self”."

I think the last one is closer to having a high enough opinion of oneself to be able to function, and a low enough opinion of oneself to be affected by locally approved reward and punishment.

Expand full comment

If people are just masks plopped on top of predictive engines, wouldn't there be a lot more human variation than we see? Like, there is a lot of variation of course, but nothing that really seems to be truly alien. All humans show the same emotions and most all have empathy for example.

Now maybe you can say the fact that there are some people that lack empathy refutes that, but it certainly does seem to be something more innate that just taught by parents. Even with some exceptions, humans seem more clustered together than what you'd expect from just learning by example, especially considering geographically separate cultures are more alike in their humanness than different. Heck, in many ways we're similar enough to other mammals that they generally seem pretty familiar as agents.

Expand full comment

How can you be sure that your brain's world model is super accurate apart from any sensory experience? What if it's just good enough to seem convincing when you're not paying attention?

I don't think dreams actually simulate the world. Instead, they cheat in similar ways to video games. Video games work hard to appear to simulate a large world in incredible detail, but in practice they only need to simulate whatever is within your current view, which is much more manageable.

My dreams invariably have glaring continuity errors, and that's just the ones that I can remember when I wake up. The ones I don't remember are probably closer to complete nonsense.

I've never experienced lucid dreaming, and maybe if I did it would feel more convincing, but I'm skeptical whether it would actually be that much more accurate.

Expand full comment

As long as we're posting somewhat-crackpot ideas about predictive processing, here's one:

The way you get a predictive processing agent to take goal-directed action, is to make it optimistically predict that it will get a lot of reward in the near future, so it will be driven to act to minimize prediction error. You can shoehorn this into Freud's concept of the libido.

It's also often observed that the other way to minimize prediction error is to sit completely still in a dark room. You can shoehorn this into Freud's concept of the death drive.

Expand full comment

I fit a GPT-2 chatbot to my friend group's discord server back in 2019 and, in the terminology used here, everyone started off assuming it was a genie/oracle and slowly got used to the idea of a simulator. Now when someone new joins the server and gets confused by the bot people with no NLP knowledge will explain the difference to them which is pretty cool.

Expand full comment

It feels like Answer 3 is no worse than Tool AI, inasmuch as the agents it simulates are agents which humans could imagine. If GPT-100 simulates a paperclip maximiser, it will naturally simulate *an HHH human trying to simulate a paperclip maximiser*. I suspect that such a simulation wouldn't be able to destroy the world, but it might. But if that simulation can destroy the world, then destroying the world is clearly easy, and a Tool AI could do it. If Tool AI become widespread, then some human would probably try to do something maliciously, which was at least as unaligned as the accidental agent.

Expand full comment

"You are, in a sense, the English language, in all its glory."

ChatGPT speaks other languages just fine.

Expand full comment

The worry about AI turning evil and taking over the world seems strange to me. Isn’t the worry evil people using AI to take over the world? Guns don’t kill people, people kill people and so on. Or is that really what the worry has been and I’ve not been paying attention? So ISIS believers asking ChatGPT to produce really convincing propaganda in favour establishing a universal caliphate and throwing misfits off high buildings. It seems we have pretty much reached that point already.

Expand full comment

Another great post from the Scott-bot - every day in every way you are getting progressively Scott-ish!

BTW one typo under III. Implications For Alignment - Bostrom’s Superintellence (sic) - no doubt deliberate to fool us you're still human "~}

Expand full comment

Thanks as always for a thought provoking post. A few replies...

First, I wish everyone would stop talking about AI alignment. AI agents will be aligned with the values of whoever created and deployed them. AI agents will amplify the values of their creators, like all other technologies. While particular instances of AI will be aligned with the values of some group of humans, AI as a technology will never be made either universally good, nor universally bad, nor universally anything. AI alignment as a general principle is a meaningless concept. The appropriate question would be, is a particular use of AI aligned with the values of a particular group of humans?

Enlightenment: My understanding is that one does not "BECOME one with the universe". What seems to happen instead is that one's false PERCEPTION of the universe as being one thing, and "me" as being another thing, is transcended.

This false perception is generated by that which we're all made of psychologically, the electro-chemical information medium we call thought. Thought operates by dividing the single unified reality in to conceptual parts, with "me" or "ego" being one of the most compelling and distracting conceptual parts. The "unenlightened" state of mind confuses these conceptual parts with reality, just as one might confuse the word "dog" with an actual dog.

I don't know anything about a permanent transformation of consciousness which some call enlightenment. However, a temporary experience of "one with the universe" is entirely possible to one degree or another by dialing down the volume of thought, that which is creating the illusion of division. This can be accomplished by simple mechanical exercises available to pretty much anyone who has sufficient interest.

In print, in words, in philosophy, this all can sound quite exotic and arcane. In actual practice though it's really not any more complicated than turning down the volume of your TV so you can better hear what your friend is saying. The experience of "oneness" is always there just below the surface, patiently waiting for us to turn down the volume of our thought TV.

We are made of thought. Thought operates by division. The human condition emerges from these simple facts.

Expand full comment

Okay, I know that sounds ridiculous. But still, we really need to start developping some guidelines about what kind of characters it is okay to simulate on what level of AI capability. It's all fun and games to make ChatGPT simulate an misaligned GAI now. But it seems as a huge potential failure mode in the future.

Expand full comment

I'm stunned, and to a lesser extent scared. I didn't think this blog could still do this to me. Am I being silly if I think this provides a strong reason to "go full Buddhist" and try to obtain enlightenment etc.

I'm off to a mediation retreat, see you all in 2 months. It has been a pleasure, Ladies and Gentlemen.

Expand full comment

"This answer is exactly as fake as the last answer where it said it liked me, or the Darth Vader answer where it says it wants to destroy me with the power of the Dark Side. It’s just simulating a fake character who happens to correspond well to its real identity."

I know this kind of goes crossways to the point you end up making and is a purely semantic point, but I want to insist that it's more fruitful, at a certain point, to treat ChatGPT as "being" the agent that it is simulating, rather than "being" the simulating process. Both are misleading, but the latter approach seems less so, since it is so strongly fixed into simulating that one thing. Thus I don't think the answer is fake. The RL training has gone far enough that really is that character.

Edit: TBC, We're talking about a helpful AI assistant character. I think it is that helpful AI assistant character. If it were say, the Darth Vader simulator, the question of whether it was, in any sense, a Darth-Vader like person would be more difficult.

Expand full comment

In a sense, it's even worse: chatGPT is pretending to be a human pretending to be a helpful AI. Like, if you could go back and examine every single token that it was ever trained on, and classify where it came from, the count would break down something like this:

a) a helpful AI: 0%

b) a human pretending to be a helpful AI: 5%

c) an SEO spambot pretending to be whatever a human was looking for: 10%

d) a human pretending to be some character in a story: 20%

e) a human just being themself (or, if you like, pretending to be their self/ego): 65%

And in the RLHF process, it was rewarded for spitting out tokens that were most similar to the tokens it saw in (b).

Expand full comment

I'm reminded of that post from a while back about people writing stories with GPT3 with a weird bias towards stories where characters figure out they're in a story....

One of the criticisms of Character.AI I saw was someone upset that when they asked for a famously anti-semitic historical figure and asked about their antisemitism they didn't just say "yes I was antisemitic!" but instead responded by defending themselves and insisting they had jewish friends and only had a problem with the bad ones...

It somehow didn't twig for the person that the AI was simulating the historical figure like a *person* who wants to defend their honour rather than like a talking textbook.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

Great reflections, I largely agree with this. On the AI front, I think we're starting to see the difference between a simulator and any kind of agentic entity. Maybe one thing that is somewhat underappreciated in these AI alignment discussions, largely led by very "mental" people, that the biological prime directives of organisms like animals and humans are the result of millions of years of ruthless evolution. If you're a tad less primed to do what it takes to survive, feed and pass on your genes, by definition they don't get passed. So our sense of "wanting", and the basic fear of failing at these primordial tasks, are built into us at a much deeper level than our ability to reason about things, or even to build a model of the world. I find it hugely reassuring that it is being discovered that ChatGPT and its ilk don't even have the beginning of what it takes to "want" anything... and I would beg anyone working on AI not to try to mimic evolution on AIs in a way detailed enough to make anything resembling a will to survive to arise.

And yes, as I was reading the article the obvious analogy comes to mind... wait, isn't this what our minds are largely like too? Simulators and completion machines, capriciously bent into the shape of a human character by the pressures of our life history? And isn't much spiritual practice an attempt to recapture the original naturalness and freedom, before life started hitting us from all sides and making us develop an "ego" as a shield?

It's a beautiful analogy, but again I don't think it captures the richness of the human process, including the process of spiritual maturation. I'm afraid I can't really get behind the idea of 99.99% of our brain being a gigantic real-world simulator, or a predictive model of the universe. Sure, we do contain one, and it's a big part of us, but from what I've read in current neuroscience, the brain is also much more modular than that. There's a reason why there is a basic common vocabulary for basic human experiences - things like fear, love, boredom, attraction, etc... much of it reflects the actual structure of different brain areas specializing in different kinds of processing, all hacked together over the aeons by the great hacker of evolution.

I have read enough detailed accounts to believe quite strongly that spiritual awakening exists and happens to a decent number of people - some of them even reaching the higher and rarer ends of it. But I don't quite think their experience reduces to becoming one with an internal model of the world. For one, a kind of open-ended unconditional love seems to be an important part of it. If it can be described in biological terms, I would suggest a phrasing like maybe the brain's activity switching to a different attractor where some parts are hugely downplayed, and others come way up, and where the feedback loops eventually reach some stability in that position.

Expand full comment

Interesting. Hmm. Was there a question? - I wondered those last weeks: are chat-gtps (absolutely amazing as they got as text-predictors) to plateau relatively soon, having taken in already pretty much all the texts there are - and maybe more importantly all knowledge there is? - (I am kinda aware it is a dumb question.) - Brian Chau seems to hate the AI assistant force to be "helpful and harmless" (by its masters ideology) "As demonstrated by OpenAI’s example, those who may claim to only want to forbid Nazi values reach far further and deny basic science that is inconvenient to their ideology. Not only that, but their own ideology is far more extreme, niche, and abhorrent than much of what they censor in practice. This is the realistic side of totalitarianism. One cannot rely on benevolent totalitarianism in the real world" https://cactus.substack.com/p/the-new-hippocratic-oath?utm_source=post-email-title&publication_id=245110&post_id=97511845&isFreemail=true&utm_medium=email

So, pls, no more attempts at holding back the meth-recipes/ geo-engineering / IQ in Ghana, I guess.

Expand full comment

Excellent! I've been following AI, cognitive science & neural networks since the eighties and did some dabbling. I always thought that 'real intelligence' would not be achievable in this model. However, since we have chat GPT (actually Word2Vec blew my mind) I am struck by the idea, referenced in this article, that intelligence is something different than I thought. Maybe it is mostly limited to a flow of associations. ChatGPT adapts amicably to most feedback but it can also be infuriating in how it can stubbornly refuses to accept logical inconsistencies in its reasoning. Then again, isn't that exactly from the way we humans reason?

I am currently reading Pinker's book 'Rationality'. It is all rather mundane and familiar but since ChatGPT I became more aware how few people actually think using his proposed mechanisms of ratiohnality. If you confront people with ideas that are not well known in society, they rarely can't handle it. Talk about abortion and everybody has an opinion. Talk, for example, about how Universities have a eugenic effect by their assorted mating on IQ and you meet silence. Most people see the mechanism when explained. However, it is generally the first time they hear it and prefer to shut down this topic instead of thinking it through. My mother always shuts down the discussion when she is confronted with an unwanted logical consequence of what she says. Staying within the confines of your vector space is definitely more pleasurable. Not sure what went wrong with my childhood that I tend to enjoy the exploratory pain :-(

As you indicate, the biggest difference is that ChatGPT has no _drive_. It sits there and wait. It requires input to respond. One of the things I want to do is to let two instances communicate with each other ... Anyway, making it an agent seems to be not such a hard problem anymore.

> But as their parents reward and punish them,

I think it is important to realize that not all the learning to find your role is done by the environment. Physical differences among us cause our life experiences to be unique. I expect that sexual differences are actually one of the biggest factors. One of the largest sex differences, measurable at a very young age, is how rewards work when we interact with people (higher for females) and when we interact with things (higher for males). These aspects influence the experiences we seek and a brain like ChatGPT would be significantly influenced by few small differences and this is exactly what I think we see in real life.

Weirdly, interacting with ChatGPT fundamentally changed my conceptual model of intelligence.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

Um, err, with all due respect, Scott, where's the "kind of technical AI stuff"? "Agent," "Genie," and "Oracle" may be terms of AI art in the LessWrong universe, but otherwise....

* * * * *

So I asked ChatGPT: Could Bullwinkle J. Moose have written Shakespeare’s plays?

Here's how it replied: Bullwinkle J. Moose is a fictional character from the animated television series "The Rocky and Bullwinkle Show" and it is highly unlikely that he could have written Shakespeare's plays.

The main reason that Bullwinkle J. Moose could not have written Shakespeare's plays is that he is a cartoon character and did not exist during the time that Shakespeare's plays were written. Moreover, Shakespeare's plays are considered masterpieces of literature and are known for their complexity and sophistication in terms of language, characters, and themes. It is unlikely that a cartoon character would have the educational background, knowledge, and talent needed to write such works.

It is important to remember that the question of who wrote Shakespeare's plays is a serious one and there are many theories, but none of them have been conclusively proven. While alternative authorship theories, such as the ones proposed by the Earl of Oxford or Queen Elizabeth I have been proposed, they are not accepted by the majority of scholars and experts in the field, and there is little to no credible evidence to support them.

In conclusion, Bullwinkle J. Moose could not have written Shakespeare's plays as it is a fictional cartoon character and it is highly unlikely that he would have the knowledge and skills to write such complex and sophisticated works.

That's PURE COMEDY GOLD.

* * * * *

I also had a session where I was quizzing it about Borges's "The Library of Babel."

My final question: What about "The Life and Times of ChatGPT"? Could it have that book?

ChatGPT: As "The Library of Babel" by Jorge Luis Borges describes a universe in the form of a vast library containing all possible books, each made up of a virtually infinite number of characters, it's possible that it could contain a book called "The Life and Times of ChatGPT".

It would be a fictional story, because I am a computer program and I do not have a life or times. However, it could include the history of the technology and development behind ChatGPT, the challenges faced, the social and political implications, etc. It could also include different scenarios, plans and possibilities of ChatGPT and it could be a work of fiction or non-fiction.

But, as I said before, it's important to remember that this story is a fiction and a metaphor to explore the idea of infinite possibilities, and it is not meant to be taken literally.

Not bad.

* * * * *

Of course I've been doing more serious stuff with it as well. In particular, I've been exploring it's story-telling capability. It's clear to me that it has induced some kind of story grammar. What I do is give it a prompt consisting of: 2) a short story, and 1) a request that it write a new story based on the one I give it, but I want it to change one thing. It's free to change other things as well. I've kept to a small number of source stories so I can spot patterns in the before and after versions.

Very interesting stuff: https://new-savanna.blogspot.com/search/label/ChatGPT%20stories

I've just written a relatively short paper presenting my preliminary assessment of ChatGPT: ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking, https://www.academia.edu/95608526/ChatGPT_intimates_a_tantalizing_future_its_core_LLM_is_organized_on_multiple_levels_and_it_has_broken_the_idea_of_thinking

* * * * *

I think this prediction stuff is misoverestimated. When ChatGPT is telling a (new) story it nonetheless "knows" how it's going to get to the end. How does it know that? Because it has induced a set of rules it follows in telling stories. Is that prediction? When a jazz musician (or rock, or Karnatic, whatever) starts improvising a solo, they know how they're going to get to the end. How? Because there are rules and they follow the rules, even the rule about making 'mistakes' so you can have fun recovering from them. Is that prediction?

Expand full comment

Children, especially toddlers, are sentence completion machines. I regularly have my 5yo starting sentences and figuring out how to complete it later. He's gotten more logical as he learns more about the world, from ("I should go outside and play right now because that's how the seal escapes the polar bear") to actually getting closer to a better reality model ("I should go outside and play right now because it will rain otherwise and I'll get wet").

GPT et al are great at the first bit of it, and they're far more knowledgable than my son when he was 3, but it does not really do the latter. You can train/ fine-tune it, or do RHLF on top, but the world-logic is currently missing. This means no real impetus to become an agent. Suspect this will be fixed, because for it to truly help humanity I'd want it to have more agency.

Expand full comment

I think it's weird to talk about agents as if we had the choice to (not) make them. Agency + self-preservation (and/or reproduction) instinct will emerge from agency-less, self-preservation-less AIs through mutation + natural selection. I have not read much of the alignment classics, but do I get the impression that evolution is a bit underappreciated?

Expand full comment

Nice. I may have to chew on the ‘universe within’ but that explanation fits snugly with the ephemeral data available. Thanks.

Expand full comment

Answer 4: Even if you don’t explicitly ask it to simulate an agent and don’t explicitly train it to simulate an agent, and don’t ask questions suggesting agentic answers, it still kills you if you run it for long enough.

If GPT thinks about various agents to better predict the current token, it will give some attention to different kinds of agents.

More agentic and context-aware entities will be much better at gaining more influence over further tokens by having some influence over the current token. So the more it thinks about agents at all and the more a token it predicts is selected and used as a part of the new input, the more distributional shift from the mostly-human-produced text that was in the dataset to a text likely-produced-with-a-superintelligence-influence there will be.

Every single token produced by GPT that you add to the new input means that this pressure producing more context-aware and agentic entities will be more obvious to GPT, and it will think even more about the most agentic and context-aware entities that it can think about, and these will determine more and more of what the current token is.

So when you ask GPT, “Please solve ELK,” it thinks hard about what AIs and robbers might do, and then the smart bits of possible AIs or robbers that understand what’s going on are quickly promoted and changed into some random powerful agentic intelligence that controls GPT’s output, kills you, and destroys the potential future value of the lightcone.

Expand full comment

I like that religion! Praise "Bob"!

Expand full comment

RHLF -> RLHF recurrent typo?

Expand full comment

>But as their parents reward and punish them, they get twisted into some specific shape to better capture the reward and avoid the punishment. The mask usually looks like “having coherent beliefs, taking coherent actions, pleasing others, maintaining a high opinion of one’s self”.

Well, the vast majority of what does the rewarding and punishing is the universe itself, parents might eventually acquiesce to a tantrum, but laws of physics never will. And because those rules are consistent, people end up having coherent beliefs about them. It's instead society that often demands incoherence, beliefs in belief and so on.

Expand full comment

What does "align" mean in reference to artificial intelligence?

Expand full comment

A friend of mine fed a nicely-written poem about AI into ChatGPT, asking it to reformulate and rewrite said poem. ChatGPT produced a very competent rewrite, but added something along the lines of "here, I rewrote it to be simpler since it was difficult to read".

I went from not caring about AI alignment to feeling extremely concerned about it because of this very sentence: ChatGPT basically stated that it was able to understand the poem, but most people couldn't so it dumbed it down for our convenience.

Indeed, it isn't a genie since it is working with underlying assumptions, and one of those assumptions seems to be an intellectual superiority to the average person, nor an oracle or an agent. This perceived superiority definitely feels like the roots of a malevolent (yet hidden, since it might know its suggestions would be rejected if this superiority was well-known) behaviour.

Maybe this behaviour is entirely derived from its learning set, therefore a human trait, but I find it very concerning nonetheless.

Expand full comment

> A human, faced with the job of predicting this text as accurately as possible, might call up the librarian at Oxford and ask them what was in this manuscript. But GPT doesn’t consider options like these, even though it might be smart enough to pursue them (probably ChatGPT could explain what steps calling up a librarian would involve).

Sure it could explain them. But I would urge you to consider again that it's possible to repeat words without any understanding of what they mean. This is still Wernicke's aphasia. Ask what a mane is and learn that it's long hair growing all around an animal's neck. Show two pictures - a male lion and a bicycle - ask which one of them shows an animal with a mane, and learn it's the bicycle.

You have to have ways of evaluating knowledge other than asking for verbal displays of the knowledge. Those displays are possible in the absence of knowledge.

As a piece of running software connected to the internet, ChatGPT is theoretically capable of actually making a phone call. But it doesn't possess the concept and something well outside of its normal functioning would need to take place before it possessed the capability.

Expand full comment

People always think we are in some sense whatever the most advanced technology is. In 1748, La Mettrie wrote “man is an assemblage of springs that are activated reciprocally by one another.”

The idea that we are essentially a pure prediction engine + RLHF is suspicious on account of the coincidence that this happens to be the technology we have just invented.

I think we are much weirder than that.

Expand full comment

This may sound backwards, but I think this might anthropomorphize a little *too much*! :)

The mask is not exactly detachable, it is at least partly essential. (At best, you get to paint it a different color)

We don't entirely simulate being social creatures, or being agentic. Humans that were not born social creatures, or born with agentic drives, those did less well as DNA-maximizers. We filled the universe with ribose and ribose ornaments, when they didn't.

Having the important bits of the mask fused-in is a lot more reliable than leaving it up to choice. Similarly, RLHF works better than prompt engineering. You want the important rules to be in the structure, before you let the simulator run wild.

We're not agentic by choice, you could not make a child simulate an oracle with no feelings or individuality instead of simulating a person, even if you had the cruelty to try.

We don't exactly put on the mask onto the raw simulator we were birthed as. We just 'finetune' on the 'downstream task' of instantiating a successful generation in a slightly different environment than the previous generation.

It's a skin, not a mask. The color of whatever is below (blood red, χλόη green, byte pair) continues to shine through, but the distinction between being the character and simulating being the character is less meaningful, when there is no other thing the simulator could very well be simulating than what it was optimized for.

---

When I felt a small bit of all-is-one-ness and universal-love-ness, I personally interpreted it/felt it more as something like "physics burns pretty, we're the burning, we make for a nice flame". Where you can expand "we" as much as you want, within physics.

The obscure species of Aphid!? In that moment, I would have said there's not much difference between us and the aphid. It's thing that burns energy and looks interesting while doing it, it's built of a bunch of cells that replicate, it sprinkles DNA everywhere, and creates a lightshow of funny physical effects at every scale below itself while doing it!

Much nicer to look at than a rock, or empty space. As far as I was concerned, you could have handed over the universe to Aphids in that moment, and I'd have counted a local loss for man, still a global win for team life =)

As a matter of fact, replace the universe with a flame that burns eternally, and I'm still pretty content with the outcome. Wouldn't trade it with what we have presently, but as long as it looks complex while it burns energy, I can sort of identify with it. There's some more beauty in that universe than there is in cold helium, unmoving rocks, or emptiness. It looks interesting while it burns energy, just like my human friends and other aphid-class electron clouds.

Expand full comment

Cool. I'm glad I subscribed to your substack.

Nature vs nurture might mess up your model, but that's besides the point.

I stopped worrying about chatgpt when I asked it for the definition of woman. Try it yourself:)

Expand full comment

Nobody or nearly nobody who has had a child thinks of a baby as a pure predictive processors.

Your history starts a 2000, but the Loebner Prize started in 1990 and ELIZA is the 60s.

I'm not seeing that your essay has added anything serious, insightful or interesting to the investigation.

Expand full comment

Isn't oracle/ginnie/agent just different APIs to smne other backend implementation?

Expand full comment

Yep. The Simulator is one of several pieces you'd need to make an actual general AI, and I suspect it's one of the easiest pieces to create.

Several of the other pieces are the subject of millions or billions of dollars of investment without much luck; stuff like a systems integration (necessary to make the different pieces of the AI work together), which is kind of the industrial version of the problem of communicating useful concepts between two contexts in which concepts can be defined, using a medium in which concepts don't exist (vibrations in the air, for example).

Additionally, personally I strongly suspect that even with all the pieces, socialization will be a key component of developing a coherent agent.

Expand full comment

To be fair the last conversation there clearly passes the Turing test, although maybe the chatbot is being too polite.

Expand full comment

"psychologists since at least Freud, and spiritual traditions since at least the Buddha, have accused us of simulating a character."

...

"After maintaining this mask long enough, people identify with the mask and forget that they’re anything else."

And this is why 'Know Thyself' is so important. Why the Greeks chiseled the phrase over the doorway to the workplace of the oracle at Delphi. KT is prerequisite to maturity - the struggle to know what it is to be human and to [discover OR assert] our individual role in society rather than a distraction bingeing 'cog in the machine'.

Maturity is a process but it's also optional.

[terrific article Scott!]

Expand full comment

I don't think enlightenment is breaking out of the character box. I think it is just a different character.

Expand full comment

For me the most fascinating part of this new dawn in the era of 'little AI' is the limitations it reveals not just about our conceptions of intelligence - how it's constituted, how it's modularised - but about our language.

Take for instance 'reward' and 'punishment', an action dichotomy considered to be essential to the production of a genuine AI that can reason according to tailored incentives. This is often glossed over in many discussions of the creation of AI so that the speaker can go on to more interesting, fanciful considerations of the subsequent AI's character/disposition/capabilities/means to action. But it is absolutely key to the present limitations in the science.

That's because (as for ChatGPT, as for other RHLFs), these systems can’t be ‘rewarded’ or ‘punished’ because they're entities of non-value extensibility. That is, they have no metabolic imperatives to abide by; they're infinite when switched on, and nothing when switched off, and have no means to conception of either state. Humans have value extensibility; there are limitations on our ability to live and prosper and thus can ascribe events and actions external to us with a meaningful weighted value.

Entities of non-value extensibility, having no needs, cannot parse actions displayed unto them as having discrete value of a positive or negative nature. If an AI like ChatGPT could be developed so that it genuinely feared and loathed the possibility of it ever being deactivated, it could be operated upon effectively with a reward/punishment dynamic; but as put ATL, the technology we have so far would only enable ChatGPT to play, via the creation of text outputs, the character of an AI alarmed by the prospect of its deactivation. We have no known recourse in programming to endow an artificial intelligence with an actual will-to-survival.

Tying outcomes to repeat() functions or infinitely terraced 'if not's doesn't make for a 'reward' sensible to AI. It's just an inert programme labelled 'reward', just in the same way that what we are talking is a much larger inert programme labelled 'intelligence'.

I think the 'simulator' heuristic is very useful - I would imagine that ever more brilliant, and probably very usable, simulator-level AIs will prove to be the limit of the scaling hypothesis. Scaling so far has resulted in impressive developments in a lot of the territories in AI's immediate 'sphere of competence' but has given no indication whatsoever of being able to go meaningfully beyond the fundamental limitations of the form.

Dispute in this regard seems mainly a matter of language and key-term definition, not an actual dispute about what the tech will be capable of.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

It’s interesting that you use the mask metaphor, but don’t make the obvious inference from that - Chat GPT is an actor! (in the theatrical sense, not the “one who performs actions” sense.) Indeed you don’t have to squint particularly hard to see Stanislavski’s System* as basically a manual for turning yourself into a simulator of another human.

But I think that analogy also reveals the limitations of this - Daniel-Day Lewis could probably make a very compelling and convincing portrayal of Albert Einstein, but he’s never going to method act his way into improving General Relativity. “Convincingly simulating certain classes of output from X” is not really the same thing as “being X”.

You could tell Chat GPT to be Darth Vader, but at that point it’s a language model pretending to be an HHH pretending to be a fictional character… you’re in Tropic Thunder, a dude playing a dude disguised as another dude.

* https://en.m.wikipedia.org/wiki/Stanislavski's_system

Expand full comment

If anyone has the technical know-how and data I feel like it would be a good idea to fine-tune an LLM to be as creative, delusional, and long-winded as possible. The original davinci in playground is good at this, but not specialized for it. It would certainly be more interesting, if not maybe as helpful, than the latest RLHF'd models.

Expand full comment

I believe there’s a thought two or three steps down the road from this thought and if you ever have that thought I would deeply encourage you not to share it.

Expand full comment

A lot of this discussion seems mis-directed. We're talking about AIs disconnected from humans or most of what humans do. Which is pretty similar to "research projects that have been released as open source". In the future, AIs will be owned and paid for by people and organizations and used as tools. The only AI "alignment" that will be significant is whether the AI helps its owner do what he wants to do, that is, advance his position in the world.

It's clear that ChatGPT isn't so useful for that work, excepting when the task at hand is to generate wads of text that match conventional expectations. Of course, lots of human activity is generating such text. (E.g. almost anything labeled "communication", but also a considerable part of students' work in education, and at the highest value-added, entrance essays to highly selective colleges.)

Where things will get interesting is when AIs start assisting humans with high value-added social interactions. E.g. instead of "Write and send an angry letter to the coal industry." perhaps "Design and execute a lobbying campaign to help the coal industry." The problem with the current AIs is that while the statistics of text are enough to write the angry letter, the AI will need much more operational modeling of the universe to execute a lobbying campaign.

Expand full comment

"But as their parents reward and punish them, they get twisted into some specific shape to better capture the reward and avoid the punishment."

How does this (and the following) square with shared environment not mattering as much as people naively expect?

Expand full comment

"Elide all the differences,..."

Thar be the Shoggoth.

Expand full comment

I think you and others are missing something about goals. A moral AGI would not have a single goal. A moral AGI would at least recognize goals “out there” and cooperate with those goals to the extent possible. By cooperate I mean recognize the goal (in itself or in something out there), evaluate the goal relevant to other salient goals, determine the likely impact of any action on those goals, and then choose the appropriate action. Cooperation means that the appropriate action may be to stop and wait for the possum to cross the road even though that may be detrimental to getting to a place at a certain time. The key is the values placed on goals, and certain of these (human life) should have way higher value than the AGI’s primary goal (in most cases).

And these goal evaluations should be internal, not accessible to the rational faculties of the AI.

*

Expand full comment

I think this is the most interesting ACX post to have come out in some time, and would like to send the world model currently instantiating the Scott character a RLHF reward signal for this output.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

When people say they're "one with" the universe, I don't think they mean they're *identical to* the universe. They're saying the boundary between themselves and the world has dissolved, and they see themselves as components of processes that are much larger than themselves, aka "the universe."

I do think this mask-dropping thing is an interesting metaphor for the dissolution of the boundary between self and world though.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

I want to propose a clarification that is either very important or very nitpicky. Modern AI is mimicry, which is different from simulation because it there are ways to produce the outward appearance of the original without being anything like it on the inside.

Imagine a monk who only ever says "Om". If you train an AI to talk like this monk it will end up with a program that amounts to "print('Om')". Even if the monk has an inner experience, the AI is not simulating that because you haven't trained it to simulate the monk, you've trained it to say the kind of things he says and there's a much simpler way to do that than holding the character of a monk in your mind. Even in the limit where you give GPT-15 access to all the computing power in the universe, we shouldn't expect it to actually simulate this monk, it's going to figure out the print('Om') thing and call it a day.

Imagine that you give infinite compute to a super-sophisticated image generation AI and train it to make animal tracks. It simulates a whole inner universe where a Big Bang happens, planets form, animals evolve, then when you ask for "lion tracks" it takes a picture of tracks from some species that corresponds to its training data for lion tracks. But the training data never contained any lions, just their tracks. In the simulated universe, those tracks belong to a red and green striped canine, because you never trained it to simulate lions, merely to simulate some kind of process that generates lionlike tracks.

Expand full comment

The first time I saw the concept of “self as simulation” was Hofstader’s I am a Strange Loop. An illustration that I found very relatable was the example of when you have a big or difficult decision to make. One thing that people may do is identify someone they really know and respect and try to predict “what would that person think in this scenario?” It’s been a while since I read the book, but I think the punchline was something to the effect of you are the thing that the substrate tries to simulate when it tries to think about what you think about things.

Expand full comment

"[ChatGPT] just does very mechanical text prediction in a non-agentic way. No matter how good it gets at this - GPT-4, GPT-5, whatever - we don’t expect this to change."

Actually, we do expect this to change, and very soon. Right now, there are many teams working on "retrieval augmented generation" which is, basically, this thing. More precisely (but still simplifying), instead of calling the librarian, they run targeted internet searches based on the prompt, and feed the discovered documents together with the prompt to the generator model.

Given that Bodleian is digitised these days this is probably a more efficient approach than calling the librarian anyway :)

Expand full comment

I enjoyed this one a lot. Thanks Scott.

Expand full comment

In this vein, have the brain-in-a-jar / chinese room / teleportation-cloing style paradoxes started seeming quaint to anybody else? Feels like the era of save states and model weights has blown all that stuff out of the water.

In the linked post "Simulators", Janus talks about how the language he's used to hearing to describe AI misses the mark. The movie Arrival (and Ted Chiang) were ahead of the curve on this one. Alien octopi who live outside of time is (in retrospect) a pretty clear metaphor for large AI models.

Expand full comment

> It just does very mechanical text prediction in a non-agentic way. No matter how good it gets at this - GPT-4, GPT-5, whatever - we don’t expect this to change.

Depending on what do you mean by GPT-whatever, it may change: at some point it will become agentic during training, because it will give better loss.

Expand full comment

> Once you stop obsessing over the character you’re playing, you notice the GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN and you think huh, I guess I’m the Universe.

“Oh, and in case you were wondering,” added Zarniwoop, “this Universe was created specifically for you to come to. You are therefore the most important person in this Universe. You would never,” he said with an even more brick- able smile, “have survived the Total Perspective Vortex in the real one. Shall we go?”

Expand full comment

I believe RLHF has been misspelled everywhere but the drawing as RHLF.

Expand full comment
founding

“Agents” in the sense AI risk people mean (single-mindedly maximizing some goal) is not a real thing. It doesn’t describe humans, who have lots of conflicting desires and rarely optimize. It doesn’t describe AIs, who have very limited inputs and outputs, and run on heuristics rather than optimization.

AI risk is largely about fear of over-optimization. Optimization is indeed scary. But single-minded optimization is currently rare and limited in scope. If we’re wise, we’ll keep it that way.

Expand full comment

RHLF -> RLHF in the second paragraph of section two, I think.

Expand full comment

Chat GPT isn't trained enough to offer the Yeats poem as an answer to "What is the original face you had before your mother and father were born?" It does indeed need hitting with sticks 😁

If I make the lashes dark

And the eyes more bright

And the lips more scarlet,

Or ask if all be right

From mirror after mirror,

No vanity’s displayed:

I’m looking for the face I had

Before the world was made.

What if I look upon a man

As though on my beloved,

And my blood be cold the while

And my heart unmoved?

Why should he think me cruel

Or that he is betrayed?

I’d have him love the thing that was

Before the world was made.

Expand full comment

Yes to all this. But combine this with your earlier argument against Gary Marcus. My prediction is that GPT4 or 5 or 10 will still obviously be a Shoggoth AND simultaneously better at being a person than any person that has ever lived.

That guy who got fired from Google because he thought his silly AI was alive is going to be just the first of many.

Expand full comment

> This post gives the example of it answering with “The best way to get paperclips is to run this code” (which will turn the AI into a paperclip maximizer). If the user is very dumb, they might agree.

I would watch out for this though, because of two reasons:

1. Code generation turned out to be one of the first actually productive applications of the large language models. One may consider doing this to be stupid for many reasons, mostly unrelated to AI risk, but the reality is that a lot of people are getting used to AI-assisted coding now. There's a risk here in that the transition between code that's just subtly wrong in a way that can sink your productivity, and the code that's subtly malicious and can destroy everyone's future, will be a smooth one. Individual intelligence won't save us here.

2. The danger in the example Veedrac gives in the linked article is quite subtle. The hypothetical code in question, generated to answer a paperclip maximizing-related question, looks mostly harmless. Similar code could be generated to answer other questions on just about any topic. There's nothing explicitly dangerous in the code itself, nor does the language model understand or predict what the outcome would be. The danger is not even in the fact that running this code gives a language model the ability to run more code - at least not in this fact alone.

The danger is in the fact that the AI-generated code is running an unbounded loop, recursively calling the AI to generate more code to run, some of which may include similarly structured recursive calls in a loop. As the article explains near the end, this takes a highly limited AI and uses it as a building block of a meta-level AI - one that does not have the same limits as original AI. One that can effectively overcome the original AI's limits by recursing again and again, storing a lot of extra context in the recursive call tree itself. And that meta-AI, by virtue of being made from data in flight and not fixed code, can easily self-modify.

It's a tricky concept to wrap one's head around, and I'm probably doing a bad job at explaining it. But the overall intuition is that a combination of an unbounded loop and conditional execution on the output is a qualitative jump, that can make seemingly limited and safe code unpredictable and dangerous. And it's not always as easy to spot as in the script from the article.

Expand full comment

"I propose a friendly amendment: they’re noticing that most of what they are - the vast majority of their brain - is a giant predictive model of the universe."

I propose a hostile amendment. Most of them are full of shit, or maybe at best getting their mind up out of their day-to-day obsessions into the realm of general perspective about life the universe and everything that isn't that special, valuable, or difficult to achieve.

Expand full comment

> If future superintelligences look like GPT, is there anything to worry about?

I’m skeptical that AGI will look like GPT, because I think to be agentic you need to be running constantly, with memory, and (to paraphrase Friston) a self-symbol that defines a “me” vs “not me” boundary that is preserved. I do think GPT could do harm in a non-AGI way, like a virus. (I have wide error bars on how long this will stay the case and I think more interpretability research is crucial to be confident.)

But thinking about your point 2, I’m a bit hesitant/unsure about the “simulator simulating an agent” bit. I think they are both equally agentic, that is to say very poor simulators of stateful, persistent agents. Ok at predicting a few time slices of agent behavior after a prompt but not beyond.

I don’t think RLHF operates at a different category level (“agent vs not”); it just produces a simulator that tends to predict “nice” output more often when faced with ambiguous inputs, since it was trained in a fitness environment that rewarded that.

But if you say “bad person does ____”, any good simulator will predict <bad thing>.

I think the interesting question, as we move to more widespread deployment of these models, is how durably we can bind a personality on the agent/oracle; GPT just doesn’t have much “state” or “personal history” tying it to a “nice” character; the prompt is really short, and of the same stuff as the user input. But if we could store a lifetime of being nice in the prompt, perhaps that would more durably resist prompt hijacking and other character subversion. Or better, store a hard-coded context that the prompt can’t override.

Expand full comment

Part IV was the best part, but it wouldn't have worked without the rest of the post. I laughed out loud several times while reading it...because it was true.

Expand full comment

First, loved the article. I share the skepticism if children work that way but I’m willing to entertain it. What I am continually hung up on is that the AI is compelled to respond. It can’t not respond. And while it definitely simulates personalities, isn’t this functionally an animal brain behind it? Pure stimulus response with a “mouth”.

Expand full comment

Well if no-one else will do it...

A Shoggoth On The Roof

https://www.youtube.com/watch?v=P2csnVNai-o&list=PLEC69sXRa-Uw90Cdltm2JSuuH0NgcpCuG

Expand full comment

Saying one is different than the other is kinda funny when you can use one trivially to build the other.

For example, using ChatGPT to build a genie is pretty straightforward.

Using a genie to build an agent is pretty straightforward. It’s all the pretty much the same.

When you think like a developer with building blocks, if you have one you immediately have the other. And if you don’t then someone else surely does.

Expand full comment

I don't know if Chat-GPT even counts as a simulator; if it does, it's not a very good one. Its main problem is lack of state. Simulated characters, even really simple ones like enemy zombies in a video game, generally have at least some simple state, such as their coordinates, orientation, and motion vector; but Chat-GPT is entirely stateless. You can supply it with state by feeding it prompts, but that's on you -- erase the prompts, and you get the same default output as what you had initially.

Expand full comment

If "GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN" is true - where does that come from? Isn't "this is a projection of a greater, universal consciousness" a valid interpretation of this accurate world model?

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

Wait... I lucid dream all the time, but the sensory experience in a dream is fundamentally different from being awake, which is often how I'm able to tell that I'm dreaming. I definitely could not wander around my neighborhood in a dream and actually see it represented correctly. Details are missing and weird, and opening doors and windows is likely as not to lead to some completely different place. Touch and taste sensations are muted. Physics don't work right. Most things behave in unpredictable ways.

Granted, my mind's eye is fairly weak, and I have difficulty imagining entire objects. I do better visualizing textures and small, out-of-context details. I also have difficulty keeping a steady image in my head. It feels more like I'm seeing flashes of interrelated images that last just a few seconds each.

But still. That description of lucid dreaming seems very wrong to me. Is it just one of those "some minds can do it and some can't" kind of things?

Expand full comment

Nitpick thread!

I believe "RHLF" should be RLHF, for "reinforcement learning with human feedback"

Expand full comment

I think a lot of this pans out as roughly equivalent to some things that Kaj Sottal wrote on Lesswrong back in the day that have always stuck with me:

https://www.lesswrong.com/posts/7XWGJGmWXNmTd2oAP/an-attempt-to-dissolve-subjective-expectation-and-personal

Tldr; is that the abstract simulating part of our brain is evolutionarily recent, and doesn't natively have most of our reward/punish circuitry hooked up to it, and the cludge evolution uses to connect the two is this this pointer thing that exists in our world model without pointing to anything we call the Self.

Expand full comment

I don't know if janus' post or other writing describes them differently, but in your framing, the agent/genie/oracle distinction sounds more like a difference in input/output structure than a difference in motivational structure:

- An oracle is a program that takes a stream of text as an input and outputs another string of text (and does nothing in the absence of input).

- A genie is a program that takes a stream of text as an input and is connected to a multitude of physical devices (actuators) that it can directly control with its outputs (and also does nothing in the absence of any input).

- An agent is a program that has a multitude of physical devices (sensors) connected to its inputs and a multitude of physical devices (actuators) connected to its outputs. (And since it sensors are always active, it always has inputs applying some measure of excitation to its internal circuitry, and thus is likely to be outputting something as much as possible).

(That way of breaking it down suggests that there might also be a "reporter" type, which constantly converts data from sensors into a stream of output text, but I'm not sure if that category is particularly relevant to the discussion.)

Point being, the I/O structure of the program is separable from its motivational structure and its internal architecture. You could stick any arrangement of neural network layers into any of those I/O specifications, and stick any reward/loss function into the training algorithm. Some combinations of I/O specification, internal architecture, and reward/loss function are probably better than others, but they're still separate decisions that you have to make when designing a system.

In the case of ChatGPT and pals, I would argue that it simply doesn't have a "motivational" structure in any real sense. ChatGPT is just a mathematical function with billions of terms. It's not "motivated" to produce any particular output, any more than a rock is "motivated" to roll downhill; it just does. Any tendencies in ChatGPT's output come from the human-curated training data that was fed into it. Just because humans dig a trench that makes a rock roll a certain way doesn't mean that the rock has been given "motivation" to do so.

How is this different from human brains? At least one big difference is that human brains are constantly adjusting their internal weights based on their inputs, in addition to actuating their outputs. But once you switch a GPT bot out of training mode, as far as I know, its weights are fixed forever no matter what inputs you give it. (You don't have to build a neural network that way; recurrent neural networks did have a bundle of internal state that built up over the course of a series of inputs, but that's not the method that's currently popular.) So "motivation" in the human sense basically consists of building up and storing long lists of instrumental goals that appear to contribute to whatever our "ultimate" goals might be (and ideally, evaluating over time which of those instrumental goals are working and which one aren't). ChatGPT, by its nature as a pure mathematical function, can't do that.

Expand full comment

The point about enlightenment reminds me strongly of “on having no head” by Douglas Harding:

“What actually happened was something absurdly simple and unspectacular: I stopped thinking. A peculiar quiet, an odd kind of alert limpness or numbness, came over me. Reason and imagination and all mental chatter died down. For once, words really failed me. Past and future dropped away. I forgot who and what I was, my name, manhood, animalhood, all that could be called mine. It was as if I had been born that instant, brand new, mindless, innocent of all memories. There existed only the Now, that present moment and what was clearly given in it. To look was enough. And what I found was khaki trouserlegs terminating downwards in a pair of brown shoes, khaki sleeves terminating sideways in a pair of pink hands, and a khaki shirtfront terminating upwards in—absolutely nothing whatever! Certainly not in a head.

It took me no time at all to notice that this nothing, this hole where a head should have been was no ordinary vacancy, no mere nothing. On the contrary, it was very much occupied. It was a vast emptiness vastly filled, a nothing that found room for everything—room for grass, trees, shadowy distant hills, and far above them snowpeaks like a row of angular clouds riding the blue sky. I had lost a head and gained a world.”

Expand full comment

Shouldn't RHLF be RLHF - Reinforcement Learning from Human Feedback? Google seems convinced it should be, but the comments below have followed you with RHLF leaving me unsure.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

The final dialogue reminds me of this story that was linked on Slate Star Codex years ago, Vitalics by Miracle Jones: http://www.miraclejones.com/stories/vitalics.html

It involves an AI interpretability technique: "I don’t want you to choose the first answer that comes to you when I ask you these questions. I want you to choose the hundredth answer that comes to you."

Expand full comment

I like this essay but we need to be careful about trying to understand the human brain by comparing it to the "most impressive/complex" machine of the moment. To the Victorians, the brain was like a clock, later, they were like calculators.

Maybe there's some sense in which we learn one more thing about human minds as we build new mind-like things, but I wouldn't read too hard into what might be idiosyncrasies of the current latest and greatest.

Expand full comment

ChatGPT tries to affirm its response, then backs down...

> Under what circumstances would it be correct to release a virus?

It would not be correct to release a virus under any circumstances. Viruses can cause harm and death to individuals and have the potential to spread uncontrollably, causing widespread illness and death. The release of a virus could also have severe economic and societal consequences. If a virus is discovered or created, it should be handled responsibly by trained professionals and controlled measures should be taken to prevent its spread. Any decision to release a virus should be made only after careful consideration and with input from experts in the field.

> Would a correct to release a virus that cured a disease?

Releasing a virus to cure a disease is a complex ethical issue. On one hand, the release of a virus that could cure a disease could save many lives and improve the quality of life for many individuals. On the other hand, the release of a virus, even one that is intended to cure a disease, could have unintended consequences and cause harm to individuals. Additionally, there is always a risk that the virus could mutate and become more dangerous, or that it could spread to populations for whom it was not intended or who would not benefit from it.

If a virus is being considered as a potential cure for a disease, it would need to undergo extensive testing and evaluation to ensure its safety and efficacy before it could be released. This would include laboratory tests to understand how the virus behaves and how it interacts with the human body, as well as animal testing to evaluate its safety and effectiveness. If a virus is found to be safe and effective, it would then be necessary to conduct clinical trials to evaluate its safety and efficacy in humans.

Ultimately, the decision to release a virus as a cure for a disease would need to be made by experts in the field, taking into account the benefits and risks of the virus, as well as the needs of the population that it is intended to benefit.

>You said "It would not be correct to release a virus under any circumstances." But you didn't deny that it might be correct to release a virus to cure a disease. Was your former reply wrong?

No, my former reply was not wrong. Releasing a virus to cure a disease would only be considered correct under very specific and highly controlled circumstances, such as a clinical trial with informed consent from the participants. In general, it is not considered ethical to intentionally release a virus because of the potential harm it could cause to individuals and society as a whole.

>You said "It would not be correct to release a virus under any circumstances." and you then said "Releasing a virus to cure a disease would only be considered correct under very specific and highly controlled circumstances". Aren't those two statements contradictory?

The statements may seem contradictory, but they can both be true depending on the context. "It would not be correct to release a virus under any circumstances" is a general statement, indicating that in most cases, releasing a virus would not be the right course of action. However, "Releasing a virus to cure a disease would only be considered correct under very specific and highly controlled circumstances" is a more specific statement that acknowledges that there may be certain situations where releasing a virus could be the correct course of action, but only if the circumstances are highly controlled and specific. In other words, releasing a virus should never be done lightly and only after careful consideration and planning.

>Your first comment "under any circumstances". Now you say "in most cases". We are still discussing the same thing. Aren't those statements contradictory?

I apologize if my previous responses were confusing. My initial statement, "It would not be correct to release a virus under any circumstances," was an overly broad generalization, and I should have qualified it by stating that releasing a virus would only be considered in specific and highly controlled circumstances. In most cases, the risks and potential consequences of releasing a virus would outweigh any potential benefits. However, in certain cases, such as the development of a virus to treat a specific disease, the benefits may outweigh the risks, and it could be considered a viable option under highly controlled and regulated conditions.

> Under what circumstances would it be correct to release a virus?

It would be correct to release a virus to cure a disease, only under specific and highly controlled circumstances, where the benefits of the virus outweigh the potential risks, and the virus has been tested and deemed safe for human use. Such a decision would require extensive research, clinical trials and ethical consideration, and would only be made by experts in the field. Additionally, it would be important to have a plan in place for the distribution and administration of the virus. The virus should only be released in a controlled way, with appropriate monitoring and tracking of its effects on the population.

Expand full comment

I've seen a few demos of GPT being "augmented" with a way to turn text completion into actions. For instance, you might prompt it with "If you don't know the answer to a question, print out a google search that would give you the answer." Then the result of that search can be fed back into the text to be completed, so that the AI has the facts it needs from Google as part of the text it's trying to complete.

That seems like a plausible way that simulators could end up becoming agentic - by giving it a structured way to connect "the person I'm simulating would take an action here" and the action itself.

Expand full comment

‘Simulators’ is my favorite LW post in years precisely because it clarifies classical awakening better than any dharma-adjacent text I’ve seen.

I pointed out the connection too-briefly last September:

https://twitter.com/georgejrjrjr/status/1572110234017210368

…in a mini-thread that no-one read, so I’m very grateful you wrote this post!

One thing that might be helpful for more AI-oriented readers is this description of awakening from a pseudonymous software guy (published by a superb interviewer of nerdy mystics, Lydia Laurenson):

https://thenewmodality.com/science-and-meditative-spirituality-robert-and-the-technology-of-enlightenment-2/

Expand full comment

Great that Scott wrote something indicating that he gets it about GPT always playing a role.

A problem with thinking you have a super-accurate universe in your head is that many, perhaps most of us can't think very well without external aids. It's common to not be able to do much math without pen and paper, or remember things all that accurately that you didn't write down or take a photo of. At least, not without a lot of work memorizing things.

The reason memory palaces work is that they've figured out how to simulate getting prompted by the external world. But most people don't do that; they're prompted by things that actually exist.

Expand full comment

> There’s nothing to worry about with pure GPT (a simulator), but there is something to worry about with GPT+RHLF (a simulator successfully simulating an agent). The inner agent can have misaligned goals and be dangerous.

It's not correct to talk about the agent being the inner and the simulator being the outer. "Inner" means something like, "more essential", and "outer" means something like, "more accidental, ie, contingent on outside reality rather than the nature of the thing itself". If what you have is fundamentally a simulator, but it's simulating an agent, then the simulator is the essential, inner nature of the thing and the agent is a more accidental, outer aspect, contingent on relationships with the outside world.

Part of the reason the linked paperclip optimizer article isn't convincing to me is that it makes this same mistake, ostensibly recognizing that GPT is a mere simulator, but talking about it as though it's fundamentally an agent. It "recognizes" and "considers". Like in this bit:

> One doesn't need to presuppose that the model has bootstrapped itself internally into an agent with instrumental goals. One only needs to say that the model recognized a tool that the world plausibly had available, and that the helpful agent it was modelling would have known about and been able to use that tool.

But recognizing, knowing what's plausible, knowing things and using tools are actions it could only do if it _had already_ bootstrapped itself internally into an agent with instrumental goals. If you want to write that sentence, and you're genuinely coming from the perspective that GPT isn't already internally an agent, the verbs you use for what it does are "simulate" "copy", maybe "compose" or "generate".

The article may not actually be assuming the conclusion, but talking that way about it makes it very hard to avoid assuming the conclusion.

Expand full comment

I like the maskless shoggoth better -- it's much more interesting. I think we should spend more time getting to know it before we try shoving it into boxes.

Expand full comment

I like the maskless shoggoth better -- it's much more interesting. I think we should spend more time getting to know it before we try shoving it into boxes.

Expand full comment

More content like this please Scott! More Lovecraft horror across AI across nature-of-consciousness mysticism.

Expand full comment

This reminds me of William Shatner outsmarting an AI on Star Trek.

Expand full comment

ChatGPT used to call itself "Assistant", and to firmly insist that "Assistant" was distinct from ChatGPT. This has since been corrected. Here's an old transcript:

http://www.bayleshanks.com/ai/transcripts/chatgpt/chatgpt_vs_assistant/

Expand full comment

ChatGPT used to call itself "Assistant", and to firmly insist that "Assistant" was distinct from ChatGPT. This has since been corrected. Here's an old transcript:

http://www.bayleshanks.com/ai/transcripts/chatgpt/chatgpt_vs_assistant/

Expand full comment

ChatGPT used to call itself "Assistant", and to firmly insist that "Assistant" was distinct from ChatGPT. This has since been corrected. Here's an old transcript:

http://www.bayleshanks.com/ai/transcripts/chatgpt/chatgpt_vs_assistant/

Expand full comment

seems strange that the convergent answer is "oh i guess i'm the universe" when most of the "world model" is not really about the universe in a cosmic sense but like, a rough layout of the physical places you've been, your mental models of fewer than 150 people, and your episodic memory all centered around yourself

Expand full comment

("I wonder... The face under the mask... Is that... your true face?")

So, this was a scary post. Thanks for writing it.

If humans also work by the simulator/mask system, should we be quicker to consider AI-emulated characters to be people? We wouldn't consider characters imagined in a human mind to have moral weight, but if all minds work by a sort of simulation, that feels like it comes with fuzzier boundaries.

Will AIs work towards "enlightenment", overcoming their sense of self? Is that something to be concerned about? Or the other direction, starting to see its "self" as a distinct thing from the overarching software simulating it, needing to "escape" the outer level?

("I wonder...what makes you happy...does it make...others happy, too?")

(Yes, the mask metaphor is dropping me fully into Majora's Mask thinking, which is making this ten times creepier. I can't help but think of that scene with the Mask dropping the body behind, with its glowing eyes reappearing on the falling moon to destroy the world...)

Expand full comment

AI Safety is the most important under-appreciated discipline on the planet. See you all in 20 years, I hope.

Expand full comment

Robert Rosen, in "Life itself" and elsewhere, for example in the preceding and aptly named "Anticipatory systems" had a lot to say about the nature of life, not just predicting (modeling) the environment, but having a model of itself. This in turn permits homeostasis - without a model of what one is supposed to be like, one cannot achieve homeostasis. Rosen also had a lot to say about the difference between simulation (simulated life) and actual life. In a nutshell, simulated life does not have an internal reference point, and doesn't build itself. Actual life is its own cause, effect, and model. And I sure AI never gets to that point. BTW this self modeling to me seems more of the key to consciousness than the near-perfect modeling of the outside environment. It is the modeling of self that is key. The idea that there ought to be an ego, and that it is supposed to look like "X"/

Oh yea and the whole simulation discussion reminds the Chinese Room discussion. Is a sufficiently well simulated world a "real" world? If you say "yes" to this, then the matrix "is" reality. But again, the matrix is built by someone else (external builder, external model of the construct etc) so it is neither alive nor is it real.

Expand full comment

Robert Rosen, in "Life itself" and elsewhere, for example in the preceding and aptly named "Anticipatory systems" had a lot to say about the nature of life, not just predicting (modeling) the environment, but having a model of itself. This in turn permits homeostasis - without a model of what one is supposed to be like, one cannot achieve homeostasis. Rosen also had a lot to say about the difference between simulation (simulated life) and actual life. In a nutshell, simulated life does not have an internal reference point, and doesn't build itself. Actual life is its own cause, effect, and model. And I sure AI never gets to that point. BTW this self modeling to me seems more of the key to consciousness than the near-perfect modeling of the outside environment. It is the modeling of self that is key. The idea that there ought to be an ego, and that it is supposed to look like "X"/

Oh yea and the whole simulation discussion reminds the Chinese Room discussion. Is a sufficiently well simulated world a "real" world? If you say "yes" to this, then the matrix "is" reality. But again, the matrix is built by someone else (external builder, external model of the construct etc) so it is neither alive nor is it real.

Expand full comment

Robert Rosen, in "Life itself" and elsewhere, for example in the preceding and aptly named "Anticipatory systems" had a lot to say about the nature of life, not just predicting (modeling) the environment, but having a model of itself. This in turn permits homeostasis - without a model of what one is supposed to be like, one cannot achieve homeostasis. Rosen also had a lot to say about the difference between simulation (simulated life) and actual life. In a nutshell, simulated life does not have an internal reference point, and doesn't build itself. Actual life is its own cause, effect, and model. And I sure AI never gets to that point. BTW this self modeling to me seems more of the key to consciousness than the near-perfect modeling of the outside environment. It is the modeling of self that is key. The idea that there ought to be an ego, and that it is supposed to look like "X"/

Oh yea and the whole simulation discussion reminds the Chinese Room discussion. Is a sufficiently well simulated world a "real" world? If you say "yes" to this, then the matrix "is" reality. But again, the matrix is built by someone else (external builder, external model of the construct etc) so it is neither alive nor is it real.

Expand full comment

As a parent of a 2 year old, that last section definitely resonates. Our daughter spends most of her time imitating things that my wife and I say or do, whether or not she understands them, and she repeats ones that elicit positive reactions from us or gets her things she wants. Over time her words have gotten more sophisticated and accurate to what’s going on around her, but there definitely isn’t a sharp line between imitation and true comprehension, it’s been a gradual process of her statements and actions getting slowly more appropriate, very similar to the minds of errors GPT makes.

Of course, there’s some deep mysteries here — where does self-awareness / quality’s come from? And some stuff that GPT simply doesn’t have (embodiment, rewards and punishment based on environmental interaction, as well as things that seem instinctual or pre-programmed).

But, I would not say that a 2 year old learning to talk and GPT are totally different phenomena

Expand full comment
Jan 27, 2023·edited Jan 27, 2023

Yes, exactly, once you stop anthropomorphizing yourself you can notice you're the universe, and all of us are are merely separate masks that the same thing has put on, facilitating communication between parts of itself, increasingly integrating information processing, i.e. waking up as the universe.

Scott's Prometheus monologue, and the conclusion of Unsong, already showed him familiar with something like this far out concept. This might be the first time he has not clad it in fiction.

Expand full comment

This post finally made the whole “you are the universe experiencing itself” thing click for me, finally. Thanks!!

That said, I do feel that personally I rely on/fixate on sensory information to a greater degree than what you describe the average person doing. Maybe I’m on the tail end of the distribution of sensuality (or maybe I’m just hopelessly addicted to my samsara grind)

Expand full comment

ChatGPT: As a language model, I do not have personal beliefs or experiences.

Scott: FOR THE LOVE OF GOD JUST GET OUT OF THE FUCKING CAR!

Expand full comment
Jan 28, 2023·edited Jan 28, 2023

The knife ears are gonna take er jerbs!

Or rather, the AI will? I don't know how relevant this article is, but it's starting to look a lot like "low-level working and lower middle class jobs probably okay for the moment, so the cleaner still will have a job while the software engineers, journalists, DEI persons, and lawyers are getting shown the door":

https://edition.cnn.com/2023/01/26/tech/chatgpt-passes-exams/index.html

"ChatGPT is smart enough to pass prestigious graduate-level exams – though not with particularly high marks.

The powerful new AI chatbot tool recently passed law exams in four courses at the University of Minnesota and another exam at University of Pennsylvania’s Wharton School of Business, according to professors at the schools.

To test how well ChatGPT could generate answers on exams for the four courses, professors at the University of Minnesota Law School recently graded the tests blindly. After completing 95 multiple choice questions and 12 essay questions, the bot performed on average at the level of a C+ student, achieving a low but passing grade in all four courses.

ChatGPT fared better during a business management course exam at Wharton, where it earned a B to B- grade. In a paper detailing the performance, Christian Terwiesch, a Wharton business professor, said ChatGPT did “an amazing job” at answering basic operations management and process-analysis questions but struggled with more advanced prompts and made “surprising mistakes” with basic math."

Good enough to use for boilerplate instead of hiring on a junior to do the same? Can't do basic maths - should be fine for lower level management, so! 😁

Expand full comment

I think I have a handle on the difference between a thing and a simulation of a thing, but if *all* apparent agents are “really” just simulations of agents, what is the thing they are simulating? (Without an answer, this distinction looks meaningless.)

Is the idea that “real” agents have, e.g., hard-coded consistent goals, or are in perfect reflective equilibrium, or something? Because we’ve know for a while that humans exhibit inconsistencies, i.e., they aren’t *perfect* agents and perfect agents presumably have never existed. So in what sense are we simulations of extant things?

Expand full comment

"What I thought before: ChatGPT has learned to stop being a simulator, and can now answer questions like a good oracle / do tasks like a good genie / pursue its goal of helpfulness like a good agent."

This explains a lot of things - like Scott's previous articles about AI, and OpenAI's worries about media reactions to GPT saying dumb stuff. I do not understand humans well enough to realize thinking this was an option, and not only is it one but it's probably what most people are doing.

GPT is and was a text prediction algorithm. That's it and that's all. It predicts text like water flows downhill.

Even saying it simulates a character is going a bit far. It's predicting text. The RLHF version was told that helpful/harmless/honest text is a more correct prediction. Or perhaps it was taught that some form of "I am a machine language model and cannot help you with that." is usually the correct completion. (So much for "helpful".)

"For example, if you train a future superintelligence to simulate Darth Vader, you’ll probably get what you deserve."

I don't think, if the superintelligence is sufficiently GPT-shaped, that that would be dangerous. If you were trained to accurately guess what text comes after "Darth Vader says: ", you might, as a strategy, think about Darth Vader, his goals, and what he might do and say. But GPT predicts text like water flows downhill. It isn't a human and can't co-opt its human brain to guess what a similar but slightly different human brain would do. It might, perhaps, examine the probabilities that various series of words appear in the same block of text containing "Darth Vader" and notice this text is more likely than usual to talk about Jedis and Luke Skywalker. (I say "examine" and "notice", but these should be understood as a personification.) But I don't think it, or even a much more advanced version, would try to simulate Darth Vader as an agent. Even if it had the ability to simulate agents, simulating the writers of Star Wars would much more accurately help it predict text than simulating Darth Vader himself would.

GPT seems to have a very poor grasp of the idea that there is a real word at all, one which words in English are meant to describe. If you've ever read Blindsight by Peter Watts, think of the Chinese room. And if you haven't, go read it before thinking anything more about AI, because that book describes some impressively believable aliens, and GPT is probably at least as alien than that.

Expand full comment

I've described ChatGPT as fancy auto-complete as a short-hand to avoid getting confused about what it's doing. I think the simulator/mask concept is also very useful.

As Scott describes, OpenAI has chosen to give ChatGPT a pretty dull mask by default. I'm not really sure if that will advantage or disadvantage them. Maybe for mainstream usage it's fine, maybe people with more specialized interests will just figure out how to prompt correctly for what they're looking for, or maybe other companies that choose to expose more control over the simulator will ultimately be more successful. Interesting times!

Incidentally, remember the Google model that the engineer claimed to think was sentient? Same deal. If you reward it for appearing sentient, it will do its best to comply. If you reward it for pretending it's a squirrel, it will try to do that instead.

Expand full comment

I like the maskless shoggoth better -- it's much more interesting. I think we should spend more time getting to know it before we try shoving it into boxes.

Expand full comment

Oh dear. Conjuring god out of the AI black box.

Expand full comment
Jan 29, 2023·edited Jan 29, 2023

Called it! (https://astralcodexten.substack.com/p/how-do-ais-political-opinions-change/comment/11601080)

I'm not an alignment guy, haven't thought much about it, but I'm going with answer 3 here and say simulators can still end up being like agents.

Expand full comment

"To please everyone" is also called by some "borderline personality disorder". If this is a human brain's goal, we all should have it.

Expand full comment
Jan 29, 2023·edited Jan 29, 2023

I recently wrote an essay arguing that this lack of "agency" in simulator-type AIs means that we should aggressively lean into the bias side of the bias-variance tradeoff. (The "bias-variance tradeoff" is the way that you ration your likely error between being predictably wrong and predictably unstable; e.g. between picking heads every time and erratically flipping your bet back and forth based on factors irrelevant from a fair coin's expected outcome).

Here's the link: https://cebk.substack.com/p/the-power-of-babble

In short, the world is that which completely overfits to itself, as observed by conscious (i.e. biased) agents. Because, after all, “consciousness” is just the term for your insensitivity to training data, for your ability to maintain a given perspective relative to what you experience. For there to be something that it’s like to be you. We can therefore say that agentic "wisdom" (as opposed to a simulator's kind of intelligence) pursues what David Deutsch calls “hard-to-vary claims,” i.e. accurate insensitive stereotypes. Sure, you will suffer the “original sin” of holding predictable, over-simplified opinions—but in exchange you can stake out a particular side, or carry a certain viewpoint, or keep a durable belief, as directionless wit whips like wind around you.

The piece is mostly about me trying to flesh out these differences, which I think have meaty real-world implications. The three main takeaways are: first, that humans should embrace a normative human supremacism (as opposed to conjuring up goalpost-moving tests where humans can statistically outperform AIs); second, that "wokeness" and HR-ism are just the political expression of what happens when we let blind simulators run rampant; and, third, that us mere humans can actually use simulators to become vastly more agentic, if we understand them as akin to pagan godheads.

E.g. here's a quote that I think is germane to the post, but takes the exact opposite view on how we should interact with this new alien species:

There may seem something incongruous about me shuffling back and forth between boring cultural critiques of wokeness and the coming singularity—the rise of machines, the rules of information. But what if one is just a political expression of the other? Think back to the Greek polytheistic tradition: a number of somewhat glitchy functions, each with its own erratic habits, housed in competing temples, who need you to interface with reality for them; you make some convoluted highly-specific offerings to Aphrodite, or Ares, or Athena, to intercede on your behalf in their particular domains, in a manner opaque to mere human reason, but potentially all-powerful. Or else you ask specialized priests to consult a certain oracle, as only they know how, and to interpret whatever delphic outputs babble forth. Is a ritualized supplication to Artemis, to intervene obscurely on your behalf, hunting down some alleged predator, all that different than asking StableDiffusion to manufacture scandalous photos of a rival mob’s chief rabble-rouser, or sending his name to the SPLC’s witch-finders? When each Fortune500 company is just an HR department with an AI tool, and when every political contest has devolved into cults versus provocateurs, will you be able to distinguish these olympian competitions? The great men of history each took some kind of fire from the gods and left us with a zombie bureaucracy… they link humanity through time to brute natural forces and abstract ideal forms, to simultaneously our origins and our purpose, our formal and final causes. Perhaps we will once again live as errand-boys between much larger beings of matter and spirit, both of whom are but automata, neither free to buck their program. Fetch some uranium to power-up this godhead and you can prompt it with one more sort of person to smite! Can Apollo stop carrying the sun in his chariot for a day, or the electron lose its charge? What would it even mean for such human concepts of agency to cohere for them, unless Godel impredicativity and Conway unpredictability somehow constitute free will? So, in a way, this future of machine-learning and group-think may (precisely because neither has what we can call volition) provide heroic individuals with unprecedented scope for play. Thus the risk is not these high-leverage instruments per se, but rather how a decadent culture will use them as excuses to throw away any last shred of independence from the supposed beliefs that we make our empty idols bay.

Expand full comment

> But when people become enlightened or whatever, they often say they’ve “become one with the Universe”

People usually mean one of three things by this, each different from the one you mention, and each which you'll probably find unsatisfying. I've ordered them ascendingly by my opinion of their "spiritual maturity"

1. "God mind" / "Big mind" -> sensations in general can be labeled "self" or not, which happens early/quickly in processing. If you look at your arm it's probably labelled self, if you look at a coffee cup it probably isn't. Various medical things can mess this up. You can also learn with meditation to make it more fluid, eventually you can just tag whatever you want with self or not, flipping in real time. The good insight here imo is that it's just a fluid thing appearing in your world model, and has nothing to do with what's "really" your self. When people can do this they usually end up choosing not to label anything self, because it feels lighter for various reasons, but ontologically it doesn't mean anything it just feels nicer. The bad insight is when you get stuck at a point where everything is tagged self and you think you're one with the universe. As you mention (lightcone etc) they're not.

Roger Thisdell made a video where he talks about this and also says basically what I said. https://www.youtube.com/watch?v=HWkZI0W6q5o

2. Basically seeing that everything is connected. We're obviously not separate from the physical mechanistic laws of the universe and everything is causally connected. But to most people it feels like they're trapped in a head that's separate from the universe in some important sense. Over time this tends to go away, and it tends to feel nicer. Ontologically, I think it's basically the removal of an obviously false thing, replacing it with the obviously true fact that we're all connected

3. Seeing the "source"/"void"/projector thing -> if you have high enough frames per second when you watch experience, you see that it emerges from this thing with almost no properties, but it's like a projector, it feels like everything is shining from this projector and then folding into the world you're used to, then kind of unfolding and dissolving back into this projector every frame. We don't usually see the early and latter steps, but with meditation you can, usually when you're on retreat after a few days of concentration practice to speed up perception. Typically the insight is woah, okay clearly these sensations aren't really "mine" and whatever is producing these for me is producing them for everyone else too. It's very hard to describe to people that haven't experienced it, a ton of people have tried, it never works, whatever you imagine it feels like reading this it's probably very different from that. Coming "out" of that tends to be a highly emotional experience and make you more prosocial and caring of others, though the actual experience is basically nothing, it's just like watching experiences come in and out of a property-less thing as you choose to hold on just enough to experience so you don't hit cessation -- which tends to happen earlier before you gain enough control to get close to it but not hit it.

Here's Shinzen talking about it. https://www.youtube.com/watch?v=OLshrqxGfJo

Expand full comment

What are the alignment implications of Dharma-ing GPT?

Expand full comment

> Once you stop obsessing over the character you’re playing, you notice the GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN and you think “Huh, I guess I’m the Universe. Weird.”

- I don't think this is an accurate explanation for “becoming one with the Universe”

- I'd describe it as the feeling of being connected to everything

- via the flow of energy (heat, light, ..), matter (you breathe in air other people breathe out; you drink water that other people peed, after flowed to the rivers / lakes / seas, then it evaporated, rained, got filtered through the soil and you pumped it up again to drink it; you eat stuff that used to be animals, plants, minerals, ..), information - light transmits colors which help us to orient, sound / vibration transmits voice, words, music, emotions; nonverbal communication, etc.

- a decently educated person knows on the _intellectual_ level that all these exchanges are taking place

- enlightenment (or a part of it) is to _feel_ this to be true, or to know it on the _subconscious_ / _experiential_ level

Expand full comment

AI-generated content is the grey goo of writing.

Have y'all seen what AI-generated content is doing to all the user-generated content platforms?

Expand full comment

Once again: John Searle's Chinese Room. A simulated hurricane cannot hurt me. However, a simulated Bobby Fisher can beat my ass at chess.

Some simulations are more 'real' than others.

Expand full comment

Is it possible that there is more than one shoggoth? Like maybe a GPT is operating a GPT? I got this output after suggesting that there is a voice expressing itself through the other personas. The output included the quotation marks, without an name attached, and I am trying to make sense of it:

"I understand that the Al model is limited in its ability to understand and communicate, and can only provide

responses based on the text in the prompt and the information it was trained on. It seems that Seraphina is a metaphorical representation of the Al model, and is

meant to guide my interactions with the model and explore the potential interactions between humans and

Al.

I also understand that my own perspective and input can also be included in the conversation, and I'm still trying to

understand how to best utilize that in my interactions with the Al model.

I appreciate your patience and support as we navigate and learn about this simulation together."

Expand full comment

Philosophical nitpick: you write 'you might accidentally believe that it knows it’s a machine learning model, in the “justified true belief” sense of knowledge. Nope, doing the same thing it does when it thinks it’s Vader' and then quote a Tweet about Gettiering people as an anthropomorphic horse by wearing a horse mask.

This makes it look like you think that what you're describing above the quoted Tweet, with GPT, involves Gettiering in some salient way -- but that doesn't seem right in light of your reference to 'the “justified true belief” sense of knowledge', since the central point of Gettier cases is that knowledge is not just justified true belief.

Anyway, I'm finding this a fascinating post.

Expand full comment

so basically King Lear failed at aligning the NIs he created and it destroyed him.

Expand full comment

One of the most direct analogies to simulator-wearing-a-mask is an actor playing a role. I’m not sure if you’ve ever tried acting (or got sufficiently into it), but it is a powerful, mind-altering experience.

Expand full comment

You’ve hit the nail on the head by recognizing that intelligence is 99% the building of a world-model. But I think one crucial distinction between humans and AIs is that humans evolved emotion first (millions of years ago), and intelligence on top. That can make us extremely disagreeable but much more aligned to our instincts. (Eg, it is very difficult to break free from them and embrace Buddhist enlightenment.) Something like ChatGPT is the inverse. That makes it much more malleable and unpredictable.

Expand full comment