A Guide To Asking Robots To Design Stained…

May 30, 2022

232

305

...

Read →

305 Comments

Metacelsus

De Novo

May 30, 2022·edited May 30, 2022

Amazing images! But there's one problem: Tycho Brahe didn't use a telescope.

https://www.pas.rochester.edu/~blackman/ast104/brahe10.html

Expand full comment

Reply (4)

Stephen Saperstein Frug

Attempts (2.0)

May 30, 2022

Maybe DALL-E is more subtle than you think, and is trying to be accurate to period style when it makes the figures in stained glass windows look nothing like the people they're supposed to portray?

Expand full comment

gwern

Gwern.net Newsletter

May 30, 2022

You can repair the Reverend's head with 'uncropping', expanding the image upwards. Examples: https://www.reddit.com/r/dalle2/search?q=flair%3AUncrop&restrict_sr=on

Expand full comment

dbmag9

May 30, 2022

Am I the only one for whom 'metal nose', and not 'pet moose', was the defining trait of Tycho Brahe?

Expand full comment

Reply (8)

Maybe later

May 30, 2022

Off-topic, yet topical enough I don't want to put this in the off-topic thread: Matt Strassler has been doing a bunch of “how to discover/prove basic astronomy facts for yourself” recently: https://profmattstrassler.com/2022/02/11/why-simple-explanations-of-established-facts-have-value/

Expand full comment

Reply (2)

TGGP

May 30, 2022

Here's my idea: Alexandra Elbakyan standing on the shoulders of the Montgolfier brothers, who are themselves standing on the shoulders of Thomas Bayes, who is standing on the shoulders of Tycho Brahe, who is standing on the shoulders of William of Ockham. Yes, I know DALL-E wouldn't want to stack so high as it was already cutting off heads. So I might as well have Ockham standing on a giant turtle.

Expand full comment

Reply (2)

MurrayTDTS

May 30, 2022

The AI seems fuzzy on what, exactly, a telescope is used for. Most of the time Tycho seems to be trying to breathe through it, or lick it, or trying to stick it up his nose; even when he is looking through the eyepiece, as often as not he's just staring at the ground. I dunno, maybe the AI heard that story about the drunken moose and figured that Tycho himself was typically fully in the bag

Expand full comment

Reply (10)

bakkot

May 30, 2022

Would love to see this with Imagen, Google's even-newer image synthesizer.(No public demo though, alas.) In the examples we've seen it does a much better job of mapping adjectives to their corresponding noun instead of just trying to apply all the adjectives to all of the nouns, which is the main failure going on here.

Expand full comment

Zach Stein-Perlman

Not Optional

May 30, 2022

Re faces, OpenAI says:

> Preventing Harmful Generations

> We’ve limited the ability for DALL·E 2 to generate violent, hate, or adult images. By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to these concepts. We also used advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures.

Expand full comment

Reply (2)

Stephen Pimentel

May 30, 2022

> I’m not going to make the mistake of saying these problems are inherent to AI art. My guess is a slightly better language model would solve most of them ...

What if the problem is more subtle than either of those two alternatives? What if the mapping between language prompts and 'good' pictures is itself quite fuzzy, such that different people will judge pictures rather differently for the same prompt, due to different assumptions and expectations? Don't we encounter such situations all the time, e.g., in a workplace meeting trying to settle on a particular design? Is it not naive to assume that there are objectively 'best' outputs, and we just need a better model to get them? What if I thought a particular picture was excellent, and you said, "No, no, that's not what I meant?"

Expand full comment

Reply (3)

fortenforge

fortenforge’s Substack

May 30, 2022

Curious what you're planning on depicting for the other six virtues

Expand full comment

Vitor

May 30, 2022

> These are the sorts of problems I expect to go away with a few months of future research.

Why are you so confident in this? The inability of systems like DALL-E to understand semantics in ways requiring an actual internal world model strikes me as the very heart of the issue. We can also see this exact failure mode in the language models themselves. They only produce good results when the human asks for something vague with lots of room for interpretation, like poetry or fanciful stories without much internal logic or continuity.

Not to toot my own horn, but two years ago you were naively saying we'd have GPT-like models scaled up several orders of magnitude (100T parameters) right about now (https://slatestarcodex.com/2020/06/10/the-obligatory-gpt-3-post/#comment-912798).

I'm registering my prediction that you're being equally naive now. Truly solving this issue seems AI-complete to me. I'm willing to bet on this (ideas on operationalization welcome).

Expand full comment

Is DALL-E giving extra weight to its own previous results based on the similar input phrases? How much time elapsed between these queries?

I laughed until I cried. Fighting off an impulse to re-caption some of them.

Expand full comment

Reply (2)

Victualis

May 30, 2022

There is some indication over on the subreddit that adding noise to the prompt can sometimes avoid bad attractors. For instance, a few typos (or even fully garbled text) can improve the output. It seems important to avoid the large basins near existing low quality holiday photos, people learning to paint, and earnest fan illustrations. Maybe Dall-E associates some kinds of diction with regions messed up by OpenAI's content restrictions, or mild dyslexia with wild creativity. In comparison the early images from Imagen seem crisper, more coherent, but generally displaying a lack of that special quality which Dall-E 2 sometimes displays, which seems close to what we call "talent" in human artists. Thanks for the funny and insightful essay.

Expand full comment

Thomas L. Hutcheson

Radical Centrist

May 30, 2022

William of Ockham never himself used the image of a razor; that's a modern metaphor and would be be inappropriate for depiction in the stained glass image. And few people would know who Brahe is even with the moose, so leave it out.

Expand full comment

Reply (1)

c1ue

May 30, 2022

Great set of experiments and writeup.

What it really looks like is that the author is praying at the altar of a very uncaring god or gods, and getting a bunch of vague prophetic crap.

Expand full comment

Chris

May 30, 2022

Other Dali use the words “in the style of” as part of the cue instead of just sticking a comma between the content and style parts, does that make a difference?

Previous work in image stylization has used a more explicit separation between content and style, which would help here. I imagine there will be follow-on work with a setup like the following: you plug in your content description which gets churned through the language model to produce “content” latent features, then you provide it with n images that get fed into a network to produce latent “style” features, then it fuses them into the final image. Of course then you potentially would have a more explicit problem with copyright infringement since the source images have no longer been laundered through the training process but maybe that’s fairer to the source artists anyways.

Expand full comment

Reply (2)

John K.

May 30, 2022

Since it seems to get hung up on the stained glass window style, try getting the image you want without the style, and use neural style transfer to convert it to stained glass.

Expand full comment

Reply (1)

Cups and Mugs

May 30, 2022

It appears the goals of 'AI images', 'natural text' to search/generate using single strings of text, and 'actually useful' user interfaces are in conflict. The idea that you'd have an art generator where the art style, subject, colours, etc. are not discreet fields and ignores all the existing rules and databases used to search for art is a bad approach to making a useful AI art generator.

I'd be more interested if they ignored the middle part of 'single string of text' and focused more on the image AI. They are perhaps trying to solve too many problems at once with AI text being a very difficult problem on its own - that said it pulled random images which are probably not well categorised as a datasource, so I'm sure they hit various limitations as well.

I would think using an image focused AI to generate categories might be an interesting approach drawing directly from the images rather than whatever text is used to describe them on the internet. Existing art databases could be used to train the AI in art styles.

It would even be interested to see what sorts of categories the AI comes up with on its own. While we think of things like Art Nouveau, the AI is clearly thinking 'branded shaving commercials' or 'HP fan art' are a valid art categories. I don't think the shaving ads will show up in Sotheby's auction catalogue as a category anytime soon though.

Perhaps we can see 'Mona Lisa, in art style of shaving advertisement' or 'alexander the great conquest battle in as HP fan art'? 'Napolean Bonopart in the style of steampunk'

Expand full comment

dianna

May 30, 2022·edited May 30, 2022

My best guess about William’s red beard and hair: DALL-E may sort of know that “William (of) Ockham” is Medieval, but apparently no more than that since he’s not given a habit or tonsure (he’s merely bald, sometimes). But he has to be given *some* color of hair, so what to choose??

Well, we know that close to Medieval in concept space is Europe. And what else do we know? We have a name like William, which in the vaguely European region of concept space is close to the Dutch/Germanic names Willem and Wilhelm. And what do we know of the Dutch and Germanic peoples? In the North / West of Europe is the highest concentration of strawberry-blonde hair!

If that’s too much of a stretch, then maybe DALL-E knows some depictions of “William of Orange” and transposed the “Orange” part to “William (of) Ockham’s” head?

Expand full comment

Reply (2)

CptDrMoreno

May 30, 2022

I am personally addicted to generating "uncanny creep" "eldritch horror" and similar prompts using mini dalle.

Literally addicted, it's become an obsession.

https://huggingface.co/spaces/dalle-mini/dalle-mini

Expand full comment

Reply (2)

Bullseye

May 30, 2022

I wonder if you could get a key in the raven's beak if you called it a beak.

Expand full comment

Reply (2)

May 30, 2022

I do wonder how an human artist who got a similar query from an anonymous source would respond (assuming that the artist was willing to go to the trouble etc.)

Expand full comment

Reply (2)

Njnnja

May 30, 2022

This is actually a great example of the challenges with fairness and bias issues with AI/ML. Systems that screen resumes, grant credit (eg Apple card), or even just do marketing have real problems with corpus. Even if standards for past hiring are completely fair, if the system is calibrated on data where kindergarten teachers are 45 year old women and scientists are 35 year old men due to environmental factors, it is incredibly difficult to get the system to see the unbiased standards that are desired. This is a great laymen’s exploration into why that is.

Expand full comment

Evelyn

May 30, 2022

Wasn’t Ada Lovelace not super her name?

Expand full comment

Astral Codex Ten

A Guide To Asking Robots To Design Stained…