305 Comments
May 30, 2022·edited May 30, 2022

Amazing images! But there's one problem: Tycho Brahe didn't use a telescope.

https://www.pas.rochester.edu/~blackman/ast104/brahe10.html

See also: https://en.wikipedia.org/wiki/Tycho_Brahe#Tycho_Brahe's_Instruments

Expand full comment

Maybe DALL-E is more subtle than you think, and is trying to be accurate to period style when it makes the figures in stained glass windows look nothing like the people they're supposed to portray?

Expand full comment

You can repair the Reverend's head with 'uncropping', expanding the image upwards. Examples: https://www.reddit.com/r/dalle2/search?q=flair%3AUncrop&restrict_sr=on

Expand full comment

Am I the only one for whom 'metal nose', and not 'pet moose', was the defining trait of Tycho Brahe?

Expand full comment

Off-topic, yet topical enough I don't want to put this in the off-topic thread: Matt Strassler has been doing a bunch of “how to discover/prove basic astronomy facts for yourself” recently: https://profmattstrassler.com/2022/02/11/why-simple-explanations-of-established-facts-have-value/

Expand full comment

Here's my idea: Alexandra Elbakyan standing on the shoulders of the Montgolfier brothers, who are themselves standing on the shoulders of Thomas Bayes, who is standing on the shoulders of Tycho Brahe, who is standing on the shoulders of William of Ockham. Yes, I know DALL-E wouldn't want to stack so high as it was already cutting off heads. So I might as well have Ockham standing on a giant turtle.

Expand full comment

The AI seems fuzzy on what, exactly, a telescope is used for. Most of the time Tycho seems to be trying to breathe through it, or lick it, or trying to stick it up his nose; even when he is looking through the eyepiece, as often as not he's just staring at the ground. I dunno, maybe the AI heard that story about the drunken moose and figured that Tycho himself was typically fully in the bag

Expand full comment

Would love to see this with Imagen, Google's even-newer image synthesizer.(No public demo though, alas.) In the examples we've seen it does a much better job of mapping adjectives to their corresponding noun instead of just trying to apply all the adjectives to all of the nouns, which is the main failure going on here.

Expand full comment

Re faces, OpenAI says:

> Preventing Harmful Generations

> We’ve limited the ability for DALL·E 2 to generate violent, hate, or adult images. By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to these concepts. We also used advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures.

Expand full comment

> I’m not going to make the mistake of saying these problems are inherent to AI art. My guess is a slightly better language model would solve most of them ...

What if the problem is more subtle than either of those two alternatives? What if the mapping between language prompts and 'good' pictures is itself quite fuzzy, such that different people will judge pictures rather differently for the same prompt, due to different assumptions and expectations? Don't we encounter such situations all the time, e.g., in a workplace meeting trying to settle on a particular design? Is it not naive to assume that there are objectively 'best' outputs, and we just need a better model to get them? What if I thought a particular picture was excellent, and you said, "No, no, that's not what I meant?"

Expand full comment

Curious what you're planning on depicting for the other six virtues

Expand full comment

> These are the sorts of problems I expect to go away with a few months of future research.

Why are you so confident in this? The inability of systems like DALL-E to understand semantics in ways requiring an actual internal world model strikes me as the very heart of the issue. We can also see this exact failure mode in the language models themselves. They only produce good results when the human asks for something vague with lots of room for interpretation, like poetry or fanciful stories without much internal logic or continuity.

Not to toot my own horn, but two years ago you were naively saying we'd have GPT-like models scaled up several orders of magnitude (100T parameters) right about now (https://slatestarcodex.com/2020/06/10/the-obligatory-gpt-3-post/#comment-912798).

I'm registering my prediction that you're being equally naive now. Truly solving this issue seems AI-complete to me. I'm willing to bet on this (ideas on operationalization welcome).

Expand full comment

Is DALL-E giving extra weight to its own previous results based on the similar input phrases? How much time elapsed between these queries?

I laughed until I cried. Fighting off an impulse to re-caption some of them.

Expand full comment

There is some indication over on the subreddit that adding noise to the prompt can sometimes avoid bad attractors. For instance, a few typos (or even fully garbled text) can improve the output. It seems important to avoid the large basins near existing low quality holiday photos, people learning to paint, and earnest fan illustrations. Maybe Dall-E associates some kinds of diction with regions messed up by OpenAI's content restrictions, or mild dyslexia with wild creativity. In comparison the early images from Imagen seem crisper, more coherent, but generally displaying a lack of that special quality which Dall-E 2 sometimes displays, which seems close to what we call "talent" in human artists. Thanks for the funny and insightful essay.

Expand full comment

William of Ockham never himself used the image of a razor; that's a modern metaphor and would be be inappropriate for depiction in the stained glass image. And few people would know who Brahe is even with the moose, so leave it out.

Expand full comment

Great set of experiments and writeup.

What it really looks like is that the author is praying at the altar of a very uncaring god or gods, and getting a bunch of vague prophetic crap.

Expand full comment

Other Dali use the words “in the style of” as part of the cue instead of just sticking a comma between the content and style parts, does that make a difference?

Previous work in image stylization has used a more explicit separation between content and style, which would help here. I imagine there will be follow-on work with a setup like the following: you plug in your content description which gets churned through the language model to produce “content” latent features, then you provide it with n images that get fed into a network to produce latent “style” features, then it fuses them into the final image. Of course then you potentially would have a more explicit problem with copyright infringement since the source images have no longer been laundered through the training process but maybe that’s fairer to the source artists anyways.

Expand full comment

Since it seems to get hung up on the stained glass window style, try getting the image you want without the style, and use neural style transfer to convert it to stained glass.

Expand full comment

It appears the goals of 'AI images', 'natural text' to search/generate using single strings of text, and 'actually useful' user interfaces are in conflict. The idea that you'd have an art generator where the art style, subject, colours, etc. are not discreet fields and ignores all the existing rules and databases used to search for art is a bad approach to making a useful AI art generator.

I'd be more interested if they ignored the middle part of 'single string of text' and focused more on the image AI. They are perhaps trying to solve too many problems at once with AI text being a very difficult problem on its own - that said it pulled random images which are probably not well categorised as a datasource, so I'm sure they hit various limitations as well.

I would think using an image focused AI to generate categories might be an interesting approach drawing directly from the images rather than whatever text is used to describe them on the internet. Existing art databases could be used to train the AI in art styles.

It would even be interested to see what sorts of categories the AI comes up with on its own. While we think of things like Art Nouveau, the AI is clearly thinking 'branded shaving commercials' or 'HP fan art' are a valid art categories. I don't think the shaving ads will show up in Sotheby's auction catalogue as a category anytime soon though.

Perhaps we can see 'Mona Lisa, in art style of shaving advertisement' or 'alexander the great conquest battle in as HP fan art'? 'Napolean Bonopart in the style of steampunk'

Expand full comment
May 30, 2022·edited May 30, 2022

My best guess about William’s red beard and hair: DALL-E may sort of know that “William (of) Ockham” is Medieval, but apparently no more than that since he’s not given a habit or tonsure (he’s merely bald, sometimes). But he has to be given *some* color of hair, so what to choose??

Well, we know that close to Medieval in concept space is Europe. And what else do we know? We have a name like William, which in the vaguely European region of concept space is close to the Dutch/Germanic names Willem and Wilhelm. And what do we know of the Dutch and Germanic peoples? In the North / West of Europe is the highest concentration of strawberry-blonde hair!

If that’s too much of a stretch, then maybe DALL-E knows some depictions of “William of Orange” and transposed the “Orange” part to “William (of) Ockham’s” head?

Expand full comment

I am personally addicted to generating "uncanny creep" "eldritch horror" and similar prompts using mini dalle.

Literally addicted, it's become an obsession.

https://huggingface.co/spaces/dalle-mini/dalle-mini

Expand full comment

I wonder if you could get a key in the raven's beak if you called it a beak.

Expand full comment

I do wonder how an human artist who got a similar query from an anonymous source would respond (assuming that the artist was willing to go to the trouble etc.)

Expand full comment

This is actually a great example of the challenges with fairness and bias issues with AI/ML. Systems that screen resumes, grant credit (eg Apple card), or even just do marketing have real problems with corpus. Even if standards for past hiring are completely fair, if the system is calibrated on data where kindergarten teachers are 45 year old women and scientists are 35 year old men due to environmental factors, it is incredibly difficult to get the system to see the unbiased standards that are desired. This is a great laymen’s exploration into why that is.

Expand full comment

Wasn’t Ada Lovelace not super her name?

Expand full comment

This article of Tycho Brahe (borrowed from another comment, https://www.pas.rochester.edu/~blackman/ast104/brahe10.html) says that from his measurements Brahe concluded that either the earth is the center of the universe, or stars are too far away to accurately measure any Parallax. Then It adds:

"Not for the only time in human thought, a great thinker formulated a pivotal question correctly, but then made the wrong choice of possible answers: Brahe did not believe that the stars could possibly be so far away and so concluded that the Earth was the center of the Universe and that Copernicus was wrong."

What are other times that "a great thinker formulated a pivotal question correctly, but then made the wrong choice"?

Expand full comment

Off-topic to the AI generation, but "What I’d really like is a giant twelve-part panel depicting the Virtues Of Rationality." - I feel that you're not alone in this.

Expand full comment

So, DALL-E can't understand style as opposed to content. This is like very young children who can recognize a red car or red hat, but haven't generalized the idea of red as being an abstraction, a descriptor that can be applied to a broad range of objects. I forget the age, maybe around three or four, that children start to realize that there are nouns AND adjectives, so DALL-E is functioning like a two or three year old. I wonder how well it does with conservation games like peek-a-boo.

P.S. Maybe instead of a Turing test, we need a Piaget test for artificial intelligence.

Expand full comment

My guess is that you are expecting to produce great art right off the bat with a new tool and only a few hours practice with it. Obviously there is a learning curve, as your post demonstrates. Spend a few days with it, and I would assume your results will be spectacularly better.

From what limited exposure to DALL-E 2 I have seen, your assumption about the query "a picture of X in the style of Y” would work to remove the stained glass from the background of the subjects and make the art itself stained glass -- "a picture of Darwin in the style of stained glass."

Perhaps someone will make a new WALL-E interface that include various sliders that work in real time, like the sliders on my phone's portrait mode, allowing me to bump up and down the cartoon effects and filters. So you could make your output more or less "art nouveau" or "stained glass" or whatever parameters you entered in your query.

Someone wanted to make a music video with WALL-E 2 yesterday, but couldn't quite do it. He still got some pretty results however.

https://youtu.be/0fDJXmqdN-A

Expand full comment

For Brahe, have you considered his metal nose as a signifier rather than the moose?

Expand full comment

I think that modern AI has reached a local maximum. Machine learning algorithms, as currently being developed, are not going to learn abstractions like adjectives and prepositions by massaging datasets. They're basically very advanced clustering algorithms that develop Bayesian priors based on analyzing large numbers of carefully described images. A lot of the discussion here recognizes this. Some limits, like understanding the style, as opposed to the content, of an image could be improved with improved labeling, but a lot of things will take more.

Before AI turned to what they called case based reasoning which trained systems using large datasets and statistical correlation, it took what seemed to be a more rational approach to understanding the real world. One of the big ideas involved "frames", that is, stylized real world descriptions, ontologies, and the idea was that machines would learn to fill in the frames and then reason about them. Each object in a scene, for example, would have a color, a quantity, a geometry, a size, a judgement, an age, a function, component objects and so on, so the descriptors of an object would have specific slots to be filled. A lot of this was inspired by the 19th century formalization of record keeping, and a lot of this came from linguistics which recognized that words had roles and weightings. There's a reason "seven green dragons" is quite different from "green seven dragons" even though both just consist of the same two adjectives followed by the same noun.

I suspect that we'll be hearing about frames, under a different name, in the next ten years or so, as AI researchers try to get past the current impasse. Frames may be arbitrary, but they would be something chosen by the system designer to solve a problem, whether it is getting customer contact information using voice recognition, commissioning an illustration or recognizing patterns in medical records.

P.S. As for a lot of the predictions for systems like DALL-E, I'm with Rodney Brooks, NIML.

Expand full comment

Lift your razor high, Occam

Hold it to the sky

Entities without true needs

Shan't multiply

Expand full comment

It might have placed the key better if you put it in the raven's beak instead of mouth

Expand full comment

If you want matching styles, maybe use Deep Art to adjust some as a second phase?

Expand full comment

> The most interesting thing I learned from this experience is that DALL-E can’t separate styles from subject matters (or birds from humans).

Looks like entanglement is an issue. DALL-E cannot seem to find the right basis, where the basis vectors are styles, subjects, objects, etc. and instead uses artistic license to the max.

Expand full comment

I am 95% sure there is actually a moose in one of the stained glass windows at my church. It’s in a more recent window depicting the Creation.

Expand full comment

"DALL-E has seen one picture of Thomas Bayes, and many pictures of reverends in stained glass windows, and it has a Platonic ideal of what a reverend in a stained glass window looks like. Sometimes the stained glass reverend looks different from Bayes, and this is able to overpower its un-confident belief in what Bayes looks like."

So the Bayesian update wasn't strong enough to overcome its reverend prior?

Expand full comment

Scott,

Are you familiar with the work of the Emil Frei Stained Glass Company, based in St. Louis? If not, here is their official site:

https://www.emilfrei.com/

And this is a layman's tour of their work in St. Louis, a great resource to view the breadth of their work (spanning more than 100 years):

https://www.builtstlouis.net/mod/emil-frei-stained-glass.html

I grew up in St. Louis, but only learned about their work much later. And yet, when I saw it, it seemed hauntingly familiar. Their style is very distinctive, very quintessentially Modern, moving into Mid-Century Modern. But their stylized figures of people and animals also feel very ... Eastern, Early Christian ... Macedonian, actually.

Besides advocating for these great artisans from my hometown, I want to mention a second point about stained glass -- it is Architectural. Real stained glass windows always exist in a building, with its interior/exterior spaces, its particular site and the sunlight, and most importantly, the people who will gather there to worship.

Digital images of stained glass patterns can be spectacular. I am sure that if DALL-E had a images of the Emil Frei style in its corpus, it could generate new "works" that would be uncannily like real, original artworks from that studio.

From there, rendering the actual light coming through windows in a sanctuary is really a simple extention of existing CAD/ capabilities.

But, much like AI chess programs working under the direction of the Kasparovs and Carsons, designing the placement and subject matter and general massing and flow of these dramatic featured elements, is still the domain of humans.

Oh, btw, if you really wanted to design an epic 12-panel journey depicting the Tenents of Rationality, I think the Emil Frei style would serve you well. (Also, 12-step-or-station journeys are a time-honored way to educate people through stories/allegory/pictures.)

BRetty

Expand full comment

Something I've been thinking about in the context of machine translation, but it might apply to stuff like this as well.

All these neural net-y systems use an interaction pattern where you give them a single prompt, they do a bunch of internal churning, and then spit out a single response. A lot of the time, the response is nonsense, but you can sometimes trace the nonsense a bit to see how the internal churning misinterpreted the original prompt, and then continued to compound mistakes in interpretation on top of that.

The problems these systems are solving are usually multi-step problems, so there's some meaningful sense in which they should be able to "show their work" as they solve them. Like, in machine translation, the system should be able to show you how it broke the prompt into words, how it interpreted the grammatical relationships between words, how it mapped the words from the input language to the output language, and what common idioms it recognized and reworded.

So it seems like you should be able to get generally better results with machine translation if the user were able to give feedback on the accuracy of each of those steps. That way, if the initial parse into words is completely wrong, you could recognize that issue and correct it at that point, rather than letting that initial mistake get compounded into incomprehensibility. Basically, designing the interaction with a neural algorithm as a collaborative feedback loop rather than a black-box oracle.

Applied to this case, the system ought to be able to ask you questions during the process of image synthesis, like "It looks like the subject of your image is a woman named 'Alexandra Elbakyan,' who I think looks like this. Is this correct?" "It looks like you want there to be a 'raven,' which is a type of bird which looks like this, in proximity to the main subject." "It looks like you want there to be a 'key,' which is a tool that looks like this, in proximity to the 'raven.'" "Here is a composition containing these three elements. Is this acceptable?" "It looks like you want this to be drawn in the style of a medieval stained-glass window."

It doesn't seem like there should be any technical reason why these systems couldn't be designed to work more like this, so I don't know why they're all so fixated on the black-box design pattern.

Expand full comment

weird that you didn't try *less specific queries*. instead of "Alexandra Elbakyan in library with a raven with a key in its mouth, stained glass", why not just "raven with a key in its mouth, stained glass" or "raven in a library, stained glass"?

Expand full comment

Only one known picture of Thomas Bayes, and it doesn't even show his posterior...

Expand full comment

Immediate comment: if you've ever commissioned images on Fiverr or similar, you will know that this is a very difficult task. The real problem here is the one-shot communication and the lack of ability to iterate. I'm not sure DALLE is doing any worse than a human would.

Expand full comment

I would like to see a window that depicts Joseph Overton, holding a window.

Expand full comment

I suspect the eventual takeaway from DALL-E in terms of a practical tool for art is that short snippets of natural language are a very unwieldy way to control an AI artist. IMHO, DALL-E’s capacity to render coherent scenes and styles, and to have visual familiarity with so many things, makes it the most viable “actual economic use of actual AI” case we have to date. Beside that, it’s NLP aspect is a fun spandrel that (as this post amusingly demonstrates) gets in the way of purposeful use of those greater capabilities. I think and hope that’s eventually (perhaps after DALL-E is licensed to some team more focused on consumer software) we’ll have a version that’s much more hands-on, something like an “artificially intelligent Photoshop” that lets you much more directly poke the model into doing what you want.

As an aside, though, I think “Darwin as a finch with a human head” is a fantastic visual metaphor and wholly appropriate to the symbolic stained glass setting.

Expand full comment

Welcome to the wonderful world of AI psychology! Also known as prompt engineering. How can I extract the desired knowledge or output from the AI by asking the right questions?

I actually expect that AI psychologist becomes a common job in the future. Perhaps under a more boring name like AI operator.

Expand full comment

Why can't the AI ask you questions to clarify your request? I'm sure a stained glass...artist? would ask you a few questions to make sure they knew exactly what you were after before completing the work. Wouldn't that resolve the issues of ambiguity? Seems like we're expecting AI to be more intelligent, or clairvoyant, than humans.

Expand full comment

I think you've confused "art" with "propaganda". Not really any consideration of the concept of "beauty" and "truth". Nor the idea that art is a process of "virtualization" of aspects of consciousness. See Suzanne K Langer. Form and Feeling.

Perhaps some self reflection on your personal ideas of your own aesthetic theory would reveal that you are pretty close to embracing a Marxist theory of art and hence it is no wonder that DALL E is picking up on this and thus producing works that could easily pass as part of that tradition.

Expand full comment

In the spirit of asking for a deer instead of a moose, in the SciHub one I would have tried asking for a crow instead of a raven.

Expand full comment

Where's Chagall when you need him?!

Expand full comment

> I’m not going to make the mistake of saying these problems are inherent to AI art. My guess is a slightly better language model would solve most of them

I think your specific problem would be better solved by a model that doesn't know any styles other than stained glass. If everything it generates looks like stained glass, you can ask for anything and it will come out looking like stained glass.

Random comments:

- Darwin #1 appears to be horrifically deformed. He's got weird flaps of skin hanging off his face. That's not a beard.

- I'm surprised you didn't try asking for something like "Alexandra Elbakyan in a blonde ponytail".

- When you ask for a person accompanied by a raven, you appear to be getting ravens that are half the size of a man. Something's very wrong there.

Expand full comment

Should've asked for a stained glass window that humans would want if they were smarter, wiser, and grown up further together.

Expand full comment

This was hilarious, and a much-needed riposte to all the "DALL-E will totally replace human artists" posts we had.

Come on, the Darwin-finch is *awesome*. Rationality should definitely keep that one!

I don't know about William of Ockham, but Gila Whamm is definitely going to cut a bitch. William was a Franciscan, so you might do a bit better putting that in, though I imagine DALL-E will then churn out images of Franciscan saints which may not be what Rationalist Virtue stained glass windows want as their imagery:

https://upload.wikimedia.org/wikipedia/commons/7/70/William_of_Ockham.png

As for the Reverend Bayes, he was an 18th century Presbyterian. Anything resembling a cassock or soutane or Anglican/Papist (but those are the same thing) clerical robes will have his ghost arising out of its resting place to haunt you.

As for Tycho Brahe - a moose is the best attribute to identify him? Scott, are you forgetting his METAL NOSE????

"Tycho Brahe lost his nose in 1566 in a duel with Manderup Parsberg, a fellow Danish student at the University of Rostock and his third cousin. Tycho wore a prosthetic nose made of brass, and afterward he and Parsberg became good friends."

Also I do like how the art attempts ended up with Alexandra Elbakyan as a Sirin of Russian folklore:

https://en.wikipedia.org/wiki/Sirin

Russian folklore has not one, but *two* woman-headed birds, the Alkonost is the second:

https://en.wikipedia.org/wiki/Alkonost

See this painting with both:

https://en.wikipedia.org/wiki/Alkonost#/media/File:Vasnetsov_Sirin_Alkonost.jpg

Expand full comment

On the final thoughts, I think that DALL-E2 itself is probably capable of satisfying most of your requests, but what it needs is some amount of fine-tuning on the kind of results that you want, rather than trying to sample from 'the set of images that are likely to have your prompt as a caption', which in any case becomes less defined as your caption diverges from the training set.

The process for fine-tuning these kind of neural networks to be much more helpful is now quite well established, basically involving generating a load of pairwise preferences over results, and then fine-tuning. For example, DeepMind's GopherCite (https://www.deepmind.com/publications/gophercite-teaching-language-models-to-support-answers-with-verified-quotes), trains a language model in just a few steps to give verbatim quotes supporting an answer to a question, in a pre-specified syntax. On the other hand, if the underlying language model is prompted to do so it only sometimes get the syntax and rarely gives proper verbatim quotes. (In language model cases, they also train an RL model to plan ahead, but it's not obvious how this would transfer images which I understand are generated all at once).

Given that the model is huge and not public, it's not going to be possible as an individual but it should be quite trivial to do within OpenAI, and the number of examples required is fairly low, so if it was worth their time and money, you might even be able to give enough examples on your own for the fine tuning.

Expand full comment

It would be useful if the AI could be set to make drawings with no backgrounds. Then one could ask it to make drawings of ravens with keys in their mouth with no background. Then ask it to make library backgrounds separately. Then pick the best raven and put in in the best library.

I've been thinking about using AI art for comics. Example: in the first panel a cowboy punches a robot and in the second panel the robot punches the cowboy. A problem is the AI should draw the same cowboy and the same robot in each panel, just adjusting their positions. So for a AI to be useful for comics you should be able to make a specific character and name it, and then whenever you use that name the AI should be able to remember that character.

This seems hard because the character should be able to change to some extent, if he changes his clothes, or grows old, or gets a haircut or whatever.

The AI should also be able to remember specific objects and locations.

Expand full comment

It's possible to do masking with DALL-E 2. To get a consistent style, you should try putting the same window frame including a little background glass of the style you want, and then let DALL-E fill in the middle with the figure.

Expand full comment

Sometimes it's hard for people to recognize important differences too. The second image of Occam's razor and picture of the medieval razor are not similar.

In the picture of the medieval razor, look at where the metal attaches to the wood. There's a pivot, so this is a folding knife. Razors need to be extremely sharp, but don't have to cut through anything hard, so a razor has an extremely thin blade. This is a picture of a thin and sharp pocketknife.

Now look at the second picture for Occam's razor. The place where the metal attaches to the wood does not have a pivot and looks more like a sledge hammer. The metal is extremely thick. It may not even have a sharp edge. And it's larger than Occam's head.

The second picture shows Occam's pick. Mining tools rarely make good razors.

https://en.wikipedia.org/wiki/Pickaxe#/media/File:Keilhaue_Bergmann_Hammer_VEB_-_BKW_-_GL%C3%9CCKAUF_-Tr%C3%A4ger_des_Vaterl%C3%A4ndischen_VO_in_Gold_-_Betrieb_im_VE_BKK_Senftenberg_-_Lupus_in_Saxonia_Bild_00017.jpg

Expand full comment

How much would it cost to hire a human artist to design the stained glass window for you?

Expand full comment

I really appreciate you showing your process here. Most of the DALL-E stuff I see is just a curated collection of the very best, which tends to give the impression that human artists are about to go extinct, whereas the fact of the matter is that DALL-E is less "useful" and more "unintentionally hilarious".

One major problem with AI is that basically the only thing is has to learn from is the internet, and the internet is a very bad place to learn about the real world. Being able to ingest and self-label data from reality seems to be a pretty significant part of what makes humans and other animals intelligent; the neural network inference algorithms seem to be a pretty small piece of the puzzle in the end.

Expand full comment

I think you may get more consistent styles by mentioning a particular artist, for example "William Ockham holding a razor, art noveau stained glass by Louis Comfort Tiffany"

Expand full comment

For William of Ockham, I bet you'd get better results if you swapped out "razor" for "knife". They're practically the same thing in a medieval style, but DALL-E would stop pulling from shaving ads.

Or for a more metaphorical razor, go with "sword" or "longsword". Add some language about monk's robes and tonsure and you might William of Ockham halfway to a Jedi. Which would be awesome.

Expand full comment

Regarding Alexandra Elbakyan, I recall reading in a different article about DALL-E that it is programmed to deliberately screw up pictures of real people, to avoid deep fakes and revenge porn and such. Could that be the problem there?

Expand full comment

Interesting. And sounds exactly like the annoying aspect of almost all modern AIs, from Google's search algorithm to voice recognition phone trees: they always funnel you too rapidly into whatever is heavily represented in their training data -- the most popular queries -- that is represented by most of your query. It's like they're all gross failures at the Sesame Street game "one of these things is not like the other" -- cannot readily pick out minority components of a query that are unusual and important in that they shove the query off the well-beaten path of its training set.

It's my experience that modern AIs are heck on wheels if you want a popular result but are a little fuzzy on how to ask for it. I can ask Google for "the redhead in ABBA" or "famous Russian marshal Operation Barbarossa" and get "Anna-Frid Lyngstad" and "Zhukov" in no time flat, very impressive. But if I want something *almost* but significantly different from typical, it's like wrestling with an oiled 600 lb walrus, trying to shove the AI off its monomaniacal well-intentioned discarding of certain keywords I add, assuming I must just be mistaken, surely I want the most common result desired when (a random) 80% of the words I used are selected...

You wonder what it is that the human brain does to pick this stuff up.

Expand full comment

Dumb question, but what is the difference between what we can deduce the AI is doing and AI truly being a black box? Doesn't this post kind of illustrate that AI isn't as black box-y as we sometimes think? Also if any one has any resources on like meta (not the company) AI studies that would be really helpful to me.

Expand full comment

Thoughts in no particular order:

Find the style of stained glass you want in books or museums and see how it is captioned there. Maybe use words like 'collection', 'ca. yyyy' or even 'C.317-1927. ©Victoria and Albert Museum'.

Art Nouveau might not be the best style for this. There are a lot of faux-wooden front doors out there with some vaguely edwardian stained glass design on them for you to put on the front of your vaguely edwardian house. Whereas something like pre-raphaelite stained glass has several advantages.

It is much more likely to be representational and allegorical. It's a reaction to the industrial world that tries to hide its fundamentally modern nature behind traditionalism like the work of Chesterton or Lewis. And art nouveau women look dreamy whilst pre-raph women look like they are done with your shit; Elbakyan is very done with academic publishing's shit.

Are all of the captions in English (*all* of them)? Maybe calling it Jugendstil or Sezessionstil would work better than Art Nouveau.

Expand full comment

Regarding Ada Loves Lace, i had a co-worker named Phil French. Windows NT decided that he must speak French, so changed his operating system language to French. Sigh.

Expand full comment
May 31, 2022·edited May 31, 2022

Apparently DALL-E is not completely bad at generating text, it's just that it has its own language.

You can type the "gibberish" back at it to see what it means: https://giannisdaras.github.io/publications/Discovering_the_Secret_Language_of_Dalle.pdf

Expand full comment

I see a general trend here: AI is good at creating things similar to existing ones (to the training data. Same same but different. However, it is not creative. It cannot create something original

Expand full comment
May 31, 2022·edited May 31, 2022

I am curious. Why didn't you specify that you wanted it in the style of a stained glass window, rather than including a stained glass window as one item in a list? If someone gave that prompt to me, that is how I would interpret it.

Expand full comment

It may be the power of suggestion, but it seems like some of those Tycho Brahe images might have been influenced by the fact that Tycho Brahe is also the pseudonym for a webcomic author.

Expand full comment

DALL-E is impressive bit why is it still so hidden and locked down, and why hasn’t a large company bought it or the technology.

Expand full comment

William Herschel just has a German name. He was a German-born British astronomer. He was raised protestant, as was typical in Northern Germany at the time.

Nothing Jewish about Herrn Herschel at all.

(Of course, Dall-e could be very American, and think all German names sound Jewish.)

Expand full comment

Not AI but: we were once in a position to commission a piece from Theodore Ellison. Not your typical suburban house lilies. He's here in the Bay Area.

https://theodoreellison.com/collection/bespoke/

Expand full comment

We actually have a stained glass window of Darwin with finches at Providence College, along with Newton and Galileo, scroll down here for pictures: https://news.providence.edu/stained-glass-windows-transform-fiondella-great-room

Expand full comment

If you want to kill

Like Son of Sam

Try our razor

Gila Whamm

Expand full comment

There is a program called Pattern Wizard. It allows you to import patterns and use parts of them to make a new pattern. It also lets you import pictures and draw patterns over the picture. Therefore, you can fairly easily produce a pattern of pretty much whatever you want. The only thing you have to be careful of is making the pieces a size you can actually work with. However, if you simply want to make a faux stained glass, you can forget those constraints and draw pretty much what you want. It will then let you plug in different colors and patterns of stained glass which it has in a library. I use this feature to show my customers generally what the stained glass piece will look like. I also will print some more detailed pieces on sticky back plastic pages and stick them to the glass. Then I will cut out the area I want to paint a particular color and use a powdered glass mixed with a binder to fill in the area. After it dries I do another color. I continue this process until the painting of the piece is done. My glass powder fires at 1250 degrees F but there are low temp ones that fire in a microwave or oven. I hope this helps some of you out there to be able to create something special for yourselves, I know these processes have certinaly made me and my customers happy with the results. Have fun creating. Jim Walter

Expand full comment

It's interesting, but the same problem of mixing style and content is also present in humans. We can find a great example of this in early Russian literature. When the author wanted to write something religious, they copied the Bible (which was not even in Russian); when the author wanted to say everyday things, they used their transription of spoken language. When it was something in between those two themes, like description of a battle, it was a weird mixing of those two styles by page, or an attempt to find a stylistic middle ground.

Expand full comment

This was laugh out loud funny! I loved it, especially that red-bearded psycho Gila Whamm!

I was surprised you didn't mention the problem that DALL-E showed both Brahe and Herschel putting the eyepiece of the telescope up to their mouths or noses.

Expand full comment

The thing is, the more I see of these programs, the more I think they're not actually capable of doing what they pretend like they're doing at all. It looks an awful lot like me that what these programs are actually doing is basically taking bits and pieces of images from the internet and stitching it together in such a way that it is hard for the people who are trying to figure out what the thing is doing is doing this. I actually noticed this with the flamingos thing, as one of them had bits I remembered seeing in online art previously.

It also seems to run them through various "filters" which makes this harder to catch. As such, I think these things may ultimately end up being really complicated ways to obfuscate copyright infringement.

The thing is, this makes sense; these "machine learning" programs are basically programming shortcuts.

It's been known for a long time that it's possible to trick image recognition programs in various subtle ways, by altering a few pixels or making a slight change to the image, and getting it to highly confidently misidentify the image as being something completely different even though humans often cannot even detect the alterations that were made to the original image. This is because these things aren't actually recognizing the image in the way that humans do, but creating an algorithm for "things that get textually described/linked to as describing this". These attributes sometimes have very little to do with the actual desired thing ,which is why you can get these weird outputs.

Which is exactly what is going on here as well, and why including "reindeer" makes it more Christmasy - because it doesn't actually understand "reindeer" conceptually.

Expand full comment

I can't wait to try out some stuff with DALL-E. This looks like a lot of fun to mess with. Thanks for sharing.

Expand full comment

Is there no way to say "NOT Santa Claus" to remove Santa elements? Or maybe there's a way to find the term antithetical to Santa Claus and add that to cancel out the Santa elements

Expand full comment

“looking realistic” is different than being real..

Expand full comment

Really excited that the AI was able to tap into Charles Darwin-Nagel's infamous treatise "What is it like to be a finch?" from the alt universe my parents came from.

Expand full comment

Is there some experimentation or guidance not depicted here that led to the consistent "[subject], [style]" phrasing? Did you ever try something like "stained glass window depicting [subject]"? This very slightly reminds me of people trying to feed overly structured queries to Jeeves back when it wanted plain English questions.

Expand full comment

I think Dall-E confuses William Ockham with Will Oldham aka Bonnie Prince Billy.

Expand full comment

Interesting that it isn't just Harry Potter fan art, the pseudo-Elbakyan is obviously Slytherin (unlike Hermione) in first and third picture and maybe Hufflepuff (again, unlike Hermione) in the second. (And if I had to assign the Stalinist hothead herself a Hogwarts house, I would probably put her in Gryffindor, too. She's many unpleasant things, but bravery does suit her. And, while she might be hard-working, she's not loyal, so no Huff.)

Expand full comment