In preparation for the release of Flicker, Chapter 10, I asked OpenAI’s DALL-E 2 to help me generate images that reflected the chapter’s narrative, vibes, and characters.
Chapter 10 takes place in Solin Felwing’s past, and includes yet another conflict with his older brother, Varin. A villain in Solin’s early life, Varin made his on-page debut in Chapter 5.
After some failed attempts to re-create the opening scene in Chapter 10, I decided to try a prompt for Varin. In addition to some physical descriptors and asking for a specific age range (the Drakons are teens in this chapter), along with including a sword and leather armor (so as not to get contemporary clothing), I included some character traits, such as “mean” and “hard eyes” and “snarls at you.”
Here are the four images that DALL-E 2 generated for me:
Image A

Here, DALL-E attempted to include the detail of the sword but seems to have ignored the armor. It also paid attention to the fact that I wanted Varin to have that “I’m gonna attack you” stance. No trace of a snarl. Age is a little older than Varin is in the scene. Alas, I was also disappointed in myself because despite knowing that AI images have inherent biases (as a result of their training materials having biases), I’d forgotten to be extremely clear about Varin’s complexion, so this fella isn’t quite fitting the bill.
(If you aren’t familiar with the biases of AI, definitely take a gander on the interwebs at more legitimate sites about how that arises. Essentially, because society is laden with -isms both overt and subconscious, the images people put out there tend to be in line with societal values and trends. Some of these values and trends are harmful, and some aren’t harmful but perhaps annoying because of oversaturation. At some point, I may have time to share an example of the harmless, but annoying trends side of this too. For another chapter, I tried to get a simple photo of Solin pouring Scotch whisky into a glass. The amount of dudes in red flannels was astounding, but not surprising, given recent marketing trends for craft beers and spirits to feature flannelled dudes.)
Image B

Look- and vibes-wise, this generation gives me more of Varin’s antagonistic character, and we even get more of the body in the frame too. Age-wise, it seems to work better for the scene. Unfortunately, there are just some general image hiccups with the details. This image also appears to put Varin in an outfit that is half modern t-shirt, half tunic, and the blade is not a sword, but a long dagger that has crystalline fragments that appear to meld into Varin’s chest.
Image C

What even is this? The sword is barely in frame, his look is totally blank, and he looks like a video game character from early 3D consoles. When I saw this, I realized my prompt had failed to include specifics about the type of art, which is why DALL-E 2 mixed this 3D character into what otherwise looks like a batch of photos. Again, we also see the inherent biases at play too.
Image D

With this character, DALL-E 2 didn’t seem to interpret the age or personality traits correctly at all. We have an adult male with a Mona Lisa smile who appears to be having his actor headshot taken. We get the sword, sort of, although it appears like the sword is just floating in front of him. And while we don’t see much of the armor, it’s clear that there is something more fantastical about his outfit.
Masterpieces or Mishaps?
Unfortunately, I can’t say DALL-E 2 met my expectations of re-producing Varin with any of these images. I believe all three photo-realistic images could possibly have uses elsewhere with a little cropping or a few touch-ups, but they are not Varin Felwing.
These images bring up so many questions about how to train AI to not only generate according to specifications entered by prompters, but to create images free of societal biases. Let’s be direct here: the figure who had the darker complexion also happened to be the one who generated with the proper personality characteristics and age from the prompt. While this works for Varin Felwing specifically, the AI largely ignored these characteristics in my prompt with the subjects that generated with lighter complexions, which could reflect inherent colorism in the training materials, which in themselves are reflections of societal values and beauty trends.
And let’s be direct again: the dude in Image D, by contemporary beauty standards, is photogenic as hell. This is in part due to the compositions of both his pose and the image itself. But does this suggest that a hefty majority of well-posed and well-composed images DALL-E 2 was trained on included people with paler complexions? Or is my prompt an outlier, a coincidence in which the “bad” personality traits I wanted present on all alternates just happened to be more pronounced on the subject with the darker complexion, which also happens to align with racist and xenophobic representations?
It is impossible to know for sure, at least on our ends with this particular prompt and these generations, but it remains likely given the way this type of AI works. These are questions we must continue asking ourselves as prompt engineers and of the AI and AI developers. That said, I remain hopeful for DALL-E’s continued development, as part of OpenAI’s mission is to enhance the safety of AI and reduce bias in interactions.
A Resource
It’s clear DALL-E has a ways to go before it can replicate individuals it produces, given that it still fails to read and respond to prompts it is given. Let’s say I had decided to go with Image B for Varin; I could not re-create Varin using DALL-E. The AI is not currently capable of making that same figure re-appear. It could not age up the figure for a future chapter or pose the figure differently for a different scene. I could delete, for instance, the weird dagger and ask DALL-E to fill in the blank canvas, but it will still only be able to work with that small area and my prompt. That version of Varin only exists in that image.
I think this creates an opportunity for artists to use images generated by AI like DALL-E as references. Human artists already use references of pre-existing works and models, and artists also learn via direct or stylistic reproductions. This is one way, I think, that human artists can take AI generations to the next level, using the AI like a resource to supplement and support their work, as well as helping clients better articulate any requests.
Until next time.
You must log in to post a comment.