Image by Nvidia / Colie Wertz, Concept Artist

This article is related to a current exhibition showcasing creative AI pieces at our RTL offices in Hamburg. The exhibit connected to this article is an image-to-image demonstration of the power that generative models bring to the field of artificial intelligence.

Have you ever had trouble getting what you have in mind onto paper or even a digital canvas? Yes? You're certainly not alone, as most adults would say of themselves that they draw as if they were still a child. How about an assistant to whom you could explain how you envision a picture, or to whom you could submit a rough sketch of a scene and that assistant would turn it into an amazing looking piece of art for you? Sound good? Well, it seems that we have just reached this point in the history of AI technology.

Models like Nvidia GauGAN (and the corresponding app Canvas implementing this technology) and diffusion models like DALL-E or especially the publicly available Stable Diffusion recently have opened up a whole new world of creative possibilities, even for people without particular artistic talents. What impact such models will have one could only guess at this time but deep down we all feel that this might be huge as the technology feels a bit like a kind of magic. And the creative explosion happening in online communities support the story. And to emphasize this, human creativity is what makes the impact here, supported and augmented by the algorithm’s capabilities. We are indeed living in an age where we might transition from experience to result, from crafting everything in the process with our own hands to steering our silicon assistants towards our creative goals.

So, let’s have a more detailed look on what can be done with image-to-image models like GauGAN and Stable Diffusion. Remember back then when you were a kid? How would you have drawn a “man with golden armor, and mask, rises from the sands, a shiny golden magical staff in one hand”? Mine would probably look quite like this:

Taking this (to be honest, combined with more details in textual form which is part of the “prompt engineering” process) as an input for Stable Diffusion, the resulting image would come out like this:

Pictures published by argaman123 on reddit

This certainly looks like something you could expect from 3D rendered movie or video game, right?
Talking about video games, this scenery is the work from the outstanding talented concept artist Donglu Yu for the video game “Assassin's Creed: Valhalla”. One can image how much effort and talent it takes to even come up with the initial sketch and then create some very detailed piece of game art from this. But considering what we just learned about image-to-image models, how will the creative process not be affected by these new possibilities?

Artist: Donglu Yu, Concept Art done for Assassin's Creed: Valhalla

And while most of us haven't tried to be a concept artist yet, many of us have built something creative from less complex shapes, like blocks in the game Minecraft. These new models could help you automatically create a photorealistic digital twin of your builds. 

Outputs from GanCraft. The input block worlds are shown as insets.

In fact, anything semantically segmented can be used as an input, even you physical Lego models.

Image by Matt Henderson

Want to try out your concept-artistic skills with the help of AI? You can do this also online following these links:

Nvidia GauGAN 2
Stable Diffusion Image2Image Demo