The Evolution Of Generative AI

Most people have probably heard of generative AI by now, often (incorrecty) abbreviated as AI. There are a plethora of different websites and apps people can use to create images with generative AI, Midjourney and DALL-E being two of the most common ones. In addition there’s Stable Diffusion, which is a generative AI that can be installed locally at your computer.

I’m using Stable Diffusion with the web-ui Forge myself, so it’s Stable Diffusion I’m referring to in this post.

Stable Diffusion generative AI models

Stable Diffusion 1.5, realesed in October 2022

Stable Diffusion 2.0, released in November 2022

Stable Diffusion XL, released in July 2023

Stable Diffusion XL turbo, released in November 2023

Stable Diffusion Cascade, released in February 2024

Stable Diffusion 3, released to a selected group for preview in February 2024

Some of the limitations of the models

One of the more obvious limitation has been the inability to create human hands, which gave rise to lots of humourous meme’s such as this one.

This issue was mostly fixed with the release of SDXL, and even if bad hands are still generated it’s much more uncommon now. There’s also tools such as embeddings to help combatting the faulty anatomy of “AI-hands”.

Besides the bad hands issue, there has been a period of of Mengele images.

These kind of images can largely be blamed on the user for not understanding how to use prompts and settings correctly.

SDXL made prompting easier since it had a better understanding of the text users were feeding it, as well as a new type of prompt structure. SDXL also brought better control of perspective and camera angles, where users could change perspective by adding things like full body view to the prompt.

Tools for enhancing AI images

Tools like embeddings and LoRa has made it easier to generate quality images with Stable Diffusion. Embeddings can help giving your images different styles such as fantasy or realism, or simply as a short command for a string of words to be used in the prompt or negative prompt.

LoRA or Low-Rank Adaptation is a form of Extra Network and the latest technology that lets you append a sort of smaller model to any of your full models. They are similar to embeddings, but Loras are larger and often more capable.Loras can represent a character, an artstyle, poses, clothes, or even human faces.

Rainface LoRa

One of the limitations with Stable Diffusion 1.5 was the inability to understand rain and how it interacts with human skin. Imagine trying to explain rain for someone who has never seen and experienced it, and then explain what it looks like when raindrops lands on someones face.

Generating an image of a person standing in the rain with Stable Diffusion 1.5 will most of the time end up something like this.

Prompt:

Portrait photo of a beautiful womans face in the rain

You can kind of see in the background that it’s raining, and a few raindrops on her hair, but her face is completely dry.

What I did was creating a LoRa that was trained on only faces with rain- or waterdrops on them. The result was better, but still not really convincing.

I think that looking how raindrops interacts with human skin is a great way of showing the rapid evolution of generative AI.

While Stable Diffusion Cascade is superior in details and quality, it responds best to short and general prompts in my experience. This is something I demonstrated in an earlier post, and makes Cascade fun to play around with. I do believe that it responds poorly to camera angles and is generally bad at faces.

This is also something that is acknowledged as a limitation with the current model, and I hope it will be fixed eventually. For now, the difference is notable and I’ll continue using SDXL for creating images with people.

The Evolution Of Generative AI

Stable Diffusion generative AI models

Some of the limitations of the models

Tools for enhancing AI images

Rainface LoRa

Nyhetsbrev

Thank you!