Early Review: Stable Diffusion 3

As I have previously mentioned Stable diffusion 3 was released for preview i February and a few days ago some kind person created a working UI for it. Obviously I had to check it out and see how well it’s doing compared to SDXL and SD Cascade.

Get started with Stable Diffusion 3

There are two quick and easy ways to get started with Stable Diffusion 3 at this time.

Locally on your own computer
- Create a folder where you want to install SD 3 on your computer
- Open your command prompt inside that folder (right click inside the folder and choose Open in Windows Terminal)
- Copy and paste the following in the terminal window and then press enter

git clone https://github.com/MackinationsAi/SD3.git

This will copy all the necessary files to the specified folder. When that’s done, locate and double-click on a file named install.bat. Since SD 3 only is available through API at this time, the installation will be done in a minute or so.

Next double-click on run_SD3.bat, which will open this in your web browser.

Next you will have to get an API key from stability.ai. If you don’t have an account already, just sign up for one (it’s free). Clicking on the Create API Key button will create a personal API key for you (don’t share it with anyone!).

Copy your API key and paste it in the Stable Diffusion 3 window where it says Enter your API key. Now you can create something like 3-5 images for free. You will have to purchase credits for generating additional images, 1000 credits costs $10 at this time and is enough to create ~150 images.

Through Huggingface
- Visit SD 3 app at Huggingface
- Paste your API key in the web app
- Create images

Stable Diffusion 3 Review

Disclaimer: My review is based solely on my experience of SD 3 in it’s current state. Additional web-ui’s, settings and features will surely be available once the weights are released and you can run it without the API.

Requirements
Exactly how large the weights will be and how much VRAM will be required for Stable Diffusion 3 is not currently known. SD 3 has between 800 million to 8 billion parameters and will most likely be released in different sizes. An unoptimized checkpoint with 8 billion parameters would require at least 24 gb VRAM according to the research papers.

For comparison SD Cascade has ~ 5 billion parameters when using the largest checkpoints, which together are a bit over 10 gb. For this the recommended amount of VRAM is 20gb, a bit less will suffice but the persformance will suffer greatly.

I expect that a light version of SD 3 eventually will function somewhat decent with a 12 gb VRAM graphics card. But that’s just my own speculations for now.

Current performance
Since Stable Diffusion 3 currently only runs through API there is no real performance issues. High quality images in the size 1024×1024 is generated in 5-10 seconds, something that will definitely change once we start using it locally.

As for the quality of the images that it currently produces, it varies and depends a lot on the subject of the image you are creating.

The images to the left are generated with SDXL and to the right is SD 3

I’m not entirely sure that I prefer the SD 3 images at this point. But since the settings and features available for SD 3 right now is extremely limited, we can’t know what the images would look like if we could optimize them as we see fit.

As with Stable Cascade, SD 3 is made to only generate SFW (Safe for work) images. I can understand why they chose to take that path, especially when it seems that 95% av all SD 1.5 and SDXL checkpoints have been specifically trained for NSFW images.

However, to decide where to draw the line for what’s SFW or not isn’t easy. Images that are nowhere near NSFW will get censored for reasons only the developers know.

A humanoid robot with blood spraying from the back of its head is safe.

A woman that’s dressed like most women are during a warm summer day is unsafe.

A completely innocent prompt like this is deemed not safe for work.

A bloodied woman that is about to get eaten by a monster is considered to be safe

A male artist is deemed to be safe, but the exact same image with a female artist is unsafe?

A woman sitting with her back against a tree in a park. Safe or unsafe?

I’m all for making sure that people aren’t creating deepfake images, or illegal images that depicting children in situatios no child should ever have to be in. It’s a thin line to walk, for sure. However, it’s a bit too late to worry about that now. The technique, software and knowledge is already out in the world and you can’t put back what you’ve let out of Pandora’s box.

The issue is that a heavily censored AI is useless, especially in this kind of state since you have no idea what will trigger the censoring of an image beforehand. And if you pay for Stable Diffusion 3 you are allowed to use it commercially. Should you not be allowed to create sexy or suggestive images of adult people for commercial purposes?

What if you create an AI version of an existing person with their consent? And they also give you consent to create a portfolio of sexy images with their AI alter ago? That shouldn’t be allowed?

At this point I would say that SD 3 as well as SD Cascade are both great for creating images with no humans in them. Such as images of nature, fantasy images, abstract art and dragons. But as soon as any of the images you are creating has anything to do with humans, it’s safer to use SDXL.

It is possible that once the weights are released the AI community will be able to train their own checkpoints based on SD 3, since it will be released in open source. And as soon as the community start training their own checkpoints, we will be back to the way things are with SD 1.5 and SDXL.

I guess we will just have to wait and see.

Early Review: Stable Diffusion 3

Get started with Stable Diffusion 3

Stable Diffusion 3 Review

Nyhetsbrev

Thank you!