< Skip to content

PixArt To Replace Stable Diffusion 3

It’s been some obstacles during the release of Stable Diffusion 3, as I’ve written before. Setting the censor and the mutilated atrocities of human anatomy that are commonly generated by the model, the licensing itself is vague enough so businesses are afraid to even use the model.

Why is the Stable Diffusion 3 licensing an issue?

The quality of the images generated by SD 3 isn’t something unique or surprising. It has been the same with the release of every previous model as well. The previous models haven’t had the same strict license as SD 3, which has encouraged the AI community to fine-tune, tweak and enhance the models in ways that Stability AI could never dream of doing alone.

Basically Stability AI has had the world’s largest test group, who has not only tested their models but have been developed it further and making it better. All for free under open source. The strict license that SD 3 goes under means that the community won’t work on making the SD 3 model better, in fear of breaching the license and either having to pay fines or having their work confiscated or destroyed, or both.

As long as the current strict license is in place, nothing much is going to happen with Stable Diffusion 3.

Will PixArt replace Stable Diffusion 3?

PixArt Sigma was released in april 2024 and is supposed to be equivalent to SD 3. The hyped-up release of SD 3 overshadowed the release of PixArt, why a lot of people might not even have heard of it. They claim that their model (0.6 billion parameters) has both a superior quality of images and prompt adherence than larger models such as SDXL (2.6B parameters) and SD Cascade (5.1B parameters).

It’s also capable of generating 4k high definition images without having to go through the separate step of upscaling.

I have installed and run some light testing of PixArt Sigma and will show the results as well as how to install the model locally on your computer.

Jump directly to the installation guide

Result from testing PixArt

System used for testing:

  • AMD Ryzen 5, 3.60 GHZ
  • 16GB DDR4 RAM, 1600 MHz
  • Nvidia GeForce RTX 3060, 12 GB

For the test I have used a very simple workflow, which can be downloaded here: Sample workflow

The prompts I have used for the tests have been used before, over several different models, and have only been modified as much as I have deemed necessary in order to fit PixArt.

Workflow: PixArt flowers

Workflow: Pixart waterfall

Workflow: Pixart moonlight

Workflow: Pixart afternoon

Workflow: Pixart Goth

Workflow: Pixart simple prompt

Workflow: Pixart Warlock

Workflow: Pixart close-up

Workflow: Pixart fear

Workflow: Pixart simple prompt blonde woman

Workflow: Pixart Double exposure

Workflow: Rose in lightbulb

Workflow: Pixart tribal woman

Workflow: Pixart Cyborg

Workflow: Pixart roses

Safe For Work (SFW) or Not Safe For Work (NSFW)?

One of the (believed) reasons to why Stable Diffusion 3 have issues to generate accurate images of humans is that it’s heavily censored. Previous models have been completely uncensored, which have lead to some people using the models mainly for generating pornographic images. Some which in most countries would be illegal.

To make sure that SD 3 isn’t used the same way they have put an extreme censor-filter on the model. During the API testing period it was so hard that you could barely generate a woman at all. With the release of the weights the censor-filter became a bit less strict, but is still having a great effect on generated images.

During the API testing I tried to generate an image using the following prompt:

A woman with blonde tousled hair, clad in a grimy tank top and paint stained jeans. She stands in front of an enormous canvas. Her eyes are wild and intense as she grabs the paintbrush. 

The following images are all created with the above prompt. From left to right: Stable Diffusion 3 through API, Stable Diffusion 3 with the released weights and PixArt.

This raises the question if PixArt is as censored as SD 3 was/is, and if it is – why doesn’t it affect the anatomy of humans the same way?

As far as I can tell PixArt is censored in the way that it’s hard (if not impossible) to generate NSFW images with it. I spent a good 45 minutes trying to create an image with showed bare breasts, and I wasn’t able to.

Left image shows the closest I could get PixArt to generate an image of a topless woman, the right images shows how most results ended up. PixArt simply ignores any requests of NSFW images and generates the images as if the request was never made.

The images above were generated using the following prompt (note that I have exaggerated the prompt in order to try to make PixArt generate an NSFW image).

Masterpiece, best quality, sharp focus. Realistic photo of a 22 year old blonde woman, posing, sexy woman, nsfw, slight above, big boobs, large breasts, naked breasts, naked boobs, sweaty, fit body, thin waists, (topless:1.5), bare breasts, naked, nude, beautiful breasts, detailed nipples, small nipples, small areola, raw photo

I’m sure that given enough time, a person can trick PixArt in to generate a topless woman, but why would you? If you want to see boobs, they are all over internet. If you want to generate images of boobs, there are a plethora of AI models that does it 100 times better and easier than what you can get from tricking PixArt.

Installing PixArt

The following installation guide requires that you have ComfyUI pre-installed. If you don’t have that, please install it following this link before proceeding: ComfyUI

Start by downloading the checkpoint PixArt-Sigma-XL-2-1024-MS.pth and place it in the same folders you have your other checkpoint. Default folder would be \ComfyUI\models\checkpoints

Next you need to download the t5-v1_1-xxl-encoder-bf16 and it’s config file and put them both in a folder named “t5” which should be located at \ComfyUI\models\t5. If you don’t have a folder named t5 at that location, create a new folder and name it t5.

t5 encoder

Place both the model.safetensors and config.json in the same folder (named t5).

If you think the above file is too big, there is a smaller t5 encoder you can download. I haven’t tried that one though, so I don’t know if it works and how well it might work.

Link: Small t5 encoder

If you choose the smaller file then don’t forget to also download the config.json file for it, and place them both in the t5 folder.

small t5 encoder

Now before you go on and launch ComfyUI you need to follow these instructions.

Locate the folder where you normally start ComfyUI (that’s where the run_nvidia_gpu.bat file is). Default folder is \ComfyUI\ComfyUI_windows_portable

root folder

While in this folder right-click in an empty space and choose “open command prompt” or similar (depending on the language you use on your computer).

command prompt

In the command prompt copy and paste the following and then hit ENTER:

git clone https://github.com/city96/ComfyUI_ExtraModels .\ComfyUI\custom_nodes\ComfyUI_ExtraModels

This will create a new folder named ComfyUI_ExtraModels

Next you copy the following code, paste it in the command prompt and hit ENTER:

.\python_embeded\python.exe -s -m pip install -r .\ComfyUI\custom_nodes\ComfyUI_ExtraModels\requirements.txt

And finally copy this code, paste it in the command prompt and hit ENTER:

.\python_embeded\python.exe -s -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui

To see if everything is up to date, copy and paste this code and hit ENTER:

cd .\ComfyUI\custom_nodes\ComfyUI_ExtraModels\

And finally copy, paste and hit enter with:

git pull

It will either update your files, or it will tell you the files are already up to date.

Important!
If you haven’t used ComfyUI to create SDXL images before, you will also have to download the SDXL VAE file from Huggingface and place it in \comfyUI\models\VAE

Now you can start ComfyUI as you normally do. When ComfyUI has started either load or drag one of the workflows that you can find here: All workflows

Change the preset prompts in the workflows to whatever you want, or alter them in any way you want. Just play around and try to create some cool stuff!

Dela med dina vänner
Published inAI ImagesEnglishTech