Chaining The Beast

In my previous post I gave a short overview of Flux.2 Dev, that had been released just a couple of hours earlier. Now I have had a chance to test it more, and will make a more informed assessment of the model, particularly on how to contain its massive VRAM requirements and make it work for complex, chained workflows.

Hardware and models used in this test

Hardware:

GFX Card: Nvidia RTX 5060Ti 16 GB VRAM
System RAM: 64 GB

Software:

ComfyUI Version: 0.3.75
Torch Version: 2.8.0
CUDA Version: 12.8
Python Version: 3.12.8

Models:

Generative Models: Flux.2 Dev Q5_K_M.GGUF, Z-Image Turbo BF16, WAN 2.2 I2V 14b
Text Encoders: umt5-xxl-enc-fp8, mistral_3_small_flux2_fp8, qwen_3_4b
VAE: flux2-vae, flux.1 VAE, Wan2_1_VAE_fp32

Initial Flux 2 testing

The first thing I wanted to test was how well Flux 2 can make changes to already existing images. To test this I snagged an image of a woman wearing a face mask, I wrote “Good morning” in red text on the image. Then I had Flux 2 alter the image in two stages: first to remove the red text, and secondly to remove the face mask from the woman.

The text was removed without any noticeable issues. The face mask was successfully removed, even though the remaining clean skin looks a bit smooth to be natural. It still did a great job, and the unnatural skin can easily be fixed with hi-res fix or similar.

In my next test I wanted to completely remove a person from an image, and I used an image I made some time ago for this purpose.

The original image:

In my first attempt to completely remove the woman from the image I used the following prompt:
Recreate the input image without the woman

Which resulted in the following image being created:

The model seems to associate woman with the female face only. So in my next attempt I got a bit more specific, and used the following prompt:
Recreate the input image without the person, without skin, without red cloth

Which resulted in the following image:

As long as we’re being clear about what we don’t want in the image, it handles the removal of the woman very well.

For the final test I had Flux 2 recreate the image after having Gemini create a promt from it. The prompt Gemini created was the following:

Photorealistic portrait of a stunning woman with long, dark, wet, wavy hair, wearing a vibrant red string bikini, reclining provocatively on a wet, sandy tropical beach at the water’s edge under bright midday sun. Her skin is glistening with oil and small water droplets, emphasizing defined musculature and curves. She is wearing dark, rectangular sunglasses, looking off to the side with a sultry expression. The background features turquoise ocean waves gently washing ashore, soft white sand, lush green palm trees, and distant hills under a bright blue sky with soft white clouds. Cinematic lighting, high contrast, extremely detailed skin texture, 8k resolution, professional swimwear photography aesthetic.

Which resulted in this image:

The resulting image is good, although zooming in reveals some flaws such as JPEG artefacts and some blur. But it’s important to keep in mind that I’m not using the full 64 GB model, which might prove to have a lot better results.

The Z-Image Model

Using the same prompt as above with the Z-Image model results in a similar quality image to the Flux 2 model. The overall image is good, but zooming in reveals the same artifacts and blur issues found with the compressed Flux 2 model.

Side by side comparison on Flux 2 and Z-Image at 280% zoom.

For your day-to-day image generation there is not a lot of difference between the 22 GB quantized Flux 2 model and the 11 GB bf16 Z-image turbo model.

The Z-Image model can be used for refinement, though, as shown below; it’s adding a bit more texture and details based on the context of the image.

Considerations

Before completely condemning Flux.2 there are a few things to take into consideration. The first is that most of us can probably still run the original Flux.1 Dev version (~23 GB) and get approximately the same results that we get with Flux.2 Dev.

However, Flux.2 Dev also includes the edit feature, which previously required the separate Flux.1 Kontext model (~24 GB) to use. By simply exchanging flux.1 Dev and Flux.1 Kontext (total of 47 GB) for Flux.2 Dev Q5 GGUF, you are saving 25 GB disk space. Flux.2 is definitely slower than Flux.1, and that is a trade-off everyone must consider. If a bit slower generation in general outweighs the saved disk space or not, is a personal preference.

Let’s Chain This Beast

It should be noted that the fact I’m using these specific models for chaining doesn’t necessarily means that this is the optimal solution. It’s more about showing how the size and architecture of the models doesn’t matter when it comes to chaining workflows. The node that is solely responsible for chaining workflows is this (available in github: https://github.com/Creepybits/ComfyUI-Creepy_nodes):

You can use this node to chain any workflows you want. Only your imagination sets the limit.

The workflows used for chaining the Flux.2 Dev, Z-Image and WAN 2.2 are available here: 3-chain-workflows

The other workflows that are being used are all available in ComfyUI templates.

If you enjoyed this technical breakthrough and want more detailed guides and useful tips delivered directly to your inbox, please sign up for my newsletter.

If you want early access and exclusive materials and guides, joining my patreon!

Support Creepybits on Patreon