How can I create a workflow for Comfy? This is a recurring question in various forums and online AI groups.
Okay, not really. The questions are often phrased like this; Why isn’t this workflow working, and what do all these error messages mean?
But it’s the same question because if you know how to build a workflow you will (most of the time) know what’s wrong when it doesn’t work.
Most people have no reason to learn how to create custom workflows though. Anything you can think of that you would want to create a workflow for has likely already been thought of by someone else, and there are several pre-made workflows for it already. You can find almost anything you want on CivitAi, OpenArt, Comfy Workflows and probably thousands of other places.
However, if you are a creative person who likes to try new things and find new and improved ways of doing things, you need to learn how to build workflows. The default workflow in ComfyUI right now is an SD1.5/SDXL workflow, and this can always be loaded by clicking on the Load Default button.
Create Comfy Workflows
Checkpoint
The first thing you should start with (in my opinion) is a way to load the checkpoint. There are a lot of them available, and most of them are pretty much just variants of the same checkpoint loader, which is used to load SD 1.5 and SDXL for example. Some checkpoints are different though, which is why you have to decide what checkpoint you are using before you pick a loader.

In the image above the blue checkpoint loaders are the normal ones, that you would use for creating images with SD 1.5 and SDXL. The green one is specially made for Stable Cascade since the checkpoint for Cascade is divided into several stages. The light blue/grey loader is custom-made to work with Flux NF4 checkpoints, which is a kind of lightweight checkpoint of Flux and doesn’t require as many resources as the full Flux checkpoints do. The top purple loader is custom-made for GGUF models (basically large models that have been split into smaller parts). The bottom purple loader is the default one to use with Flux Dev checkpoint.
VAE
Variational autoencoder (VAE) is a technique that can be used to improve the quality of images you generate. It’s an autoencoder is a model (or part of a model) that is trained to produce its input as output. By giving the model less information to represent the data than the input contains, it’s forced to learn about the input distribution and compress the information. A stereotypical autoencoder has an hourglass shape – let’s say it starts with 100 inputs and reduces it to 50 then 20 then 10 (encoder) and then 10 to 20 to 50 to 100 (decoder). The 10 dimensions that the encoder produces and the decoder consumes are called the latent representation.
Sometimes it’s included (baked) in the checkpoint, but often you use a separate VAE. Different models use different VAE, so you can’t use SDXL VAE for Flux, or PixArt VAE for Stable Cascade. They all load in the same loader though, so next we will add VAE loader to the workflow.

CLIP
Contrastive Language-Image Pre-Training (CLIP) is a neural network trained on a variety of (image, text) pairs. It’s needed for the AI to decode your text into what will later be an image.
As with VAE, different models are using different types of CLIP. However, some models share one or more CLIP’s. An example is Flux which uses the L CLIP from SDXL, but instead of using SDXL’s G CLIP, it uses a pruned version of Google T-5 efficient XXL encoder. The full T-5 XXL is 45 GB large, while the T-5 XXL encoder used by Flux is 10 GB at fp16 and 4,5 GB at fp8.
Some people have merged one or both CLIP encoders (and sometimes VAE) in a Flux model so that you don’t need to have separate CLIP and VAE for it. I prefer to have it separate, because if I have several different Flux models I now only need CLIP and VAE for one. The models which have CLIP and/or VAE baked in are usually bigger than the original model, so if you have more than one of the same type of models you will save space by using the separate CLIP and VAE.

In short, the green CLIP loader is for SD 1.5 and SDXL, blue is for Flux and SDXL if you are using both L and G CLIP. The purple one is for Stable Diffusion 3 (and maybe Cascade, I can’t remember). Since we are building a Flux workflow, we will use the blue one.
The yellow is where you put your actual prompt. In the upper text field, it says clip_l and in the bottom one it says t5xxl. That means that the text you put in the upper field will be encoded by the SDXL CLIP L, while the text in the bottom field will be decoded by the pruned Google T-5 XXL encoder. This is important because they are used differently.
The L CLIP is used for the listing of words (the way prompts used to be written), for example:
Masterpiece, best quality, 32k HD, sharp focus, hyper realistic, ultra detailed, professional. Female character design, full body focus, person, clothed in heavy mist, soft shading, beautiful highly detailed face, dark fantasy, gothic, dress made of water, dark liquid, feminine facial detail, hyperfine maximalist eye detail, photorealistic eyes, symetrical eyes, finely detailed, intricately detailed, intricate design, sharp lines.
This (using SDXL) will result in the following image.

While the t5xxl CLIP is using natural language, the same prompt would look more like this:
In a hauntingly enchanting dark fantasy realm, a mesmerizing female character stands at the center, captivating in her ethereal presence. Clothed in a flowing dress that appears to be woven from shimmering strands of a dark liquid, the fabric ripples and glistens like water under the moonlight, casting soft reflections around her. The mist enveloping her forms an intricate blanket of vapor, swathing her in an aura of mystery and allure.
Her stunning facial features are delicately sculpted, with symmetrical eyes that glimmer like polished gems, each eyelash exquisitely defined, creating an arresting gaze that seems to penetrate the soul. The hyper-realistic detailing reveals the subtle nuances of her skin—smooth and almost luminescent against the dark backdrop. Gentle, soft shading enhances her feminine elegance, accentuating the graceful curve of her neck and shoulders.
Intricate, gothic-themed accessories—perhaps a collar of silver vines—adorn her, merging seamlessly with the flowing design of her dress. The scene is imbued with an overall emotional tone that evokes both beauty and a sense of foreboding, as if she is a guardian of secrets lost to time. The atmosphere is charged with an almost electric stillness, punctuated only by the faint sound of dripping water echoing in the background, further enhancing the surreal vibes of this meticulously crafted masterpiece. The sharp lines and hyperfine details contribute to the artwork’s professional quality, promising an image that is both visually astonishing and deeply evocative.
And the result from using Flux Dev will be something like this:

The Basics
These are the absolute most basic nodes, and they are used (in variations) in pretty much every workflow.

From here on the Flux checkpoints have a little bit different nodes than you might have used for SDXL and other models like it.
Samplers And Schedulers
I’m not going to go into details on what samplers and schedulers are and how they work. But if you are interested in getting some more in-depth knowledge about it, you can start with this article: Stable Diffusion Samplers: A Comprehensive Guide
Below you see a side-by-side comparison between the Flux nodes (that are all separated) and the compact KSamplers normally used for SDXL checkpoints.

Final Section
The only thing left to do now is pretty much just save your image.

And this is what the full workflow looks like

You can download the workflow here: