Skip to content

Flash Attention for ComfyUI on Windows: Get Warp Speed!

If you’re anything like me, you’re always looking for ways to push the boundaries of what’s possible with generative AI, especially in ComfyUI. And you know what makes pushing those boundaries easier? Speed!

Today, I’m sharing a guide on how to compile and install Flash Attention v2.7.4.post1 on your Windows machine. This isn’t just a minor tweak; it’s a game-changer that can dramatically reduce your image generation times, especially when paired with optimizations like TeaCache (which I have previously written about here: Guide: Install Triton, Sageattention And TeaCache On Windows).

Heads Up: A Marathon, Not a Sprint!

Let’s be upfront: Compiling Flash Attention on Windows is a complex process. It requires setting up a specific development environment, and the compilation itself is a very resource-intensive task. Based on my own experience and that of others, you can realistically expect this build process to take between 8 to 10 hours on most systems, pushing your CPU to its limits (my Ryzen 7 5700X 8 cores and 16 threads were certainly put through their paces!).

Don’t have 8-10 hours to spare, or just want to jump straight into the action?
No problem at all! If you’d rather skip the entire compilation process and get straight to harnessing the power of Flash Attention, you can download my pre-compiled .whl file directly from my Patreon.

CTA patreon, flash attention

To use the pre-compiled .whl file you need:

  • Python 3.12
  • CUDA 12.8 (and its toolkit) set to PATH
  • Pytorch 2.7.0

Why Go Through the Trouble? The Speed You Gain!

You might be wondering if it’s worth the effort (or the small Patreon support!). First let me tell you about my hardware. A lot of the time people assume that my hardware is a lot more high end than it actually is, especially when they see the highly specialized and intricate workflows I create and run.

However, a lot of my work aims to optimize, automize and randomize – which in most cases means that if I can run something on my computer, you can too!

The relevant parts of my hardware is:

  • Nvidia 3060 (12GB VRAM)
  • Corsair 32GB DDR4 system RAM
  • CPU: Ryzen 7 5700X

If your hardware is about the same as mine, you can effectively run everything I create and promote!

Based on my hardware, let me show you the speed increase in creating an image using 25 steps with Flux Dev (fp16) together with T5-XXL CLIP (fp16) – those two alone will take roughly 31GB VRAM/RAM.

  • Baseline (No optimizations): ~3-4 minutes per image
  • Sage Attention + TeaCache: ~2 minutes per image (a good improvement!)
  • Flash Attention (alone): ~2 minutes per image (impressive for a single optimization!)
  • Flash Attention + TeaCache:A blazing ~1 minute per image!

That’s right! By combining Flash Attention with TeaCache, you can achieve a 3x to 4x speedup compared to a non-optimized setup. This means faster iterations, more experimentation, and more time for creative exploration.


Building Flash Attention v2.7.4.post1 on Windows (using venv)

If you are still reading, it means that you have made the decision to compile everything from scratch. Kudos to you!

Prerequisites

Please make sure that the following software is correctly installed and configured on your Windows system:

  • Windows OS: Windows 10 or 11 (64-bit).
  • NVIDIA GPU: You need a FlashAttention-2 compatible NVIDIA GPU (Ampere, Ada, or Hopper architecture, e.g., RTX 30xx, RTX 40xx, A100, H100).
  • NVIDIA CUDA Toolkit: Install a version compatible with PyTorch 2.7.0 and your GPU driver. CUDA 12.1 or later is generally recommended for recent PyTorch versions. You can download it from the NVIDIA CUDA Toolkit Archive. Important: During installation, make sure that nvcc (the CUDA compiler) was added to your system’s PATH environment variable. You can check this by opening a command prompt and typing nvcc –version.
  • Microsoft Visual Studio: Visual Studio 2019 or 2022 is recommended. Make sure to install the “Desktop development with C++” workload and include the latest MSVC v143 (or v142) build tools and Windows SDK. You can download the Community edition from the Visual Studio website.
  • Git for Windows: Required for cloning source code repositories. Download it from git-scm.com.
  • Python: Install Python 3.12 (64-bit), which can be downloaded from python.org. Make sure Python and pip are added to your system PATH during installation.
  • Python Packages:  Install ninja and packaging in your global environment (or temporarily for this step):
    • pip install ninja packaging

Environment Setup (using venv)

Important: We strongly recommend using a dedicated Python virtual environment throughout the build process to prevent dependency collisions and maintain a project-specific environment. We use the Python standard venv here.

  • Create a venv virtual environment: Open a command prompt or PowerShell, navigate to the directory you want to work in (e.g., D:\fa), and create a dedicated virtual environment:
    • python -m venv flash_attn_build_env
  • Activate the virtual environment: In the same command prompt, activate the virtual environment you just created:
    • .\flash_attn_build_env\Scripts\activate
  • Install PyTorch: Install PyTorch 2.7.0 with the CUDA version that matches your installed CUDA Toolkit. Visit the PyTorch Get Started page, select your configuration (Stable, Windows, Pip, Python, CUDA 12.x), copy the generated installation command, and run it. For CUDA 12.8, an example command would be:
    • pip3 install torch torchvision torchaudio –index-url https://download.pytorch.org/whl/cu128
  • Verify your PyTorch installation:
    • python -c “import torch; print(f’PyTorch version: {torch.version}’); print(f’CUDA available: {torch.cuda.is_available()}’); print(f’CUDA version: {torch.version.cuda}’)”
      (You should see output confirming PyTorch 2.7.0+cu128, CUDA available: True, and CUDA version: 12.8 (or your corresponding CUDA version).)

Download Flash Attention and Build Script

  • Clone the repository: Open a Git Bash or command prompt and run:
    • git clone https://github.com/Dao-AILab/flash-attention.git
    • cd flash-attention
  • Check out the tag: Switch to the desired version. This step is CRUCIAL! If you skip it, your build will likely fail.
    • git checkout v2.7.4.post1
  • Download the WindowsWhlBuilder_cuda.bat script.
    • Download the .bat file here: cuda.bat script
    • Save this .bat file directly in the flash-attention directory you just cloned.

Build/Compiling process

  • Open a Native Tools Command Prompt: Search for and open “x64 Native Tools Command Prompt for VS 2022” (or your corresponding Visual Studio version) from the Start menu. Do NOT use a standard Command Prompt or PowerShell window. This special prompt sets up the necessary environment variables for the Visual Studio C++ build tools.
  • Change directory: In the Native Tools Command Prompt, change the directory to the flash-attention directory where you saved the .bat file.
    • cd path\to\your\flash-attention
  • Activate the venv: activate the venv the same way you did earlier:
    • .\flash_attn_build_env\Scripts\activate
  • Run the build script: Now, finally, run the batch script. This is where the magic (and the long wait!) happens:
    • WindowsWhlBuilder_cuda.bat

Installing Flash Attention in ComfyUI

  • Output: If the build is successful, the compiled .whl file will be placed in the newly created subdirectory named dist, inside the flash-attention folder. The exact filename will vary slightly based on your CUDA and Python versions.
  • Installation: Once the wheel has been built (or if you downloaded it from my Patreon!), you can install it using pip (make sure to use you ComfyUI Python). Copy the newly compiled .whl file to your ComfyUI’s root folder, and install it by typing:
python_embeded/python.exe -m pip install flash_attn-2.7.4.post1+cu128torch2.7.0cxx11abiFALSE-cp312-cp312-win_amd64

(Note: the actual name of the .whl file might vary depending on your system, make sure to use the actual name from your output.)

Troubleshooting

  • Build errors: Most errors are related to incorrect versions and/or paths to the CUDA Toolkit, Visual Studio C++ tools, or PyTorch. Double-check all prerequisites and your PATH settings.
  • NVCC not found: Make sure the CUDA Toolkit bin directory is properly added to your system PATH.
  • Visual Studio error: Make sure the ‘Desktop development with C++’ workload is completely installed.
  • Check GitHub Issues: Search for similar issues on the official Flash Attention GitHub Issues page.


Conclusion

Congratulations! If you’ve made it this far, you’ve successfully compiled and installed Flash Attention, unlocking a new level of performance for your ComfyUI workflows. If you chose the shortcut via Patreon, I hope you’re already enjoying the extra time you saved!

Now don’t forget to edit your ComfyUI .bat file to let ComfyUI know that you want to use Flash Attention.

Open your run_nvidia.bat file in a text editor:

Make sure to include the – -use-flash-attention part in your stratup .bat file.

.\python_embeded\python.exe -s ComfyUI\main.py  --windows-standalone-build --use-flash-attention --disable-xformers --output-directory D:\AI_images\Output\
pause


Enjoy the blazing fast generations!

Don’t forget to subscribe to my newsletter for more tips, tricks, and custom ComfyUI nodes!

Support my work and get early access to exclusive tools!

Published inAIComfyUIPythonTech