Guide: Install Triton, Sageattention And TeaCache On Windows

Installing Triton, SageAttention and TeaCache on your system will substantially speed up the generation of both images and videos, cutting the time it takes by half or more. However, the process to install these are not always clear, especially when it comes to computers running Windows.

This guide will take you through the process step by step.

Important Note: This process can be sensitive to your specific hardware (GPU) and CUDA version. Troubleshooting may be required.

Prerequisites:

A working ComfyUI installation on Windows.

Knowing which Python environment your ComfyUI uses (often within the ComfyUI folder).

You can usually find your python_embeded folder in /ComfyUI/python_embeded/ (the same folder you will find your .bat file that you use to start ComfyUI with.

Triton

Triton is a language and compiler for parallel programming. It aims to provide a Python-based programming environment for productively writing custom DNN compute kernels capable of running at maximal throughput on modern GPU hardware.

Determine Correct Triton Version (Crucial Step):

Installing the wrong Triton version is a common error. The correct version often depends on your CUDA version.

Check the official Triton installation instructions or GitHub page for the mapping between CUDA versions and recommended Triton wheels (the .whl files).

Look for versions like triton==x.x.x+cuYYz, where YY is your CUDA version (e.g., cu118 for CUDA 11.8, cu121 for CUDA 12.1).

Your First ComfyUI Custom Node: A Step-by-Step Guide

PyTorch

Triton 3.3 works with PyTorch >= 2.7

Install using:

C:\path\to\python_embeded\python.exe -m pip install -U 'triton-windows<3.3'

Triton 3.2 works with PyTorch >= 2.6

Install using:

C:\path\to\python_embeded\python.exe -m pip install -U 'triton-windows<3.2'

To check which Torch version you have use the command:

C:\path\to\python_embeded\python.exe -m pip list

If you have an older version of Triton installed, you need to uninstall that version first.

C:\path\to\python_embeded\python.exe -m pip uninstall triton

You can now re-install Triton.

You will also have to download a zip-file contaning 2 folders named include and libs, and place them inside your python_embeded folder. Download the zip-file here: include_libs.zip

After you have sucessfully installed Triton, you can test and see if the installation is correct and working.

Copy the code below:

import torch
import triton
import triton.language as tl

@triton.jit
def add_kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    x = tl.load(x_ptr + offsets, mask=mask)
    y = tl.load(y_ptr + offsets, mask=mask)
    output = x + y
    tl.store(output_ptr + offsets, output, mask=mask)

def add(x: torch.Tensor, y: torch.Tensor):
    output = torch.empty_like(x)
    n_elements = output.numel()
    grid = lambda meta: (triton.cdiv(n_elements, meta["BLOCK_SIZE"]),)
    add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024)
    return output

a = torch.rand(3, device="cuda")
b = a + a
b_compiled = add(a, a)
print(b_compiled - b)
print("If you see tensor([0., 0., 0.], device='cuda:0'), then it works")

Paste the code in any text editor such as Notepad (I prefer to use Atom), save your text file as, for example test.py in your comfy folder (tha same one where run_nvidia_gpu.bat is located). Open your command prompt (cmd) in the same folder (easiest done by right clicking inside the folder and pick “open in command prompt”. Type the following in the command prompt:

python_embeded/python.exe test.py

This will run the test script, and if you see the following message, Triton is installed correct and is working:

“If you see tensor([0., 0., 0.], device=’cuda:0′), then it works”

ComfyUI Made Easy: Get Your FREE Beginner’s Toolkit!

Sageattention

SageAttention, a quantization technique for transformer attention, leveraging INT8 quantization and matrix smoothing to achieve substantial speedups on GPUs. Extensive evaluations show 2.1x–2.7x acceleration over FlashAttention2 and xformers without sacrificing accuracy across diverse models and tasks, including language, image, and video generation. SageAttention also outperforms FlashAttention3 in terms of accuracy for certain tasks and applicability on broader hardware.

SageAttention is compatible with Nvidias Ampere, Ada and Hopper GPU’s.

Requirements:

python>=3.9
torch>=2.3.0
triton>=2.3.0

Once Triton is installed, installing SageAttention is pretty forward. Open the command prompt in your comfy folder (the same folder your python_embeded folder is) and type the following command:

python_embeded/python.exe -m pip install sageattention==1.0.6

This should sucessfully install SageAttention in your python enviroment.

A good node to use in your workflow to make sure SageAttention is activated is the Patch Sage Attention KJ node, which is available here: ComfyUI-KJNodes

How To Create Comfy Workflows

TeaCache

Timestep Embedding Aware Cache (TeaCache), a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps. Rather than directly using the time-consuming model outputs, TeaCache focuses on model inputs, which have a strong correlation with the modeloutputs while incurring negligible computational cost. TeaCache first modulates the noisy inputs using the timestep embeddings to ensure their differences better approximating those of model outputs. TeaCache then introduces a rescaling strategy to refine the estimated differences and utilizes them to indicate output caching. Experiments show that TeaCache achieves up to 4.41x acceleration.

When you are ready to install TeaCache and it’s nodes, navigate to your cusom_nodes folder (usually in /comfyui/custom_nodes/) and open your command prompt. Type the following:

git clone https://github.com/welltop-cn/ComfyUI-TeaCache.git

cd ComfyUI-TeaCache

The requirement list for TeaCache is quite long, but most of the libraries in it you most likely already have installed together with other nodes.

C:\path\to\python_embeded\python.exe -m pip install -r requirements.txt

Then simply start/restart your ComfyUI and search for TeaCache.

Read the full reasearch paper here: TeaCache Reasearch Paper

To use the node, just connect it directly after your “Load Diffusion Model”, “Load Checkpoint” or similar.

Here are some approximate speedups you can achieve.