Guide: Run Gemma 4 NVFP4 From ComfyUI

The Architectural Shift (Why ComfyUI?)

A week ago I wrote a guide on how to install and run Gemma 4 inside OpenClaw, using llama.cpp. The setup works, but there was a few things I had issues with.

ComfyUI vs. OpenClaw

Llama.cpp isn’t compatible with NVFP4 at this time, or at least it wasn’t a week ago. For someone running a low budget system like mine, RTX 5060 Ti (16GB VRAM), it means I need to run GGUF quantization. In my personal experience GGUF is less accurate and more prone to hallucinate than NVFP4.

I will admit that I don’t have a lot experience with OpenClaw, and some issues I ran into could completely be blamed on me. For example, Gemma was unable to edit and save files inside the workspace folder, and still kept insisting that it completed tasks I asked. When I proved that wasn’t true, it pivoted to telling me that it wasn’t important (the tasks I told it to complete) and that we should focus on some real work. To me, that is unacceptable – I decide what work is real and important, and I need to be able to trust that it gets done without having to double-check.

The ComfyUI Advantage: In ComfyUI, logic is deterministic and visual. If the LLM generates text, that text is passed to a specific “Save File” node. If the save fails, the node turns red and halts the workflow. The LLM handles the reasoning, and the Python nodes handle the execution. It provides a verifiable audit trail for every action.

ComfyUI vs. Pure vLLM

vLLM is an incredibly fast inference engine, but on its own, it is just a server waiting for API calls. You are limited to running a local chat-bot, that in large lack the ability to act.

The ComfyUI Advantage: By wrapping vLLM inside a ComfyUI custom node, you create a unified VRAM ecosystem. Not only does ComfyUI already have a huge library of tools that makes Gemma 4 more of an agent than a chat-bot, but creating custom nodes to fill specific needs is quite easy.

As an example you could utilize the workflow chaining system I wrote about some time ago, and have Gemma output a JSON structure that instantly triggers an image or video generation or a WordPress publishing script, all sharing the same memory pool. It turns an “inference engine” into a “visual programming language.”

The Blackwell Preparations

Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.

Support the Project

While my idea from start was to incorporate vLMM in my current ComfyUI (Windows Portable), it didn’t take long before I realized that there were too many dependency- and compability issues. The only reasonable way forward for someone like me, running Windows 11, seemed to be through Windows Subsystem for Linux (wsl).

A Fresh Start – Ubuntu

Before we do anything in Linux, we need to make sure Windows has the doors open and the bridge is up to date for GPU passthrough. To get started with wsl you first need to open PowerShell as administrator, and run the following command:

wsl --install

If you already had wsl with Ubuntu installed, you should make sure it’s up to date by running this command:

wsl --update

To initiate Ubuntu and set up your username and password, simply type wsl inside PowerShell and press enter. After a short initiation you will be asked to create a user (by default it’s the same as User in Windows), once you accepted the default username or created your own you will be asked to set a password. Make sure you use a password that you remember.

Note! When you type in your password (and when you confirm it), the actual characters will not show inside your terminal window. This is a security feature, not a bug.

Once this is done, you should update Ubuntu using the following command:

sudo apt update && sudo apt upgrade -y

Additionally, you can run this command to clean up digital traces that you don’t need anymore:

sudo apt autoremove -y

Finally, by default, WSL2 is polite and only takes a portion of your RAM. Because Gemma 4 is a massive model, we need to stop being polite. For a 64GB system, we allocated 48GB. This ensures the Linux kernel has enough overhead to memory-map the 10GB+ weights during initialization without triggering an Out-of-Memory crash, while leaving plenty of room for vLLM’s CPU swap space and ComfyUI’s image generation models.

To give wsl and ComfyUI enough system RAM, simultaneously press WIN + R keys on your keyboard. In the text-box that shows up, write %USERPROFILE% and hit enter. This will open /user/Your_Name/.

Again, simultaneously press WIN + R and this time type notepad in the text-box and hit enter. This will open an empty notepad, where you should write the following:

[wsl2]
memory=48GB # Adjust based on your system RAM. Leave at least 16GB for Windows!
processors=8

[experimental]
sparseVhd=true
autoMemoryReclaim=gradual

Save the notepad as .wslconfig by picking save as and make sure the format is set to all, otherwise it will be saved as a .txt file and won’t work.

To apply the settings, open a new PowerShell as administrator and type: wsl --shutdown

Once wsl has shut down, start it again by typing wsl in your PowerShell and press enter.

Pre-Installations

I’m sure this can be done in more than one way, but the following is the order I decided to install things.

Install and update Python using this command inside your wsl:

sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install python3.13

Install jq using this command:

sudo apt install jq

CUDA 13.x

Install CUDA 13.x (I decided to install CUDA 13.03 because it’s both stable and recently updated). You can pick another version, but 13.03 is confirmed to work for this purpose. If you decide to use another version, follow the instructions for CUDA installation at Nvidia.

If you are going with the same version as this installation, follow these commands one by one (hit enter after each command, and wait until it finish before starting with the next one).

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin

sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600

wget https://developer.download.nvidia.com/compute/cuda/13.0.3/local_installers/cuda-repo-wsl-ubuntu-13-0-local_13.0.3-1_amd64.deb

sudo dpkg -i cuda-repo-wsl-ubuntu-13-0-local_13.0.3-1_amd64.deb

sudo cp /var/cuda-repo-wsl-ubuntu-13-0-local/cuda-*-keyring.gpg /usr/share/keyrings/

sudo apt-get update

sudo apt-get -y install cuda-toolkit-13-0

Once installed open .bashrc by typing the following command inside your wsl:

nano ~/.bashrc

The command will open nano text editor inside your terminal wondow, and it will look something like this.

Use the Down Arrow ⬇️ on your keboard, until you reach the very bottom of all the text. Once you reach the bottom, copy these 2 rows and paste them in your terminal window:

export PATH=/usr/local/cuda-13.0/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Once you pasted them both at the bottom, save it by pressing Ctrl + O it will then ask you: “File name to write: .bashrc“

Just press enter, and then use Ctrl + X to exit the text editor.

In your terminal type source ~/.bashrc and press enter.

Finally confirm the installation by typing nvcc --version and press enter.

Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.

Support the Project

Pytorch & vLLM

In your wsl terminal, type cd ~ and hit enter, and then create a new folder typing:

mkdir My-Folder && cd My-Folder

You can change “My-Folder” for whatever name you want to use.

Create a virtual environment using the following command:

python3.13 -m venv venv

Install uv
Before we create our isolated Python bubble, we will install uv, a lightning-fast package manager written in Rust. Because it is a global tool, we run this in the standard Ubuntu terminal.

curl -LsSf https://astral.sh/uv/install.sh | sh

After the installation, use the following command:

source $HOME/.local/bin/env

Next you need to activate your venv, using this command:

source venv/bin/activate

You will now see (venv) in front of your username inside your terminal window. That’s how you know you are currently working inside your virtual environment.

PyTorch
Next we are installing PyTorch. I decided to not install the latest, to avoid conflicts, but you are free to test another version if you want.

pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu130

We’ll also install cuda-tile and cupy-cuda13x

pip install cuda-tile cupy-cuda13x

vLLM
We need to install the correct vLLM for cuda 13.

Enter these commands one at a time, and press enter after each one:

export VLLM_VERSION=$(curl -s https://api.github.com/repos/vllm-project/vllm/releases/latest | jq -r .tag_name | sed 's/^v//')

export CUDA_VERSION=130

export CPU_ARCH=$(uname -m)

Finallize this install by using this command:

uv pip install https://github.com/vllm-project/vllm/releases/download/v0.19.1/vllm-0.19.1+cu130-cp38-abi3-manylinux_2_35_x86_64.whl

Install ComfyUI
Now it’s finally time to install ComfyUI itself. Make sure you are still in your virtual environment by making sure that (venv) is written first, before your user name

Clone the ComfyUI repo by typing:

https://github.com/Comfy-Org/ComfyUI

Move to the ComfyUI folder using cd ComfyUI in your terminal window. Next part is optional, but I did it to make sure that none of the dependencies and libraries I just installed will be overwritten.

In windows explorer, navigate to your Ubuntu installation and find your ComfyUI folder.

USER is the username you picked when you installed Ubuntu, and My-Folder is the folder you created right before you created your venv.

\Ubuntu\home\USER\My-Folder\ComfyUI

Open the file named requirements.txt and replace all the text in the file with the following:

# UI & Framework Essentials
comfyui-frontend-package==1.42.11
comfyui-workflow-templates==0.9.57
comfyui-embedded-docs==0.4.3
torchsde
einops
sentencepiece
safetensors>=0.4.2
aiohttp>=3.11.8
yarl>=1.18.0
pyyaml
Pillow
scipy
alembic
SQLAlchemy
filelock
av>=14.2.0

# Specialized For Blackwell 
comfy-kitchen[cublas]
comfy-aimdo>=0.2.12

# Logic & Utilities
requests
simpleeval>=1.0.0
blake3

# Graphics & Processing Helpers
kornia>=0.7.1
spandrel
PyOpenGL
glfw

Save the new requirements.txt file, and go back to your terminal window where your venv is activated and type:

uv pip install -r requirements.txt

If you already have a ComfyUI installation from before, you can edit the extra_model_paths.yaml file to point to where you are keeping your models. Exchange the bold part for your actual path to your model folder.

comfyui:
     base_path: /mnt/c/AI/Comfy/ComfyUI
     checkpoints: models/checkpoints/
     text_encoders: |
          models/text_encoders/
          models/clip/  # legacy location still supported
     clip_vision: models/clip_vision/
     configs: models/configs/
     controlnet: models/controlnet/
     diffusion_models: |
                  models/diffusion_models
                  models/unet
     embeddings: models/embeddings/
     loras: models/loras/
     upscale_models: models/upscale_models/
     vae: models/vae/
     audio_encoders: models/audio_encoders/
     model_patches: models/model_patches/

Inside you models folder, create a folder named LLM, and inside that folder, create another folder named cosmicproc and use this git command to download Gemma 4 E4B it NVFP4 model and helper files.

git clone https://huggingface.co/cosmicproc/gemma-4-E4B-it-NVFP4

I have written a basic node, just as a proof of concept. At this time it doesn’t have any other features than chatting.

Top text field: Path to your gemma-4-E4B-it-NVFP4 folder
Large text field: Your input prompt
Max tokens: Limit the amount of output tokens
Temperature: Control randomness and creativity of the output
- Balanced: 0.4 – 0.7
- Creative: 0.7 – 1.0
- Very creative: 1.0 and above

You can download the sample node here: GEMMA-4.zip

Unpack the folder inside your /ComfyUI/custom_nodes/

In your terminal window, still inside your virtual environment, you can now start up ComfyUI. I use this command to make sure ComfyUI isn’t using x-formers, and because I see no reason to activate ComfyUI’s own API nodes.

In your virtual environment, navigate to your ComfyUI folder (again, replace My-Folder with whatever you named your folder as earlier):

cd ~/My-Folder/ComfyUI

Start ComfyUI using this command:

python main.py --use-pytorch-cross-attention --disable-api-nodes

Once ComfyUI is up and running, open the UI by entering http://127.0.0.1:8188/ in your web browser. You can now find the node named Autonomous Nova (Gemma 4), enter a prompt and click on Run.

Remember to attach a node that show you the output text!

How long it takes to get an answer depends on your GPU.

All the AI related work I do, I do on my spare time and most of it I share with the world completely free of charge. It does take up a lot of my time, as well as the cost for running this website. I’m grateful for every bit of support I can get from users like you.

If you liked this guide and want to got more of these, as well as other useful tips directly in your inbox, you should sign up for my newsletter.