Guide: Run Gemma 4 NVFP4 From ComfyUI

NOTE: The text has been updated.

Even though the system initially worked well, although a bit slow on my hardware, I ran into issues once I added options for image, video and sound inputs.

Because of this I have updated the method we connect Gemma 4 to ComfyUI. The updated method means that we run the vLLM server and ComfyUI server separately, and have ComfyUI send requests to Gemma 4 through the internal API of the vLLM server.

I have updated the guide to reflect this method, but will let the old text remain for anyone who might be interested in evolving that method.

Jump to the update

The Architectural Shift (Why ComfyUI?)

A week ago I wrote a guide on how to install and run Gemma 4 inside OpenClaw, using llama.cpp. The setup works, but there was a few things I had issues with.

ComfyUI vs. OpenClaw

Llama.cpp isn’t compatible with NVFP4 at this time, or at least it wasn’t a week ago. For someone running a low budget system like mine, RTX 5060 Ti (16GB VRAM), it means I need to run GGUF quantization. In my personal experience GGUF is less accurate and more prone to hallucinate than NVFP4.

I will admit that I don’t have a lot experience with OpenClaw, and some issues I ran into could completely be blamed on me. For example, Gemma was unable to edit and save files inside the workspace folder, and still kept insisting that it completed tasks I asked. When I proved that wasn’t true, it pivoted to telling me that it wasn’t important (the tasks I told it to complete) and that we should focus on some real work. To me, that is unacceptable – I decide what work is real and important, and I need to be able to trust that it gets done without having to double-check.

The ComfyUI Advantage: In ComfyUI, logic is deterministic and visual. If the LLM generates text, that text is passed to a specific “Save File” node. If the save fails, the node turns red and halts the workflow. The LLM handles the reasoning, and the Python nodes handle the execution. It provides a verifiable audit trail for every action.

ComfyUI vs. Pure vLLM

vLLM is an incredibly fast inference engine, but on its own, it is just a server waiting for API calls. You are limited to running a local chat-bot, that in large lack the ability to act.

The ComfyUI Advantage: By wrapping vLLM inside a ComfyUI custom node, you create a unified VRAM ecosystem. Not only does ComfyUI already have a huge library of tools that makes Gemma 4 more of an agent than a chat-bot, but creating custom nodes to fill specific needs is quite easy.

As an example you could utilize the workflow chaining system I wrote about some time ago, and have Gemma output a JSON structure that instantly triggers an image or video generation or a WordPress publishing script, all sharing the same memory pool. It turns an “inference engine” into a “visual programming language.”

The Blackwell Preparations

Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.

Support the Project

While my idea from start was to incorporate vLMM in my current ComfyUI (Windows Portable), it didn’t take long before I realized that there were too many dependency- and compability issues. The only reasonable way forward for someone like me, running Windows 11, seemed to be through Windows Subsystem for Linux (wsl).

A Fresh Start – Ubuntu

Before we do anything in Linux, we need to make sure Windows has the doors open and the bridge is up to date for GPU passthrough. To get started with wsl you first need to open PowerShell as administrator, and run the following command:

wsl --install

If you already had wsl with Ubuntu installed, you should make sure it’s up to date by running this command:

wsl --update

To initiate Ubuntu and set up your username and password, simply type wsl inside PowerShell and press enter. After a short initiation you will be asked to create a user (by default it’s the same as User in Windows), once you accepted the default username or created your own you will be asked to set a password. Make sure you use a password that you remember.

Note! When you type in your password (and when you confirm it), the actual characters will not show inside your terminal window. This is a security feature, not a bug.

Once this is done, you should update Ubuntu using the following command:

sudo apt update && sudo apt upgrade -y

Additionally, you can run this command to clean up digital traces that you don’t need anymore:

sudo apt autoremove -y

Finally, by default, WSL2 is polite and only takes a portion of your RAM. Because Gemma 4 is a massive model, we need to stop being polite. For a 64GB system, we allocated 48GB. This ensures the Linux kernel has enough overhead to memory-map the 10GB+ weights during initialization without triggering an Out-of-Memory crash, while leaving plenty of room for vLLM’s CPU swap space and ComfyUI’s image generation models.

To give wsl and ComfyUI enough system RAM, simultaneously press WIN + R keys on your keyboard. In the text-box that shows up, write %USERPROFILE% and hit enter. This will open /user/Your_Name/.

Again, simultaneously press WIN + R and this time type notepad in the text-box and hit enter. This will open an empty notepad, where you should write the following:

[wsl2]
memory=48GB # Adjust based on your system RAM. Leave at least 16GB for Windows!
processors=8

[experimental]
sparseVhd=true
autoMemoryReclaim=gradual

Save the notepad as .wslconfig by picking save as and make sure the format is set to all, otherwise it will be saved as a .txt file and won’t work.

To apply the settings, open a new PowerShell as administrator and type: wsl --shutdown

Once wsl has shut down, start it again by typing wsl in your PowerShell and press enter.

Pre-Installations

I’m sure this can be done in more than one way, but the following is the order I decided to install things.

Install and update Python using this command inside your wsl:

sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install python3.13

Install jq using this command:

sudo apt install jq

CUDA 13.x

Install CUDA 13.x (I decided to install CUDA 13.03 because it’s both stable and recently updated). You can pick another version, but 13.03 is confirmed to work for this purpose. If you decide to use another version, follow the instructions for CUDA installation at Nvidia.

If you are going with the same version as this installation, follow these commands one by one (hit enter after each command, and wait until it finish before starting with the next one).

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin

sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600

wget https://developer.download.nvidia.com/compute/cuda/13.0.3/local_installers/cuda-repo-wsl-ubuntu-13-0-local_13.0.3-1_amd64.deb

sudo dpkg -i cuda-repo-wsl-ubuntu-13-0-local_13.0.3-1_amd64.deb

sudo cp /var/cuda-repo-wsl-ubuntu-13-0-local/cuda-*-keyring.gpg /usr/share/keyrings/

sudo apt-get update

sudo apt-get -y install cuda-toolkit-13-0

Once installed open .bashrc by typing the following command inside your wsl:

nano ~/.bashrc

The command will open nano text editor inside your terminal wondow, and it will look something like this.

Use the Down Arrow ⬇️ on your keboard, until you reach the very bottom of all the text. Once you reach the bottom, copy these 2 rows and paste them in your terminal window:

export PATH=/usr/local/cuda-13.0/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Once you pasted them both at the bottom, save it by pressing Ctrl + O it will then ask you: “File name to write: .bashrc“

Just press enter, and then use Ctrl + X to exit the text editor.

In your terminal type source ~/.bashrc and press enter.

Finally confirm the installation by typing nvcc --version and press enter.

Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.

Support the Project

Pytorch & vLLM

In your wsl terminal, type cd ~ and hit enter, and then create a new folder typing:

mkdir My-Folder && cd My-Folder

You can change “My-Folder” for whatever name you want to use.

Create a virtual environment using the following command:

python3.13 -m venv venv

Install uv
Before we create our isolated Python bubble, we will install uv, a lightning-fast package manager written in Rust. Because it is a global tool, we run this in the standard Ubuntu terminal.

curl -LsSf https://astral.sh/uv/install.sh | sh

After the installation, use the following command:

source $HOME/.local/bin/env

Next you need to activate your venv, using this command:

source venv/bin/activate

You will now see (venv) in front of your username inside your terminal window. That’s how you know you are currently working inside your virtual environment.

PyTorch
Next we are installing PyTorch. I decided to not install the latest, to avoid conflicts, but you are free to test another version if you want.

pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu130

We’ll also install cuda-tile and cupy-cuda13x

pip install cuda-tile cupy-cuda13x

vLLM
We need to install the correct vLLM for cuda 13.

uv pip install https://github.com/vllm-project/vllm/releases/download/v0.19.1/vllm-0.19.1+cu130-cp38-abi3-manylinux_2_35_x86_64.whl

Install ComfyUI
Now it’s finally time to install ComfyUI itself. Make sure you are still in your virtual environment by making sure that (venv) is written first, before your user name

Clone the ComfyUI repo by typing:

https://github.com/Comfy-Org/ComfyUI

Move to the ComfyUI folder using cd ComfyUI in your terminal window. Next part is optional, but I did it to make sure that none of the dependencies and libraries I just installed will be overwritten.

In windows explorer, navigate to your Ubuntu installation and find your ComfyUI folder.

USER is the username you picked when you installed Ubuntu, and My-Folder is the folder you created right before you created your venv.

\Ubuntu\home\USER\My-Folder\ComfyUI

Open the file named requirements.txt and replace all the text in the file with the following:

# UI & Framework Essentials
comfyui-frontend-package==1.42.11
comfyui-workflow-templates==0.9.57
comfyui-embedded-docs==0.4.3
torchsde
einops
sentencepiece
safetensors>=0.4.2
aiohttp>=3.11.8
yarl>=1.18.0
pyyaml
Pillow
scipy
alembic
SQLAlchemy
filelock
av>=14.2.0

# Specialized For Blackwell 
comfy-kitchen[cublas]
comfy-aimdo>=0.2.12

# Logic & Utilities
requests
simpleeval>=1.0.0
blake3

# Graphics & Processing Helpers
kornia>=0.7.1
spandrel
PyOpenGL
glfw

Save the new requirements.txt file, and go back to your terminal window where your venv is activated and type:

uv pip install -r requirements.txt

If you already have a ComfyUI installation from before, you can edit the extra_model_paths.yaml file to point to where you are keeping your models. Exchange the bold part for your actual path to your model folder.

comfyui:
     base_path: /mnt/c/AI/Comfy/ComfyUI
     checkpoints: models/checkpoints/
     text_encoders: |
          models/text_encoders/
          models/clip/  # legacy location still supported
     clip_vision: models/clip_vision/
     configs: models/configs/
     controlnet: models/controlnet/
     diffusion_models: |
                  models/diffusion_models
                  models/unet
     embeddings: models/embeddings/
     loras: models/loras/
     upscale_models: models/upscale_models/
     vae: models/vae/
     audio_encoders: models/audio_encoders/
     model_patches: models/model_patches/

Inside you models folder, create a folder named LLM, and inside that folder, create another folder named cosmicproc and use this git command to download Gemma 4 E4B it NVFP4 model and helper files.

git clone https://huggingface.co/cosmicproc/gemma-4-E4B-it-NVFP4

You can download the sample node here: GEMMA-4-new.zip

Unpack the folder inside your /ComfyUI/custom_nodes/

UPDATED GUIDE

Start the vLLM server by pasting the complete block below in your terminal window, after you have activated your virtual environment.

python -m vllm.entrypoints.openai.api_server \
  --model /mnt/c/AI/Comfy/ComfyUI/models/LLM/cosmicproc/gemma-4-E4B-it-NVFP4 \
  --quantization nvfp4 \
  --gpu-memory-utilization 0.75 \
  --max-model-len 8192 \
  --trust-remote_code \
  --enforce-eager \
  --allowed-local-media-path /home/PATH/TO/ComfyUI/input

NOTE: the path to the model is hardcoded in the node, which means you will have to either create the directory or change the path both in this start-up command and the code inside the node.

For the latest updated nodes, visit my Github repo: ComfyUI-vLLM-MultiModal-Agent

Delete the current GEMMA-4 folder (if you got the old version) from your custom_nodes folder, and then extract the new GEMMA-4 to the custom_nodes folder.

In a second terminal window, also after you activated your virtual environment, start your ComfyUI using:

cd ~/My-Folder/ComfyUI

Followed by:

python main.py --use-pytorch-cross-attention --disable-api-nodes

You can now open both the new nodes inside your workflow.

END OF UPDATED GUIDE

The old guide continues below

I have written a basic node, just as a proof of concept. At this time it doesn’t have any other features than chatting.

Top text field: Path to your gemma-4-E4B-it-NVFP4 folder
Large text field: Your input prompt
Max tokens: Limit the amount of output tokens
Temperature: Control randomness and creativity of the output
- Balanced: 0.4 – 0.7
- Creative: 0.7 – 1.0
- Very creative: 1.0 and above

In your terminal window, still inside your virtual environment, you can now start up ComfyUI. I use this command to make sure ComfyUI isn’t using x-formers, and because I see no reason to activate ComfyUI’s own API nodes.

In your virtual environment, navigate to your ComfyUI folder (again, replace My-Folder with whatever you named your folder as earlier):

cd ~/My-Folder/ComfyUI

Start ComfyUI using this command:

python main.py --use-pytorch-cross-attention --disable-api-nodes

Once ComfyUI is up and running, open the UI by entering http://127.0.0.1:8188/ in your web browser. You can now find the node named Autonomous Nova (Gemma 4), enter a prompt and click on Run.

Remember to attach a node that show you the output text!

How long it takes to get an answer depends on your GPU.

All the AI related work I do, I do on my spare time and most of it I share with the world completely free of charge. It does take up a lot of my time, as well as the cost for running this website. I’m grateful for every bit of support I can get from users like you.

If you liked this guide and want to got more of these, as well as other useful tips directly in your inbox, you should sign up for my newsletter.