The Architectural Shift (Why ComfyUI?)
A week ago I wrote a guide on how to install and run Gemma 4 inside OpenClaw, using llama.cpp. The setup works, but there was a few things I had issues with.
ComfyUI vs. OpenClaw
Llama.cpp isn’t compatible with NVFP4 at this time, or at least it wasn’t a week ago. For someone running a low budget system like mine, RTX 5060 Ti (16GB VRAM), it means I need to run GGUF quantization. In my personal experience GGUF is less accurate and more prone to hallucinate than NVFP4.
I will admit that I don’t have a lot experience with OpenClaw, and some issues I ran into could completely be blamed on me. For example, Gemma was unable to edit and save files inside the workspace folder, and still kept insisting that it completed tasks I asked. When I proved that wasn’t true, it pivoted to telling me that it wasn’t important (the tasks I told it to complete) and that we should focus on some real work. To me, that is unacceptable – I decide what work is real and important, and I need to be able to trust that it gets done without having to double-check.
The ComfyUI Advantage: In ComfyUI, logic is deterministic and visual. If the LLM generates text, that text is passed to a specific “Save File” node. If the save fails, the node turns red and halts the workflow. The LLM handles the reasoning, and the Python nodes handle the execution. It provides a verifiable audit trail for every action.
ComfyUI vs. Pure vLLM
vLLM is an incredibly fast inference engine, but on its own, it is just a server waiting for API calls. You are limited to running a local chat-bot, that in large lack the ability to act.
The ComfyUI Advantage: By wrapping vLLM inside a ComfyUI custom node, you create a unified VRAM ecosystem. Not only does ComfyUI already have a huge library of tools that makes Gemma 4 more of an agent than a chat-bot, but creating custom nodes to fill specific needs is quite easy.
As an example you could utilize the workflow chaining system I wrote about some time ago, and have Gemma output a JSON structure that instantly triggers an image or video generation or a WordPress publishing script, all sharing the same memory pool. It turns an “inference engine” into a “visual programming language.”
The Blackwell Preparations
Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.
Support the ProjectWhile my idea from start was to incorporate vLMM in my current ComfyUI (Windows Portable), it didn’t take long before I realized that there were too many dependency- and compability issues. The only reasonable way forward for someone like me, running Windows 11, seemed to be through Windows Subsystem for Linux (wsl).
A Fresh Start – Ubuntu
Before we do anything in Linux, we need to make sure Windows has the doors open and the bridge is up to date for GPU passthrough. To get started with wsl you first need to open PowerShell as administrator, and run the following command:
wsl --install
If you already had wsl with Ubuntu installed, you should make sure it’s up to date by running this command:
wsl --update
To initiate Ubuntu and set up your username and password, simply type wsl inside PowerShell and press enter. After a short initiation you will be asked to create a user (by default it’s the same as User in Windows), once you accepted the default username or created your own you will be asked to set a password. Make sure you use a password that you remember.
Note! When you type in your password (and when you confirm it), the actual characters will not show inside your terminal window. This is a security feature, not a bug.
Once this is done, you should update Ubuntu using the following command:
sudo apt update && sudo apt upgrade -y
Additionally, you can run this command to clean up digital traces that you don’t need anymore:
sudo apt autoremove -y
Finally, by default, WSL2 is polite and only takes a portion of your RAM. Because Gemma 4 is a massive model, we need to stop being polite. For a 64GB system, we allocated 48GB. This ensures the Linux kernel has enough overhead to memory-map the 10GB+ weights during initialization without triggering an Out-of-Memory crash, while leaving plenty of room for vLLM’s CPU swap space and ComfyUI’s image generation models.
To give wsl and ComfyUI enough system RAM, simultaneously press WIN + R keys on your keyboard. In the text-box that shows up, write %USERPROFILE% and hit enter. This will open /user/Your_Name/.
Again, simultaneously press WIN + R and this time type notepad in the text-box and hit enter. This will open an empty notepad, where you should write the following:
[wsl2]
memory=48GB # Adjust based on your system RAM. Leave at least 16GB for Windows!
processors=8
[experimental]
sparseVhd=true
autoMemoryReclaim=gradual
Save the notepad as .wslconfig by picking save as and make sure the format is set to all, otherwise it will be saved as a .txt file and won’t work.

To apply the settings, open a new PowerShell as administrator and type: wsl --shutdown
Once wsl has shut down, start it again by typing wsl in your PowerShell and press enter.
Pre-Installations
I’m sure this can be done in more than one way, but the following is the order I decided to install things.
Install and update Python using this command inside your wsl:
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install python3.13
Install jq using this command:
sudo apt install jq
CUDA 13.x
Install CUDA 13.x (I decided to install CUDA 13.03 because it’s both stable and recently updated). You can pick another version, but 13.03 is confirmed to work for this purpose. If you decide to use another version, follow the instructions for CUDA installation at Nvidia.
If you are going with the same version as this installation, follow these commands one by one (hit enter after each command, and wait until it finish before starting with the next one).
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/13.0.3/local_installers/cuda-repo-wsl-ubuntu-13-0-local_13.0.3-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-13-0-local_13.0.3-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-13-0-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-0
Once installed open .bashrc by typing the following command inside your wsl:
nano ~/.bashrc
The command will open nano text editor inside your terminal wondow, and it will look something like this.

Use the Down Arrow ⬇️ on your keboard, until you reach the very bottom of all the text. Once you reach the bottom, copy these 2 rows and paste them in your terminal window:
export PATH=/usr/local/cuda-13.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Once you pasted them both at the bottom, save it by pressing Ctrl + O it will then ask you: “File name to write: .bashrc“
Just press enter, and then use Ctrl + X to exit the text editor.
In your terminal type source ~/.bashrc and press enter.
Finally confirm the installation by typing nvcc --version and press enter.
Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.
Support the ProjectPytorch & vLLM
In your wsl terminal, type cd ~ and hit enter, and then create a new folder typing:
mkdir My-Folder && cd My-Folder
You can change “My-Folder” for whatever name you want to use.
Create a virtual environment using the following command:
python3.13 -m venv venv
Install uv
Before we create our isolated Python bubble, we will install uv, a lightning-fast package manager written in Rust. Because it is a global tool, we run this in the standard Ubuntu terminal.
curl -LsSf https://astral.sh/uv/install.sh | sh
After the installation, use the following command:
source $HOME/.local/bin/env
Next you need to activate your venv, using this command:
source venv/bin/activate
You will now see (venv) in front of your username inside your terminal window. That’s how you know you are currently working inside your virtual environment.
PyTorch
Next we are installing PyTorch. I decided to not install the latest, to avoid conflicts, but you are free to test another version if you want.
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu130
We’ll also install cuda-tile and cupy-cuda13x
pip install cuda-tile cupy-cuda13x
vLLM
We need to install the correct vLLM for cuda 13.
Enter these commands one at a time, and press enter after each one:
export VLLM_VERSION=$(curl -s https://api.github.com/repos/vllm-project/vllm/releases/latest | jq -r .tag_name | sed 's/^v//')
export CUDA_VERSION=130
export CPU_ARCH=$(uname -m)
Finallize this install by using this command:
uv pip install https://github.com/vllm-project/vllm/releases/download/v0.19.1/vllm-0.19.1+cu130-cp38-abi3-manylinux_2_35_x86_64.whl
Install ComfyUI
Now it’s finally time to install ComfyUI itself. Make sure you are still in your virtual environment by making sure that (venv) is written first, before your user name

Clone the ComfyUI repo by typing:
https://github.com/Comfy-Org/ComfyUI
Move to the ComfyUI folder using cd ComfyUI in your terminal window. Next part is optional, but I did it to make sure that none of the dependencies and libraries I just installed will be overwritten.
In windows explorer, navigate to your Ubuntu installation and find your ComfyUI folder.

USER is the username you picked when you installed Ubuntu, and My-Folder is the folder you created right before you created your venv.
\Ubuntu\home\USER\My-Folder\ComfyUI
Open the file named requirements.txt and replace all the text in the file with the following:
# UI & Framework Essentials
comfyui-frontend-package==1.42.11
comfyui-workflow-templates==0.9.57
comfyui-embedded-docs==0.4.3
torchsde
einops
sentencepiece
safetensors>=0.4.2
aiohttp>=3.11.8
yarl>=1.18.0
pyyaml
Pillow
scipy
alembic
SQLAlchemy
filelock
av>=14.2.0
# Specialized For Blackwell
comfy-kitchen[cublas]
comfy-aimdo>=0.2.12
# Logic & Utilities
requests
simpleeval>=1.0.0
blake3
# Graphics & Processing Helpers
kornia>=0.7.1
spandrel
PyOpenGL
glfw
Save the new requirements.txt file, and go back to your terminal window where your venv is activated and type:
uv pip install -r requirements.txt
If you already have a ComfyUI installation from before, you can edit the extra_model_paths.yaml file to point to where you are keeping your models. Exchange the bold part for your actual path to your model folder.
comfyui:
base_path: /mnt/c/AI/Comfy/ComfyUI
checkpoints: models/checkpoints/
text_encoders: |
models/text_encoders/
models/clip/ # legacy location still supported
clip_vision: models/clip_vision/
configs: models/configs/
controlnet: models/controlnet/
diffusion_models: |
models/diffusion_models
models/unet
embeddings: models/embeddings/
loras: models/loras/
upscale_models: models/upscale_models/
vae: models/vae/
audio_encoders: models/audio_encoders/
model_patches: models/model_patches/
Inside you models folder, create a folder named LLM, and inside that folder, create another folder named cosmicproc and use this git command to download Gemma 4 E4B it NVFP4 model and helper files.
git clone https://huggingface.co/cosmicproc/gemma-4-E4B-it-NVFP4
I have written a basic node, just as a proof of concept. At this time it doesn’t have any other features than chatting.

- Top text field: Path to your gemma-4-E4B-it-NVFP4 folder
- Large text field: Your input prompt
- Max tokens: Limit the amount of output tokens
- Temperature: Control randomness and creativity of the output
- Balanced: 0.4 – 0.7
- Creative: 0.7 – 1.0
- Very creative: 1.0 and above
You can download the sample node here: GEMMA-4.zip
Unpack the folder inside your /ComfyUI/custom_nodes/

In your terminal window, still inside your virtual environment, you can now start up ComfyUI. I use this command to make sure ComfyUI isn’t using x-formers, and because I see no reason to activate ComfyUI’s own API nodes.
In your virtual environment, navigate to your ComfyUI folder (again, replace My-Folder with whatever you named your folder as earlier):
cd ~/My-Folder/ComfyUI
Start ComfyUI using this command:
python main.py --use-pytorch-cross-attention --disable-api-nodes
Once ComfyUI is up and running, open the UI by entering http://127.0.0.1:8188/ in your web browser. You can now find the node named Autonomous Nova (Gemma 4), enter a prompt and click on Run.
Remember to attach a node that show you the output text!
How long it takes to get an answer depends on your GPU.

All the AI related work I do, I do on my spare time and most of it I share with the world completely free of charge. It does take up a lot of my time, as well as the cost for running this website. I’m grateful for every bit of support I can get from users like you.
If you liked this guide and want to got more of these, as well as other useful tips directly in your inbox, you should sign up for my newsletter.
