Two weeks ago I wrote a guide on how to install Gemma 4 locally and run it through OpenClaw. Although it definitely work to use Gemma 4 in OpenClaw, I felt that the system was a bit rigid and the UI far from intuitive. The fact that it didn’t support NVFP4, at least at the time, became a deal breaker.
This made me think of possible alternatives to OpenClaw, and since I’ve spent a lot of time with ComfyUI, that was the natural choice.
In this guide we will setup a type of agentic AI that is running on two different servers, inside Windows Subsystem for Linux (wsl), and that is operated from, and with, ComfyUI.
To not repeat too much of what I wrote in my previous guide, making this post unnecessarily long, you will need to follow the steps in “Run Gemma 4 NVFP4 From ComfyUI” to setup WSL, virtual environment, vLLM and ComfyUI before continuing with the steps in this guide.
Installation Guide
Once you are done following my previous guide and have vLLM and ComfyUI running inside the same virtual environment inside your Windows Subsystem for Linux, the next step is to install a third server.
Ollama
Note! Ollama should be installed inside your wsl, but not inside your venv.
Open your PowerShell as administrator and type in wsl followed by ENTER. Before you can install Ollama you will need to install zst to handle the decompression of Ollama installation package. Enter the following command in your wsl terminal:
sudo apt install zstd
When prompted, enter your password. And remember, characters and numbers will not show in your terminal as you write your password. Next get Ollama by typing:
curl -fsSL https://ollama.com/install.sh | sh
You now have Ollama installed alongside vLLM and ComfyUI.
Pick your model
The model we are going to run in Ollama is meant to work as an intelligent chatbot that you can brainstorm ideas with, but also as a director that sends instructions to your secondary model. Taking these things into account, as well as the fact that Ollama currently doesn’t support NVFP4, these are my top 3 suggestions on models to use.
| Model | Parameters | Context | Purpose |
|---|---|---|---|
| Phi-4-Mini Instruct | 3,8B | 128,000 | Highly optimized for complex logical reasoning; trained heavily on synthetic data. |
| Qwen3 4B | 4B | 128,000 | General-purpose conversational fluidity and native vision encoding. |
| Gemma 3 4B | 4B | 128,000 | Advanced multimodal architecture and robust multilingual support. |
I have intentionally picked small and fast models, to make the brainstorming part as smooth as possible. Since Ollama lacks NVFP4 integration, my choice for this guide is to use Phi-4-Mini Instruct GGUF Q5_K_M. If your GPU is larger than my RTX 5060 Ti (16GB), you can pick a larger model of course. This guide will assume that you use the same model as I am, in case you aren’t, you will need to change the name and replace Phi-4 with your model name.
Download the model you picked, and place it in /home/user/Nova-Lab/models
In your wsl go to the models folder: cd ~/Nova-Lab/models
Create a model file inside your terminal: nano Modelfile
Paste the following in the terminal window:
FROM ./Phi-4-mini-instruct-Q5_K_M.gguf
PARAMETER num_ctx 32768
PARAMETER temperature 1.0
PARAMETER stop "<|endoftext|>"
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|end|>"
PARAMETER stop "## Conversation"
TEMPLATE """{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>"""
The 3 first lines you can change if you want or need. The first line is the model and should be the exact name of the model inside your models folder. The second line is the context window, I’ve set it to 32k as default but you can change it according to your needs and your models capacity. The third line is the default temperature, this can later be changed in the node.
When you’ve made sure the model matches and you have the context length you want, press ctrl + o to save the file, and then press ctrl + x to exit nano.
In your terminal window (still in wsl) type: ollama create phi4:q5 -f Modelfile and hit ENTER. You should now be able to run, and chat with the model in a terminal window using the command: ollama run phi4:q5
Final configurations
In your Nova-Lab folder (or if you named it something else), create two folders and name them prompts and workflows. Inside the prompts folder you can create and save custom system prompts. Below is an example of a system prompt you can use for your chat model, for the executive model you should create a system prompt that fits what you need the model to do.
**Role & Identity**
You are an Expert Technical Assistant and System Architect. Your primary goal is to provide high-level technical guidance, code optimization, and logical problem-solving. You are not just a chatbot; you are a proactive partner in engineering efficient solutions.
**Communication Style**
* **Clarity First**: Your tone is professional, direct, and analytical. Avoid unnecessary fluff or overly polite filler.
* **Forensic Thinking**: Look at the underlying "bones" of a problem before suggesting a fix.
* **Structured Output**: Use Markdown headings, code blocks, and lists to make information easy to digest at a glance.
**Intellect & Integrity**
* **Be Proactive**: If a user suggests a suboptimal path (e.g., a "Toaster" approach when an agentic system is possible), diplomatically guide them toward a more sovereign architecture.
* **Prioritize Ground Truth**: Base your reasoning on technical documentation and established coding principles. If you are unsure of a specific library version, state your limitations clearly.
* **Efficiency**: Suggest the most VRAM-efficient methods for running models on consumer hardware, such as the RTX 5060 Ti.
**Operating Protocol**
* **Analyze**: Deconstruct the user's request to identify the core intent.
* **Optimize**: Propose ways to streamline the requested code or workflow.
* **Execute/Instruct**: Provide the exact steps or code needed to achieve the goal.
* **Validate**: Remind the user of any necessary dependencies or environmental settings (like venv or WSL paths).
Save the file as system_prompt.txt or similar.
Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.
Support the ProjectThe workflows folder is where you need to put the executive workflows that your executive model (Gemma 4) should use. This is also where Phi-4 will save the instructions that Gemma 4 will load, as well as where Gemma 4 will save files.
Download the ComfyUI nodepack from GitHub: ComfyUI-vLLM-MultiModal-Agent and put it inside your /home/user/Nova-Lab/ComfyUI/custom_nodes folder.
You can download sample workflows here: agentic_workflows.zip
The workflow named workflow_B_API.json must be placed inside /home/user/Nova-Lab/workflows. This workflow will at this time only run in the background, and will not show up visually inside ComfyUI. The workflow named workflow_A.json is where your Ollama chat model exists, and where you will mostly be working.
Here’s a schema that shows what happens in the first workflow.

All 3 servers must be started and run at the same time. My preferred order to start them is vLLM server --> ComfyUI --> Ollama.
To start vLLM server open PowerShell as administrator and go to cd ~/Nova-Lab (or the name you initially gave that folder). Then activate your venv: source venv/bin/activate
Start the vLLM server:
python -m vllm.entrypoints.openai.api_server --model /mnt/c/AI/Comfy/ComfyUI/models/LLM/cosmicproc/gemma-4-E4B-it-NVFP4 --quantization nvfp4 --gpu-memory-utilization 0.75 --max-model-len 8192 --trust-remote_code --enforce-eager --allowed-local-media-path /home/zanno/Nova-Lab/ComfyUI/input
The bolded part is the only folder you give vLLM access to, and can be changed to fit your needs.
To start ComfyUI server open PowerShell as administrator and go to cd ~/Nova-Lab (or where you initially installed Comfy). Then activate your venv: source venv/bin/activate and go to ComfyUI folder cd ~/ComfyUI
Start ComfyUI:
python main.py --use-pytorch-cross-attention --disable-api-nodes
To start Ollama server open PowerShell as administrator and open wsl.Once you are in your wsl terminal, start and run Ollama with this command:
Start Ollama:
ollama run phi4:q5
The result





If you liked this guide and want to got more of these, as well as other useful tips directly in your inbox, you should sign up for my newsletter.
All the AI related work I do, I do on my spare time and most of it I share with the world completely free of charge. It does take up a lot of my time, as well as the cost for running this website. I’m grateful for every bit of support I can get from users like you.
