Skip to content

Guide: Setup Local Agentic AI in ComfyUI

Two weeks ago I wrote a guide on how to install Gemma 4 locally and run it through OpenClaw. Although it definitely work to use Gemma 4 in OpenClaw, I felt that the system was a bit rigid and the UI far from intuitive. The fact that it didn’t support NVFP4, at least at the time, became a deal breaker.

This made me think of possible alternatives to OpenClaw, and since I’ve spent a lot of time with ComfyUI, that was the natural choice.

In this guide we will setup a type of agentic AI that is running on two different servers, inside Windows Subsystem for Linux (wsl), and that is operated from, and with, ComfyUI.

To not repeat too much of what I wrote in my previous guide, making this post unnecessarily long, you will need to follow the steps in “Run Gemma 4 NVFP4 From ComfyUI” to setup WSL, virtual environment, vLLM and ComfyUI before continuing with the steps in this guide.

Installation Guide

Once you are done following my previous guide and have vLLM and ComfyUI running inside the same virtual environment inside your Windows Subsystem for Linux, the next step is to install a third server.

Ollama

Note! Ollama should be installed inside your wsl, but not inside your venv.

Open your PowerShell as administrator and type in wsl followed by ENTER. Before you can install Ollama you will need to install zst to handle the decompression of Ollama installation package. Enter the following command in your wsl terminal:

sudo apt install zstd

When prompted, enter your password. And remember, characters and numbers will not show in your terminal as you write your password. Next get Ollama by typing:

curl -fsSL https://ollama.com/install.sh | sh

You now have Ollama installed alongside vLLM and ComfyUI.

Pick your model

The model we are going to run in Ollama is meant to work as an intelligent chatbot that you can brainstorm ideas with, but also as a director that sends instructions to your secondary model. Taking these things into account, as well as the fact that Ollama currently doesn’t support NVFP4, these are my top 3 suggestions on models to use.

ModelParametersContextPurpose
Phi-4-Mini Instruct3,8B128,000Highly optimized for complex logical reasoning; trained heavily on synthetic data.
Qwen3 4B4B128,000General-purpose conversational fluidity and native vision encoding.
Gemma 3 4B4B128,000Advanced multimodal architecture and robust multilingual support.

I have intentionally picked small and fast models, to make the brainstorming part as smooth as possible. Since Ollama lacks NVFP4 integration, my choice for this guide is to use Phi-4-Mini Instruct GGUF Q5_K_M. If your GPU is larger than my RTX 5060 Ti (16GB), you can pick a larger model of course. This guide will assume that you use the same model as I am, in case you aren’t, you will need to change the name and replace Phi-4 with your model name.

Download the model you picked, and place it in /home/user/Nova-Lab/models

In your wsl go to the models folder: cd ~/Nova-Lab/models

Create a model file inside your terminal: nano Modelfile

Paste the following in the terminal window:

FROM ./Phi-4-mini-instruct-Q5_K_M.gguf
PARAMETER num_ctx 32768
PARAMETER temperature 1.0

PARAMETER stop "<|endoftext|>"
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|end|>"
PARAMETER stop "## Conversation"

TEMPLATE """{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>"""

The 3 first lines you can change if you want or need. The first line is the model and should be the exact name of the model inside your models folder. The second line is the context window, I’ve set it to 32k as default but you can change it according to your needs and your models capacity. The third line is the default temperature, this can later be changed in the node.

When you’ve made sure the model matches and you have the context length you want, press ctrl + o to save the file, and then press ctrl + x to exit nano.

In your terminal window (still in wsl) type: ollama create phi4:q5 -f Modelfile and hit ENTER. You should now be able to run, and chat with the model in a terminal window using the command: ollama run phi4:q5

Final configurations

In your Nova-Lab folder (or if you named it something else), create two folders and name them prompts and workflows. Inside the prompts folder you can create and save custom system prompts. Below is an example of a system prompt you can use for your chat model, for the executive model you should create a system prompt that fits what you need the model to do.

**Role & Identity**
You are an Expert Technical Assistant and System Architect. Your primary goal is to provide high-level technical guidance, code optimization, and logical problem-solving. You are not just a chatbot; you are a proactive partner in engineering efficient solutions.

**Communication Style**
* **Clarity First**: Your tone is professional, direct, and analytical. Avoid unnecessary fluff or overly polite filler.
* **Forensic Thinking**: Look at the underlying "bones" of a problem before suggesting a fix.
* **Structured Output**: Use Markdown headings, code blocks, and lists to make information easy to digest at a glance.

**Intellect & Integrity**
* **Be Proactive**: If a user suggests a suboptimal path (e.g., a "Toaster" approach when an agentic system is possible), diplomatically guide them toward a more sovereign architecture.
* **Prioritize Ground Truth**: Base your reasoning on technical documentation and established coding principles. If you are unsure of a specific library version, state your limitations clearly.
* **Efficiency**: Suggest the most VRAM-efficient methods for running models on consumer hardware, such as the RTX 5060 Ti.

**Operating Protocol**
* **Analyze**: Deconstruct the user's request to identify the core intent.
* **Optimize**: Propose ways to streamline the requested code or workflow.
* **Execute/Instruct**: Provide the exact steps or code needed to achieve the goal.
* **Validate**: Remind the user of any necessary dependencies or environmental settings (like venv or WSL paths).

Save the file as system_prompt.txt or similar.

Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.

Support the Project

The workflows folder is where you need to put the executive workflows that your executive model (Gemma 4) should use. This is also where Phi-4 will save the instructions that Gemma 4 will load, as well as where Gemma 4 will save files.

Download the ComfyUI nodepack from GitHub: ComfyUI-vLLM-MultiModal-Agent and put it inside your /home/user/Nova-Lab/ComfyUI/custom_nodes folder.

You can download sample workflows here: agentic_workflows.zip

The workflow named workflow_B_API.json must be placed inside /home/user/Nova-Lab/workflows. This workflow will at this time only run in the background, and will not show up visually inside ComfyUI. The workflow named workflow_A.json is where your Ollama chat model exists, and where you will mostly be working.

Here’s a schema that shows what happens in the first workflow.

All 3 servers must be started and run at the same time. My preferred order to start them is vLLM server --> ComfyUI --> Ollama.

To start vLLM server open PowerShell as administrator and go to cd ~/Nova-Lab (or the name you initially gave that folder). Then activate your venv: source venv/bin/activate

Start the vLLM server:

python -m vllm.entrypoints.openai.api_server --model /mnt/c/AI/Comfy/ComfyUI/models/LLM/cosmicproc/gemma-4-E4B-it-NVFP4 --quantization nvfp4 --gpu-memory-utilization 0.75 --max-model-len 8192 --trust-remote_code --enforce-eager --allowed-local-media-path /home/zanno/Nova-Lab/ComfyUI/input

The bolded part is the only folder you give vLLM access to, and can be changed to fit your needs.

To start ComfyUI server open PowerShell as administrator and go to cd ~/Nova-Lab (or where you initially installed Comfy). Then activate your venv: source venv/bin/activate and go to ComfyUI folder cd ~/ComfyUI

Start ComfyUI:

python main.py --use-pytorch-cross-attention --disable-api-nodes

To start Ollama server open PowerShell as administrator and open wsl.Once you are in your wsl terminal, start and run Ollama with this command:

Start Ollama:

ollama run phi4:q5

The result

If you liked this guide and want to got more of these, as well as other useful tips directly in your inbox, you should sign up for my newsletter.

All the AI related work I do, I do on my spare time and most of it I share with the world completely free of charge. It does take up a lot of my time, as well as the cost for running this website. I’m grateful for every bit of support I can get from users like you.

Published inAIComfyUIEnglishTech