Paying $20/month for cloud generations when you’ve got a decent GPU sitting right there is like taking taxis when you own a car.
ComfyUI lets you run everything on your own PC and get the same kind of results — without limits, subscriptions, or “you’ve used up your credits.” Fast and totally free… besides the cost of the computer.
Below is a beginner-friendly ComfyUI setup: how to install it, where to get models, how to build your first workflow, and which settings to tweak first.
Note: Anywhere you see
Your_username, replace it with your Windows account name.
It’s worth trying if you have
- A PC with an NVIDIA RTX 20xx (8GB VRAM+)
- An Apple Mac with M3+ and 16GB+ RAM (M1–M2 can work too, just slower)
Keep in mind: On Macs, VRAM and system RAM are basically shared memory. When you generate images, it can eat most of it — multitasking at the same time can get painful.
What you’ll need
ComfyUI for your OS: https://docs.comfy.org/installation/system_requirements
In this guide, we’ll focus on generating with the ComfyUI Desktop app on Windows.
Models to generate with
- Z-Image-Turbo-AIO (https://huggingface.co/SeeSee21/Z-Image-Turbo-AIO) — grab the fp8 version
We’ll use this model throughout the article because it offers a great balance of quality and speed. The tradeoff is higher GPU requirements than SDXL. - An SDXL model: CyberRealistic Pony
Download here
Keep in mind: For graphics cards below 2070 and/or with 8GB VRAM, it is better to use this or other SDXL models that weigh 4-7GB. - For Mac: Stable Diffusion 1.5
Download here
How to install models
Download the model .safetensors file into:
- Windows:
C:\Users\Your_username\Documents\ComfyUI\models\checkpoints - Mac:
<basePath>/ComfyUI/models/checkpoints
Where to get more models:
- Hugging Face — https://huggingface.co/
- Civitai — https://civitai.com/
- ComfyUI Templates (built-in gallery)
Model differences

Z-Image Turbo
Very low step count, “native” prompt structure. Negative prompt has a weak influence; you can usually skip it.
Prompt example for Z-Image Turbo AIO FP8:
Ultra-photoreal cinematic still of Lara Croft from Tomb Raider inside an ancient jungle temple, moss-covered ruins, shafts of sunlight and cinematic god rays through humid haze, brunette hair in a high ponytail with loose strands, lightly tanned skin with a subtle healthy sheen and light sweat, intense determined eyes, athletic curvy figure, teal tank top and brown shorts, no weapons and no holsters, dynamic action-ready pose with one knee bent and one hand bracing on a stone ledge, 35mm lens look, crisp film still realism, harp focus, high detail.
SDXL models
Higher step count. Some are tag-based. Negative prompt is usually needed.
Prompt example for CyberRealistic Pony:
score_9, score_8_up, score_7_up, (Lara Croft, Tomb Raider, athletic curvy adventurer woman, brunette high ponytail, teal tank top, brown shorts, subtle sweat sheen, intense determined eyes, dynamic action pose, one knee bent, hand bracing on stone), ancient jungle temple, moss covered ruins, tropical jungle, shafts of sunlight, god rays, humid haze, cinematic still, ultra photorealistic, 35mm lens look, sharp focus, high detail, realistic skin texture
Negative:
(nude:1.4), worst quality, low quality, lowres, blurry, out of focus, jpeg artifacts, oversharpen, overprocessed, plastic skin, waxy skin, bad anatomy, bad hands, extra fingers, missing fingers, deformed, mutated, long neck, bad proportions, crossed eyes, duplicate face, cropped, out of frame, watermark, text, logo, signature, anime, cartoon, cgi, 3d render, illustration
Getting started
First thing: ComfyUI has its own built-in “store” of templates/models with presets.

Click Templates, pick something, hit Run, and ComfyUI will offer to download the required model files.
Once it finishes downloading, type your prompt and hit Run again.

Generated images are saved here:
- Windows:
C:\Users\Your_username\Documents\ComfyUI\output - Mac:
<basePath>/ComfyUI/output
Tip: Before generating, close heavy apps (Figma, Adobe stuff, 3D tools).
If a job takes forever and your GPU isn’t really doing much, it probably didn’t fit into VRAM and started spilling into system RAM. Cancel it in Job Queue and try again with a smaller resolution.
Templates are nice because everything is already wired up, but you don’t get much control, and the downloaded models aren’t always the best choices. So let’s do it the “manual” way.
Server settings
Click Settings (bottom left), then Server-Config.
- If you have less than 16GB VRAM, set UNET precision to fp8_e4m3fn.
VRAM-saving mode: tiny quality/detail hit possible. Useful for heavy models like Z-Image.
On Mac: don’t enable this. - VRAM management mode: 8GB →
lowvram, 12GB →normalvram, 16GB+ →highvram - Reserved VRAM (GB): how much VRAM you leave to the system. Recommended:
0–1 - Disable smart memory management: turn it on if you have 12GB+ VRAM (if generation becomes too slow, turn it back off)

Workflow setup
A new file is created automatically. If you already opened something from Templates, click the plus next to the current file tab.

Double-left-click on an empty spot on the canvas to open the node search.

Add these nodes in order
- Load Checkpoint
- CLIP Text Encode (Prompt) ×2
- Keep in mind: ConditioningZeroOut (disables Negative prompt — only for Z-Image Turbo; if you’re using SDXL, don’t add it)
- Empty Latent Image
- KSampler
- VAE Decode (Tiled) (if you don’t have Tiled, use regular VAE Decode)
- Save Image

Connect nodes and set up the flow
Basic rule: outputs on the right connect to inputs on the left by dragging “wires” between the colored dots.
Connect Load Checkpoint to:
MODEL→modelonKSamplerVAE→vaeonVAE Decode (Tiled)CLIP→ bothclipinputs on bothCLIP Text Encode (Prompt)nodes
Then select the model you’ll use (example: z-image-turbo-fp8-aio.safetensors).
Top CLIP Text Encode (Prompt): CONDITIONING → positive on KSampler
Bottom CLIP Text Encode (Prompt):
- For Z-Image:
CONDITIONING→ConditioningZeroOut→negativeonKSampler - For SDXL:
CONDITIONING→negativeonKSampler(skip ConditioningZeroOut)
Tip: Double-click a node title to rename it (for example, rename the two CLIP nodes to
PositiveandNegative).
Empty Latent Image: LATENT → latent_image on KSampler
Set your resolution. For testing, you can leave 512×512, or go bigger like 768×1024.Batch size = number of final images. Leave it at 1 for now.
KSampler: LATENT → samples on VAE Decode (Tiled)
KSampler settings
For Z-Image Turbo
- Steps:
9 - CFG:
1.0 - Sampler:
res_multistep - Scheduler:
simple - Denoise:
1
For SDXL models
- Steps:
30+ - CFG:
5 - Sampler:
dpmpp_2m_sde - Scheduler:
karras - Denoise:
1
Notes:
- Seed: basically the ID number of the generation. Any number works. The “crossed arrows” lets you lock it or auto-change it between runs.

- Steps: Z-Image is designed for up to ~10. SDXL is usually 30+. Too few = blurry. Too many = “overcooked” artifacts (skin gets weird, etc.).
- CFG: how strictly the model follows your prompt (Z-Image 1, SDXL ~5+).
- Sampler: how noise is removed.
- Scheduler: the pace/curve of noise removal.
- Denoise: with a single KSampler it’s typically
1.
Model pages usually list recommended settings, but nothing stops you from experimenting.
VAE Decode (Tiled): IMAGE → images on Save Image
You should end up like this:

Add your prompt to the text field of the top CLIP node.
Put any word (like test) into the bottom CLIP node, otherwise generation won’t start.
If you’re using SDXL, write a proper negative prompt in the bottom node.
Test prompts
Z-Image Ultra photoreal cinematic still of white American woman as Jessica Rabbit in a vintage jazz club with soft warm lights and blurred tables behind. Hourglass curvy body. Long vivid red hair in glamorous waves, porcelain pale skin, heavy glam makeup with red lips and sultry half closed eyes. Wearing a tight sparkling red evening dress with a deep neckline and high slit. She posign with deep back arch, hips pushed back and chest forward, one hand on her hip and the other touching her neck, both eyes clearly visible and looking into the camera. Warm club lighting, smoky atmosphere, cinematic depth of field, 85 mm lens look, ultra realistic live action still.
SDXL score_9, score_8_up, score_7_up, (glamorous vintage pin-up woman, vintage jazz club, warm soft lights, blurred tables background, smoky atmosphere, hourglass slim curvy body, long vivid red hair in glamorous waves, porcelain pale skin, heavy glam makeup, red lips, sultry half-closed eyes, deep back arch pose, hips pushed back, chest forward, one hand on hip, other touching neck, both eyes visible, looking into camera), (sparkling scarlet sequin evening gown:1.25), (true red dress:1.2), (neutral white balance:1.1), cinematic still, ultra photorealistic, live action, cinematic depth of field, 85mm lens look, sharp focus, high detail, realistic skin texture
Negative: (nude:1.4), worst quality, low quality, lowres, blurry, out of focus, jpeg artifacts, oversharpen, overprocessed, plastic skin, waxy skin, bad anatomy, bad hands, extra fingers, missing fingers, deformed, mutated, long neck, bad proportions, crossed eyes, duplicate face, cropped, out of frame, watermark, text, logo, signature, anime, cartoon, cgi, 3d render, illustration
Time to hit Run and see what happens
You should get something roughly like this:

Important: the first run is usually slower.
If the progress sits on the same percentage for a long time, you likely ran out of VRAM and it started offloading to system RAM.
In that case, you can:
- Wait it out (it’s not frozen — just slow), or
- Cancel the job and rerun at a smaller resolution in Empty Latent Image

Our workflow for Z-Image Turbo
The model author recommends using one KSampler, but we found that using two KSamplers can make the image look richer (at the cost of slower generation). It’s not a magic bullet: sometimes it’s great, sometimes it’s not.
For example, here the direct-light effect looks better:

Add these nodes
- Upscale Latent By
- KSampler
You can select them and press Ctrl+G to group them.
Wire it up
Load Checkpoint
- Connect
MODELto the secondKSampler(model)
Keep the original connection to the first KSampler, too — you want the model feeding both.
Both CLIP Text Encode (Prompt) nodes
- Connect into
positiveandnegativeon the secondKSampler
(so prompts feed both KSamplers)
First KSampler
- Reconnect
LATENT→samplesintoUpscale Latent By
Upscale Latent By
- Connect
LATENT→latent_imageinto the secondKSampler - Keep
upscale_methodset tonearest-exact
Second KSampler
- Connect
LATENT→samplesintoVAE Decode (Tiled)
Select ConditioningZeroOut and press Ctrl+B (or right-click → Bypass).
You should end up like this:

Add a prompt:
Neon lit retro diner booth at night with rain on the window and city bokeh outside, 35mm film still photo of a beautiful 23 years old Canadian woman, clean pale skin, glossy blonde bob haircut, glam makeup with glossy lips, wearing a mint long sleeve top with a deep neckline and black tiny leather mini skirt, she sits sideways on the booth with one knee raised, playful smile, direct eye contact, neon night lighting, Leica Q3, 50mm lens look, film grain, crisp details, natural skin texture, headroom
In the bottom CLIP Text Encode (Prompt), add a negative prompt: nude, text, watermarks, low quality, artifacts, bad anatomy
Note: by default, Z-Image Turbo prompts don’t affect the result much, but with CFG set to 2 they start affecting it a bit. It’s still worth writing the important basics. Sometimes it works, sometimes it doesn’t.
Set both KSamplers
First KSampler
- Steps:
6 - CFG:
2.0 - Sampler:
dpmpp_2m - Scheduler:
sgm_uniform - Denoise:
1
Second KSampler
- Steps:
4 - CFG:
2.0 - Sampler:
dpmpp_2m - Scheduler:
sgm_uniform - Denoise:
0.5
Then hit Run.
LoRA
A LoRA is a small add-on trained for a specific style.
For example, here’s a bundle of six: https://huggingface.co/DeverStyle/Z-Image-loras
We’ll use z_image_arcane_v1.safetensors: https://huggingface.co/DeverStyle/Z-Image-loras/blob/main/z_image_arcane_v1.safetensors
Download the LoRA and put it here
- Windows:
C:\Users\Your_username\Documents\ComfyUI\models\loras - Mac:
<basePath>/ComfyUI/models/loras
Important: With Z-Image, LoRA doesn’t play nicely with the second KSampler. More specifically, it “works,” but after the first Arcane-style pass, the second KSampler runs again and turns it back into something photo-like.
Select Upscale Latent By and the second KSampler and disable them (Bypass via right-click menu, or Ctrl+B).
Select ConditioningZeroOut and enable it again the same way.
In the first KSampler, keep these settings:
- Steps:
9 - CFG:
1.0 - Sampler:
res_multistep - Scheduler:
simple - Denoise:
1
Add a Load LoRA node and place it between Load Checkpoint and the CLIP nodes.
Connect Load Checkpoint to Load LoRA
MODEL→modelCLIP→clip
Connect Load LoRA to
MODEL→modelon the firstKSamplerCLIP→ bothCLIP Text Encode (Prompt)nodes
Parameters:
strength_model:1strength_clip:1
These control how strong the LoRA effect is. Lower values = weaker effect.
You should end up like this:

Hit Run and check the result:

Our generations
Here’s a taste of what you can create with the setups from this guide.
Photos
These photorealistic portraits were all generated locally using the workflows described above. No cloud services, no subscriptions – just ComfyUI and a decent GPU.





















Graphics
Z-Image Turbo also performs well even without LoRA.












Wrapping up
ComfyUI might look intimidating with all those nodes and wires, but once you build your first workflow, it clicks. The setup takes maybe an hour, but then you’re generating without limits, without subscriptions, and without anyone’s servers going down.
For more advanced techniques and community workflows, check out the official ComfyUI documentation and the growing library of templates. And if you’re looking for other AI tools to complement your workflow, explore our guide to prompting AI image generators, browse our collection of 50+ tested AI image prompts, or discover Icons8’s AI illustration generator for when you need quick vector graphics instead of photorealistic renders.