Paying $20/month for cloud generations when you’ve got a decent GPU sitting right there is like taking taxis when you own a car.

ComfyUI lets you run everything on your own PC and get the same kind of results — without limits, subscriptions, or “you’ve used up your credits.” Fast and totally free… besides the cost of the computer.

Below is a beginner-friendly ComfyUI setup: how to install it, where to get models, how to build your first workflow, and which settings to tweak first.

Note: Anywhere you see Your_username, replace it with your Windows account name.

It’s worth trying if you have

  • A PC with an NVIDIA RTX 20xx (8GB VRAM+)
  • An Apple Mac with M3+ and 16GB+ RAM (M1–M2 can work too, just slower)

Keep in mind: On Macs, VRAM and system RAM are basically shared memory. When you generate images, it can eat most of it — multitasking at the same time can get painful.

What you’ll need

ComfyUI for your OS: https://docs.comfy.org/installation/system_requirements

In this guide, we’ll focus on generating with the ComfyUI Desktop app on Windows.

Models to generate with

  • Z-Image-Turbo-AIO (https://huggingface.co/SeeSee21/Z-Image-Turbo-AIO) — grab the fp8 version
    We’ll use this model throughout the article because it offers a great balance of quality and speed. The tradeoff is higher GPU requirements than SDXL.
  • An SDXL model: CyberRealistic Pony
    Download here
    Keep in mind: For graphics cards below 2070 and/or with 8GB VRAM, it is better to use this or other SDXL models that weigh 4-7GB.
  • For Mac: Stable Diffusion 1.5
    Download here

How to install models

Download the model .safetensors file into:

  • Windows: C:\Users\Your_username\Documents\ComfyUI\models\checkpoints
  • Mac: <basePath>/ComfyUI/models/checkpoints

Where to get more models:

Model differences

ComfyUI model comparison showing Z-Image Turbo and SDXL results

Z-Image Turbo

Very low step count, “native” prompt structure. Negative prompt has a weak influence; you can usually skip it.

Prompt example for Z-Image Turbo AIO FP8:
Ultra-photoreal cinematic still of Lara Croft from Tomb Raider inside an ancient jungle temple, moss-covered ruins, shafts of sunlight and cinematic god rays through humid haze, brunette hair in a high ponytail with loose strands, lightly tanned skin with a subtle healthy sheen and light sweat, intense determined eyes, athletic curvy figure, teal tank top and brown shorts, no weapons and no holsters, dynamic action-ready pose with one knee bent and one hand bracing on a stone ledge, 35mm lens look, crisp film still realism, harp focus, high detail.

SDXL models

Higher step count. Some are tag-based. Negative prompt is usually needed.

Prompt example for CyberRealistic Pony:
score_9, score_8_up, score_7_up, (Lara Croft, Tomb Raider, athletic curvy adventurer woman, brunette high ponytail, teal tank top, brown shorts, subtle sweat sheen, intense determined eyes, dynamic action pose, one knee bent, hand bracing on stone), ancient jungle temple, moss covered ruins, tropical jungle, shafts of sunlight, god rays, humid haze, cinematic still, ultra photorealistic, 35mm lens look, sharp focus, high detail, realistic skin texture

Negative:
(nude:1.4), worst quality, low quality, lowres, blurry, out of focus, jpeg artifacts, oversharpen, overprocessed, plastic skin, waxy skin, bad anatomy, bad hands, extra fingers, missing fingers, deformed, mutated, long neck, bad proportions, crossed eyes, duplicate face, cropped, out of frame, watermark, text, logo, signature, anime, cartoon, cgi, 3d render, illustration

Getting started

First thing: ComfyUI has its own built-in “store” of templates/models with presets.

ComfyUI templates gallery showing various presets and models

Click Templates, pick something, hit Run, and ComfyUI will offer to download the required model files.

Once it finishes downloading, type your prompt and hit Run again.

ComfyUI template run result showing generated image

Generated images are saved here:

  • Windows: C:\Users\Your_username\Documents\ComfyUI\output
  • Mac: <basePath>/ComfyUI/output

Tip: Before generating, close heavy apps (Figma, Adobe stuff, 3D tools).

If a job takes forever and your GPU isn’t really doing much, it probably didn’t fit into VRAM and started spilling into system RAM. Cancel it in Job Queue and try again with a smaller resolution.

Templates are nice because everything is already wired up, but you don’t get much control, and the downloaded models aren’t always the best choices. So let’s do it the “manual” way.

Server settings

Click Settings (bottom left), then Server-Config.

  • If you have less than 16GB VRAM, set UNET precision to fp8_e4m3fn.
    VRAM-saving mode: tiny quality/detail hit possible. Useful for heavy models like Z-Image.
    On Mac: don’t enable this.
  • VRAM management mode: 8GB → lowvram, 12GB → normalvram, 16GB+ → highvram
  • Reserved VRAM (GB): how much VRAM you leave to the system. Recommended: 0–1
  • Disable smart memory management: turn it on if you have 12GB+ VRAM (if generation becomes too slow, turn it back off)
ComfyUI server configuration settings panel

Workflow setup

A new file is created automatically. If you already opened something from Templates, click the plus next to the current file tab.

ComfyUI new workflow tab button

Double-left-click on an empty spot on the canvas to open the node search.

ComfyUI node search panel showing available nodes

Add these nodes in order

  • Load Checkpoint
  • CLIP Text Encode (Prompt) ×2
  • Keep in mind: ConditioningZeroOut (disables Negative prompt — only for Z-Image Turbo; if you’re using SDXL, don’t add it)
  • Empty Latent Image
  • KSampler
  • VAE Decode (Tiled) (if you don’t have Tiled, use regular VAE Decode)
  • Save Image
ComfyUI basic workflow nodes layout

Connect nodes and set up the flow

Basic rule: outputs on the right connect to inputs on the left by dragging “wires” between the colored dots.

Connect Load Checkpoint to:

  • MODELmodel on KSampler
  • VAEvae on VAE Decode (Tiled)
  • CLIP → both clip inputs on both CLIP Text Encode (Prompt) nodes

Then select the model you’ll use (example: z-image-turbo-fp8-aio.safetensors).

Top CLIP Text Encode (Prompt): CONDITIONINGpositive on KSampler

Bottom CLIP Text Encode (Prompt):

  • For Z-Image: CONDITIONINGConditioningZeroOutnegative on KSampler
  • For SDXL: CONDITIONINGnegative on KSampler (skip ConditioningZeroOut)

Tip: Double-click a node title to rename it (for example, rename the two CLIP nodes to Positive and Negative).

Empty Latent Image: LATENTlatent_image on KSampler

Set your resolution. For testing, you can leave 512×512, or go bigger like 768×1024.
Batch size = number of final images. Leave it at 1 for now.

KSampler: LATENTsamples on VAE Decode (Tiled)

KSampler settings

For Z-Image Turbo

  • Steps: 9
  • CFG: 1.0
  • Sampler: res_multistep
  • Scheduler: simple
  • Denoise: 1

For SDXL models

  • Steps: 30+
  • CFG: 5
  • Sampler: dpmpp_2m_sde
  • Scheduler: karras
  • Denoise: 1

Notes:

  • Seed: basically the ID number of the generation. Any number works. The “crossed arrows” lets you lock it or auto-change it between runs.
ComfyUI seed controls with lock and random options
  • Steps: Z-Image is designed for up to ~10. SDXL is usually 30+. Too few = blurry. Too many = “overcooked” artifacts (skin gets weird, etc.).
  • CFG: how strictly the model follows your prompt (Z-Image 1, SDXL ~5+).
  • Sampler: how noise is removed.
  • Scheduler: the pace/curve of noise removal.
  • Denoise: with a single KSampler it’s typically 1.

Model pages usually list recommended settings, but nothing stops you from experimenting.

VAE Decode (Tiled): IMAGEimages on Save Image

You should end up like this:

ComfyUI fully connected workflow ready to run

Add your prompt to the text field of the top CLIP node.

Put any word (like test) into the bottom CLIP node, otherwise generation won’t start.
If you’re using SDXL, write a proper negative prompt in the bottom node.

Test prompts

Z-Image Ultra photoreal cinematic still of white American woman as Jessica Rabbit in a vintage jazz club with soft warm lights and blurred tables behind. Hourglass curvy body. Long vivid red hair in glamorous waves, porcelain pale skin, heavy glam makeup with red lips and sultry half closed eyes. Wearing a tight sparkling red evening dress with a deep neckline and high slit. She posign with deep back arch, hips pushed back and chest forward, one hand on her hip and the other touching her neck, both eyes clearly visible and looking into the camera. Warm club lighting, smoky atmosphere, cinematic depth of field, 85 mm lens look, ultra realistic live action still.

SDXL score_9, score_8_up, score_7_up, (glamorous vintage pin-up woman, vintage jazz club, warm soft lights, blurred tables background, smoky atmosphere, hourglass slim curvy body, long vivid red hair in glamorous waves, porcelain pale skin, heavy glam makeup, red lips, sultry half-closed eyes, deep back arch pose, hips pushed back, chest forward, one hand on hip, other touching neck, both eyes visible, looking into camera), (sparkling scarlet sequin evening gown:1.25), (true red dress:1.2), (neutral white balance:1.1), cinematic still, ultra photorealistic, live action, cinematic depth of field, 85mm lens look, sharp focus, high detail, realistic skin texture

Negative: (nude:1.4), worst quality, low quality, lowres, blurry, out of focus, jpeg artifacts, oversharpen, overprocessed, plastic skin, waxy skin, bad anatomy, bad hands, extra fingers, missing fingers, deformed, mutated, long neck, bad proportions, crossed eyes, duplicate face, cropped, out of frame, watermark, text, logo, signature, anime, cartoon, cgi, 3d render, illustration

Time to hit Run and see what happens

You should get something roughly like this:

ComfyUI first run result showing generated image

Important: the first run is usually slower.

If the progress sits on the same percentage for a long time, you likely ran out of VRAM and it started offloading to system RAM.

In that case, you can:

  • Wait it out (it’s not frozen — just slow), or
  • Cancel the job and rerun at a smaller resolution in Empty Latent Image
ComfyUI job queue cancel button

Our workflow for Z-Image Turbo

The model author recommends using one KSampler, but we found that using two KSamplers can make the image look richer (at the cost of slower generation). It’s not a magic bullet: sometimes it’s great, sometimes it’s not.

For example, here the direct-light effect looks better:

ComfyUI single vs dual KSampler comparison showing lighting differences

Add these nodes

  • Upscale Latent By
  • KSampler

You can select them and press Ctrl+G to group them.

Wire it up

Load Checkpoint

  • Connect MODEL to the second KSampler (model)
    Keep the original connection to the first KSampler, too — you want the model feeding both.

Both CLIP Text Encode (Prompt) nodes

  • Connect into positive and negative on the second KSampler
    (so prompts feed both KSamplers)

First KSampler

  • Reconnect LATENTsamples into Upscale Latent By

Upscale Latent By

  • Connect LATENTlatent_image into the second KSampler
  • Keep upscale_method set to nearest-exact

Second KSampler

  • Connect LATENTsamples into VAE Decode (Tiled)

Select ConditioningZeroOut and press Ctrl+B (or right-click → Bypass).

You should end up like this:

ComfyUI dual KSampler workflow configuration

Add a prompt:

Neon lit retro diner booth at night with rain on the window and city bokeh outside, 35mm film still photo of a beautiful 23 years old Canadian woman, clean pale skin, glossy blonde bob haircut, glam makeup with glossy lips, wearing a mint long sleeve top with a deep neckline and black tiny leather mini skirt, she sits sideways on the booth with one knee raised, playful smile, direct eye contact, neon night lighting, Leica Q3, 50mm lens look, film grain, crisp details, natural skin texture, headroom

In the bottom CLIP Text Encode (Prompt), add a negative prompt: nude, text, watermarks, low quality, artifacts, bad anatomy

Note: by default, Z-Image Turbo prompts don’t affect the result much, but with CFG set to 2 they start affecting it a bit. It’s still worth writing the important basics. Sometimes it works, sometimes it doesn’t.

Set both KSamplers

First KSampler

  • Steps: 6
  • CFG: 2.0
  • Sampler: dpmpp_2m
  • Scheduler: sgm_uniform
  • Denoise: 1

Second KSampler

  • Steps: 4
  • CFG: 2.0
  • Sampler: dpmpp_2m
  • Scheduler: sgm_uniform
  • Denoise: 0.5

Then hit Run.

LoRA

A LoRA is a small add-on trained for a specific style.

For example, here’s a bundle of six: https://huggingface.co/DeverStyle/Z-Image-loras

We’ll use z_image_arcane_v1.safetensors: https://huggingface.co/DeverStyle/Z-Image-loras/blob/main/z_image_arcane_v1.safetensors

Download the LoRA and put it here

  • Windows: C:\Users\Your_username\Documents\ComfyUI\models\loras
  • Mac: <basePath>/ComfyUI/models/loras

Important: With Z-Image, LoRA doesn’t play nicely with the second KSampler. More specifically, it “works,” but after the first Arcane-style pass, the second KSampler runs again and turns it back into something photo-like.

Select Upscale Latent By and the second KSampler and disable them (Bypass via right-click menu, or Ctrl+B).

Select ConditioningZeroOut and enable it again the same way.

In the first KSampler, keep these settings:

  • Steps: 9
  • CFG: 1.0
  • Sampler: res_multistep
  • Scheduler: simple
  • Denoise: 1

Add a Load LoRA node and place it between Load Checkpoint and the CLIP nodes.

Connect Load Checkpoint to Load LoRA

  • MODELmodel
  • CLIPclip

Connect Load LoRA to

  • MODELmodel on the first KSampler
  • CLIP → both CLIP Text Encode (Prompt) nodes

Parameters:

  • strength_model: 1
  • strength_clip: 1

These control how strong the LoRA effect is. Lower values = weaker effect.

You should end up like this:

ComfyUI LoRA workflow setup with connected nodes

Hit Run and check the result:

ComfyUI Arcane LoRA result showing stylized character

Our generations

Here’s a taste of what you can create with the setups from this guide.

Photos

These photorealistic portraits were all generated locally using the workflows described above. No cloud services, no subscriptions – just ComfyUI and a decent GPU.

Graphics

Z-Image Turbo also performs well even without LoRA.

More generations

Wrapping up

ComfyUI might look intimidating with all those nodes and wires, but once you build your first workflow, it clicks. The setup takes maybe an hour, but then you’re generating without limits, without subscriptions, and without anyone’s servers going down.

For more advanced techniques and community workflows, check out the official ComfyUI documentation and the growing library of templates. And if you’re looking for other AI tools to complement your workflow, explore our guide to prompting AI image generators, browse our collection of 50+ tested AI image prompts, or discover Icons8’s AI illustration generator for when you need quick vector graphics instead of photorealistic renders.