Nodes Browser

ComfyDeploy: How ComfyUI-MultiGPU works in ComfyUI?

What is ComfyUI-MultiGPU?

This extension adds CUDA device selection to supported loader nodes in ComfyUI. By monkey-patching ComfyUI’s memory management, each model component (like UNet, Clip, or VAE) can be loaded on a specific GPU. Examples included are multi-GPU workflows for SDXL, FLUX, LTXVideo, and Hunyuan Video for both standard and GGUF loader nodes.

How to install it in ComfyDeploy?

Head over to the machine page

  1. Click on the "Create a new machine" button
  2. Select the Edit build steps
  3. Add a new step -> Custom Node
  4. Search for ComfyUI-MultiGPU and select it
  5. Close the build step dialig and then click on the "Save" button to rebuild the machine

ComfyUI-MultiGPU

Experimental nodes for using multiple GPUs as well as offloading model components to the CPU in a single ComfyUI workflow

This extension adds device selection capabilities to model loading nodes in ComfyUI. It monkey patches the memory management of ComfyUI in a hacky way and is neither a comprehensive solution is nor is it well-tested on any edge-case CUDA/CPU solutions. Use at your own risk.

Note: This does not add parallelism. The workflow steps are still executed sequentially just with model components loaded on different GPUs or offloaded to the CPU where allowed. Any potential speedup comes from not having to constantly load and unload models from VRAM.

Installation

Installation via ComfyUI-Manager is preferred. Simply search for ComfyUI-MultiGPU in the list of nodes and follow installation instructions.

Manual Installation

Clone this repository inside ComfyUI/custom_nodes/.

Nodes

The extension automatically creates MultiGPU versions of loader nodes. Each MultiGPU node has the same functionality as its original counterpart but adds a device parameter that allows you to specify the GPU to use.

Currently supported nodes (automatically detected if available):

  • Standard ComfyUI model loaders:
    • CheckpointLoaderSimpleMultiGPU
    • CLIPLoaderMultiGPU
    • ControlNetLoaderMultiGPU
    • DualCLIPLoaderMultiGPU
    • TripleCLIPLoaderMultiGPU
    • UNETLoaderMultiGPU
    • VAELoaderMultiGPU
  • GGUF loaders (requires ComfyUI-GGUF):
    • UnetLoaderGGUFMultiGPU (supports quantized models like flux1-dev-gguf)
    • UnetLoaderGGUFAdvancedMultiGPU
    • CLIPLoaderGGUFMultiGPU
    • DualCLIPLoaderGGUFMultiGPU
    • TripleCLIPLoaderGGUFMultiGPU
  • XLabAI FLUX ControlNet (requires x-flux-comfy):
    • LoadFluxControlNetMultiGPU
  • Florence2 (requires ComfyUI-Florence2):
    • Florence2ModelLoaderMultiGPU
    • DownloadAndLoadFlorence2ModelMultiGPU
  • LTX Video Custom Checkpoint Loader (requires ComfyUI-LTXVideo):
    • LTXVLoaderMultiGPU
  • NF4 Checkpoint Format Loader(requires ComfyUI_bitsandbytes_NF4):
    • CheckpointLoaderNF4MultiGPU

All MultiGPU nodes available for your install can be found in the "multigpu" category in the node menu.

Example workflows

All workflows have been tested on a 2x 3090 setup.

Split FLUX.1-dev across two GPUs

  • examples/flux1dev_2gpu.json This workflow loads a FLUX.1-dev model and splits its components across two GPUs. The UNet model is loaded on GPU 1 while the text encoders and VAE are loaded on GPU 0.

Split FLUX.1-dev between the CPU and a single GPU

  • examples/flux1dev_cpu_1gpu_GGUF.json This workflow demonstrates splitting a quantized, GGUF FLUX.1-dev model between a CPU and a single GPU. The UNet model is loaded on the GPU, while the VAE and text encoders are handled by the CPU. Requires ComfyUI-GGUF.

Using GGUF quantized models across GPUs

Using GGUF quantized models across a CPU and a single GPU for video generation

  • examples/hunyuan_cpu_1gpu_GGUF.json This workflow demonstrates using quantized GGUF models for Hunyan Video split across the CPU and one GPU. In this instance, a quantized video model's UNet and VAE are on GPU 0, whereas a split of one standard and one GGUF model text encoder are on the CPU. Requires ComfyUI-GGUF.

Using GGUF quantized models across GPUs for video generation

  • examples/hunyuan_2gpu_GGUF.json This workflow demonstrates using quantized GGUF models for Hunyan Video split across multiple GPUs. In this instance, a quantized video model's UNet is on GPU 0 whereas the VAE and text encoders are on GPU 1. Requires ComfyUI-GGUF.

Loading two SDXL checkpoints on different GPUs

  • examples/sdxl_2gpu.json This workflow loads two SDXL checkpoints on two different GPUs. The first checkpoint is loaded on GPU 0, and the second checkpoint is loaded on GPU 1.

FLUX.1-dev and SDXL in the same workflow

  • examples/flux1dev_sdxl_2gpu.json This workflow loads a FLUX.1-dev model and an SDXL model in the same workflow. The FLUX.1-dev model has its UNet on GPU 1 with VAE and text encoders on GPU 0, while the SDXL model uses separate allocations on GPU 0.

Image to Prompt to Image to Video Generation Pipeline

  1. Loading the Florence2 model on the CPU and providing a starting image for analysis and generating a text response
  2. Loading FLUX.1 Dev UNET on GPU 1, with CLIP and VAE on the CPU and generating an image using the Florence2 text as a prompt
  3. Loading the LTX Video UNet and VAE on GPU 2, and LTX-encoded CLIP on the CPU, and taking the resulting FLUX.1 image and provide it as the starting image for an LTX Video image-to-video generation
  4. Generate a 5 second video based on the provided image All models are distributed across available the available CPU and GPUs with no model reloading on dual 3090s. Requires ComfyUI-GGUF and ComfyUI-LTXVideo

LLM-Guided Video Generation

  1. Using a local LLM (loaded on first GPU via llama.cpp) to take a text suggestion and craft an LTX Video promot
  2. Feeding the enhanced prompt to LTXVideo (loaded on second GPU) for video generation Requires appropriate LLM. Requires ComfyUI-GGUF.

Support

If you encounter problems, please open an issue. Attach the workflow if possible.

Credits

Originally created by Alexander Dzhoganov. Implementation improved by City96. Currently maintained by pollockjj.