CO
Open SourceImage Generationby Stanford University (Lvmin Zhang)

ControlNet

ControlNet adds precise spatial control to Stable Diffusion: condition image generation on edges, depth, human pose, segmentation and more, so outputs follow a reference structure. Open-source under Apache 2.0.

ai-artcontrolnetimage-generationopen-source-aipose-controlstable-diffusion
Quick facts
LicenseApache 2.0
TypeDiffusion Control
BaseStable Diffusion
ConditionsMulti
No ratings yet — be the first
Type
Diffusion control
conditioning net
Conditions
Edges/depth/pose
and more
License
Apache 2.0
open source
Runs on
Consumer GPU
with Stable Diffusion

What is ControlNet?

ControlNet is a neural-network add-on that gives precise spatial control to diffusion image generators such as Stable Diffusion. Introduced by Lvmin Zhang and collaborators (Stanford), it lets you condition generation on a structural reference — Canny edges, a depth map, a human pose skeleton, a segmentation map, scribbles, normals and more — so the output follows that structure while the text prompt controls style and content. It is open source under Apache 2.0.

How it works

ControlNet attaches a trainable copy of the diffusion model's encoder to a frozen base model, connected through 'zero-convolution' layers that start at zero so training does not disrupt the original weights. Each ControlNet is trained for one condition type (e.g. Canny-to-image), and at generation time it injects the structural guidance into the diffusion process. You supply a control image plus a prompt, and the base model renders a new image that respects the control map's geometry while the prompt still governs style and content. In effect, the prompt says what to draw and the control image says where everything goes.

What it is good at

It solves diffusion's biggest practical weakness — controllability. With ControlNet you can hold a composition, pose or layout fixed while varying style, turn a rough sketch into a finished image, re-pose a character, colourise or restyle while preserving structure, and keep architectural or product lines accurate. Multiple ControlNets can be combined (e.g. pose + depth) for layered control, which is why it became a mainstay of serious Stable Diffusion workflows.

Licensing & access

ControlNet's code and the original models are open source (Apache 2.0), with weights on Hugging Face and native support in the Diffusers library, ComfyUI and Automatic1111. It runs locally on a consumer GPU alongside a Stable Diffusion checkpoint, so there are no per-image fees. A wide ecosystem of community-trained ControlNets exists for additional conditions and for newer base models like SDXL.

Practical considerations

ControlNet runs on top of a base diffusion model, so you need a compatible Stable Diffusion checkpoint and enough VRAM for both. You usually pre-process the control image (run a Canny detector, depth estimator or pose detector) to produce the guidance map, and you match the ControlNet to your base version (SD 1.5 vs SDXL). Results depend on choosing the right condition and tuning its strength.

How it compares

Where plain Stable Diffusion gives you prompt-only control, ControlNet adds structural conditioning that earlier approaches lacked. Pix2Pix learns a fixed image-to-image translation for one paired domain; DreamBooth teaches a model a specific subject. ControlNet is more general and composable — a reusable control layer for many conditions on top of any compatible diffusion model — which is why it is a standard part of controllable-generation pipelines.

Getting started

The easiest route is the Diffusers library: load a Stable Diffusion pipeline with a ControlNet checkpoint, prepare a control image (for example by running Canny edge detection on a reference), and generate with both the prompt and the control map. In a GUI like ComfyUI or Automatic1111 you simply add a ControlNet node or tab, pick the condition and preprocessor, and adjust the control strength.

Model variants

MOST POPULAR

Canny Edge

~1.45B
Edges

Edge-guided generation

MOST POPULAR

Depth

~1.45B
Depth

Depth-map conditioning

MOST POPULAR

OpenPose

~1.45B
Pose

Human-pose control

Segmentation

~1.45B
Seg

Semantic-map conditioning

Capabilities

🧭
Structural conditioning
Guide generation with edges, depth, pose, segmentation, scribbles, normals and more.
🧱
Zero-convolution design
A trainable encoder copy connects to the frozen base without disrupting its weights.
Composable controls
Combine several ControlNets to constrain pose, depth and layout together.
🖼️
Works with the ecosystem
First-class support in Diffusers, ComfyUI and Automatic1111.

Pros & Cons

Pros6
  • Precise structural control over diffusion
  • Many conditions: edges, depth, pose, segmentation
  • Composable — stack multiple controls
  • Open source (Apache 2.0)
  • Runs locally on a consumer GPU
  • Huge community and ecosystem
Cons4
  • Requires a compatible base diffusion model
  • Needs extra VRAM for the control branch
  • Control images usually need preprocessing
  • Must match the ControlNet to the base version

Inspiration

ControlNet use cases & project ideas

Sketch to image

Turn line art or scribbles into finished art.

Pose control

Generate characters in an exact pose.

Layout-preserving restyle

Restyle a scene while keeping structure.

Depth-guided render

Render new images from a depth map.

FAQ

Frequently asked questions

It conditions a diffusion model on a structural reference (edges, depth, pose, segmentation, etc.) so generated images follow that geometry.