Yes. It is open source under Apache 2.0 with weights on Hugging Face, and runs locally.

CO

Open SourceImage Generationby Stanford University (Lvmin Zhang)

ControlNet

ControlNet adds precise spatial control to Stable Diffusion: condition image generation on edges, depth, human pose, segmentation and more, so outputs follow a reference structure. Open-source under Apache 2.0.

ai-artcontrolnetimage-generationopen-source-aipose-controlstable-diffusion

View on GitHub

Quick facts

LicenseApache 2.0

TypeDiffusion Control

BaseStable Diffusion

ConditionsMulti

No ratings yet — be the first

Type

Diffusion control

conditioning net

Conditions

Edges/depth/pose

and more

License

Apache 2.0

open source

Runs on

Consumer GPU

with Stable Diffusion

What is ControlNet?

ControlNet is a neural-network add-on that gives precise spatial control to diffusion image generators such as Stable Diffusion. Introduced by Lvmin Zhang and collaborators (Stanford), it lets you condition generation on a structural reference — Canny edges, a depth map, a human pose skeleton, a segmentation map, scribbles, normals and more — so the output follows that structure while the text prompt controls style and content. It is open source under Apache 2.0.

How it works

ControlNet attaches a trainable copy of the diffusion model's encoder to a frozen base model, connected through 'zero-convolution' layers that start at zero so training does not disrupt the original weights. Each ControlNet is trained for one condition type (e.g. Canny-to-image), and at generation time it injects the structural guidance into the diffusion process. You supply a control image plus a prompt, and the base model renders a new image that respects the control map's geometry while the prompt still governs style and content. In effect, the prompt says what to draw and the control image says where everything goes.

What it is good at

It solves diffusion's biggest practical weakness — controllability. With ControlNet you can hold a composition, pose or layout fixed while varying style, turn a rough sketch into a finished image, re-pose a character, colourise or restyle while preserving structure, and keep architectural or product lines accurate. Multiple ControlNets can be combined (e.g. pose + depth) for layered control, which is why it became a mainstay of serious Stable Diffusion workflows.

Licensing & access

ControlNet's code and the original models are open source (Apache 2.0), with weights on Hugging Face and native support in the Diffusers library, ComfyUI and Automatic1111. It runs locally on a consumer GPU alongside a Stable Diffusion checkpoint, so there are no per-image fees. A wide ecosystem of community-trained ControlNets exists for additional conditions and for newer base models like SDXL.

Practical considerations

ControlNet runs on top of a base diffusion model, so you need a compatible Stable Diffusion checkpoint and enough VRAM for both. You usually pre-process the control image (run a Canny detector, depth estimator or pose detector) to produce the guidance map, and you match the ControlNet to your base version (SD 1.5 vs SDXL). Results depend on choosing the right condition and tuning its strength.

How it compares

Where plain Stable Diffusion gives you prompt-only control, ControlNet adds structural conditioning that earlier approaches lacked. Pix2Pix learns a fixed image-to-image translation for one paired domain; DreamBooth teaches a model a specific subject. ControlNet is more general and composable — a reusable control layer for many conditions on top of any compatible diffusion model — which is why it is a standard part of controllable-generation pipelines.

Getting started

The easiest route is the Diffusers library: load a Stable Diffusion pipeline with a ControlNet checkpoint, prepare a control image (for example by running Canny edge detection on a reference), and generate with both the prompt and the control map. In a GUI like ComfyUI or Automatic1111 you simply add a ControlNet node or tab, pick the condition and preprocessor, and adjust the control strength.

Model variants

Canny Edge

~1.45B

Edges

Edge-guided generation

Depth

~1.45B

Depth

Depth-map conditioning

OpenPose

~1.45B

Pose

Human-pose control

Segmentation

~1.45B

Seg

Semantic-map conditioning

Capabilities

🧭

Structural conditioning

Guide generation with edges, depth, pose, segmentation, scribbles, normals and more.

🧱

Zero-convolution design

A trainable encoder copy connects to the frozen base without disrupting its weights.

➕

Composable controls

Combine several ControlNets to constrain pose, depth and layout together.

🖼️

Works with the ecosystem

First-class support in Diffusers, ComfyUI and Automatic1111.

Pros & Cons

Pros6

Precise structural control over diffusion
Many conditions: edges, depth, pose, segmentation
Composable — stack multiple controls
Open source (Apache 2.0)
Runs locally on a consumer GPU
Huge community and ecosystem

Cons4

Requires a compatible base diffusion model
Needs extra VRAM for the control branch
Control images usually need preprocessing
Must match the ControlNet to the base version

Inspiration

ControlNet use cases & project ideas

Sketch to image

Turn line art or scribbles into finished art.

Pose control

Generate characters in an exact pose.

Layout-preserving restyle

Restyle a scene while keeping structure.

Depth-guided render

Render new images from a depth map.

FAQ

Frequently asked questions

What does ControlNet do?+

It conditions a diffusion model on a structural reference (edges, depth, pose, segmentation, etc.) so generated images follow that geometry.

Do I need Stable Diffusion to use it?+

Is ControlNet free?+

Can I combine multiple ControlNets?+

Do I have to preprocess the control image?+

More to explore

Learn more

From our blog

Tutorials

ControlNet

What is ControlNet?

How it works

What it is good at

Licensing & access

Practical considerations

How it compares

Getting started

Model variants

Canny Edge

Depth

OpenPose

Segmentation

Capabilities

Pros & Cons

ControlNet use cases & project ideas

Sketch to image

Pose control

Layout-preserving restyle

Depth-guided render

Frequently asked questions

You might also like

From our blog

Claude Fable 5: What's New and How to Use It (2026)

Build a Telegram Bot with a Free API in Python (2026)

Best Free Text-to-Speech APIs in 2026