(Update: We are adding Depth next. It is not ready yet as of 4/20)
ControlNet, as the name implies, is a popular method of controlling the overall pose and composition of Stable Diffusion images. It can be used for any kind of image, like a fish or a skyscraper, but it’s most frequently used to pose characters. The basic idea is that you provide a photo, the shape or skeleton of that photo is stored, and the image itself is discarded. Using that as a preset, you can then prompt into that shape, like coloring in a skeleton.
The commands look like this. To set a present, upload a photo and then:
/control /new:mypreset (while uploading an image)
And immediately after, it is ready to be filled into:
/render /pose:mypreset a polar bear ballerina
Let’s adapt an iconic Bruce Lee pose and then make a puppet do it.
First, I paste the Bruce Lee photo. The actual image contents, Bruce Lee himself, will be discarded. A ControlNet preset is only going to store an understanding of the pose information, like his bone positions or shape.
So now, the photo of Bruce Lee is gone. But the pose is in.
Here, I have named the pose “bruceleestance” so I can easily remember it later. If you look at the image below, the black area shows what the computer understood. This is called the mask.
To reveal the mask, this command is used when rendering:
/render /masks /edges:bruceleestance a strange muppet
I can now immediately render anything in the world doing this pose. I can change this to a woman, a polar bear, a ninja, an old man, anything. And they will strike that same pose. What kind of poses do you wish your images could do? Find a good clean starting point image, and create a collection.
With your Graydient PRO account, you can create an unlimited amount of poses on our cloud. In groups like PirateDiffusion, you can use existing poses created by the community, or use your own bot in private folders/projects.
It’s literally that easy, and insanely powerful.
If you make a mistake or don’t like the preset, delete it like this:
Before you upload a pose:
Prep your input image in the recommended best Stable Diffusion sizes to get the best results. Its not recommended to upload a 4k image, for reasons explained on that guide. You can always AI upscale to hit that resolution later.
For best resoluts, the input image should be the same size as your target output resolution. Otherwise, text or fine details will be lost. If you upload the image at 800×600 you’ll want to render at 800×600, otherwise you’ll get a squished 512×512 image.
Introduction to modes
ControlNet is popular because of its ease of use: A pose/shape “preset” can be created from any photo on the internet, as shown above. A preset just means the understood shape of the object, whether its a landscape or a person.
To draw an image with a saved pose (preset), you can pick from kids of controls, called modes. At the moment, we have three available:
- Pose — best for people whose joints are clearly defined, but you want to completely discard the original photo’s finer details. Just the pose.
- Edges (canny) — best for objects and obscured poses, where it creates a line drawing of the subject, like a coloring book, and fills that in
- Contours (hed) — an alternative, fine-focused version of edges. This one and Edges retains the most resemblance to the preset image
Like in the example above, there is a /masks command. Each mode outputs a different kind of masks. When rendering with edges, a mask looks like a pencil outline. Pose masks look like colorful skeleton joints (see below)
Masks tells you how well the AI understood the image. In many cases you’ll find surprise information in there, like this person detected off to the right:
ControlNet is not just for poses. Look at this re-imagining of a house by defining an interesting camera angle instead of a person or object:
When working with objects and landscapes, use the edges mode.
Combine this technique with inpaint to put characters into scenes without having to worry about getting everything right in one long prompt.
Contours looks like this:
/render /masks /contours:handbag /steps:more dramatic product photography, a fashionable leather handbag with gold buckles on an executive desk, sexy woman in the bokeh background, ((8k, intricate details, high fashion, absurdres)) <realvis20>
By calling contours combined with the realistic model “realvis20” I’m able to make thousands of realistic design variations of this handbag everyday. Seriously powerful.
Now its your turn to try it
It’s easier than it sounds. Don’t feel intimated, you can do this!
In this tutorial, we will:
- Provide you with images to learn how to make presets
- Name a preset
- Create images with Edges Mode
- Create images with Poses Mode
- Look at a Countours use case
- Send a Pose result to Remix to change the style
Make your first preset
Step 1: Open your private bot.
If you don’t have one yet, register first.
Step 2: Type (forward slash) control to see list of presets
A message like the one below will appear. Similar to your personal styles, your personal presets will be saved here as a list.
It starts off empty, like this:
You can ignore all of this text for now, we’ll explain it all below.
Step 3: Download this image to your local device
Step 4: Send that image into your private bot chat
Next, copy and paste the image (or) upload it to your private bot. Check the “use compression” box if asked.
Notes: Don’t forward the image or paste the URL though: literally get that sucker in there as a binary file. The actual photo needs be added to the room’s media gallery (not linked externally)
For experts: You can combine steps 4-6 into one by sending the command in the text field of your upload. You don’t have to, but it’s faster.
Step 5: Select the image by replying to it, like it’s people
Just like you do for Remix and Facelift, select the image in your chat by tapping on it, and long press or right click to Reply to the picture, as if you were going to talk to it. See the figure below, where it says Reply under the emoji. This will make a text box appear at the bottom with the image selected.
A thumbnail of the image should now appear near the bottom of your device. Move your mouse or tap that chat area to continue.
Step 6: Send a chat command that gives the preset a name
I’m going to call this preset “crossing”, so I will reply with:
Important: Notice that I’m not deciding at this very moment if this picture is going to be used in a pose or an edge.
You can use a preset with any mode of ControlNet later. You don’t have to decide at creation, you’re only saving the picture. If you did it correctly, the bot will then reply with “new preset defined!”:
If you didn’t encounter an error, then skip the troubleshooting section below.
If you get an error here like “unknown prompt” it means that Pirate Diffusion is not an administrator in the group, and has no access to photos. This is what a bad group configuration looks like. Your image will be ignored, in this case:
That group’s owner must edit the group, edit administrators, and add PirateDiffusion as an administrator, and then use the /email command to give that group rendering perks again. Also — images previously sent to the room cannot be read, so delete the image and load it again after the bot is an admin. Make sure:
- The image is selected properly before replying
- The image was uploaded with “compression on”
- PirateDiffusion is an administrator of your group, or it cannot read images at all. Ask the group owner to bump it up:
- Try /settings /concept:none. Remember, ControlNet is its own model, targeting concepts won’t work, and forcing a model will just disable ControlNet to fail. Do that as a second step with /remix instead.
Step 7: Let’s try out your preset in a render
I recommend using /render instead of /brew, for this tutorial:
/render /edges:crossing Freddy Krueger
^ here I’m saying, use render (don’t add extra words) and use the “edges” mode of ControlNet (we don’t have to write /control when making images) and call my preset called “crossing”, followed by my prompt “Freddy Krueger”
Pretty cute, even for Freddy.
Debugging with /masks
Want to be sure if ControlNet understood your shape correctly? Add the command /masks after render, and before the mode, to see how it “understood” the input. If you’re getting distorted people from a pose, this will help you understand why. Here’s what the edges of my first preset look like:
/render /masks /edges:crossing freddy kruger
This image has clear edges, so we know it’s going to work fine. But did ControlNet understand this image to use with pose?
/render /masks /pose:crossing Freddy Krueger
Results – See image 2, the black box? Unsurprisingly, a diamond shaped sign with a tiny cartoon isn’t a clear and convincing human pose.
Had it worked, the pose skeleton appears in image 2. Thus, the resulting images are random images and not controlled at all. By debugging with /masks, you can pick out problematic poses and bad presets.
COUNTOURS aka HED mode is similar to Edges. Compare the HED outputs to the EDGE (canny) output below. In some cases, Contours is a better choice.
Next, let’s use a human photo for a preset instead.
You can technically render the above preset as a pose, but that’s going to be a weird looking square person. Poses work best when the limbs of a person are clearly defined and in clear view.
Here I’m uploading a photo I found of a turnaround drawing. Let’s say I want to turn this image into different characters.
First, I send the image. I can reply to it, or I can save time by defining my preset right in the caption of the image, like this:
Optional – check your masks and see if they are viable.
/render /masks /pose:turnaround
Tada! Look at those colorful skeletons in image #2. It understood it well.
Upon closer inspection, you can see clean bone setups with no confusion.
Let’s get back to our ControlNet project.
Typing /control now shows that I have defined 2 presets. A list is forming, and a new /show option appears. It allows me to inspect poses I’ve uploaded before, right from the chat.
Now I can render using my new preset, this time as a pose:
/render /pose:turnaround Lara Croft
With the general image now intact, I can take it into another style, upscale it, and do whatever I want with it. I can use remix to change into a different model that isn’t supported by ControlNet yet, like this:
/remix A realistic photo of Lara Croft <realvis14>
Fine tune with controlguidance
You can control how much the effect is applied using a parameter for guidance, specific to the mode. This is different than /guidance, which pertains to the prompt (content information). ControlGuidance is a value between from 0.1 (lowest) to 2 (max).
/render Scary people /pose:dancing /controlguidance:1.3
Remember, your presets are personal
For Telegram users, remember that ControlNet presets are ROOM SPECIFIC so check each room for different poses. We do not reserve the names of poses, so keep in mind that other users in group rooms may use similar names. Use the /control /show:name command to see what’s going on.
Look how easy and quick it is to do after you get the hang of it:
What is the difference between Remix and ControlNet?
Remix is unaware of things like joints and anatomy. Remix is a raw style transfer tool, where you can select one of our many AI models to transfer into. Remix requires models to be formatted in a certain way.
ControlNet, on the other hand, excels at posing human limbs. However, it needs specific models to work. By default, it renders in stock Stable Diffusion. Unfortunately, the Control Net format is different from Remix. We hope that they become compatible in time, but right now, it’s an extra step.
Thus, a popular workflow is:
>> define a pose >> render the pose >> remix into your ideal concept
Later, when entering group rooms, you will find that people have already added poses to the room.
Feel free to join in the fun, and use their poses, as well as add your own.
Additional ControlNet Modes
There are many modes, and for the average person, the results will not be much different, so we didn’t want to overload you with complexity when we first launched this feature. As we hear more from our community and as this technology evolves, we will add more modes by request. Things are evolving so fast now, that we wouldn’t be surprised if a new technique replaces ControlNet tomorrow, and we’ll add that, too.