From 49b959b5408b97274e2ee423059d9239445aea26 Mon Sep 17 00:00:00 2001 From: Steven Liu <59462357+stevhliu@users.noreply.github.com> Date: Fri, 3 May 2024 16:08:27 -0700 Subject: [PATCH] [docs] LCM (#7829) * lcm * lcm lora * fix * fix hfoption * edits --- docs/source/en/_toctree.yml | 6 +- .../en/using-diffusers/inference_with_lcm.md | 465 ++++++++++++++++-- .../inference_with_lcm_lora.md | 422 ---------------- 3 files changed, 413 insertions(+), 480 deletions(-) delete mode 100644 docs/source/en/using-diffusers/inference_with_lcm_lora.md diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index f2755798b7..89af55ed2a 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -81,16 +81,14 @@ title: ControlNet - local: using-diffusers/t2i_adapter title: T2I-Adapter + - local: using-diffusers/inference_with_lcm + title: Latent Consistency Model - local: using-diffusers/textual_inversion_inference title: Textual inversion - local: using-diffusers/shap-e title: Shap-E - local: using-diffusers/diffedit title: DiffEdit - - local: using-diffusers/inference_with_lcm_lora - title: Latent Consistency Model-LoRA - - local: using-diffusers/inference_with_lcm - title: Latent Consistency Model - local: using-diffusers/inference_with_tcd_lora title: Trajectory Consistency Distillation-LoRA - local: using-diffusers/svd diff --git a/docs/source/en/using-diffusers/inference_with_lcm.md b/docs/source/en/using-diffusers/inference_with_lcm.md index 798de67c65..19fb349c54 100644 --- a/docs/source/en/using-diffusers/inference_with_lcm.md +++ b/docs/source/en/using-diffusers/inference_with_lcm.md @@ -10,29 +10,30 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -[[open-in-colab]] - # Latent Consistency Model -Latent Consistency Models (LCM) enable quality image generation in typically 2-4 steps making it possible to use diffusion models in almost real-time settings. +[[open-in-colab]] -From the [official website](https://latent-consistency-models.github.io/): +[Latent Consistency Models (LCMs)](https://hf.co/papers/2310.04378) enable fast high-quality image generation by directly predicting the reverse diffusion process in the latent rather than pixel space. In other words, LCMs try to predict the noiseless image from the noisy image in contrast to typical diffusion models that iteratively remove noise from the noisy image. By avoiding the iterative sampling process, LCMs are able to generate high-quality images in 2-4 steps instead of 20-30 steps. -> LCMs can be distilled from any pre-trained Stable Diffusion (SD) in only 4,000 training steps (~32 A100 GPU Hours) for generating high quality 768 x 768 resolution images in 2~4 steps or even one step, significantly accelerating text-to-image generation. We employ LCM to distill the Dreamshaper-V7 version of SD in just 4,000 training iterations. +LCMs are distilled from pretrained models which requires ~32 hours of A100 compute. To speed this up, [LCM-LoRAs](https://hf.co/papers/2311.05556) train a [LoRA adapter](https://huggingface.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora) which have much fewer parameters to train compared to the full model. The LCM-LoRA can be plugged into a diffusion model once it has been trained. -For a more technical overview of LCMs, refer to [the paper](https://huggingface.co/papers/2310.04378). +This guide will show you how to use LCMs and LCM-LoRAs for fast inference on tasks and how to use them with other adapters like ControlNet or T2I-Adapter. -LCM distilled models are available for [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5), [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), and the [SSD-1B](https://huggingface.co/segmind/SSD-1B) model. All the checkpoints can be found in this [collection](https://huggingface.co/collections/latent-consistency/latent-consistency-models-weights-654ce61a95edd6dffccef6a8). - -This guide shows how to perform inference with LCMs for -- text-to-image -- image-to-image -- combined with style LoRAs -- ControlNet/T2I-Adapter +> [!TIP] +> LCMs and LCM-LoRAs are available for Stable Diffusion v1.5, Stable Diffusion XL, and the SSD-1B model. You can find their checkpoints on the [Latent Consistency](https://hf.co/collections/latent-consistency/latent-consistency-models-weights-654ce61a95edd6dffccef6a8) Collections. ## Text-to-image -You'll use the [`StableDiffusionXLPipeline`] pipeline with the [`LCMScheduler`] and then load the LCM-LoRA. Together with the LCM-LoRA and the scheduler, the pipeline enables a fast inference workflow, overcoming the slow iterative nature of diffusion models. + + + +To use LCMs, you need to load the LCM checkpoint for your supported model into [`UNet2DConditionModel`] and replace the scheduler with the [`LCMScheduler`]. Then you can use the pipeline as usual, and pass a text prompt to generate an image in just 4 steps. + +A couple of notes to keep in mind when using LCMs are: + +* Typically, batch size is doubled inside the pipeline for classifier-free guidance. But LCM applies guidance with guidance embeddings and doesn't need to double the batch size, which leads to faster inference. The downside is that negative prompts don't work with LCM because they don't have any effect on the denoising process. +* The ideal range for `guidance_scale` is [3., 13.] because that is what the UNet was trained with. However, disabling `guidance_scale` with a value of 1.0 is also effective in most cases. ```python from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, LCMScheduler @@ -49,31 +50,69 @@ pipe = StableDiffusionXLPipeline.from_pretrained( pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k" - generator = torch.manual_seed(0) image = pipe( prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=8.0 ).images[0] +image ``` -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_full_sdxl_t2i.png) +
+ +
-Notice that we use only 4 steps for generation which is way less than what's typically used for standard SDXL. +
+ -Some details to keep in mind: +To use LCM-LoRAs, you need to replace the scheduler with the [`LCMScheduler`] and load the LCM-LoRA weights with the [`~loaders.LoraLoaderMixin.load_lora_weights`] method. Then you can use the pipeline as usual, and pass a text prompt to generate an image in just 4 steps. -* To perform classifier-free guidance, batch size is usually doubled inside the pipeline. LCM, however, applies guidance using guidance embeddings, so the batch size does not have to be doubled in this case. This leads to a faster inference time, with the drawback that negative prompts don't have any effect on the denoising process. -* The UNet was trained using the [3., 13.] guidance scale range. So, that is the ideal range for `guidance_scale`. However, disabling `guidance_scale` using a value of 1.0 is also effective in most cases. +A couple of notes to keep in mind when using LCM-LoRAs are: +* Typically, batch size is doubled inside the pipeline for classifier-free guidance. But LCM applies guidance with guidance embeddings and doesn't need to double the batch size, which leads to faster inference. The downside is that negative prompts don't work with LCM because they don't have any effect on the denoising process. +* You could use guidance with LCM-LoRAs, but it is very sensitive to high `guidance_scale` values and can lead to artifacts in the generated image. The best values we've found are between [1.0, 2.0]. +* Replace [stabilityai/stable-diffusion-xl-base-1.0](https://hf.co/stabilityai/stable-diffusion-xl-base-1.0) with any finetuned model. For example, try using the [animagine-xl](https://huggingface.co/Linaqruf/animagine-xl) checkpoint to generate anime images with SDXL. + +```py +import torch +from diffusers import DiffusionPipeline, LCMScheduler + +pipe = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", + variant="fp16", + torch_dtype=torch.float16 +).to("cuda") +pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) +pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl") + +prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k" +generator = torch.manual_seed(42) +image = pipe( + prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=1.0 +).images[0] +image +``` + +
+ +
+ +
+
## Image-to-image -LCMs can be applied to image-to-image tasks too. For this example, we'll use the [LCM_Dreamshaper_v7](https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7) model, but the same steps can be applied to other LCM models as well. + + + +To use LCMs for image-to-image, you need to load the LCM checkpoint for your supported model into [`UNet2DConditionModel`] and replace the scheduler with the [`LCMScheduler`]. Then you can use the pipeline as usual, and pass a text prompt and initial image to generate an image in just 4 steps. + +> [!TIP] +> Experiment with different values for `num_inference_steps`, `strength`, and `guidance_scale` to get the best results. ```python import torch from diffusers import AutoPipelineForImage2Image, UNet2DConditionModel, LCMScheduler -from diffusers.utils import make_image_grid, load_image +from diffusers.utils import load_image unet = UNet2DConditionModel.from_pretrained( "SimianLuo/LCM_Dreamshaper_v7", @@ -89,12 +128,8 @@ pipe = AutoPipelineForImage2Image.from_pretrained( ).to("cuda") pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) -# prepare image -url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png" -init_image = load_image(url) +init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png") prompt = "Astronauts in a jungle, cold color palette, muted colors, detailed, 8k" - -# pass prompt and image to pipeline generator = torch.manual_seed(0) image = pipe( prompt, @@ -104,22 +139,130 @@ image = pipe( strength=0.5, generator=generator ).images[0] -make_image_grid([init_image, image], rows=1, cols=2) +image ``` -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_full_sdv1-5_i2i.png) +
+
+ +
initial image
+
+
+ +
generated image
+
+
+
+ - +To use LCM-LoRAs for image-to-image, you need to replace the scheduler with the [`LCMScheduler`] and load the LCM-LoRA weights with the [`~loaders.LoraLoaderMixin.load_lora_weights`] method. Then you can use the pipeline as usual, and pass a text prompt and initial image to generate an image in just 4 steps. -You can get different results based on your prompt and the image you provide. To get the best results, we recommend trying different values for `num_inference_steps`, `strength`, and `guidance_scale` parameters and choose the best one. +> [!TIP] +> Experiment with different values for `num_inference_steps`, `strength`, and `guidance_scale` to get the best results. - +```py +import torch +from diffusers import AutoPipelineForImage2Image, LCMScheduler +from diffusers.utils import make_image_grid, load_image +pipe = AutoPipelineForImage2Image.from_pretrained( + "Lykon/dreamshaper-7", + torch_dtype=torch.float16, + variant="fp16", +).to("cuda") -## Combine with style LoRAs +pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) -LCMs can be used with other styled LoRAs to generate styled-images in very few steps (4-8). In the following example, we'll use the [papercut LoRA](TheLastBen/Papercut_SDXL). +pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5") + +init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png") +prompt = "Astronauts in a jungle, cold color palette, muted colors, detailed, 8k" + +generator = torch.manual_seed(0) +image = pipe( + prompt, + image=init_image, + num_inference_steps=4, + guidance_scale=1, + strength=0.6, + generator=generator +).images[0] +image +``` + +
+
+ +
initial image
+
+
+ +
generated image
+
+
+ +
+
+ +## Inpainting + +To use LCM-LoRAs for inpainting, you need to replace the scheduler with the [`LCMScheduler`] and load the LCM-LoRA weights with the [`~loaders.LoraLoaderMixin.load_lora_weights`] method. Then you can use the pipeline as usual, and pass a text prompt, initial image, and mask image to generate an image in just 4 steps. + +```py +import torch +from diffusers import AutoPipelineForInpainting, LCMScheduler +from diffusers.utils import load_image, make_image_grid + +pipe = AutoPipelineForInpainting.from_pretrained( + "runwayml/stable-diffusion-inpainting", + torch_dtype=torch.float16, + variant="fp16", +).to("cuda") + +pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5") + +init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png") +mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png") + +prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" +generator = torch.manual_seed(0) +image = pipe( + prompt=prompt, + image=init_image, + mask_image=mask_image, + generator=generator, + num_inference_steps=4, + guidance_scale=4, +).images[0] +image +``` + +
+
+ +
initial image
+
+
+ +
generated image
+
+
+ +## Adapters + +LCMs are compatible with adapters like LoRA, ControlNet, T2I-Adapter, and AnimateDiff. You can bring the speed of LCMs to these adapters to generate images in a certain style or condition the model on another input like a canny image. + +### LoRA + +[LoRA](../using-diffusers/loading_adapters#lora) adapters can be rapidly finetuned to learn a new style from just a few images and plugged into a pretrained model to generate images in that style. + + + + +Load the LCM checkpoint for your supported model into [`UNet2DConditionModel`] and replace the scheduler with the [`LCMScheduler`]. Then you can use the [`~loaders.LoraLoaderMixin.load_lora_weights`] method to load the LoRA weights into the LCM and generate a styled image in a few steps. ```python from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, LCMScheduler @@ -134,11 +277,9 @@ pipe = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16, variant="fp16", ).to("cuda") pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) - pipe.load_lora_weights("TheLastBen/Papercut_SDXL", weight_name="papercut.safetensors", adapter_name="papercut") prompt = "papercut, a cute fox" - generator = torch.manual_seed(0) image = pipe( prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=8.0 @@ -146,15 +287,58 @@ image = pipe( image ``` -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_full_sdx_lora_mix.png) +
+ +
+
+ -## ControlNet/T2I-Adapter +Replace the scheduler with the [`LCMScheduler`]. Then you can use the [`~loaders.LoraLoaderMixin.load_lora_weights`] method to load the LCM-LoRA weights and the style LoRA you want to use. Combine both LoRA adapters with the [`~loaders.UNet2DConditionLoadersMixin.set_adapters`] method and generate a styled image in a few steps. -Let's look at how we can perform inference with ControlNet/T2I-Adapter and a LCM. +```py +import torch +from diffusers import DiffusionPipeline, LCMScheduler + +pipe = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", + variant="fp16", + torch_dtype=torch.float16 +).to("cuda") + +pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl", adapter_name="lcm") +pipe.load_lora_weights("TheLastBen/Papercut_SDXL", weight_name="papercut.safetensors", adapter_name="papercut") + +pipe.set_adapters(["lcm", "papercut"], adapter_weights=[1.0, 0.8]) + +prompt = "papercut, a cute fox" +generator = torch.manual_seed(0) +image = pipe(prompt, num_inference_steps=4, guidance_scale=1, generator=generator).images[0] +image +``` + +
+ +
+ +
+
### ControlNet -For this example, we'll use the [LCM_Dreamshaper_v7](https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7) model with canny ControlNet, but the same steps can be applied to other LCM models as well. + +[ControlNet](./controlnet) are adapters that can be trained on a variety of inputs like canny edge, pose estimation, or depth. The ControlNet can be inserted into the pipeline to provide additional conditioning and control to the model for more accurate generation. + +You can find additional ControlNet models trained on other inputs in [lllyasviel's](https://hf.co/lllyasviel) repository. + + + + +Load a ControlNet model trained on canny images and pass it to the [`ControlNetModel`]. Then you can load a LCM model into [`StableDiffusionControlNetPipeline`] and replace the scheduler with the [`LCMScheduler`]. Now pass the canny image to the pipeline and generate an image. + +> [!TIP] +> Experiment with different values for `num_inference_steps`, `controlnet_conditioning_scale`, `cross_attention_kwargs`, and `guidance_scale` to get the best results. ```python import torch @@ -186,8 +370,6 @@ pipe = StableDiffusionControlNetPipeline.from_pretrained( torch_dtype=torch.float16, safety_checker=None, ).to("cuda") - -# set scheduler pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) generator = torch.manual_seed(0) @@ -200,16 +382,84 @@ image = pipe( make_image_grid([canny_image, image], rows=1, cols=2) ``` -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_full_sdv1-5_controlnet.png) +
+ +
+
+ - -The inference parameters in this example might not work for all examples, so we recommend trying different values for the `num_inference_steps`, `guidance_scale`, `controlnet_conditioning_scale`, and `cross_attention_kwargs` parameters and choosing the best one. - +Load a ControlNet model trained on canny images and pass it to the [`ControlNetModel`]. Then you can load a Stable Diffusion v1.5 model into [`StableDiffusionControlNetPipeline`] and replace the scheduler with the [`LCMScheduler`]. Use the [`~loaders.LoraLoaderMixin.load_lora_weights`] method to load the LCM-LoRA weights, and pass the canny image to the pipeline and generate an image. + +> [!TIP] +> Experiment with different values for `num_inference_steps`, `controlnet_conditioning_scale`, `cross_attention_kwargs`, and `guidance_scale` to get the best results. + +```py +import torch +import cv2 +import numpy as np +from PIL import Image + +from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, LCMScheduler +from diffusers.utils import load_image + +image = load_image( + "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png" +).resize((512, 512)) + +image = np.array(image) + +low_threshold = 100 +high_threshold = 200 + +image = cv2.Canny(image, low_threshold, high_threshold) +image = image[:, :, None] +image = np.concatenate([image, image, image], axis=2) +canny_image = Image.fromarray(image) + +controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16) +pipe = StableDiffusionControlNetPipeline.from_pretrained( + "runwayml/stable-diffusion-v1-5", + controlnet=controlnet, + torch_dtype=torch.float16, + safety_checker=None, + variant="fp16" +).to("cuda") + +pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5") + +generator = torch.manual_seed(0) +image = pipe( + "the mona lisa", + image=canny_image, + num_inference_steps=4, + guidance_scale=1.5, + controlnet_conditioning_scale=0.8, + cross_attention_kwargs={"scale": 1}, + generator=generator, +).images[0] +image +``` + +
+ +
+ +
+
### T2I-Adapter -This example shows how to use the `lcm-sdxl` with the [Canny T2I-Adapter](TencentARC/t2i-adapter-canny-sdxl-1.0). +[T2I-Adapter](./t2i_adapter) is an even more lightweight adapter than ControlNet, that provides an additional input to condition a pretrained model with. It is faster than ControlNet but the results may be slightly worse. + +You can find additional T2I-Adapter checkpoints trained on other inputs in [TencentArc's](https://hf.co/TencentARC) repository. + + + + +Load a T2IAdapter trained on canny images and pass it to the [`StableDiffusionXLAdapterPipeline`]. Then load a LCM checkpoint into [`UNet2DConditionModel`] and replace the scheduler with the [`LCMScheduler`]. Now pass the canny image to the pipeline and generate an image. ```python import torch @@ -220,10 +470,9 @@ from PIL import Image from diffusers import StableDiffusionXLAdapterPipeline, UNet2DConditionModel, T2IAdapter, LCMScheduler from diffusers.utils import load_image, make_image_grid -# Prepare image -# Detect the canny map in low resolution to avoid high-frequency details +# detect the canny map in low resolution to avoid high-frequency details image = load_image( - "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg" + "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png" ).resize((384, 384)) image = np.array(image) @@ -236,7 +485,6 @@ image = image[:, :, None] image = np.concatenate([image, image, image], axis=2) canny_image = Image.fromarray(image).resize((1024, 1216)) -# load adapter adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda") unet = UNet2DConditionModel.from_pretrained( @@ -254,7 +502,7 @@ pipe = StableDiffusionXLAdapterPipeline.from_pretrained( pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) -prompt = "Mystical fairy in real, magic, 4k picture, high quality" +prompt = "the mona lisa, 4k picture, high quality" negative_prompt = "extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured" generator = torch.manual_seed(0) @@ -268,7 +516,116 @@ image = pipe( adapter_conditioning_factor=1, generator=generator, ).images[0] -grid = make_image_grid([canny_image, image], rows=1, cols=2) ``` -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_full_sdxl_t2iadapter.png) +
+ +
+ +
+ + +Load a T2IAdapter trained on canny images and pass it to the [`StableDiffusionXLAdapterPipeline`]. Replace the scheduler with the [`LCMScheduler`], and use the [`~loaders.LoraLoaderMixin.load_lora_weights`] method to load the LCM-LoRA weights. Pass the canny image to the pipeline and generate an image. + +```py +import torch +import cv2 +import numpy as np +from PIL import Image + +from diffusers import StableDiffusionXLAdapterPipeline, UNet2DConditionModel, T2IAdapter, LCMScheduler +from diffusers.utils import load_image, make_image_grid + +# detect the canny map in low resolution to avoid high-frequency details +image = load_image( + "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png" +).resize((384, 384)) + +image = np.array(image) + +low_threshold = 100 +high_threshold = 200 + +image = cv2.Canny(image, low_threshold, high_threshold) +image = image[:, :, None] +image = np.concatenate([image, image, image], axis=2) +canny_image = Image.fromarray(image).resize((1024, 1024)) + +adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda") + +pipe = StableDiffusionXLAdapterPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", + adapter=adapter, + torch_dtype=torch.float16, + variant="fp16", +).to("cuda") + +pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl") + +prompt = "the mona lisa, 4k picture, high quality" +negative_prompt = "extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured" + +generator = torch.manual_seed(0) +image = pipe( + prompt=prompt, + negative_prompt=negative_prompt, + image=canny_image, + num_inference_steps=4, + guidance_scale=1.5, + adapter_conditioning_scale=0.8, + adapter_conditioning_factor=1, + generator=generator, +).images[0] +``` + +
+ +
+ +
+
+ +### AnimateDiff + +[AnimateDiff](../api/pipelines/animatediff) is an adapter that adds motion to an image. It can be used with most Stable Diffusion models, effectively turning them into "video generation" models. Generating good results with a video model usually requires generating multiple frames (16-24), which can be very slow with a regular Stable Diffusion model. LCM-LoRA can speed up this process by only taking 4-8 steps for each frame. + +Load a [`AnimateDiffPipeline`] and pass a [`MotionAdapter`] to it. Then replace the scheduler with the [`LCMScheduler`], and combine both LoRA adapters with the [`~loaders.UNet2DConditionLoadersMixin.set_adapters`] method. Now you can pass a prompt to the pipeline and generate an animated image. + +```py +import torch +from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler, LCMScheduler +from diffusers.utils import export_to_gif + +adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5") +pipe = AnimateDiffPipeline.from_pretrained( + "frankjoshua/toonyou_beta6", + motion_adapter=adapter, +).to("cuda") + +# set scheduler +pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) + +# load LCM-LoRA +pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5", adapter_name="lcm") +pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-in", weight_name="diffusion_pytorch_model.safetensors", adapter_name="motion-lora") + +pipe.set_adapters(["lcm", "motion-lora"], adapter_weights=[0.55, 1.2]) + +prompt = "best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress" +generator = torch.manual_seed(0) +frames = pipe( + prompt=prompt, + num_inference_steps=5, + guidance_scale=1.25, + cross_attention_kwargs={"scale": 1}, + num_frames=24, + generator=generator +).frames[0] +export_to_gif(frames, "animation.gif") +``` + +
+ +
diff --git a/docs/source/en/using-diffusers/inference_with_lcm_lora.md b/docs/source/en/using-diffusers/inference_with_lcm_lora.md deleted file mode 100644 index 36120a0482..0000000000 --- a/docs/source/en/using-diffusers/inference_with_lcm_lora.md +++ /dev/null @@ -1,422 +0,0 @@ - - -[[open-in-colab]] - -# Performing inference with LCM-LoRA - -Latent Consistency Models (LCM) enable quality image generation in typically 2-4 steps making it possible to use diffusion models in almost real-time settings. - -From the [official website](https://latent-consistency-models.github.io/): - -> LCMs can be distilled from any pre-trained Stable Diffusion (SD) in only 4,000 training steps (~32 A100 GPU Hours) for generating high quality 768 x 768 resolution images in 2~4 steps or even one step, significantly accelerating text-to-image generation. We employ LCM to distill the Dreamshaper-V7 version of SD in just 4,000 training iterations. - -For a more technical overview of LCMs, refer to [the paper](https://huggingface.co/papers/2310.04378). - -However, each model needs to be distilled separately for latent consistency distillation. The core idea with LCM-LoRA is to train just a few adapter layers, the adapter being LoRA in this case. -This way, we don't have to train the full model and keep the number of trainable parameters manageable. The resulting LoRAs can then be applied to any fine-tuned version of the model without distilling them separately. -Additionally, the LoRAs can be applied to image-to-image, ControlNet/T2I-Adapter, inpainting, AnimateDiff etc. -The LCM-LoRA can also be combined with other LoRAs to generate styled images in very few steps (4-8). - -LCM-LoRAs are available for [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5), [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), and the [SSD-1B](https://huggingface.co/segmind/SSD-1B) model. All the checkpoints can be found in this [collection](https://huggingface.co/collections/latent-consistency/latent-consistency-models-loras-654cdd24e111e16f0865fba6). - -For more details about LCM-LoRA, refer to [the technical report](https://huggingface.co/papers/2311.05556). - -This guide shows how to perform inference with LCM-LoRAs for -- text-to-image -- image-to-image -- combined with styled LoRAs -- ControlNet/T2I-Adapter -- inpainting -- AnimateDiff - -Before going through this guide, we'll take a look at the general workflow for performing inference with LCM-LoRAs. -LCM-LoRAs are similar to other Stable Diffusion LoRAs so they can be used with any [`DiffusionPipeline`] that supports LoRAs. - -- Load the task specific pipeline and model. -- Set the scheduler to [`LCMScheduler`]. -- Load the LCM-LoRA weights for the model. -- Reduce the `guidance_scale` between `[1.0, 2.0]` and set the `num_inference_steps` between [4, 8]. -- Perform inference with the pipeline with the usual parameters. - -Let's look at how we can perform inference with LCM-LoRAs for different tasks. - -First, make sure you have [peft](https://github.com/huggingface/peft) installed, for better LoRA support. - -```bash -pip install -U peft -``` - -## Text-to-image - -You'll use the [`StableDiffusionXLPipeline`] with the scheduler: [`LCMScheduler`] and then load the LCM-LoRA. Together with the LCM-LoRA and the scheduler, the pipeline enables a fast inference workflow overcoming the slow iterative nature of diffusion models. - -```python -import torch -from diffusers import DiffusionPipeline, LCMScheduler - -pipe = DiffusionPipeline.from_pretrained( - "stabilityai/stable-diffusion-xl-base-1.0", - variant="fp16", - torch_dtype=torch.float16 -).to("cuda") - -# set scheduler -pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) - -# load LCM-LoRA -pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl") - -prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k" - -generator = torch.manual_seed(42) -image = pipe( - prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=1.0 -).images[0] -``` - -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_sdxl_t2i.png) - -Notice that we use only 4 steps for generation which is way less than what's typically used for standard SDXL. - - - -You may have noticed that we set `guidance_scale=1.0`, which disables classifer-free-guidance. This is because the LCM-LoRA is trained with guidance, so the batch size does not have to be doubled in this case. This leads to a faster inference time, with the drawback that negative prompts don't have any effect on the denoising process. - -You can also use guidance with LCM-LoRA, but due to the nature of training the model is very sensitve to the `guidance_scale` values, high values can lead to artifacts in the generated images. In our experiments, we found that the best values are in the range of [1.0, 2.0]. - - - -### Inference with a fine-tuned model - -As mentioned above, the LCM-LoRA can be applied to any fine-tuned version of the model without having to distill them separately. Let's look at how we can perform inference with a fine-tuned model. In this example, we'll use the [animagine-xl](https://huggingface.co/Linaqruf/animagine-xl) model, which is a fine-tuned version of the SDXL model for generating anime. - -```python -from diffusers import DiffusionPipeline, LCMScheduler - -pipe = DiffusionPipeline.from_pretrained( - "Linaqruf/animagine-xl", - variant="fp16", - torch_dtype=torch.float16 -).to("cuda") - -# set scheduler -pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) - -# load LCM-LoRA -pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl") - -prompt = "face focus, cute, masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck" - -generator = torch.manual_seed(0) -image = pipe( - prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=1.0 -).images[0] -``` - -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_sdxl_t2i_finetuned.png) - - -## Image-to-image - -LCM-LoRA can be applied to image-to-image tasks too. Let's look at how we can perform image-to-image generation with LCMs. For this example we'll use the [dreamshaper-7](https://huggingface.co/Lykon/dreamshaper-7) model and the LCM-LoRA for `stable-diffusion-v1-5 `. - -```python -import torch -from diffusers import AutoPipelineForImage2Image, LCMScheduler -from diffusers.utils import make_image_grid, load_image - -pipe = AutoPipelineForImage2Image.from_pretrained( - "Lykon/dreamshaper-7", - torch_dtype=torch.float16, - variant="fp16", -).to("cuda") - -# set scheduler -pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) - -# load LCM-LoRA -pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5") - -# prepare image -url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png" -init_image = load_image(url) -prompt = "Astronauts in a jungle, cold color palette, muted colors, detailed, 8k" - -# pass prompt and image to pipeline -generator = torch.manual_seed(0) -image = pipe( - prompt, - image=init_image, - num_inference_steps=4, - guidance_scale=1, - strength=0.6, - generator=generator -).images[0] -make_image_grid([init_image, image], rows=1, cols=2) -``` - -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_sdv1-5_i2i.png) - - - - -You can get different results based on your prompt and the image you provide. To get the best results, we recommend trying different values for `num_inference_steps`, `strength`, and `guidance_scale` parameters and choose the best one. - - - - -## Combine with styled LoRAs - -LCM-LoRA can be combined with other LoRAs to generate styled-images in very few steps (4-8). In the following example, we'll use the LCM-LoRA with the [papercut LoRA](TheLastBen/Papercut_SDXL). -To learn more about how to combine LoRAs, refer to [this guide](https://huggingface.co/docs/diffusers/tutorials/using_peft_for_inference#combine-multiple-adapters). - -```python -import torch -from diffusers import DiffusionPipeline, LCMScheduler - -pipe = DiffusionPipeline.from_pretrained( - "stabilityai/stable-diffusion-xl-base-1.0", - variant="fp16", - torch_dtype=torch.float16 -).to("cuda") - -# set scheduler -pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) - -# load LoRAs -pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl", adapter_name="lcm") -pipe.load_lora_weights("TheLastBen/Papercut_SDXL", weight_name="papercut.safetensors", adapter_name="papercut") - -# Combine LoRAs -pipe.set_adapters(["lcm", "papercut"], adapter_weights=[1.0, 0.8]) - -prompt = "papercut, a cute fox" -generator = torch.manual_seed(0) -image = pipe(prompt, num_inference_steps=4, guidance_scale=1, generator=generator).images[0] -image -``` - -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_sdx_lora_mix.png) - - -## ControlNet/T2I-Adapter - -Let's look at how we can perform inference with ControlNet/T2I-Adapter and LCM-LoRA. - -### ControlNet -For this example, we'll use the SD-v1-5 model and the LCM-LoRA for SD-v1-5 with canny ControlNet. - -```python -import torch -import cv2 -import numpy as np -from PIL import Image - -from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, LCMScheduler -from diffusers.utils import load_image - -image = load_image( - "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png" -).resize((512, 512)) - -image = np.array(image) - -low_threshold = 100 -high_threshold = 200 - -image = cv2.Canny(image, low_threshold, high_threshold) -image = image[:, :, None] -image = np.concatenate([image, image, image], axis=2) -canny_image = Image.fromarray(image) - -controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16) -pipe = StableDiffusionControlNetPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - controlnet=controlnet, - torch_dtype=torch.float16, - safety_checker=None, - variant="fp16" -).to("cuda") - -# set scheduler -pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) - -# load LCM-LoRA -pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5") - -generator = torch.manual_seed(0) -image = pipe( - "the mona lisa", - image=canny_image, - num_inference_steps=4, - guidance_scale=1.5, - controlnet_conditioning_scale=0.8, - cross_attention_kwargs={"scale": 1}, - generator=generator, -).images[0] -make_image_grid([canny_image, image], rows=1, cols=2) -``` - -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_sdv1-5_controlnet.png) - - - -The inference parameters in this example might not work for all examples, so we recommend you to try different values for `num_inference_steps`, `guidance_scale`, `controlnet_conditioning_scale` and `cross_attention_kwargs` parameters and choose the best one. - - -### T2I-Adapter - -This example shows how to use the LCM-LoRA with the [Canny T2I-Adapter](TencentARC/t2i-adapter-canny-sdxl-1.0) and SDXL. - -```python -import torch -import cv2 -import numpy as np -from PIL import Image - -from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, LCMScheduler -from diffusers.utils import load_image, make_image_grid - -# Prepare image -# Detect the canny map in low resolution to avoid high-frequency details -image = load_image( - "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg" -).resize((384, 384)) - -image = np.array(image) - -low_threshold = 100 -high_threshold = 200 - -image = cv2.Canny(image, low_threshold, high_threshold) -image = image[:, :, None] -image = np.concatenate([image, image, image], axis=2) -canny_image = Image.fromarray(image).resize((1024, 1024)) - -# load adapter -adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda") - -pipe = StableDiffusionXLAdapterPipeline.from_pretrained( - "stabilityai/stable-diffusion-xl-base-1.0", - adapter=adapter, - torch_dtype=torch.float16, - variant="fp16", -).to("cuda") - -# set scheduler -pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) - -# load LCM-LoRA -pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl") - -prompt = "Mystical fairy in real, magic, 4k picture, high quality" -negative_prompt = "extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured" - -generator = torch.manual_seed(0) -image = pipe( - prompt=prompt, - negative_prompt=negative_prompt, - image=canny_image, - num_inference_steps=4, - guidance_scale=1.5, - adapter_conditioning_scale=0.8, - adapter_conditioning_factor=1, - generator=generator, -).images[0] -make_image_grid([canny_image, image], rows=1, cols=2) -``` - -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_sdxl_t2iadapter.png) - - -## Inpainting - -LCM-LoRA can be used for inpainting as well. - -```python -import torch -from diffusers import AutoPipelineForInpainting, LCMScheduler -from diffusers.utils import load_image, make_image_grid - -pipe = AutoPipelineForInpainting.from_pretrained( - "runwayml/stable-diffusion-inpainting", - torch_dtype=torch.float16, - variant="fp16", -).to("cuda") - -# set scheduler -pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) - -# load LCM-LoRA -pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5") - -# load base and mask image -init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png") -mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png") - -# generator = torch.Generator("cuda").manual_seed(92) -prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" -generator = torch.manual_seed(0) -image = pipe( - prompt=prompt, - image=init_image, - mask_image=mask_image, - generator=generator, - num_inference_steps=4, - guidance_scale=4, -).images[0] -make_image_grid([init_image, mask_image, image], rows=1, cols=3) -``` - -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_sdv1-5_inpainting.png) - - -## AnimateDiff - -[`AnimateDiff`] allows you to animate images using Stable Diffusion models. To get good results, we need to generate multiple frames (16-24), and doing this with standard SD models can be very slow. -LCM-LoRA can be used to speed up the process significantly, as you just need to do 4-8 steps for each frame. Let's look at how we can perform animation with LCM-LoRA and AnimateDiff. - -```python -import torch -from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler, LCMScheduler -from diffusers.utils import export_to_gif - -adapter = MotionAdapter.from_pretrained("diffusers/animatediff-motion-adapter-v1-5") -pipe = AnimateDiffPipeline.from_pretrained( - "frankjoshua/toonyou_beta6", - motion_adapter=adapter, -).to("cuda") - -# set scheduler -pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) - -# load LCM-LoRA -pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5", adapter_name="lcm") -pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-in", weight_name="diffusion_pytorch_model.safetensors", adapter_name="motion-lora") - -pipe.set_adapters(["lcm", "motion-lora"], adapter_weights=[0.55, 1.2]) - -prompt = "best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress" -generator = torch.manual_seed(0) -frames = pipe( - prompt=prompt, - num_inference_steps=5, - guidance_scale=1.25, - cross_attention_kwargs={"scale": 1}, - num_frames=24, - generator=generator -).frames[0] -export_to_gif(frames, "animation.gif") -``` - -![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_sdv1-5_animatediff.gif) \ No newline at end of file