Compare commits

..

12 Commits

Author SHA1 Message Date
Sayak Paul 47ee2a737a Merge branch 'main' into controlnet-test-fixes 2024-03-15 12:30:25 +05:30
Sayak Paul e0e9f81971 add: torch to the pypi step. (#7328) 2024-03-15 12:28:12 +05:30
M. Tolga Cangöz 5d848ec07c [Tests] Update a deprecated parameter in test files and fix several typos (#7277)
* Add properties and `IPAdapterTesterMixin` tests for `StableDiffusionPanoramaPipeline`

* Fix variable name typo and update comments

* Update deprecated `output_type="numpy"` to "np" in test files

* Discard changes to src/diffusers/pipelines/stable_diffusion_panorama/pipeline_stable_diffusion_panorama.py

* Update test_stable_diffusion_panorama.py

* Update numbers in README.md

* Update get_guidance_scale_embedding method to use timesteps instead of w

* Update number of checkpoints in README.md

* Add type hints and fix var name

* Fix PyTorch's convention for inplace functions

* Fix a typo

* Revert "Fix PyTorch's convention for inplace functions"

This reverts commit 74350cf65b.

* Fix typos

* Indent

* Refactor get_guidance_scale_embedding method in LEditsPPPipelineStableDiffusionXL class
2024-03-14 12:17:35 -07:00
Dhruv Nair 4974b84564 Update Cascade Tests (#7324)
* update

* update

* update
2024-03-14 20:51:22 +05:30
Linoy Tsaban 83062fb872 [Advanced DreamBooth LoRA SDXL] Support EDM-style training (follow up of #7126) (#7182)
* add edm style training

* style

* finish adding edm training feature

* import fix

* fix latents mean

* minor adjustments

* add edm to readme

* style

* fix autocast and scheduler config issues when using edm

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-14 18:40:14 +05:30
Suraj Patil b6d7e31d10 add edm schedulers in doc (#7319)
* add edm schedulers in doc

* add in toctree

* address reviewe comments
2024-03-14 11:52:25 +01:00
Dhruv Nair 94fc2d3fe6 update 2024-03-14 06:53:52 +00:00
Dhruv Nair 503e359204 update 2024-03-14 06:47:29 +00:00
Anatoly Belikov 53e9aacc10 log loss per image (#7278)
* log loss per image

* add commandline param for per image loss logging

* style

* debug-loss -> debug_loss

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-14 11:41:43 +05:30
Dhruv Nair 41424466e3 [Tests] Fix incorrect constant in VAE scaling test. (#7301)
update
2024-03-14 10:24:01 +05:30
Sayak Paul 95de1981c9 add: pytest log installation (#7313) 2024-03-14 10:01:16 +05:30
Kenneth Gerald Hamilton 0b45b58867 update get_order_list if statement (#7309)
* update get_order_list if statement

* revery
2024-03-13 18:29:42 -10:00
85 changed files with 603 additions and 322 deletions
+2
View File
@@ -65,6 +65,7 @@ jobs:
python -m uv pip install -e [quality,test]
python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
python -m uv pip install pytest-reportlog
- name: Environment
run: |
@@ -150,6 +151,7 @@ jobs:
${CONDA_RUN} python -m uv pip install -e [quality,test]
${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
${CONDA_RUN} python -m uv pip install pytest-reportlog
- name: Environment
shell: arch -arch arm64 bash {0}
+1 -1
View File
@@ -52,7 +52,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -U setuptools wheel twine
pip install -U setuptools wheel twine torch
- name: Build the dist files
run: python setup.py bdist_wheel && python setup.py sdist
+2 -2
View File
@@ -77,7 +77,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi
## Quickstart
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 19000+ checkpoints):
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 22000+ checkpoints):
```python
from diffusers import DiffusionPipeline
@@ -219,7 +219,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
- https://github.com/deep-floyd/IF
- https://github.com/bentoml/BentoML
- https://github.com/bmaltais/kohya_ss
- +8000 other amazing GitHub repositories 💪
- +9000 other amazing GitHub repositories 💪
Thank you for using us ❤️.
+4
View File
@@ -404,6 +404,10 @@
title: EulerAncestralDiscreteScheduler
- local: api/schedulers/euler
title: EulerDiscreteScheduler
- local: api/schedulers/edm_euler
title: EDMEulerScheduler
- local: api/schedulers/edm_multistep_dpm_solver
title: EDMDPMSolverMultistepScheduler
- local: api/schedulers/heun
title: HeunDiscreteScheduler
- local: api/schedulers/ipndm
@@ -0,0 +1,22 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# EDMEulerScheduler
The Karras formulation of the Euler scheduler (Algorithm 2) from the [Elucidating the Design Space of Diffusion-Based Generative Models](https://huggingface.co/papers/2206.00364) paper by Karras et al. This is a fast scheduler which can often generate good outputs in 20-30 steps. The scheduler is based on the original [k-diffusion](https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L51) implementation by [Katherine Crowson](https://github.com/crowsonkb/).
## EDMEulerScheduler
[[autodoc]] EDMEulerScheduler
## EDMEulerSchedulerOutput
[[autodoc]] schedulers.scheduling_edm_euler.EDMEulerSchedulerOutput
@@ -0,0 +1,24 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# EDMDPMSolverMultistepScheduler
`EDMDPMSolverMultistepScheduler` is a [Karras formulation](https://huggingface.co/papers/2206.00364) of `DPMSolverMultistep`, a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
DPMSolver (and the improved version DPMSolver++) is a fast dedicated high-order solver for diffusion ODEs with convergence order guarantee. Empirically, DPMSolver sampling with only 20 steps can generate high-quality
samples, and it can generate quite good samples even in 10 steps.
## EDMDPMSolverMultistepScheduler
[[autodoc]] EDMDPMSolverMultistepScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
@@ -259,6 +259,50 @@ pip install git+https://github.com/huggingface/peft.git
**Inference**
The inference is the same as if you train a regular LoRA 🤗
## Conducting EDM-style training
It's now possible to perform EDM-style training as proposed in [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364).
simply set:
```diff
+ --do_edm_style_training \
```
Other SDXL-like models that use the EDM formulation, such as [playgroundai/playground-v2.5-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic), can also be DreamBooth'd with the script. Below is an example command:
```bash
accelerate launch train_dreambooth_lora_sdxl_advanced.py \
--pretrained_model_name_or_path="playgroundai/playground-v2.5-1024px-aesthetic" \
--dataset_name="linoyts/3d_icon" \
--instance_prompt="3d icon in the style of TOK" \
--validation_prompt="a TOK icon of an astronaut riding a horse, in the style of TOK" \
--output_dir="3d-icon-SDXL-LoRA" \
--do_edm_style_training \
--caption_column="prompt" \
--mixed_precision="bf16" \
--resolution=1024 \
--train_batch_size=3 \
--repeats=1 \
--report_to="wandb"\
--gradient_accumulation_steps=1 \
--gradient_checkpointing \
--learning_rate=1.0 \
--text_encoder_lr=1.0 \
--optimizer="prodigy"\
--train_text_encoder_ti\
--train_text_encoder_ti_frac=0.5\
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--rank=8 \
--max_train_steps=1000 \
--checkpointing_steps=2000 \
--seed="0" \
--push_to_hub
```
> [!CAUTION]
> Min-SNR gamma is not supported with the EDM-style training yet. When training with the PlaygroundAI model, it's recommended to not pass any "variant".
### Tips and Tricks
Check out [these recommended practices](https://huggingface.co/blog/sdxl_lora_advanced_script#additional-good-practices)
@@ -14,9 +14,11 @@
# See the License for the specific language governing permissions and
import argparse
import contextlib
import gc
import hashlib
import itertools
import json
import logging
import math
import os
@@ -37,7 +39,7 @@ import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import DistributedDataParallelKwargs, ProjectConfiguration, set_seed
from huggingface_hub import create_repo, upload_folder
from huggingface_hub import create_repo, hf_hub_download, upload_folder
from packaging import version
from peft import LoraConfig, set_peft_model_state_dict
from peft.utils import get_peft_model_state_dict
@@ -55,6 +57,8 @@ from diffusers import (
AutoencoderKL,
DDPMScheduler,
DPMSolverMultistepScheduler,
EDMEulerScheduler,
EulerDiscreteScheduler,
StableDiffusionXLPipeline,
UNet2DConditionModel,
)
@@ -79,6 +83,20 @@ check_min_version("0.27.0.dev0")
logger = get_logger(__name__)
def determine_scheduler_type(pretrained_model_name_or_path, revision):
model_index_filename = "model_index.json"
if os.path.isdir(pretrained_model_name_or_path):
model_index = os.path.join(pretrained_model_name_or_path, model_index_filename)
else:
model_index = hf_hub_download(
repo_id=pretrained_model_name_or_path, filename=model_index_filename, revision=revision
)
with open(model_index, "r") as f:
scheduler_type = json.load(f)["scheduler"][1]
return scheduler_type
def save_model_card(
repo_id: str,
use_dora: bool,
@@ -370,6 +388,11 @@ def parse_args(input_args=None):
" `args.validation_prompt` multiple times: `args.num_validation_images`."
),
)
parser.add_argument(
"--do_edm_style_training",
action="store_true",
help="Flag to conduct training using the EDM formulation as introduced in https://arxiv.org/abs/2206.00364.",
)
parser.add_argument(
"--with_prior_preservation",
default=False,
@@ -1117,6 +1140,8 @@ def main(args):
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
" Please use `huggingface-cli login` to authenticate with the Hub."
)
if args.do_edm_style_training and args.snr_gamma is not None:
raise ValueError("Min-SNR formulation is not supported when conducting EDM-style training.")
logging_dir = Path(args.output_dir, args.logging_dir)
@@ -1234,7 +1259,19 @@ def main(args):
)
# Load scheduler and models
noise_scheduler = DDPMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
scheduler_type = determine_scheduler_type(args.pretrained_model_name_or_path, args.revision)
if "EDM" in scheduler_type:
args.do_edm_style_training = True
noise_scheduler = EDMEulerScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
logger.info("Performing EDM-style training!")
elif args.do_edm_style_training:
noise_scheduler = EulerDiscreteScheduler.from_pretrained(
args.pretrained_model_name_or_path, subfolder="scheduler"
)
logger.info("Performing EDM-style training!")
else:
noise_scheduler = DDPMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
text_encoder_one = text_encoder_cls_one.from_pretrained(
args.pretrained_model_name_or_path, subfolder="text_encoder", revision=args.revision, variant=args.variant
)
@@ -1252,7 +1289,12 @@ def main(args):
revision=args.revision,
variant=args.variant,
)
vae_scaling_factor = vae.config.scaling_factor
latents_mean = latents_std = None
if hasattr(vae.config, "latents_mean") and vae.config.latents_mean is not None:
latents_mean = torch.tensor(vae.config.latents_mean).view(1, 4, 1, 1)
if hasattr(vae.config, "latents_std") and vae.config.latents_std is not None:
latents_std = torch.tensor(vae.config.latents_std).view(1, 4, 1, 1)
unet = UNet2DConditionModel.from_pretrained(
args.pretrained_model_name_or_path, subfolder="unet", revision=args.revision, variant=args.variant
)
@@ -1790,6 +1832,19 @@ def main(args):
disable=not accelerator.is_local_main_process,
)
def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):
# TODO: revisit other sampling algorithms
sigmas = noise_scheduler.sigmas.to(device=accelerator.device, dtype=dtype)
schedule_timesteps = noise_scheduler.timesteps.to(accelerator.device)
timesteps = timesteps.to(accelerator.device)
step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]
sigma = sigmas[step_indices].flatten()
while len(sigma.shape) < n_dim:
sigma = sigma.unsqueeze(-1)
return sigma
if args.train_text_encoder:
num_train_epochs_text_encoder = int(args.train_text_encoder_frac * args.num_train_epochs)
elif args.train_text_encoder_ti: # args.train_text_encoder_ti
@@ -1841,9 +1896,15 @@ def main(args):
pixel_values = batch["pixel_values"].to(dtype=vae.dtype)
model_input = vae.encode(pixel_values).latent_dist.sample()
model_input = model_input * vae_scaling_factor
if args.pretrained_vae_model_name_or_path is None:
model_input = model_input.to(weight_dtype)
if latents_mean is None and latents_std is None:
model_input = model_input * vae.config.scaling_factor
if args.pretrained_vae_model_name_or_path is None:
model_input = model_input.to(weight_dtype)
else:
latents_mean = latents_mean.to(device=model_input.device, dtype=model_input.dtype)
latents_std = latents_std.to(device=model_input.device, dtype=model_input.dtype)
model_input = (model_input - latents_mean) * vae.config.scaling_factor / latents_std
model_input = model_input.to(dtype=weight_dtype)
# Sample noise that we'll add to the latents
noise = torch.randn_like(model_input)
@@ -1854,15 +1915,32 @@ def main(args):
)
bsz = model_input.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(
0, noise_scheduler.config.num_train_timesteps, (bsz,), device=model_input.device
)
timesteps = timesteps.long()
if not args.do_edm_style_training:
timesteps = torch.randint(
0, noise_scheduler.config.num_train_timesteps, (bsz,), device=model_input.device
)
timesteps = timesteps.long()
else:
# in EDM formulation, the model is conditioned on the pre-conditioned noise levels
# instead of discrete timesteps, so here we sample indices to get the noise levels
# from `scheduler.timesteps`
indices = torch.randint(0, noise_scheduler.config.num_train_timesteps, (bsz,))
timesteps = noise_scheduler.timesteps[indices].to(device=model_input.device)
# Add noise to the model input according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_model_input = noise_scheduler.add_noise(model_input, noise, timesteps)
# For EDM-style training, we first obtain the sigmas based on the continuous timesteps.
# We then precondition the final model inputs based on these sigmas instead of the timesteps.
# Follow: Section 5 of https://arxiv.org/abs/2206.00364.
if args.do_edm_style_training:
sigmas = get_sigmas(timesteps, len(noisy_model_input.shape), noisy_model_input.dtype)
if "EDM" in scheduler_type:
inp_noisy_latents = noise_scheduler.precondition_inputs(noisy_model_input, sigmas)
else:
inp_noisy_latents = noisy_model_input / ((sigmas**2 + 1) ** 0.5)
# time ids
add_time_ids = torch.cat(
@@ -1888,7 +1966,7 @@ def main(args):
}
prompt_embeds_input = prompt_embeds.repeat(elems_to_repeat_text_embeds, 1, 1)
model_pred = unet(
noisy_model_input,
inp_noisy_latents if args.do_edm_style_training else noisy_model_input,
timesteps,
prompt_embeds_input,
added_cond_kwargs=unet_added_conditions,
@@ -1906,14 +1984,42 @@ def main(args):
)
prompt_embeds_input = prompt_embeds.repeat(elems_to_repeat_text_embeds, 1, 1)
model_pred = unet(
noisy_model_input, timesteps, prompt_embeds_input, added_cond_kwargs=unet_added_conditions
inp_noisy_latents if args.do_edm_style_training else noisy_model_input,
timesteps,
prompt_embeds_input,
added_cond_kwargs=unet_added_conditions,
).sample
weighting = None
if args.do_edm_style_training:
# Similar to the input preconditioning, the model predictions are also preconditioned
# on noised model inputs (before preconditioning) and the sigmas.
# Follow: Section 5 of https://arxiv.org/abs/2206.00364.
if "EDM" in scheduler_type:
model_pred = noise_scheduler.precondition_outputs(noisy_model_input, model_pred, sigmas)
else:
if noise_scheduler.config.prediction_type == "epsilon":
model_pred = model_pred * (-sigmas) + noisy_model_input
elif noise_scheduler.config.prediction_type == "v_prediction":
model_pred = model_pred * (-sigmas / (sigmas**2 + 1) ** 0.5) + (
noisy_model_input / (sigmas**2 + 1)
)
# We are not doing weighting here because it tends result in numerical problems.
# See: https://github.com/huggingface/diffusers/pull/7126#issuecomment-1968523051
# There might be other alternatives for weighting as well:
# https://github.com/huggingface/diffusers/pull/7126#discussion_r1505404686
if "EDM" not in scheduler_type:
weighting = (sigmas**-2.0).float()
# Get the target for loss depending on the prediction type
if noise_scheduler.config.prediction_type == "epsilon":
target = noise
target = model_input if args.do_edm_style_training else noise
elif noise_scheduler.config.prediction_type == "v_prediction":
target = noise_scheduler.get_velocity(model_input, noise, timesteps)
target = (
model_input
if args.do_edm_style_training
else noise_scheduler.get_velocity(model_input, noise, timesteps)
)
else:
raise ValueError(f"Unknown prediction type {noise_scheduler.config.prediction_type}")
@@ -1923,10 +2029,28 @@ def main(args):
target, target_prior = torch.chunk(target, 2, dim=0)
# Compute prior loss
prior_loss = F.mse_loss(model_pred_prior.float(), target_prior.float(), reduction="mean")
if weighting is not None:
prior_loss = torch.mean(
(weighting.float() * (model_pred_prior.float() - target_prior.float()) ** 2).reshape(
target_prior.shape[0], -1
),
1,
)
prior_loss = prior_loss.mean()
else:
prior_loss = F.mse_loss(model_pred_prior.float(), target_prior.float(), reduction="mean")
if args.snr_gamma is None:
loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
if weighting is not None:
loss = torch.mean(
(weighting.float() * (model_pred.float() - target.float()) ** 2).reshape(
target.shape[0], -1
),
1,
)
loss = loss.mean()
else:
loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
else:
# Compute loss-weights as per Section 3.4 of https://arxiv.org/abs/2303.09556.
# Since we predict the noise instead of x_0, the original formulation is slightly changed.
@@ -2049,17 +2173,18 @@ def main(args):
# We train on the simplified learning objective. If we were previously predicting a variance, we need the scheduler to ignore it
scheduler_args = {}
if "variance_type" in pipeline.scheduler.config:
variance_type = pipeline.scheduler.config.variance_type
if not args.do_edm_style_training:
if "variance_type" in pipeline.scheduler.config:
variance_type = pipeline.scheduler.config.variance_type
if variance_type in ["learned", "learned_range"]:
variance_type = "fixed_small"
if variance_type in ["learned", "learned_range"]:
variance_type = "fixed_small"
scheduler_args["variance_type"] = variance_type
scheduler_args["variance_type"] = variance_type
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
pipeline.scheduler.config, **scheduler_args
)
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
pipeline.scheduler.config, **scheduler_args
)
pipeline = pipeline.to(accelerator.device)
pipeline.set_progress_bar_config(disable=True)
@@ -2067,8 +2192,13 @@ def main(args):
# run inference
generator = torch.Generator(device=accelerator.device).manual_seed(args.seed) if args.seed else None
pipeline_args = {"prompt": args.validation_prompt}
inference_ctx = (
contextlib.nullcontext()
if "playground" in args.pretrained_model_name_or_path
else torch.cuda.amp.autocast()
)
with torch.cuda.amp.autocast():
with inference_ctx:
images = [
pipeline(**pipeline_args, generator=generator).images[0]
for _ in range(args.num_validation_images)
@@ -2144,15 +2274,18 @@ def main(args):
# We train on the simplified learning objective. If we were previously predicting a variance, we need the scheduler to ignore it
scheduler_args = {}
if "variance_type" in pipeline.scheduler.config:
variance_type = pipeline.scheduler.config.variance_type
if not args.do_edm_style_training:
if "variance_type" in pipeline.scheduler.config:
variance_type = pipeline.scheduler.config.variance_type
if variance_type in ["learned", "learned_range"]:
variance_type = "fixed_small"
if variance_type in ["learned", "learned_range"]:
variance_type = "fixed_small"
scheduler_args["variance_type"] = variance_type
scheduler_args["variance_type"] = variance_type
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, **scheduler_args)
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
pipeline.scheduler.config, **scheduler_args
)
# load attention processors
pipeline.load_lora_weights(args.output_dir)
@@ -637,7 +637,7 @@ def main(args):
generator=generator,
batch_size=args.eval_batch_size,
num_inference_steps=args.ddpm_num_inference_steps,
output_type="numpy",
output_type="np",
).images
if args.use_ema:
@@ -425,6 +425,11 @@ def parse_args(input_args=None):
default=4,
help=("The dimension of the LoRA update matrices."),
)
parser.add_argument(
"--debug_loss",
action="store_true",
help="debug loss for each image, if filenames are awailable in the dataset",
)
if input_args is not None:
args = parser.parse_args(input_args)
@@ -603,6 +608,7 @@ def main(args):
# Move unet, vae and text_encoder to device and cast to weight_dtype
# The VAE is in float32 to avoid NaN losses.
unet.to(accelerator.device, dtype=weight_dtype)
if args.pretrained_vae_model_name_or_path is None:
vae.to(accelerator.device, dtype=torch.float32)
else:
@@ -890,13 +896,17 @@ def main(args):
tokens_one, tokens_two = tokenize_captions(examples)
examples["input_ids_one"] = tokens_one
examples["input_ids_two"] = tokens_two
if args.debug_loss:
fnames = [os.path.basename(image.filename) for image in examples[image_column] if image.filename]
if fnames:
examples["filenames"] = fnames
return examples
with accelerator.main_process_first():
if args.max_train_samples is not None:
dataset["train"] = dataset["train"].shuffle(seed=args.seed).select(range(args.max_train_samples))
# Set the training transforms
train_dataset = dataset["train"].with_transform(preprocess_train)
train_dataset = dataset["train"].with_transform(preprocess_train, output_all_columns=True)
def collate_fn(examples):
pixel_values = torch.stack([example["pixel_values"] for example in examples])
@@ -905,7 +915,7 @@ def main(args):
crop_top_lefts = [example["crop_top_lefts"] for example in examples]
input_ids_one = torch.stack([example["input_ids_one"] for example in examples])
input_ids_two = torch.stack([example["input_ids_two"] for example in examples])
return {
result = {
"pixel_values": pixel_values,
"input_ids_one": input_ids_one,
"input_ids_two": input_ids_two,
@@ -913,6 +923,11 @@ def main(args):
"crop_top_lefts": crop_top_lefts,
}
filenames = [example["filenames"] for example in examples if "filenames" in example]
if filenames:
result["filenames"] = filenames
return result
# DataLoaders creation:
train_dataloader = torch.utils.data.DataLoader(
train_dataset,
@@ -1105,7 +1120,9 @@ def main(args):
loss = F.mse_loss(model_pred.float(), target.float(), reduction="none")
loss = loss.mean(dim=list(range(1, len(loss.shape)))) * mse_loss_weights
loss = loss.mean()
if args.debug_loss and "filenames" in batch:
for fname in batch["filenames"]:
accelerator.log({"loss_for_" + fname: loss}, step=global_step)
# Gather the losses across all processes for logging (if we use distributed training).
avg_loss = accelerator.gather(loss.repeat(args.train_batch_size)).mean()
train_loss += avg_loss.item() / args.gradient_accumulation_steps
@@ -648,7 +648,7 @@ def main(args):
generator=generator,
batch_size=args.eval_batch_size,
num_inference_steps=args.ddpm_num_inference_steps,
output_type="numpy",
output_type="np",
).images
if args.use_ema:
+2 -1
View File
@@ -454,7 +454,8 @@ def set_image_size(pipeline_class_name, original_config, checkpoint, image_size=
model_type = infer_model_type(original_config, checkpoint, model_type)
if pipeline_class_name == "StableDiffusionUpscalePipeline":
return 512
image_size = original_config["model"]["params"]["unet_config"]["params"]["image_size"]
return image_size
elif model_type in ["SDXL", "SDXL-Refiner", "Playground"]:
image_size = 1024
+1 -1
View File
@@ -293,7 +293,7 @@ class BasicTransformerBlock(nn.Module):
) -> torch.FloatTensor:
if cross_attention_kwargs is not None:
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
# Notice that normalization is always applied before the real computation in the following blocks.
# 0. Self-Attention
@@ -308,7 +308,7 @@ class Transformer2DModel(ModelMixin, ConfigMixin):
"""
if cross_attention_kwargs is not None:
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
# ensure attention_mask is a bias, and give it a singleton query_tokens dimension.
# we may have done this conversion already, e.g. if we came here via UNet2DConditionModel#forward.
# we can tell by counting dims; if ndim == 2: it's a mask rather than a bias.
+9 -9
View File
@@ -846,7 +846,7 @@ class UNetMidBlock2DCrossAttn(nn.Module):
) -> torch.FloatTensor:
if cross_attention_kwargs is not None:
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
hidden_states = self.resnets[0](hidden_states, temb)
for attn, resnet in zip(self.attentions, self.resnets[1:]):
@@ -986,7 +986,7 @@ class UNetMidBlock2DSimpleCrossAttn(nn.Module):
) -> torch.FloatTensor:
cross_attention_kwargs = cross_attention_kwargs if cross_attention_kwargs is not None else {}
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
if attention_mask is None:
# if encoder_hidden_states is defined: we are doing cross-attn, so we should use cross-attn mask.
@@ -1116,7 +1116,7 @@ class AttnDownBlock2D(nn.Module):
) -> Tuple[torch.FloatTensor, Tuple[torch.FloatTensor, ...]]:
cross_attention_kwargs = cross_attention_kwargs if cross_attention_kwargs is not None else {}
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
output_states = ()
@@ -1241,7 +1241,7 @@ class CrossAttnDownBlock2D(nn.Module):
) -> Tuple[torch.FloatTensor, Tuple[torch.FloatTensor, ...]]:
if cross_attention_kwargs is not None:
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
output_states = ()
@@ -1986,7 +1986,7 @@ class SimpleCrossAttnDownBlock2D(nn.Module):
) -> Tuple[torch.FloatTensor, Tuple[torch.FloatTensor, ...]]:
cross_attention_kwargs = cross_attention_kwargs if cross_attention_kwargs is not None else {}
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
output_states = ()
@@ -2201,7 +2201,7 @@ class KCrossAttnDownBlock2D(nn.Module):
) -> Tuple[torch.FloatTensor, Tuple[torch.FloatTensor, ...]]:
cross_attention_kwargs = cross_attention_kwargs if cross_attention_kwargs is not None else {}
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
output_states = ()
@@ -2483,7 +2483,7 @@ class CrossAttnUpBlock2D(nn.Module):
) -> torch.FloatTensor:
if cross_attention_kwargs is not None:
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
is_freeu_enabled = (
getattr(self, "s1", None)
@@ -3312,7 +3312,7 @@ class SimpleCrossAttnUpBlock2D(nn.Module):
) -> torch.FloatTensor:
cross_attention_kwargs = cross_attention_kwargs if cross_attention_kwargs is not None else {}
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
if attention_mask is None:
# if encoder_hidden_states is defined: we are doing cross-attn, so we should use cross-attn mask.
@@ -3694,7 +3694,7 @@ class KAttentionBlock(nn.Module):
) -> torch.FloatTensor:
cross_attention_kwargs = cross_attention_kwargs if cross_attention_kwargs is not None else {}
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
# 1. Self-Attention
if self.add_self_attention:
+3 -3
View File
@@ -1183,7 +1183,7 @@ class CrossAttnDownBlockMotion(nn.Module):
):
if cross_attention_kwargs is not None:
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
output_states = ()
@@ -1367,7 +1367,7 @@ class CrossAttnUpBlockMotion(nn.Module):
) -> torch.FloatTensor:
if cross_attention_kwargs is not None:
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
is_freeu_enabled = (
getattr(self, "s1", None)
@@ -1707,7 +1707,7 @@ class UNetMidBlockCrossAttnMotion(nn.Module):
) -> torch.FloatTensor:
if cross_attention_kwargs is not None:
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
hidden_states = self.resnets[0](hidden_states, temb)
@@ -127,7 +127,7 @@ class AmusedImg2ImgPipeline(DiffusionPipeline):
on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising
process runs for the full number of iterations specified in `num_inference_steps`. A value of 1
essentially ignores `image`.
num_inference_steps (`int`, *optional*, defaults to 16):
num_inference_steps (`int`, *optional*, defaults to 12):
The number of denoising steps. More denoising steps usually lead to a higher quality image at the
expense of slower inference.
guidance_scale (`float`, *optional*, defaults to 10.0):
@@ -191,7 +191,7 @@ class AmusedImg2ImgPipeline(DiffusionPipeline):
negative_prompt_embeds is None and negative_encoder_hidden_states is not None
):
raise ValueError(
"pass either both `negatve_prompt_embeds` and `negative_encoder_hidden_states` or neither"
"pass either both `negative_prompt_embeds` and `negative_encoder_hidden_states` or neither"
)
if (prompt is None and prompt_embeds is None) or (prompt is not None and prompt_embeds is not None):
@@ -824,20 +824,22 @@ class StableDiffusionControlNetPipeline(
return latents
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -869,20 +869,22 @@ class StableDiffusionXLControlNetPipeline(
self.vae.decoder.mid_block.to(dtype)
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -133,7 +133,7 @@ class SpectrogramDiffusionPipeline(DiffusionPipeline):
generator: Optional[torch.Generator] = None,
num_inference_steps: int = 100,
return_dict: bool = True,
output_type: str = "numpy",
output_type: str = "np",
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None,
callback_steps: int = 1,
) -> Union[AudioPipelineOutput, Tuple]:
@@ -157,7 +157,7 @@ class SpectrogramDiffusionPipeline(DiffusionPipeline):
expense of slower inference.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~pipelines.AudioPipelineOutput`] instead of a plain tuple.
output_type (`str`, *optional*, defaults to `"numpy"`):
output_type (`str`, *optional*, defaults to `"np"`):
The output format of the generated audio.
callback (`Callable`, *optional*):
A function that calls every `callback_steps` steps during inference. The function is called with the
@@ -249,16 +249,16 @@ class SpectrogramDiffusionPipeline(DiffusionPipeline):
logger.info("Generated segment", i)
if output_type == "numpy" and not is_onnx_available():
if output_type == "np" and not is_onnx_available():
raise ValueError(
"Cannot return output in 'np' format if ONNX is not available. Make sure to have ONNX installed or set 'output_type' to 'mel'."
)
elif output_type == "numpy" and self.melgan is None:
elif output_type == "np" and self.melgan is None:
raise ValueError(
"Cannot return output in 'np' format if melgan component is not defined. Make sure to define `self.melgan` or set 'output_type' to 'mel'."
)
if output_type == "numpy":
if output_type == "np":
output = self.melgan(input_features=full_pred_mel.astype(np.float32))
else:
output = full_pred_mel
@@ -2004,7 +2004,7 @@ class CrossAttnUpBlockFlat(nn.Module):
) -> torch.FloatTensor:
if cross_attention_kwargs is not None:
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
is_freeu_enabled = (
getattr(self, "s1", None)
@@ -2338,7 +2338,7 @@ class UNetMidBlockFlatCrossAttn(nn.Module):
) -> torch.FloatTensor:
if cross_attention_kwargs is not None:
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
hidden_states = self.resnets[0](hidden_states, temb)
for attn, resnet in zip(self.attentions, self.resnets[1:]):
@@ -2479,7 +2479,7 @@ class UNetMidBlockFlatSimpleCrossAttn(nn.Module):
) -> torch.FloatTensor:
cross_attention_kwargs = cross_attention_kwargs if cross_attention_kwargs is not None else {}
if cross_attention_kwargs.get("scale", None) is not None:
logger.warning("Passing `scale` to `cross_attention_kwargs` is depcrecated. `scale` will be ignored.")
logger.warning("Passing `scale` to `cross_attention_kwargs` is deprecated. `scale` will be ignored.")
if attention_mask is None:
# if encoder_hidden_states is defined: we are doing cross-attn, so we should use cross-attn mask.
@@ -548,20 +548,22 @@ class LatentConsistencyModelImg2ImgPipeline(
return latents
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -490,20 +490,22 @@ class LatentConsistencyModelPipeline(
latents = latents * self.scheduler.init_noise_sigma
return latents
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -713,20 +713,22 @@ class LEditsPPPipelineStableDiffusionXL(
self.vae.decoder.mid_block.to(dtype)
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -669,20 +669,22 @@ class StableDiffusionPipeline(
return latents
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -758,7 +758,6 @@ class StableDiffusionImg2ImgPipeline(
init_latents = torch.cat([init_latents], dim=0)
shape = init_latents.shape
print(shape)
noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
# get latents
@@ -768,20 +767,22 @@ class StableDiffusionImg2ImgPipeline(
return latents
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -909,20 +909,22 @@ class StableDiffusionInpaintPipeline(
return timesteps, num_inference_steps - t_start
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -1304,7 +1304,7 @@ class StableDiffusionDiffEditPipeline(
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None,
callback_steps: int = 1,
cross_attention_kwargs: Optional[Dict[str, Any]] = None,
clip_ckip: int = None,
clip_skip: int = None,
):
r"""
The call function to the pipeline for generation.
@@ -1426,7 +1426,7 @@ class StableDiffusionDiffEditPipeline(
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
lora_scale=text_encoder_lora_scale,
clip_skip=clip_ckip,
clip_skip=clip_skip,
)
# For classifier free guidance, we need to do two forward passes.
# Here we concatenate the unconditional and text embeddings into a single batch
@@ -644,20 +644,22 @@ class StableDiffusionLDM3DPipeline(
return latents
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -632,7 +632,7 @@ class StableDiffusionSAGPipeline(DiffusionPipeline, StableDiffusionMixin, Textua
# corresponds to doing no classifier free guidance.
do_classifier_free_guidance = guidance_scale > 1.0
# and `sag_scale` is` `s` of equation (16)
# of the self-attentnion guidance paper: https://arxiv.org/pdf/2210.00939.pdf
# of the self-attention guidance paper: https://arxiv.org/pdf/2210.00939.pdf
# `sag_scale = 0` means no self-attention guidance
do_self_attention_guidance = sag_scale > 0.0
@@ -667,7 +667,7 @@ class StableDiffusionSAGPipeline(DiffusionPipeline, StableDiffusionMixin, Textua
if timesteps.dtype not in [torch.int16, torch.int32, torch.int64]:
raise ValueError(
f"{self.__class__.__name__} does not support using a scheduler of type {self.scheduler.__class__.__name__}. Please make sure to use one of 'DDIMScheduler, PNDMScheduler, DDPMScheduler, DEISMultistepScheduler, UniPCMultistepScheduler, DPMSolverMultistepScheduler, DPMSolverSinlgestepScheduler'."
f"{self.__class__.__name__} does not support using a scheduler of type {self.scheduler.__class__.__name__}. Please make sure to use one of 'DDIMScheduler, PNDMScheduler, DDPMScheduler, DEISMultistepScheduler, UniPCMultistepScheduler, DPMSolverMultistepScheduler, DPMSolverSinglestepScheduler'."
)
# 5. Prepare latent variables
@@ -723,7 +723,7 @@ class StableDiffusionSAGPipeline(DiffusionPipeline, StableDiffusionMixin, Textua
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
# perform self-attention guidance with the stored self-attentnion map
# perform self-attention guidance with the stored self-attention map
if do_self_attention_guidance:
# classifier-free guidance produces two chunks of attention map
# and we only use unconditional one according to equation (25)
@@ -740,20 +740,22 @@ class StableDiffusionXLPipeline(
self.vae.decoder.mid_block.to(dtype)
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -874,20 +874,22 @@ class StableDiffusionXLImg2ImgPipeline(
self.vae.decoder.mid_block.to(dtype)
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -1110,20 +1110,22 @@ class StableDiffusionXLInpaintPipeline(
self.vae.decoder.mid_block.to(dtype)
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -613,20 +613,22 @@ class StableDiffusionAdapterPipeline(DiffusionPipeline, StableDiffusionMixin):
return height, width
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -784,20 +784,22 @@ class StableDiffusionXLAdapterPipeline(
return height, width
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor:
"""
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
Args:
timesteps (`torch.Tensor`):
generate embedding vectors at these timesteps
w (`torch.Tensor`):
Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings.
embedding_dim (`int`, *optional*, defaults to 512):
dimension of the embeddings to generate
dtype:
data type of the generated embeddings
Dimension of the embeddings to generate.
dtype (`torch.dtype`, *optional*, defaults to `torch.float32`):
Data type of the generated embeddings.
Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
"""
assert len(w.shape) == 1
w = w * 1000.0
@@ -575,8 +575,8 @@ class TextToVideoZeroPipeline(DiffusionPipeline, StableDiffusionMixin, TextualIn
Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for video
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor is generated by sampling using the supplied random `generator`.
output_type (`str`, *optional*, defaults to `"numpy"`):
The output format of the generated video. Choose between `"latent"` and `"numpy"`.
output_type (`str`, *optional*, defaults to `"np"`):
The output format of the generated video. Choose between `"latent"` and `"np"`.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a
[`~pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.TextToVideoPipelineOutput`] instead of
@@ -223,6 +223,8 @@ class DPMSolverSinglestepScheduler(SchedulerMixin, ConfigMixin):
"""
steps = num_inference_steps
order = self.config.solver_order
if order > 3:
raise ValueError("Order > 3 is not supported by this scheduler")
if self.config.lower_order_final:
if order == 3:
if steps % 3 == 0:
+1 -1
View File
@@ -829,7 +829,7 @@ class AutoencoderKLIntegrationTests(unittest.TestCase):
"https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors",
)
assert vae_default.config.scaling_factor == 0.18125
assert vae_default.config.scaling_factor == 0.18215
assert vae_default.config.sample_size == 512
assert vae_default.dtype == torch.float32
@@ -50,9 +50,7 @@ class StableCascadeUNetModelSlowTests(unittest.TestCase):
gc.collect()
torch.cuda.empty_cache()
unet = StableCascadeUNet.from_pretrained(
"stabilityai/stable-cascade-prior", subfolder="prior", revision="refs/pr/2", variant="bf16"
)
unet = StableCascadeUNet.from_pretrained("stabilityai/stable-cascade-prior", subfolder="prior", variant="bf16")
unet_config = unet.config
del unet
gc.collect()
@@ -74,9 +72,7 @@ class StableCascadeUNetModelSlowTests(unittest.TestCase):
gc.collect()
torch.cuda.empty_cache()
unet = StableCascadeUNet.from_pretrained(
"stabilityai/stable-cascade", subfolder="decoder", revision="refs/pr/44", variant="bf16"
)
unet = StableCascadeUNet.from_pretrained("stabilityai/stable-cascade", subfolder="decoder", variant="bf16")
unet_config = unet.config
del unet
gc.collect()
+10 -3
View File
@@ -211,7 +211,7 @@ class ControlNetPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"image": image,
}
@@ -402,7 +402,7 @@ class StableDiffusionMultiControlNetPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"image": images,
}
@@ -602,7 +602,7 @@ class StableDiffusionMultiControlNetOneModelPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"image": images,
}
@@ -1092,6 +1092,13 @@ class ControlNetPipelineSlowTests(unittest.TestCase):
for param_name, param_value in single_file_pipe.controlnet.config.items():
if param_name in PARAMS_TO_IGNORE:
continue
# This parameter doesn't appear to be loaded from the config.
# So when it is registered to config, it remains a tuple as this is the default in the class definition
# from_pretrained, does load from config and converts to a list when registering to config
if param_name == "conditioning_embedding_out_channels" and isinstance(param_value, tuple):
param_value = list(param_value)
assert (
pipe.controlnet.config[param_name] == param_value
), f"{param_name} differs between single file loading and pretrained loading"
@@ -164,7 +164,7 @@ class ControlNetImg2ImgPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"image": image,
"control_image": control_image,
}
@@ -313,7 +313,7 @@ class StableDiffusionMultiControlNetPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"image": image,
"control_image": control_image,
}
@@ -155,7 +155,7 @@ class ControlNetInpaintPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"image": image,
"mask_image": mask_image,
"control_image": control_image,
@@ -375,7 +375,7 @@ class MultiControlNetInpaintPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"image": image,
"mask_image": mask_image,
"control_image": control_image,
@@ -172,7 +172,7 @@ class ControlNetPipelineSDXLFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"image": init_image,
"mask_image": mask_image,
"control_image": control_image,
@@ -1002,6 +1002,11 @@ class ControlNetSDXLPipelineSlowTests(unittest.TestCase):
for param_name, param_value in single_file_pipe.unet.config.items():
if param_name in PARAMS_TO_IGNORE:
continue
# Upcast attention might be set to None in a config file, which is incorrect. It should default to False in the model
if param_name == "upcast_attention" and pipe.unet.config[param_name] is None:
pipe.unet.config[param_name] = False
assert (
pipe.unet.config[param_name] == param_value
), f"{param_name} differs between single file loading and pretrained loading"
@@ -163,7 +163,7 @@ class ControlNetPipelineSDXLImg2ImgFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"image": image,
"control_image": image,
}
+3 -3
View File
@@ -63,7 +63,7 @@ class DDIMPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
"batch_size": 1,
"generator": generator,
"num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -113,7 +113,7 @@ class DDIMPipelineIntegrationTests(unittest.TestCase):
ddim.set_progress_bar_config(disable=None)
generator = torch.manual_seed(0)
image = ddim(generator=generator, eta=0.0, output_type="numpy").images
image = ddim(generator=generator, eta=0.0, output_type="np").images
image_slice = image[0, -3:, -3:, -1]
@@ -133,7 +133,7 @@ class DDIMPipelineIntegrationTests(unittest.TestCase):
ddpm.set_progress_bar_config(disable=None)
generator = torch.manual_seed(0)
image = ddpm(generator=generator, output_type="numpy").images
image = ddpm(generator=generator, output_type="np").images
image_slice = image[0, -3:, -3:, -1]
+5 -5
View File
@@ -50,10 +50,10 @@ class DDPMPipelineFastTests(unittest.TestCase):
ddpm.set_progress_bar_config(disable=None)
generator = torch.Generator(device=device).manual_seed(0)
image = ddpm(generator=generator, num_inference_steps=2, output_type="numpy").images
image = ddpm(generator=generator, num_inference_steps=2, output_type="np").images
generator = torch.Generator(device=device).manual_seed(0)
image_from_tuple = ddpm(generator=generator, num_inference_steps=2, output_type="numpy", return_dict=False)[0]
image_from_tuple = ddpm(generator=generator, num_inference_steps=2, output_type="np", return_dict=False)[0]
image_slice = image[0, -3:, -3:, -1]
image_from_tuple_slice = image_from_tuple[0, -3:, -3:, -1]
@@ -75,10 +75,10 @@ class DDPMPipelineFastTests(unittest.TestCase):
ddpm.set_progress_bar_config(disable=None)
generator = torch.manual_seed(0)
image = ddpm(generator=generator, num_inference_steps=2, output_type="numpy").images
image = ddpm(generator=generator, num_inference_steps=2, output_type="np").images
generator = torch.manual_seed(0)
image_eps = ddpm(generator=generator, num_inference_steps=2, output_type="numpy")[0]
image_eps = ddpm(generator=generator, num_inference_steps=2, output_type="np")[0]
image_slice = image[0, -3:, -3:, -1]
image_eps_slice = image_eps[0, -3:, -3:, -1]
@@ -102,7 +102,7 @@ class DDPMPipelineIntegrationTests(unittest.TestCase):
ddpm.set_progress_bar_config(disable=None)
generator = torch.manual_seed(0)
image = ddpm(generator=generator, output_type="numpy").images
image = ddpm(generator=generator, output_type="np").images
image_slice = image[0, -3:, -3:, -1]
+1 -1
View File
@@ -50,7 +50,7 @@ class IFPipelineFastTests(PipelineTesterMixin, IFPipelineTesterMixin, unittest.T
"prompt": "A painting of a squirrel eating a burger",
"generator": generator,
"num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -55,7 +55,7 @@ class IFImg2ImgPipelineFastTests(PipelineTesterMixin, IFPipelineTesterMixin, uni
"image": image,
"generator": generator,
"num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -57,7 +57,7 @@ class IFImg2ImgSuperResolutionPipelineFastTests(PipelineTesterMixin, IFPipelineT
"original_image": original_image,
"generator": generator,
"num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -57,7 +57,7 @@ class IFInpaintingPipelineFastTests(PipelineTesterMixin, IFPipelineTesterMixin,
"mask_image": mask_image,
"generator": generator,
"num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -59,7 +59,7 @@ class IFInpaintingSuperResolutionPipelineFastTests(PipelineTesterMixin, IFPipeli
"mask_image": mask_image,
"generator": generator,
"num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -52,7 +52,7 @@ class IFSuperResolutionPipelineFastTests(PipelineTesterMixin, IFPipelineTesterMi
"image": image,
"generator": generator,
"num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
+1 -1
View File
@@ -74,7 +74,7 @@ class DiTPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
"class_labels": [1],
"generator": generator,
"num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -113,7 +113,7 @@ class LDMTextToImagePipelineFastTests(PipelineTesterMixin, unittest.TestCase):
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -153,7 +153,7 @@ class LDMTextToImagePipelineSlowTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 3,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -189,7 +189,7 @@ class LDMTextToImagePipelineNightlyTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 50,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -84,7 +84,7 @@ class LDMSuperResolutionPipelineFastTests(unittest.TestCase):
init_image = self.dummy_image.to(device)
generator = torch.Generator(device=device).manual_seed(0)
image = ldm(image=init_image, generator=generator, num_inference_steps=2, output_type="numpy").images
image = ldm(image=init_image, generator=generator, num_inference_steps=2, output_type="np").images
image_slice = image[0, -3:, -3:, -1]
@@ -109,7 +109,7 @@ class LDMSuperResolutionPipelineFastTests(unittest.TestCase):
init_image = self.dummy_image.to(torch_device)
image = ldm(init_image, num_inference_steps=2, output_type="numpy").images
image = ldm(init_image, num_inference_steps=2, output_type="np").images
assert image.shape == (1, 64, 64, 3)
@@ -128,7 +128,7 @@ class LDMSuperResolutionPipelineIntegrationTests(unittest.TestCase):
ldm.set_progress_bar_config(disable=None)
generator = torch.manual_seed(0)
image = ldm(image=init_image, generator=generator, num_inference_steps=20, output_type="numpy").images
image = ldm(image=init_image, generator=generator, num_inference_steps=20, output_type="np").images
image_slice = image[0, -3:, -3:, -1]
@@ -117,7 +117,7 @@ class PaintByExamplePipelineFastTests(PipelineTesterMixin, unittest.TestCase):
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
+3 -3
View File
@@ -49,10 +49,10 @@ class PNDMPipelineFastTests(unittest.TestCase):
pndm.set_progress_bar_config(disable=None)
generator = torch.manual_seed(0)
image = pndm(generator=generator, num_inference_steps=20, output_type="numpy").images
image = pndm(generator=generator, num_inference_steps=20, output_type="np").images
generator = torch.manual_seed(0)
image_from_tuple = pndm(generator=generator, num_inference_steps=20, output_type="numpy", return_dict=False)[0]
image_from_tuple = pndm(generator=generator, num_inference_steps=20, output_type="np", return_dict=False)[0]
image_slice = image[0, -3:, -3:, -1]
image_from_tuple_slice = image_from_tuple[0, -3:, -3:, -1]
@@ -77,7 +77,7 @@ class PNDMPipelineIntegrationTests(unittest.TestCase):
pndm.to(torch_device)
pndm.set_progress_bar_config(disable=None)
generator = torch.manual_seed(0)
image = pndm(generator=generator, output_type="numpy").images
image = pndm(generator=generator, output_type="np").images
image_slice = image[0, -3:, -3:, -1]
@@ -21,13 +21,13 @@ import torch
from transformers import CLIPTextConfig, CLIPTextModelWithProjection, CLIPTokenizer
from diffusers import DDPMWuerstchenScheduler, StableCascadeDecoderPipeline
from diffusers.image_processor import VaeImageProcessor
from diffusers.models import StableCascadeUNet
from diffusers.pipelines.wuerstchen import PaellaVQModel
from diffusers.utils.testing_utils import (
enable_full_determinism,
load_image,
load_numpy,
load_pt,
numpy_cosine_similarity_distance,
require_torch_gpu,
skip_mps,
slow,
@@ -258,7 +258,7 @@ class StableCascadeDecoderPipelineIntegrationTests(unittest.TestCase):
def test_stable_cascade_decoder(self):
pipe = StableCascadeDecoderPipeline.from_pretrained(
"diffusers/StableCascade-decoder", torch_dtype=torch.bfloat16
"stabilityai/stable-cascade", variant="bf16", torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
pipe.set_progress_bar_config(disable=None)
@@ -271,18 +271,16 @@ class StableCascadeDecoderPipelineIntegrationTests(unittest.TestCase):
)
image = pipe(
prompt=prompt, image_embeddings=image_embedding, num_inference_steps=10, generator=generator
prompt=prompt,
image_embeddings=image_embedding,
output_type="np",
num_inference_steps=2,
generator=generator,
).images[0]
assert image.size == (1024, 1024)
expected_image = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_cascade/t2i.png"
assert image.shape == (1024, 1024, 3)
expected_image = load_numpy(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_cascade/stable_cascade_decoder_image.npy"
)
image_processor = VaeImageProcessor()
image_np = image_processor.pil_to_numpy(image)
expected_image_np = image_processor.pil_to_numpy(expected_image)
self.assertTrue(np.allclose(image_np, expected_image_np, atol=53e-2))
max_diff = numpy_cosine_similarity_distance(image.flatten(), expected_image.flatten())
assert max_diff < 1e-4
@@ -29,7 +29,8 @@ from diffusers.models.attention_processor import LoRAAttnProcessor, LoRAAttnProc
from diffusers.utils.import_utils import is_peft_available
from diffusers.utils.testing_utils import (
enable_full_determinism,
load_pt,
load_numpy,
numpy_cosine_similarity_distance,
require_peft_backend,
require_torch_gpu,
skip_mps,
@@ -319,7 +320,9 @@ class StableCascadePriorPipelineIntegrationTests(unittest.TestCase):
torch.cuda.empty_cache()
def test_stable_cascade_prior(self):
pipe = StableCascadePriorPipeline.from_pretrained("diffusers/StableCascade-prior", torch_dtype=torch.bfloat16)
pipe = StableCascadePriorPipeline.from_pretrained(
"stabilityai/stable-cascade-prior", variant="bf16", torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
pipe.set_progress_bar_config(disable=None)
@@ -327,17 +330,12 @@ class StableCascadePriorPipelineIntegrationTests(unittest.TestCase):
generator = torch.Generator(device="cpu").manual_seed(0)
output = pipe(prompt, num_inference_steps=10, generator=generator)
output = pipe(prompt, num_inference_steps=2, output_type="np", generator=generator)
image_embedding = output.image_embeddings
expected_image_embedding = load_pt(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_cascade/image_embedding.pt"
expected_image_embedding = load_numpy(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_cascade/stable_cascade_prior_image_embeddings.npy"
)
assert image_embedding.shape == (1, 16, 24, 24)
self.assertTrue(
np.allclose(
image_embedding.cpu().float().numpy(), expected_image_embedding.cpu().float().numpy(), atol=5e-2
)
)
max_diff = numpy_cosine_similarity_distance(image_embedding.flatten(), expected_image_embedding.flatten())
assert max_diff < 1e-4
@@ -46,7 +46,7 @@ class OnnxStableDiffusionPipelineFastTests(OnnxPipelineTesterMixin, unittest.Tes
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -55,7 +55,7 @@ class OnnxStableDiffusionImg2ImgPipelineFastTests(OnnxPipelineTesterMixin, unitt
"num_inference_steps": 3,
"strength": 0.75,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -55,7 +55,7 @@ class OnnxStableDiffusionUpscalePipelineFastTests(OnnxPipelineTesterMixin, unitt
"generator": generator,
"num_inference_steps": 3,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -775,7 +775,7 @@ class StableDiffusionPipelineSlowTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 3,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -950,7 +950,7 @@ class StableDiffusionPipelineSlowTests(unittest.TestCase):
generator=generator,
guidance_scale=7.5,
num_inference_steps=2,
output_type="numpy",
output_type="np",
)
image_chunked = output_chunked.images
@@ -966,7 +966,7 @@ class StableDiffusionPipelineSlowTests(unittest.TestCase):
generator=generator,
guidance_scale=7.5,
num_inference_steps=2,
output_type="numpy",
output_type="np",
)
image = output.images
@@ -179,7 +179,7 @@ class StableDiffusionImg2ImgPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -199,7 +199,7 @@ class StableDiffusionInpaintPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -470,7 +470,7 @@ class StableDiffusionSimpleInpaintPipelineFastTests(StableDiffusionInpaintPipeli
"generator": [generator1, generator2],
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -586,7 +586,7 @@ class StableDiffusionInpaintPipelineSlowTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 3,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -847,7 +847,7 @@ class StableDiffusionInpaintPipelineAsymmetricAutoencoderKLSlowTests(unittest.Te
"generator": generator,
"num_inference_steps": 3,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -1072,7 +1072,7 @@ class StableDiffusionInpaintPipelineNightlyTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 50,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -131,7 +131,7 @@ class StableDiffusionInstructPix2PixPipelineFastTests(
"num_inference_steps": 2,
"guidance_scale": 6.0,
"image_guidance_scale": 1,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -288,7 +288,7 @@ class StableDiffusionInstructPix2PixPipelineSlowTests(unittest.TestCase):
"num_inference_steps": 3,
"guidance_scale": 7.5,
"image_guidance_scale": 1.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -151,7 +151,7 @@ class StableDiffusion2PipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -336,7 +336,7 @@ class StableDiffusion2PipelineSlowTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 3,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -557,7 +557,7 @@ class StableDiffusion2PipelineNightlyTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 50,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -138,7 +138,7 @@ class StableDiffusionAttendAndExcitePipelineFastTests(
"generator": generator,
"num_inference_steps": 1,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"max_iter_to_alter": 2,
"thresholds": {0: 0.7},
}
@@ -225,7 +225,7 @@ class StableDiffusionAttendAndExcitePipelineIntegrationTests(unittest.TestCase):
generator=generator,
num_inference_steps=5,
max_iter_to_alter=5,
output_type="numpy",
output_type="np",
).images[0]
expected_image = load_numpy(
@@ -174,7 +174,7 @@ class StableDiffusionDepth2ImgPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -395,7 +395,7 @@ class StableDiffusionDepth2ImgPipelineSlowTests(unittest.TestCase):
"num_inference_steps": 3,
"strength": 0.75,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -534,7 +534,7 @@ class StableDiffusionImg2ImgPipelineNightlyTests(unittest.TestCase):
"num_inference_steps": 3,
"strength": 0.75,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -143,7 +143,7 @@ class StableDiffusionDiffEditPipelineFastTests(PipelineLatentTesterMixin, Pipeli
"num_inference_steps": 2,
"inpaint_strength": 1.0,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -165,7 +165,7 @@ class StableDiffusionDiffEditPipelineFastTests(PipelineLatentTesterMixin, Pipeli
"num_maps_per_mask": 2,
"mask_encode_strength": 1.0,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -186,7 +186,7 @@ class StableDiffusionDiffEditPipelineFastTests(PipelineLatentTesterMixin, Pipeli
"inpaint_strength": 1.0,
"guidance_scale": 6.0,
"decode_latents": True,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -417,7 +417,7 @@ class StableDiffusionDiffEditPipelineNightlyTests(unittest.TestCase):
negative_prompt=source_prompt,
inpaint_strength=0.7,
num_inference_steps=25,
output_type="numpy",
output_type="np",
).images[0]
expected_image = (
@@ -129,7 +129,7 @@ class StableDiffusion2InpaintPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -155,7 +155,7 @@ class StableDiffusionLatentUpscalePipelineFastTests(
"image": self.dummy_image.cpu(),
"generator": generator,
"num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -522,7 +522,7 @@ class StableDiffusionUpscalePipelineIntegrationTests(unittest.TestCase):
ckpt_path = (
"https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/x4-upscaler-ema.safetensors"
)
single_file_pipe = StableDiffusionUpscalePipeline.from_single_file(ckpt_path)
single_file_pipe = StableDiffusionUpscalePipeline.from_single_file(ckpt_path, load_safety_checker=True)
for param_name, param_value in single_file_pipe.text_encoder.config.to_dict().items():
if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
@@ -540,12 +540,13 @@ class StableDiffusionUpscalePipelineIntegrationTests(unittest.TestCase):
for param_name, param_value in single_file_pipe.vae.config.items():
if param_name in PARAMS_TO_IGNORE:
continue
# The sample_size parameter for the VAE is incorrectly configured on the hub
# It must be 512, but it is 256 on the hub
if param_name == "sample_size":
pipe.vae.config[param_name] = param_value
assert (
pipe.vae.config[param_name] == param_value
), f"{param_name} differs between single file loading and pretrained loading"
for param_name, param_value in single_file_pipe.safety_checker.config.to_dict().items():
if param_name in PARAMS_TO_IGNORE:
continue
assert (
pipe.safety_checker.config.to_dict()[param_name] == param_value
), f"{param_name} differs between single file loading and pretrained loading"
@@ -308,7 +308,7 @@ class StableDiffusion2VPredictionPipelineIntegrationTests(unittest.TestCase):
prompt = "A painting of a squirrel eating a burger"
generator = torch.manual_seed(0)
output = sd_pipe([prompt], generator=generator, num_inference_steps=5, output_type="numpy")
output = sd_pipe([prompt], generator=generator, num_inference_steps=5, output_type="np")
image = output.images
image_slice = image[0, 253:256, 253:256, -1]
@@ -335,7 +335,7 @@ class StableDiffusion2VPredictionPipelineIntegrationTests(unittest.TestCase):
prompt = "a photograph of an astronaut riding a horse"
generator = torch.manual_seed(0)
image = sd_pipe(
[prompt], generator=generator, guidance_scale=7.5, num_inference_steps=5, output_type="numpy"
[prompt], generator=generator, guidance_scale=7.5, num_inference_steps=5, output_type="np"
).images
image_slice = image[0, 253:256, 253:256, -1]
@@ -357,7 +357,7 @@ class StableDiffusion2VPredictionPipelineIntegrationTests(unittest.TestCase):
pipe.enable_attention_slicing()
generator = torch.manual_seed(0)
output_chunked = pipe(
[prompt], generator=generator, guidance_scale=7.5, num_inference_steps=10, output_type="numpy"
[prompt], generator=generator, guidance_scale=7.5, num_inference_steps=10, output_type="np"
)
image_chunked = output_chunked.images
@@ -369,7 +369,7 @@ class StableDiffusion2VPredictionPipelineIntegrationTests(unittest.TestCase):
# disable slicing
pipe.disable_attention_slicing()
generator = torch.manual_seed(0)
output = pipe([prompt], generator=generator, guidance_scale=7.5, num_inference_steps=10, output_type="numpy")
output = pipe([prompt], generator=generator, guidance_scale=7.5, num_inference_steps=10, output_type="np")
image = output.images
# make sure that more than 3.0 GB is allocated
@@ -246,7 +246,7 @@ class AdapterTests:
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -117,7 +117,7 @@ class StableDiffusionImageVariationPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -293,7 +293,7 @@ class StableDiffusionImageVariationPipelineNightlyTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 50,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -107,7 +107,7 @@ class StableDiffusionLDM3DPipelineFastTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -222,7 +222,7 @@ class StableDiffusionLDM3DPipelineSlowTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 3,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -268,7 +268,7 @@ class StableDiffusionPipelineNightlyTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 50,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -105,7 +105,7 @@ class StableDiffusionPanoramaPipelineFastTests(PipelineLatentTesterMixin, Pipeli
"width": None,
"num_inference_steps": 1,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -263,7 +263,7 @@ class StableDiffusionPanoramaNightlyTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 3,
"guidance_scale": 7.5,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -290,7 +290,7 @@ class StableDiffusionXLAdapterPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 5.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -143,7 +143,7 @@ class StableDiffusionXLInstructPix2PixPipelineFastTests(
"num_inference_steps": 2,
"guidance_scale": 6.0,
"image_guidance_scale": 1,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -168,7 +168,7 @@ class StableUnCLIPPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"prior_num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
+29 -29
View File
@@ -117,10 +117,10 @@ def _test_from_save_pretrained_dynamo(in_queue, out_queue, timeout):
new_ddpm.to(torch_device)
generator = torch.Generator(device=torch_device).manual_seed(0)
image = ddpm(generator=generator, num_inference_steps=5, output_type="numpy").images
image = ddpm(generator=generator, num_inference_steps=5, output_type="np").images
generator = torch.Generator(device=torch_device).manual_seed(0)
new_image = new_ddpm(generator=generator, num_inference_steps=5, output_type="numpy").images
new_image = new_ddpm(generator=generator, num_inference_steps=5, output_type="np").images
assert np.abs(image - new_image).max() < 1e-5, "Models don't give the same forward pass"
except Exception:
@@ -363,12 +363,12 @@ class DownloadTests(unittest.TestCase):
)
pipe = pipe.to(torch_device)
generator = torch.manual_seed(0)
out = pipe(prompt, num_inference_steps=2, generator=generator, output_type="numpy").images
out = pipe(prompt, num_inference_steps=2, generator=generator, output_type="np").images
pipe_2 = StableDiffusionPipeline.from_pretrained("hf-internal-testing/tiny-stable-diffusion-torch")
pipe_2 = pipe_2.to(torch_device)
generator = torch.manual_seed(0)
out_2 = pipe_2(prompt, num_inference_steps=2, generator=generator, output_type="numpy").images
out_2 = pipe_2(prompt, num_inference_steps=2, generator=generator, output_type="np").images
assert np.max(np.abs(out - out_2)) < 1e-3
@@ -379,7 +379,7 @@ class DownloadTests(unittest.TestCase):
)
pipe = pipe.to(torch_device)
generator = torch.manual_seed(0)
out = pipe(prompt, num_inference_steps=2, generator=generator, output_type="numpy").images
out = pipe(prompt, num_inference_steps=2, generator=generator, output_type="np").images
with tempfile.TemporaryDirectory() as tmpdirname:
pipe.save_pretrained(tmpdirname)
@@ -388,7 +388,7 @@ class DownloadTests(unittest.TestCase):
generator = torch.manual_seed(0)
out_2 = pipe_2(prompt, num_inference_steps=2, generator=generator, output_type="numpy").images
out_2 = pipe_2(prompt, num_inference_steps=2, generator=generator, output_type="np").images
assert np.max(np.abs(out - out_2)) < 1e-3
@@ -398,7 +398,7 @@ class DownloadTests(unittest.TestCase):
pipe = pipe.to(torch_device)
generator = torch.manual_seed(0)
out = pipe(prompt, num_inference_steps=2, generator=generator, output_type="numpy").images
out = pipe(prompt, num_inference_steps=2, generator=generator, output_type="np").images
with tempfile.TemporaryDirectory() as tmpdirname:
pipe.save_pretrained(tmpdirname)
@@ -407,7 +407,7 @@ class DownloadTests(unittest.TestCase):
generator = torch.manual_seed(0)
out_2 = pipe_2(prompt, num_inference_steps=2, generator=generator, output_type="numpy").images
out_2 = pipe_2(prompt, num_inference_steps=2, generator=generator, output_type="np").images
assert np.max(np.abs(out - out_2)) < 1e-3
@@ -590,7 +590,7 @@ class DownloadTests(unittest.TestCase):
)
pipe = pipe.to(torch_device)
generator = torch.manual_seed(0)
out = pipe(prompt, num_inference_steps=2, generator=generator, output_type="numpy").images
out = pipe(prompt, num_inference_steps=2, generator=generator, output_type="np").images
with tempfile.TemporaryDirectory() as tmpdirname:
pipe.save_pretrained(tmpdirname)
@@ -601,7 +601,7 @@ class DownloadTests(unittest.TestCase):
generator = torch.manual_seed(0)
out_2 = pipe_2(prompt, num_inference_steps=2, generator=generator, output_type="numpy").images
out_2 = pipe_2(prompt, num_inference_steps=2, generator=generator, output_type="np").images
assert np.max(np.abs(out - out_2)) < 1e-3
@@ -626,7 +626,7 @@ class DownloadTests(unittest.TestCase):
assert pipe._maybe_convert_prompt("<*>", pipe.tokenizer) == "<*>"
prompt = "hey <*>"
out = pipe(prompt, num_inference_steps=1, output_type="numpy").images
out = pipe(prompt, num_inference_steps=1, output_type="np").images
assert out.shape == (1, 128, 128, 3)
# single token load local with weight name
@@ -642,7 +642,7 @@ class DownloadTests(unittest.TestCase):
assert pipe._maybe_convert_prompt("<**>", pipe.tokenizer) == "<**>"
prompt = "hey <**>"
out = pipe(prompt, num_inference_steps=1, output_type="numpy").images
out = pipe(prompt, num_inference_steps=1, output_type="np").images
assert out.shape == (1, 128, 128, 3)
# multi token load
@@ -665,7 +665,7 @@ class DownloadTests(unittest.TestCase):
assert pipe._maybe_convert_prompt("<***>", pipe.tokenizer) == "<***> <***>_1 <***>_2"
prompt = "hey <***>"
out = pipe(prompt, num_inference_steps=1, output_type="numpy").images
out = pipe(prompt, num_inference_steps=1, output_type="np").images
assert out.shape == (1, 128, 128, 3)
# multi token load a1111
@@ -693,7 +693,7 @@ class DownloadTests(unittest.TestCase):
assert pipe._maybe_convert_prompt("<****>", pipe.tokenizer) == "<****> <****>_1 <****>_2"
prompt = "hey <****>"
out = pipe(prompt, num_inference_steps=1, output_type="numpy").images
out = pipe(prompt, num_inference_steps=1, output_type="np").images
assert out.shape == (1, 128, 128, 3)
# multi embedding load
@@ -718,7 +718,7 @@ class DownloadTests(unittest.TestCase):
assert pipe._maybe_convert_prompt("<******>", pipe.tokenizer) == "<******>"
prompt = "hey <*****> <******>"
out = pipe(prompt, num_inference_steps=1, output_type="numpy").images
out = pipe(prompt, num_inference_steps=1, output_type="np").images
assert out.shape == (1, 128, 128, 3)
# single token state dict load
@@ -731,7 +731,7 @@ class DownloadTests(unittest.TestCase):
assert pipe._maybe_convert_prompt("<x>", pipe.tokenizer) == "<x>"
prompt = "hey <x>"
out = pipe(prompt, num_inference_steps=1, output_type="numpy").images
out = pipe(prompt, num_inference_steps=1, output_type="np").images
assert out.shape == (1, 128, 128, 3)
# multi embedding state dict load
@@ -751,7 +751,7 @@ class DownloadTests(unittest.TestCase):
assert pipe._maybe_convert_prompt("<xxxxxx>", pipe.tokenizer) == "<xxxxxx>"
prompt = "hey <xxxxx> <xxxxxx>"
out = pipe(prompt, num_inference_steps=1, output_type="numpy").images
out = pipe(prompt, num_inference_steps=1, output_type="np").images
assert out.shape == (1, 128, 128, 3)
# auto1111 multi-token state dict load
@@ -777,7 +777,7 @@ class DownloadTests(unittest.TestCase):
assert pipe._maybe_convert_prompt("<xxxx>", pipe.tokenizer) == "<xxxx> <xxxx>_1 <xxxx>_2"
prompt = "hey <xxxx>"
out = pipe(prompt, num_inference_steps=1, output_type="numpy").images
out = pipe(prompt, num_inference_steps=1, output_type="np").images
assert out.shape == (1, 128, 128, 3)
# multiple references to multi embedding
@@ -789,7 +789,7 @@ class DownloadTests(unittest.TestCase):
)
prompt = "hey <cat> <cat>"
out = pipe(prompt, num_inference_steps=1, output_type="numpy").images
out = pipe(prompt, num_inference_steps=1, output_type="np").images
assert out.shape == (1, 128, 128, 3)
def test_text_inversion_multi_tokens(self):
@@ -1739,10 +1739,10 @@ class PipelineSlowTests(unittest.TestCase):
new_ddpm.to(torch_device)
generator = torch.Generator(device=torch_device).manual_seed(0)
image = ddpm(generator=generator, num_inference_steps=5, output_type="numpy").images
image = ddpm(generator=generator, num_inference_steps=5, output_type="np").images
generator = torch.Generator(device=torch_device).manual_seed(0)
new_image = new_ddpm(generator=generator, num_inference_steps=5, output_type="numpy").images
new_image = new_ddpm(generator=generator, num_inference_steps=5, output_type="np").images
assert np.abs(image - new_image).max() < 1e-5, "Models don't give the same forward pass"
@@ -1765,10 +1765,10 @@ class PipelineSlowTests(unittest.TestCase):
ddpm_from_hub.set_progress_bar_config(disable=None)
generator = torch.Generator(device=torch_device).manual_seed(0)
image = ddpm(generator=generator, num_inference_steps=5, output_type="numpy").images
image = ddpm(generator=generator, num_inference_steps=5, output_type="np").images
generator = torch.Generator(device=torch_device).manual_seed(0)
new_image = ddpm_from_hub(generator=generator, num_inference_steps=5, output_type="numpy").images
new_image = ddpm_from_hub(generator=generator, num_inference_steps=5, output_type="np").images
assert np.abs(image - new_image).max() < 1e-5, "Models don't give the same forward pass"
@@ -1788,10 +1788,10 @@ class PipelineSlowTests(unittest.TestCase):
ddpm_from_hub_custom_model.set_progress_bar_config(disable=None)
generator = torch.Generator(device=torch_device).manual_seed(0)
image = ddpm_from_hub_custom_model(generator=generator, num_inference_steps=5, output_type="numpy").images
image = ddpm_from_hub_custom_model(generator=generator, num_inference_steps=5, output_type="np").images
generator = torch.Generator(device=torch_device).manual_seed(0)
new_image = ddpm_from_hub(generator=generator, num_inference_steps=5, output_type="numpy").images
new_image = ddpm_from_hub(generator=generator, num_inference_steps=5, output_type="np").images
assert np.abs(image - new_image).max() < 1e-5, "Models don't give the same forward pass"
@@ -1803,7 +1803,7 @@ class PipelineSlowTests(unittest.TestCase):
pipe.to(torch_device)
pipe.set_progress_bar_config(disable=None)
images = pipe(output_type="numpy").images
images = pipe(output_type="np").images
assert images.shape == (1, 32, 32, 3)
assert isinstance(images, np.ndarray)
@@ -1878,7 +1878,7 @@ class PipelineSlowTests(unittest.TestCase):
generator = [torch.Generator(device="cpu").manual_seed(33) for _ in range(prompt_embeds.shape[0])]
images = pipe(
prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20, output_type="numpy"
prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20, output_type="np"
).images
for i, image in enumerate(images):
@@ -1916,7 +1916,7 @@ class PipelineNightlyTests(unittest.TestCase):
ddim.set_progress_bar_config(disable=None)
generator = torch.Generator(device=torch_device).manual_seed(seed)
ddpm_images = ddpm(batch_size=2, generator=generator, output_type="numpy").images
ddpm_images = ddpm(batch_size=2, generator=generator, output_type="np").images
generator = torch.Generator(device=torch_device).manual_seed(seed)
ddim_images = ddim(
@@ -1924,7 +1924,7 @@ class PipelineNightlyTests(unittest.TestCase):
generator=generator,
num_inference_steps=1000,
eta=1.0,
output_type="numpy",
output_type="np",
use_clipped_model_output=True, # Need this to make DDIM match DDPM
).images
+1 -1
View File
@@ -233,7 +233,7 @@ class UnCLIPPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
"prior_num_inference_steps": 2,
"decoder_num_inference_steps": 2,
"super_res_num_inference_steps": 2,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -158,7 +158,7 @@ class UniDiffuserPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
}
return inputs
@@ -199,7 +199,7 @@ class UniDiffuserPipelineFastTests(
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 6.0,
"output_type": "numpy",
"output_type": "np",
"prompt_latents": latents.get("prompt_latents"),
"vae_latents": latents.get("vae_latents"),
"clip_latents": latents.get("clip_latents"),
@@ -590,7 +590,7 @@ class UniDiffuserPipelineSlowTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 3,
"guidance_scale": 8.0,
"output_type": "numpy",
"output_type": "np",
}
if generate_latents:
latents = self.get_fixed_latents(device, seed=seed)
@@ -706,7 +706,7 @@ class UniDiffuserPipelineNightlyTests(unittest.TestCase):
"generator": generator,
"num_inference_steps": 3,
"guidance_scale": 8.0,
"output_type": "numpy",
"output_type": "np",
}
if generate_latents:
latents = self.get_fixed_latents(device, seed=seed)