Compare commits

..

112 Commits

Author SHA1 Message Date
Aryan a50b0bd66d Merge branch 'main' into hidream-license 2025-04-22 07:02:18 +05:30
Ishan Modi f59df3bb8b [Refactor] Minor Improvement for import utils (#11161)
* update

* update

* addressed PR comments

* update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-21 09:56:55 -10:00
josephrocca a00c73a5e1 Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma (#11120)
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-21 09:28:19 -10:00
OleehyO 0434db9a99 [cogview4][feat] Support attention mechanism with variable-length support and batch packing (#11349)
* [cogview4] Enhance attention mechanism with variable-length support and batch packing

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-21 09:27:55 -10:00
Aamir Nazir aff574fb29 Add Serialized Type Name kwarg in Model Output (#10502)
* Update outputs.py
2025-04-21 08:45:28 -10:00
Ishan Modi 79ea8eb258 [BUG] fixes in kadinsky pipeline (#11080)
* bug fix kadinsky pipeline
2025-04-21 08:41:09 -10:00
Aryan e7f3a73786 Fix Wan I2V prepare_latents dtype (#11371)
update
2025-04-21 08:18:50 -10:00
PromeAI 7a4a126db8 fix issue that training flux controlnet was unstable and validation r… (#11373)
* fix issue that training flux controlnet was unstable and validation results were unstable

* del unused code pieces, fix grammar

---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-21 08:16:05 -10:00
Kenneth Gerald Hamilton 0dec414d5b [train_dreambooth_lora_sdxl.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env (#11240)
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-04-21 12:51:03 +05:30
Linoy Tsaban 44eeba07b2 [Flux LoRAs] fix lr scheduler bug in distributed scenarios (#11242)
* add fix

* add fix

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-21 10:08:45 +03:00
YiYi Xu 5873377a66 [Wan2.1-FLF2V] update conversion script (#11365)
update scheuler config in conversion sript
2025-04-18 14:08:44 -10:00
YiYi Xu 5a2e0f715c update output for Hidream transformer (#11366)
up
2025-04-18 14:07:21 -10:00
Kazuki Yoda ef47726e2d Fix: StableDiffusionXLControlNetAdapterInpaintPipeline incorrectly inherited StableDiffusionLoraLoaderMixin (#11357)
Fix: Inherit `StableDiffusionXLLoraLoaderMixin`

`StableDiffusionXLControlNetAdapterInpaintPipeline`
used to incorrectly inherit
`StableDiffusionLoraLoaderMixin`
instead of `StableDiffusionXLLoraLoaderMixin`
2025-04-18 12:46:06 -10:00
YiYi Xu 0021bfa1e1 support Wan-FLF2V (#11353)
* update transformer

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2025-04-18 10:27:50 -10:00
Aryan 530fe0722d update license year 2025-04-18 13:29:31 +02:00
Aryan 8dbc4504f3 add license 2025-04-18 06:55:21 +02:00
Marc Sun bbd0c161b5 [BNB] Fix test_moving_to_cpu_throws_warning (#11356)
fix

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-18 09:44:51 +05:30
Yao Matrix eef3d65954 enable 2 test cases on XPU (#11332)
* enable 2 test cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Apply style fixes

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-04-17 13:27:41 -10:00
Frank (Haofan) Wang ee6ad51d96 Update controlnet_flux.py (#11350) 2025-04-17 10:05:01 -10:00
Sayak Paul 4397f59a37 [bitsandbytes] improve dtype mismatch handling for bnb + lora. (#11270)
* improve dtype mismatch handling for bnb + lora.

* add a test

* fix and updates

* update
2025-04-17 19:51:49 +05:30
YiYi Xu 056793295c [Hi Dream] follow-up (#11296)
* add
2025-04-17 01:17:44 -10:00
Sayak Paul 29d2afbfe2 [LoRA] Propagate hotswap better (#11333)
* propagate hotswap to other load_lora_weights() methods.

* simplify documentations.

* updates

* propagate to load_lora_into_text_encoder.

* empty commit
2025-04-17 10:35:38 +05:30
Sayak Paul b00a564dac [docs] add note about use_duck_shape in auraflow docs. (#11348)
add note about use_duck_shape in auraflow docs.
2025-04-17 10:25:39 +05:30
Sayak Paul efc9d68b15 [chore] fix lora docs utils (#11338)
fix lora docs utils
2025-04-17 09:25:53 +05:30
nPeppon 3e59d531d1 Fix wrong dtype argument name as torch_dtype (#11346) 2025-04-16 16:00:25 -04:00
Ishan Modi d63e6fccb1 [BUG] fixed _toctree.yml alphabetical ordering (#11277)
update
2025-04-16 09:04:22 -07:00
Dhruv Nair 59f1b7b1c8 Hunyuan I2V fast tests fix (#11341)
* update

* update
2025-04-16 18:40:33 +05:30
Sayak Paul ce1063acfa [docs] add a snippet for compilation in the auraflow docs. (#11327)
* add a snippet for compilation in the auraflow docs.

* include speedups.
2025-04-16 11:12:09 +05:30
Sayak Paul 7212f35de2 [single file] enable telemetry for single file loading when using GGUF. (#11284)
* enable telemetry for single file loading when using GGUF.

* quality
2025-04-16 08:33:52 +05:30
Sayak Paul 3252d7ad11 unpin torch versions for onnx Dockerfile (#11290)
unpin torch versions for onnx
2025-04-16 08:16:38 +05:30
Dhruv Nair b316104ddd Fix Hunyuan I2V for transformers>4.47.1 (#11293)
* update

* update
2025-04-16 07:53:32 +05:30
Álvaro Somoza d3b2699a7f another fix for FlowMatchLCMScheduler forgotten import (#11330)
fix
2025-04-15 13:53:16 -10:00
Sayak Paul 4b868f14c1 post release 0.33.0 (#11255)
* post release

* update

* fix deprecations

* remaining

* update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-15 06:50:08 -10:00
AstraliteHeart b6156aafe9 Rewrite AuraFlowPatchEmbed.pe_selection_index_based_on_dim to be torch.compile compatible (#11297)
* Update pe_selection_index_based_on_dim

* Make pe_selection_index_based_on_dim work with torh.compile

* Fix AuraFlowTransformer2DModel's dpcstring default values

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-15 19:30:30 +05:30
Sayak Paul 7ecfe29160 [docs] fix hidream docstrings. (#11325)
* fix hidream docstrings.

* fix

* empty commit
2025-04-15 16:26:21 +05:30
Yao Matrix 7edace9a05 fix CPU offloading related fail cases on XPU (#11288)
* fix CPU offloading related fail cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Apply style fixes

* trigger tests

* test_pipe_same_device_id_offload

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-04-15 09:06:56 +01:00
hlky 6e80d240d3 Fix vae.Decoder prev_output_channel (#11280) 2025-04-15 08:03:50 +01:00
Hameer Abbasi 9352a5ca56 [LoRA] Add LoRA support to AuraFlow (#10216)
* Add AuraFlowLoraLoaderMixin

* Add comments, remove qkv fusion

* Add Tests

* Add AuraFlowLoraLoaderMixin to documentation

* Add Suggested changes

* Change attention_kwargs->joint_attention_kwargs

* Rebasing derp.

* fix

* fix

* Quality fixes.

* make style

* `make fix-copies`

* `ruff check --fix`

* Attept 1 to fix tests.

* Attept 2 to fix tests.

* Attept 3 to fix tests.

* Address review comments.

* Rebasing derp.

* Get more tests passing by copying from Flux. Address review comments.

* `joint_attention_kwargs`->`attention_kwargs`

* Add `lora_scale` property for te LoRAs.

* Make test better.

* Remove useless property.

* Skip TE-only tests for AuraFlow.

* Support LoRA for non-CLIP TEs.

* Restore LoRA tests.

* Undo adding LoRA support for non-CLIP TEs.

* Undo support for TE in AuraFlow LoRA.

* `make fix-copies`

* Sync with upstream changes.

* Remove unneeded stuff.

* Mirror `Lumina2`.

* Skip for MPS.

* Address review comments.

* Remove duplicated code.

* Remove unnecessary code.

* Remove repeated docs.

* Propagate attention.

* Fix TE target modules.

* MPS fix for LoRA tests.

* Unrelated TE LoRA tests fix.

* Fix AuraFlow LoRA tests by applying to the right denoiser layers.

Co-authored-by: AstraliteHeart <81396681+AstraliteHeart@users.noreply.github.com>

* Apply style fixes

* empty commit

* Fix the repo consistency issues.

* Remove unrelated changes.

* Style.

* Fix `test_lora_fuse_nan`.

* fix quality issues.

* `pytest.xfail` -> `ValueError`.

* Add back `skip_mps`.

* Apply style fixes

* `make fix-copies`

---------

Co-authored-by: Warlord-K <warlordk28@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: AstraliteHeart <81396681+AstraliteHeart@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-15 10:41:28 +05:30
Sayak Paul cefa28f449 [docs] Promote AutoModel usage (#11300)
* docs: promote the usage of automodel.

* bitsandbytes

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-15 09:25:40 +05:30
Beinsezii 8819cda6c0 Add skrample section to community_projects.md (#11319)
Update community_projects.md

https://github.com/huggingface/diffusers/discussions/11158#discussioncomment-12681691
2025-04-14 12:12:59 -10:00
hlky dcf836cf47 Use float32 on mps or npu in transformer_hidream_image's rope (#11316) 2025-04-14 20:19:21 +01:00
Álvaro Somoza 1cb73cb19f import for FlowMatchLCMScheduler (#11318)
* add

* fix-copies
2025-04-14 06:28:57 -10:00
Linoy Tsaban ba6008abfe [HiDream] code example (#11317) 2025-04-14 16:19:30 +01:00
Sayak Paul a8f5134c11 [LoRA] support more SDXL loras. (#11292)
* support more SDXL loras.

* update

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-04-14 17:09:59 +05:30
Fanli Lin c7f2d239fe make KolorsPipelineFastTests::test_inference_batch_single_identical pass on XPU (#11313)
adjust diff
2025-04-14 11:02:02 +01:00
Yao Matrix fa1ac50a66 make test_stable_diffusion_karras_sigmas pass on XPU (#11310)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-14 08:15:38 +01:00
Yao Matrix aa541b9fab make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308)
loose expected_max_diff from 5e-1 to 8e-1 to make
KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference
pass on XPU

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-14 07:49:20 +01:00
Ishan Modi f1f38ffbee [ControlNet] Adds controlnet for SanaTransformer (#11040)
* added controlnet for sana transformer

* improve code quality

* addressed PR comments

* bug fixes

* added test cases

* update

* added dummy objects

* addressed PR comments

* update

* Forcing update

* add to docs

* code quality

* addressed PR comments

* addressed PR comments

* update

* addressed PR comments

* added proper styling

* update

* Revert "added proper styling"

This reverts commit 344ee8a701.

* manually ordered

* Apply suggestions from code review

---------

Co-authored-by: Aryan <contact.aryanvs@gmail.com>
2025-04-13 19:19:39 +05:30
Tuna Tuncer 36538e1135 Fix incorrect tile_latent_min_width calculations (#11305) 2025-04-13 18:21:50 +05:30
Aryan 97e0ef4db4 Hidream refactoring follow ups (#11299)
* HiDream Image

* update

* -einops

* py3.8

* fix -einops

* mixins, offload_seq, option_components

* docs

* Apply style fixes

* trigger tests

* Apply suggestions from code review

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* joint_attention_kwargs -> attention_kwargs, fixes

* fast tests

* -_init_weights

* style tests

* move reshape logic

* update slice 😴

* supports_dduf

* 🤷🏻‍♂️

* Update src/diffusers/models/transformers/transformer_hidream_image.py

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* address review comments

* update tests

* doc updates

* update

* Update src/diffusers/models/transformers/transformer_hidream_image.py

* Apply style fixes

---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-13 09:45:19 +05:30
Adrien B ed41db8525 Update autoencoderkl_allegro.md (#11303)
Correction typo
2025-04-13 09:41:30 +05:30
Nikita Starodubcev ec0b2b3947 flow matching lcm scheduler (#11170)
* add flow matching lcm scheduler
* stochastic sampling
* upscaling for scale-wise generation

* Apply style fixes

* Apply suggestions from code review

Co-authored-by: hlky <hlky@hlky.ac>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-04-12 11:14:57 -10:00
hlky 0ef29355c9 HiDream Image (#11231)
* HiDream Image


---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-04-11 06:31:34 -10:00
Tuna Tuncer bc261058ee Fix incorrect tile_latent_min_width calculation in AutoencoderKLMochi (#11294) 2025-04-11 19:11:08 +05:30
Sayak Paul 7054a34978 do not use DIFFUSERS_REQUEST_TIMEOUT for notification bot (#11273)
fix to a constant
2025-04-11 14:19:46 +05:30
Sayak Paul 511d738121 [CI] relax tolerance for unclip further (#11268)
relax tolerance for unclip further.
2025-04-11 14:06:52 +05:30
Sayak Paul ea5a6a8b7c [Tests] Cleanup lora tests utils (#11276)
* start cleaning up lora test utils for reusability

* update

* updates

* updates
2025-04-10 15:50:34 +05:30
hlky b8093e6665 Fix LTX 0.9.5 single file (#11271) 2025-04-10 07:06:13 +01:00
Yuqian Hong e121d0ef67 [BUG] Fix convert_vae_pt_to_diffusers bug (#11078)
* fix attention

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-10 06:59:45 +01:00
Yao Matrix 31c4f24fc1 make test_instant_style_multiple_masks pass on XPU (#11266)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-10 06:23:00 +01:00
xieofxie 0efdf411fb add onnxruntime-qnn & onnxruntime-cann (#11269)
Co-authored-by: hualxie <hualxie@microsoft.com>
2025-04-10 06:00:23 +01:00
Yao Matrix 450dc48a2c make test_dict_tuple_outputs_equivalent pass on XPU (#11265)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-10 05:56:28 +01:00
Yao Matrix 77b4f66b9e make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-10 05:55:54 +01:00
Yao Matrix 68663f8a17 fix test_vanilla_funetuning failure on XPU and A100 (#11263)
* fix test_vanilla_funetuning failure on XPU and A100

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* change back to 5e-2

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-10 05:55:07 +01:00
Sayak Paul ffda8735be [LoRA] support musubi wan loras. (#11243)
* support musubi wan loras.

* Update src/diffusers/loaders/lora_conversion_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* support i2v loras from musubi too.

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-04-10 09:50:22 +05:30
YiYi Xu 0706786e53 fix wan ftfy import (#11262) 2025-04-09 09:08:34 -10:00
Sayak Paul 5b27f8aba8 fix consisid imports (#11254)
* fix consisid imports

* fix opencv import

* fix
2025-04-09 18:49:32 +05:30
Sayak Paul d1387ecee5 fix timeout constant (#11252)
* fix timeout constant

* style

* fix
2025-04-09 17:48:52 +05:30
Ilya Drobyshevskiy 6a7c2d0afa fix flux controlnet bug (#11152)
Before this if txt_ids was 3d tensor, line with txt_ids[:1] concat txt_ids by batch dim. Now we first check that txt_ids is 2d tensor (or take first batch element) and then concat by token dim
2025-04-09 13:01:07 +01:00
Dhruv Nair edc154da09 Update Ruff to latest Version (#10919)
* update

* update

* update

* update
2025-04-09 16:51:34 +05:30
hlky 552cd32058 [docs] AutoModel (#11250)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-09 16:42:23 +05:30
Yao Matrix c36c745ceb fix FluxReduxSlowTests::test_flux_redux_inference case failure on XPU (#11245)
* loose test_float16_inference's tolerance from 5e-2 to 6e-2, so XPU can
pass UT

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix test_pipeline_flux_redux fail on XPU

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-09 11:41:15 +01:00
hlky 437cb36c65 AutoModel (#11115)
* AutoModel

* ...

* lol

* ...

* add test

* update

* make fix-copies

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-04-09 15:20:07 +05:30
hlky 9ee3dd3862 AudioLDM2 Fixes (#11244) 2025-04-09 14:12:00 +05:30
Sayak Paul fd02aad402 fix: SD3 ControlNet validation so that it runs on a A100. (#11238)
* fix: SD3 ControlNet validation so that it runs on a A100.

* use backend-agnostic cache and pass devide.
2025-04-09 12:12:53 +05:30
Sayak Paul 6bfacf0418 [LoRA] support more comyui loras for Flux 🚨 (#10985)
* support more comyui loras.

* fix

* fixes

* revert changes in LoRA base.

* no position_embedding

* 🚨 introduce a breaking change to let peft handle module ambiguity

* styling

* remove position embeddings.

* improvements.

* style

* make info instead of NotImplementedError

* Update src/diffusers/loaders/peft.py

Co-authored-by: hlky <hlky@hlky.ac>

* add example.

* robust checks

* updates

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-04-09 09:17:05 +05:30
Sayak Paul f685981ed0 [docs] minor updates to dtype map docs. (#11237)
minor updates to dtype map docs.
2025-04-09 08:38:17 +05:30
Sayak Paul b924251dd8 minor update to sana sprint docs. (#11236) 2025-04-09 08:17:45 +05:30
Sayak Paul 1a04812439 [bistandbytes] improve replacement warnings for bnb (#11132)
* improve replacement warnings for bnb

* updates to docs.
2025-04-08 21:18:34 +05:30
Sayak Paul 4b27c4a494 [feat] implement record_stream when using CUDA streams during group offloading (#11081)
* implement record_stream for better performance.

* fix

* style.

* merge #11097

* Update src/diffusers/hooks/group_offloading.py

Co-authored-by: Aryan <aryan@huggingface.co>

* fixes

* docstring.

* remaining todos in low_cpu_mem_usage

* tests

* updates to docs.

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2025-04-08 21:17:49 +05:30
hlky 5d49b3e83b Flux quantized with lora (#10990)
* Flux quantized with lora

* fix

* changes

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Apply style fixes

* enable model cpu offload()

* Update src/diffusers/loaders/lora_pipeline.py

Co-authored-by: hlky <hlky@hlky.ac>

* update

* Apply suggestions from code review

* update

* add peft as an additional dependency for gguf

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-04-08 21:17:03 +05:30
Linoy Tsaban 71f34fc5a4 [Flux LoRA] fix issues in flux lora scripts (#11111)
* remove custom scheduler

* update requirements.txt

* log_validation with mixed precision

* add intermediate embeddings saving when checkpointing is enabled

* remove comment

* fix validation

* add unwrap_model for accelerator, torch.no_grad context for validation, fix accelerator.accumulate call in advanced script

* revert unwrap_model change temp

* add .module to address distributed training bug + replace accelerator.unwrap_model with unwrap model

* changes to align advanced script with canonical script

* make changes for distributed training + unify unwrap_model calls in advanced script

* add module.dtype fix to dreambooth script

* unify unwrap_model calls in dreambooth script

* fix condition in validation run

* mixed precision

* Update examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* smol style change

* change autocast

* Apply style fixes

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-08 17:40:30 +03:00
Yao Matrix c51b6bd837 introduce compute arch specific expectations and fix test_sd3_img2img_inference failure (#11227)
* add arch specfic expectations support, to support different arch's numerical characteristics

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix typo

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Apply suggestions from code review

* Apply style fixes

* Update src/diffusers/utils/testing_utils.py

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-08 14:57:49 +01:00
Benjamin Bossan fb54499614 [LoRA] Implement hot-swapping of LoRA (#9453)
* [WIP][LoRA] Implement hot-swapping of LoRA

This PR adds the possibility to hot-swap LoRA adapters. It is WIP.

Description

As of now, users can already load multiple LoRA adapters. They can
offload existing adapters or they can unload them (i.e. delete them).
However, they cannot "hotswap" adapters yet, i.e. substitute the weights
from one LoRA adapter with the weights of another, without the need to
create a separate LoRA adapter.

Generally, hot-swapping may not appear not super useful but when the
model is compiled, it is necessary to prevent recompilation. See #9279
for more context.

Caveats

To hot-swap a LoRA adapter for another, these two adapters should target
exactly the same layers and the "hyper-parameters" of the two adapters
should be identical. For instance, the LoRA alpha has to be the same:
Given that we keep the alpha from the first adapter, the LoRA scaling
would be incorrect for the second adapter otherwise.

Theoretically, we could override the scaling dict with the alpha values
derived from the second adapter's config, but changing the dict will
trigger a guard for recompilation, defeating the main purpose of the
feature.

I also found that compilation flags can have an impact on whether this
works or not. E.g. when passing "reduce-overhead", there will be errors
of the type:

> input name: arg861_1. data pointer changed from 139647332027392 to
139647331054592

I don't know enough about compilation to determine whether this is
problematic or not.

Current state

This is obviously WIP right now to collect feedback and discuss which
direction to take this. If this PR turns out to be useful, the
hot-swapping functions will be added to PEFT itself and can be imported
here (or there is a separate copy in diffusers to avoid the need for a
min PEFT version to use this feature).

Moreover, more tests need to be added to better cover this feature,
although we don't necessarily need tests for the hot-swapping
functionality itself, since those tests will be added to PEFT.

Furthermore, as of now, this is only implemented for the unet. Other
pipeline components have yet to implement this feature.

Finally, it should be properly documented.

I would like to collect feedback on the current state of the PR before
putting more time into finalizing it.

* Reviewer feedback

* Reviewer feedback, adjust test

* Fix, doc

* Make fix

* Fix for possible g++ error

* Add test for recompilation w/o hotswapping

* Make hotswap work

Requires https://github.com/huggingface/peft/pull/2366

More changes to make hotswapping work. Together with the mentioned PEFT
PR, the tests pass for me locally.

List of changes:

- docstring for hotswap
- remove code copied from PEFT, import from PEFT now
- adjustments to PeftAdapterMixin.load_lora_adapter (unfortunately, some
  state dict renaming was necessary, LMK if there is a better solution)
- adjustments to UNet2DConditionLoadersMixin._process_lora: LMK if this
  is even necessary or not, I'm unsure what the overall relationship is
  between this and PeftAdapterMixin.load_lora_adapter
- also in UNet2DConditionLoadersMixin._process_lora, I saw that there is
  no LoRA unloading when loading the adapter fails, so I added it
  there (in line with what happens in PeftAdapterMixin.load_lora_adapter)
- rewritten tests to avoid shelling out, make the test more precise by
  making sure that the outputs align, parametrize it
- also checked the pipeline code mentioned in this comment:
  https://github.com/huggingface/diffusers/pull/9453#issuecomment-2418508871;
  when running this inside the with
  torch._dynamo.config.patch(error_on_recompile=True) context, there is
  no error, so I think hotswapping is now working with pipelines.

* Address reviewer feedback:

- Revert deprecated method
- Fix PEFT doc link to main
- Don't use private function
- Clarify magic numbers
- Add pipeline test

Moreover:
- Extend docstrings
- Extend existing test for outputs != 0
- Extend existing test for wrong adapter name

* Change order of test decorators

parameterized.expand seems to ignore skip decorators if added in last
place (i.e. innermost decorator).

* Split model and pipeline tests

Also increase test coverage by also targeting conv2d layers (support of
which was added recently on the PEFT PR).

* Reviewer feedback: Move decorator to test classes

... instead of having them on each test method.

* Apply suggestions from code review

Co-authored-by: hlky <hlky@hlky.ac>

* Reviewer feedback: version check, TODO comment

* Add enable_lora_hotswap method

* Reviewer feedback: check _lora_loadable_modules

* Revert changes in unet.py

* Add possibility to ignore enabled at wrong time

* Fix docstrings

* Log possible PEFT error, test

* Raise helpful error if hotswap not supported

I.e. for the text encoder

* Formatting

* More linter

* More ruff

* Doc-builder complaint

* Update docstring:

- mention no text encoder support yet
- make it clear that LoRA is meant
- mention that same adapter name should be passed

* Fix error in docstring

* Update more methods with hotswap argument

- SDXL
- SD3
- Flux

No changes were made to load_lora_into_transformer.

* Add hotswap argument to load_lora_into_transformer

For SD3 and Flux. Use shorter docstring for brevity.

* Extend docstrings

* Add version guards to tests

* Formatting

* Fix LoRA loading call to add prefix=None

See:
https://github.com/huggingface/diffusers/pull/10187#issuecomment-2717571064

* Run make fix-copies

* Add hot swap documentation to the docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-08 17:05:31 +05:30
Álvaro Somoza 723dbdd363 [Training] Better image interpolation in training scripts (#11206)
* initial

* Update examples/dreambooth/train_dreambooth_lora_sdxl.py

Co-authored-by: hlky <hlky@hlky.ac>

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-04-08 12:26:07 +05:30
Bhavay Malhotra fbf61f465b [train_controlnet.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env (#8461)
* Create diffusers.yml

* fix num_train_epochs

* Delete diffusers.yml

* Fixed Changes

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-08 12:10:09 +05:30
Inigo Goiri 841504bb1a Add support to pass image embeddings to the WAN I2V pipeline. (#11175)
* Add support to pass image embeddings to the pipeline.



---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-07 15:47:06 -10:00
Steven Liu fc7a867ae5 [docs] MPS update (#11212)
mps
2025-04-07 14:32:27 -10:00
alex choi 5ded26cdc7 ensure dtype match between diffused latents and vae weights (#8391) 2025-04-07 12:59:10 -10:00
Yao Matrix 506f39af3a enable 1 case on XPU (#11219)
enable case on XPU: 1. tests/quantization/bnb/test_mixed_int8.py::BnB8bitTrainingTests::test_training

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-07 08:24:21 +01:00
Mikko Tukiainen 8ad68c1393 Add missing MochiEncoder3D.gradient_checkpointing attribute (#11146)
* Add missing 'gradient_checkpointing = False' attr

* Add (limited) tests for Mochi autoencoder

* Apply style fixes

* pass 'conv_cache' as arg instead of kwarg

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-06 02:46:45 +05:30
Edna 41afb6690c Add Wan with STG as a community pipeline (#11184)
* Add stg wan to community pipelines

* remove debug prints

* remove unused comment

* Update doc

* Add credit + fix typo

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-05 04:00:40 +02:00
Tolga Cangöz 13e48492f0 [LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning (#11174)
* Refactor `LTXConditionPipeline` to add text-only conditioning

* style

* up

* Refactor `LTXConditionPipeline` to streamline condition handling and improve clarity

* Improve condition checks

* Simplify latents handling based on conditioning type

* Refactor rope_interpolation_scale preparation for clarity and efficiency

* Update LTXConditionPipeline docstring to clarify supported input types

* Add LTX Video 0.9.5 model to documentation

* Clarify documentation to indicate support for text-only conditioning without passing `conditions`

* refactor: comment out unused parameters in LTXConditionPipeline

* fix: restore previously commented parameters in LTXConditionPipeline

* fix: remove unused parameters from LTXConditionPipeline

* refactor: remove unnecessary lines in LTXConditionPipeline
2025-04-04 16:43:15 +02:00
Suprhimp 94f2c48d58 [feat]Add strength in flux_fill pipeline (denoising strength for fluxfill) (#10603)
* [feat]add strength in flux_fill pipeline

* Update src/diffusers/pipelines/flux/pipeline_flux_fill.py

* Update src/diffusers/pipelines/flux/pipeline_flux_fill.py

* Update src/diffusers/pipelines/flux/pipeline_flux_fill.py

* [refactor] refactor after review

* [fix] change comment

* Apply style fixes

* empty

* fix

* update prepare_latents from flux.img2img pipeline

* style

* Update src/diffusers/pipelines/flux/pipeline_flux_fill.py

---------
2025-04-04 11:23:30 -03:00
Dhruv Nair aabf8ce20b Fix Single File loading for LTX VAE (#11200)
update
2025-04-04 18:02:39 +05:30
Kenneth Gerald Hamilton f10775b1b5 Fixed requests.get function call by adding timeout parameter. (#11156)
* Fixed requests.get function call by adding timeout parameter.

* declare DIFFUSERS_REQUEST_TIMEOUT in constants and import when needed

* remove unneeded os import

* Apply style fixes

---------

Co-authored-by: Sai-Suraj-27 <sai.suraj.27.729@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-04 07:23:14 +01:00
célina 6edb774b5e Update Style Bot workflow (#11202)
update style bot workflow
2025-04-03 19:31:49 +02:00
Basile Lewandowski 480510ada9 Change KolorsPipeline LoRA Loader to StableDiffusion (#11198)
Change LoRA Loader to StableDiffusion

Replace the SDXL LoRA Loader Mixin inheritance with the StableDiffusion one
2025-04-03 11:21:11 -03:00
Abhipsha Das d9023a671a [Model Card] standardize advanced diffusion training sdxl lora (#7615)
* model card gen code

* push modelcard creation

* remove optional from params

* add import

* add use_dora check

* correct lora var use in tags

* make style && make quality

---------

Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-03 07:43:01 +05:30
Eliseu Silva c4646a3931 feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (#11188)
* feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline for Image SR.

* added pipeline
2025-04-02 11:33:19 -10:00
Dhruv Nair c97b709afa Add CacheMixin to Wan and LTX Transformers (#11187)
* update

* update

* update
2025-04-02 10:16:31 -10:00
lakshay sharma b0ff822ed3 Update import_utils.py (#10329)
added onnxruntime-vitisai for custom build onnxruntime pkg
2025-04-02 20:47:10 +01:00
hlky 78c2fdc52e SchedulerMixin from_pretrained and ConfigMixin Self type annotation (#11192) 2025-04-02 08:24:02 -10:00
hlky 54dac3a87c Fix enable_sequential_cpu_offload in CogView4Pipeline (#11195)
* Fix enable_sequential_cpu_offload in CogView4Pipeline

* make fix-copies
2025-04-02 16:51:23 +01:00
hlky e5c6027ef8 [docs] torch_dtype map (#11194) 2025-04-02 12:46:28 +01:00
hlky da857bebb6 Revert save_model in ModelMixin save_pretrained and use safe_serialization=False in test (#11196) 2025-04-02 12:45:36 +01:00
Fanli Lin 52b460feb9 [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (#11197)
* add xpu part

* fix more cases

* remove some cases

* no canny

* format fix
2025-04-02 12:45:02 +01:00
hlky d8c617ccb0 allow models to run with a user-provided dtype map instead of a single dtype (#10301)
* allow models to run with a user-provided dtype map instead of a single dtype

* make style

* Add warning, change `_` to `default`

* make style

* add test

* handle shared tensors

* remove warning

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-02 09:05:46 +01:00
Bruno Magalhaes fe2b397426 remove unnecessary call to F.pad (#10620)
* rewrite memory count without implicitly using dimensions by @ic-synth

* replace F.pad by built-in padding in Conv3D

* in-place sums to reduce memory allocations

* fixed trailing whitespace

* file reformatted

* in-place sums

* simpler in-place expressions

* removed in-place sum, may affect backward propagation logic

* removed in-place sum, may affect backward propagation logic

* removed in-place sum, may affect backward propagation logic

* reverted change
2025-04-02 08:19:51 +01:00
Eliseu Silva be0b7f55cc fix: for checking mandatory and optional pipeline components (#11189)
fix: optional componentes verification on load
2025-04-02 08:07:24 +01:00
jiqing-feng 4d5a96e40a fix autocast (#11190)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-02 07:26:27 +01:00
Yao Matrix a7f07c1ef5 map BACKEND_RESET_MAX_MEMORY_ALLOCATED to reset_peak_memory_stats on XPU (#11191)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-02 07:25:48 +01:00
1203 changed files with 18831 additions and 7202 deletions
+1 -1
View File
@@ -417,7 +417,7 @@ jobs:
additional_deps: ["peft"]
- backend: "gguf"
test_location: "gguf"
additional_deps: []
additional_deps: ["peft"]
- backend: "torchao"
test_location: "torchao"
additional_deps: []
-34
View File
@@ -13,39 +13,5 @@ jobs:
uses: huggingface/huggingface_hub/.github/workflows/style-bot-action.yml@main
with:
python_quality_dependencies: "[quality]"
pre_commit_script_name: "Download and Compare files from the main branch"
pre_commit_script: |
echo "Downloading the files from the main branch"
curl -o main_Makefile https://raw.githubusercontent.com/huggingface/diffusers/main/Makefile
curl -o main_setup.py https://raw.githubusercontent.com/huggingface/diffusers/refs/heads/main/setup.py
curl -o main_check_doc_toc.py https://raw.githubusercontent.com/huggingface/diffusers/refs/heads/main/utils/check_doc_toc.py
echo "Compare the files and raise error if needed"
diff_failed=0
if ! diff -q main_Makefile Makefile; then
echo "Error: The Makefile has changed. Please ensure it matches the main branch."
diff_failed=1
fi
if ! diff -q main_setup.py setup.py; then
echo "Error: The setup.py has changed. Please ensure it matches the main branch."
diff_failed=1
fi
if ! diff -q main_check_doc_toc.py utils/check_doc_toc.py; then
echo "Error: The utils/check_doc_toc.py has changed. Please ensure it matches the main branch."
diff_failed=1
fi
if [ $diff_failed -eq 1 ]; then
echo "❌ Error happened as we detected changes in the files that should not be changed ❌"
exit 1
fi
echo "No changes in the files. Proceeding..."
rm -rf main_Makefile main_setup.py main_check_doc_toc.py
style_command: "make style && make quality"
secrets:
bot_token: ${{ secrets.GITHUB_TOKEN }}
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+3 -3
View File
@@ -28,9 +28,9 @@ ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3 -m uv pip install --no-cache-dir \
torch==2.1.2 \
torchvision==0.16.2 \
torchaudio==2.1.2 \
torch \
torchvision \
torchaudio\
onnxruntime \
--extra-index-url https://download.pytorch.org/whl/cpu && \
python3 -m uv pip install --no-cache-dir \
+1 -1
View File
@@ -1,5 +1,5 @@
<!---
Copyright 2024- The HuggingFace Team. All rights reserved.
Copyright 2025- The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+43 -33
View File
@@ -175,7 +175,7 @@
title: gguf
- local: quantization/torchao
title: torchao
- local: quantization/quanto
- local: quantization/quanto
title: quanto
title: Quantization Methods
- sections:
@@ -265,19 +265,23 @@
sections:
- local: api/models/overview
title: Overview
- local: api/models/auto_model
title: AutoModel
- sections:
- local: api/models/controlnet
title: ControlNetModel
- local: api/models/controlnet_union
title: ControlNetUnionModel
- local: api/models/controlnet_flux
title: FluxControlNetModel
- local: api/models/controlnet_hunyuandit
title: HunyuanDiT2DControlNetModel
- local: api/models/controlnet_sana
title: SanaControlNetModel
- local: api/models/controlnet_sd3
title: SD3ControlNetModel
- local: api/models/controlnet_sparsectrl
title: SparseControlNetModel
- local: api/models/controlnet_union
title: ControlNetUnionModel
title: ControlNets
- sections:
- local: api/models/allegro_transformer3d
@@ -286,30 +290,32 @@
title: AuraFlowTransformer2DModel
- local: api/models/cogvideox_transformer3d
title: CogVideoXTransformer3DModel
- local: api/models/consisid_transformer3d
title: ConsisIDTransformer3DModel
- local: api/models/cogview3plus_transformer2d
title: CogView3PlusTransformer2DModel
- local: api/models/cogview4_transformer2d
title: CogView4Transformer2DModel
- local: api/models/consisid_transformer3d
title: ConsisIDTransformer3DModel
- local: api/models/dit_transformer2d
title: DiTTransformer2DModel
- local: api/models/easyanimate_transformer3d
title: EasyAnimateTransformer3DModel
- local: api/models/flux_transformer
title: FluxTransformer2DModel
- local: api/models/hidream_image_transformer
title: HiDreamImageTransformer2DModel
- local: api/models/hunyuan_transformer2d
title: HunyuanDiT2DModel
- local: api/models/hunyuan_video_transformer_3d
title: HunyuanVideoTransformer3DModel
- local: api/models/latte_transformer3d
title: LatteTransformer3DModel
- local: api/models/lumina_nextdit2d
title: LuminaNextDiT2DModel
- local: api/models/lumina2_transformer2d
title: Lumina2Transformer2DModel
- local: api/models/ltx_video_transformer3d
title: LTXVideoTransformer3DModel
- local: api/models/lumina2_transformer2d
title: Lumina2Transformer2DModel
- local: api/models/lumina_nextdit2d
title: LuminaNextDiT2DModel
- local: api/models/mochi_transformer3d
title: MochiTransformer3DModel
- local: api/models/omnigen_transformer
@@ -318,10 +324,10 @@
title: PixArtTransformer2DModel
- local: api/models/prior_transformer
title: PriorTransformer
- local: api/models/sd3_transformer2d
title: SD3Transformer2DModel
- local: api/models/sana_transformer2d
title: SanaTransformer2DModel
- local: api/models/sd3_transformer2d
title: SD3Transformer2DModel
- local: api/models/stable_audio_transformer
title: StableAudioDiTModel
- local: api/models/transformer2d
@@ -336,10 +342,10 @@
title: StableCascadeUNet
- local: api/models/unet
title: UNet1DModel
- local: api/models/unet2d
title: UNet2DModel
- local: api/models/unet2d-cond
title: UNet2DConditionModel
- local: api/models/unet2d
title: UNet2DModel
- local: api/models/unet3d-cond
title: UNet3DConditionModel
- local: api/models/unet-motion
@@ -348,6 +354,10 @@
title: UViT2DModel
title: UNets
- sections:
- local: api/models/asymmetricautoencoderkl
title: AsymmetricAutoencoderKL
- local: api/models/autoencoder_dc
title: AutoencoderDC
- local: api/models/autoencoderkl
title: AutoencoderKL
- local: api/models/autoencoderkl_allegro
@@ -364,10 +374,6 @@
title: AutoencoderKLMochi
- local: api/models/autoencoder_kl_wan
title: AutoencoderKLWan
- local: api/models/asymmetricautoencoderkl
title: AsymmetricAutoencoderKL
- local: api/models/autoencoder_dc
title: AutoencoderDC
- local: api/models/consistency_decoder_vae
title: ConsistencyDecoderVAE
- local: api/models/autoencoder_oobleck
@@ -420,6 +426,8 @@
title: ControlNet with Stable Diffusion 3
- local: api/pipelines/controlnet_sdxl
title: ControlNet with Stable Diffusion XL
- local: api/pipelines/controlnet_sana
title: ControlNet-Sana
- local: api/pipelines/controlnetxs
title: ControlNet-XS
- local: api/pipelines/controlnetxs_sdxl
@@ -444,6 +452,8 @@
title: Flux
- local: api/pipelines/control_flux_inpaint
title: FluxControlInpaint
- local: api/pipelines/hidream
title: HiDream-I1
- local: api/pipelines/hunyuandit
title: Hunyuan-DiT
- local: api/pipelines/hunyuan_video
@@ -511,40 +521,40 @@
- sections:
- local: api/pipelines/stable_diffusion/overview
title: Overview
- local: api/pipelines/stable_diffusion/text2img
title: Text-to-image
- local: api/pipelines/stable_diffusion/depth2img
title: Depth-to-image
- local: api/pipelines/stable_diffusion/gligen
title: GLIGEN (Grounded Language-to-Image Generation)
- local: api/pipelines/stable_diffusion/image_variation
title: Image variation
- local: api/pipelines/stable_diffusion/img2img
title: Image-to-image
- local: api/pipelines/stable_diffusion/svd
title: Image-to-video
- local: api/pipelines/stable_diffusion/inpaint
title: Inpainting
- local: api/pipelines/stable_diffusion/depth2img
title: Depth-to-image
- local: api/pipelines/stable_diffusion/image_variation
title: Image variation
- local: api/pipelines/stable_diffusion/k_diffusion
title: K-Diffusion
- local: api/pipelines/stable_diffusion/latent_upscale
title: Latent upscaler
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
- local: api/pipelines/stable_diffusion/stable_diffusion_safe
title: Safe Stable Diffusion
- local: api/pipelines/stable_diffusion/sdxl_turbo
title: SDXL Turbo
- local: api/pipelines/stable_diffusion/stable_diffusion_2
title: Stable Diffusion 2
- local: api/pipelines/stable_diffusion/stable_diffusion_3
title: Stable Diffusion 3
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
title: Stable Diffusion XL
- local: api/pipelines/stable_diffusion/sdxl_turbo
title: SDXL Turbo
- local: api/pipelines/stable_diffusion/latent_upscale
title: Latent upscaler
- local: api/pipelines/stable_diffusion/upscale
title: Super-resolution
- local: api/pipelines/stable_diffusion/k_diffusion
title: K-Diffusion
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
- local: api/pipelines/stable_diffusion/adapter
title: T2I-Adapter
- local: api/pipelines/stable_diffusion/gligen
title: GLIGEN (Grounded Language-to-Image Generation)
- local: api/pipelines/stable_diffusion/text2img
title: Text-to-image
title: Stable Diffusion
- local: api/pipelines/stable_unclip
title: Stable unCLIP
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+15 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -20,10 +20,13 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
- [`AuraFlowLoraLoaderMixin`] provides similar functions for [AuraFlow](https://huggingface.co/fal/AuraFlow).
- [`LTXVideoLoraLoaderMixin`] provides similar functions for [LTX-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).
- [`SanaLoraLoaderMixin`] provides similar functions for [Sana](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana).
- [`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
- [`Lumina2LoraLoaderMixin`] provides similar functions for [Lumina2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina2).
- [`WanLoraLoaderMixin`] provides similar functions for [Wan](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan).
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
@@ -56,6 +59,9 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
## Mochi1LoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin
## AuraFlowLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.AuraFlowLoraLoaderMixin
## LTXVideoLoraLoaderMixin
@@ -73,6 +79,14 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
[[autodoc]] loaders.lora_pipeline.Lumina2LoraLoaderMixin
## CogView4LoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.CogView4LoraLoaderMixin
## WanLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin
## AmusedLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+29
View File
@@ -0,0 +1,29 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AutoModel
The `AutoModel` is designed to make it easy to load a checkpoint without needing to know the specific model class. `AutoModel` automatically retrieves the correct model class from the checkpoint `config.json` file.
```python
from diffusers import AutoModel, AutoPipelineForText2Image
unet = AutoModel.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet")
pipe = AutoPipelineForText2Image.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", unet=unet)
```
## AutoModel
[[autodoc]] AutoModel
- all
- from_pretrained
+1 -1
View File
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLAllegro
vae = AutoencoderKLCogVideoX.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
```
## AutoencoderKLAllegro
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team and The InstantX Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -0,0 +1,29 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# SanaControlNetModel
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
The abstract from the paper is:
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
This model was contributed by [ishan24](https://huggingface.co/ishan24). ❤️
The original codebase can be found at [NVlabs/Sana](https://github.com/NVlabs/Sana), and you can find official ControlNet checkpoints on [Efficient-Large-Model's](https://huggingface.co/Efficient-Large-Model) Hub profile.
## SanaControlNetModel
[[autodoc]] SanaControlNetModel
## SanaControlNetOutput
[[autodoc]] models.controlnets.controlnet_sana.SanaControlNetOutput
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team and The InstantX Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team and The InstantX Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -0,0 +1,30 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# HiDreamImageTransformer2DModel
A Transformer model for image-like data from [HiDream-I1](https://huggingface.co/HiDream-ai).
The model can be loaded with the following code snippet.
```python
from diffusers import HiDreamImageTransformer2DModel
transformer = HiDreamImageTransformer2DModel.from_pretrained("HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## HiDreamImageTransformer2DModel
[[autodoc]] HiDreamImageTransformer2DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+18 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -89,6 +89,23 @@ image = pipeline(prompt).images[0]
image.save("auraflow.png")
```
## Support for `torch.compile()`
AuraFlow can be compiled with `torch.compile()` to speed up inference latency even for different resolutions. First, install PyTorch nightly following the instructions from [here](https://pytorch.org/). The snippet below shows the changes needed to enable this:
```diff
+ torch.fx.experimental._config.use_duck_shape = False
+ pipeline.transformer = torch.compile(
pipeline.transformer, fullgraph=True, dynamic=True
)
```
Specifying `use_duck_shape` to be `False` instructs the compiler if it should use the same symbolic variable to represent input sizes that are the same. For more details, check out [this comment](https://github.com/huggingface/diffusers/pull/11327#discussion_r2047659790).
This enables from 100% (on low resolutions) to a 30% (on 1536x1536 resolution) speed improvements.
Thanks to [AstraliteHeart](https://github.com/huggingface/diffusers/pull/11297/) who helped us rewrite the [`AuraFlowTransformer2DModel`] class so that the above works for different resolutions ([PR](https://github.com/huggingface/diffusers/pull/11297/)).
## AuraFlowPipeline
[[autodoc]] AuraFlowPipeline
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team, The Black Forest Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team, The Black Forest Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
+1 -1
View File
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team, The InstantX Team, and the XLabs Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team, The InstantX Team, and the XLabs Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -0,0 +1,36 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ControlNet
<div class="flex flex-wrap space-x-1">
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
</div>
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
The abstract from the paper is:
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
This pipeline was contributed by [ishan24](https://huggingface.co/ishan24). ❤️
The original codebase can be found at [NVlabs/Sana](https://github.com/NVlabs/Sana), and you can find official ControlNet checkpoints on [Efficient-Large-Model's](https://huggingface.co/Efficient-Large-Model) Hub profile.
## SanaControlNetPipeline
[[autodoc]] SanaControlNetPipeline
- all
- __call__
## SanaPipelineOutput
[[autodoc]] pipelines.sana.pipeline_output.SanaPipelineOutput
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

Some files were not shown because too many files have changed in this diff Show More