Compare commits
181 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| de7cdf6287 | |||
| 73c5fe8bb1 | |||
| 595581d6ba | |||
| d27b65411e | |||
| cb9dca5523 | |||
| 79166dcb47 | |||
| f95c320467 | |||
| 59abd9514b | |||
| 5f3ebef0d7 | |||
| e6ffde2936 | |||
| 04171c7345 | |||
| be5e10ae61 | |||
| a2da0004ee | |||
| 863c7df543 | |||
| e0083b29d5 | |||
| 6521f599b2 | |||
| 0fcce2acd8 | |||
| ceeb3c1da3 | |||
| 0fcdd699cf | |||
| 5af003a9e1 | |||
| 179d6d958b | |||
| 229c4b355c | |||
| 0a4819a755 | |||
| 7cea9a3bb0 | |||
| 23de59e21a | |||
| 4f8b6f5a15 | |||
| 63e94cbc61 | |||
| 2c66fb3a85 | |||
| 284f827d6c | |||
| b750c69859 | |||
| 13c51bb038 | |||
| 3e46c86a93 | |||
| 8cb5b084b5 | |||
| 13fe248152 | |||
| 2e2024152c | |||
| 1987c07899 | |||
| 4543d216ec | |||
| b5db8aaa6f | |||
| 98ea5c9e86 | |||
| f27fbceba1 | |||
| 4b12a60c93 | |||
| abf28d55fb | |||
| db4b54cfab | |||
| 0138e176ac | |||
| bbd9340781 | |||
| 363737ec4b | |||
| c5849ba9d5 | |||
| f09b1ccfae | |||
| 285f877620 | |||
| c75b88f86f | |||
| b43e703fae | |||
| 9fae3828a7 | |||
| 3a3441cb45 | |||
| fdd2bedae9 | |||
| fedaa00bd5 | |||
| 8c680bc0b4 | |||
| 92b6b43805 | |||
| 49ea4d1bf5 | |||
| 58dbe0c29e | |||
| 9aaec5b9bc | |||
| 93760b1888 | |||
| 75540f42ee | |||
| b543bcc661 | |||
| 885a596696 | |||
| 655512e2cf | |||
| f63d62e091 | |||
| 7608d2eb9e | |||
| 449f299c63 | |||
| 84f4b27dfa | |||
| 9abac85f77 | |||
| 61772f0994 | |||
| b92cda25e2 | |||
| 7492e331b4 | |||
| ab6d63407a | |||
| da4242d467 | |||
| 129d658da7 | |||
| 75e62385f5 | |||
| a33206d22b | |||
| a82e211f89 | |||
| f3453f05ff | |||
| c437ae72c6 | |||
| 9530245e17 | |||
| 74b908b7e2 | |||
| 7d2a633e02 | |||
| cb328d3ff9 | |||
| 8c038f0e62 | |||
| 5917d7039f | |||
| c0327e493e | |||
| 174628edf4 | |||
| 1c9f0a83c9 | |||
| cdaaa40d31 | |||
| ffbaa890ba | |||
| e49413d87d | |||
| 48e4ff5c05 | |||
| 7c78fb1aad | |||
| bb4044362e | |||
| 1ae591e817 | |||
| 42c06e90f4 | |||
| 085ade03be | |||
| 78d2454c7c | |||
| 19545fd3e1 | |||
| d12531ddf7 | |||
| 4751d456f2 | |||
| 083479c365 | |||
| 04c16d0a56 | |||
| 9e58856b7a | |||
| 45392cce11 | |||
| 8913d59bf3 | |||
| 5a8c1b5f19 | |||
| 7ad01a6350 | |||
| a8e853b791 | |||
| 6a509ba862 | |||
| 96795afc72 | |||
| 12650e1393 | |||
| addaad013c | |||
| 485f8d1758 | |||
| cff0fd6260 | |||
| 8ddb20bfb8 | |||
| e5089d702b | |||
| 2c3e4eafa8 | |||
| c7020df2cf | |||
| 4bed3e306e | |||
| 00a3bc9d6c | |||
| ccb35acd81 | |||
| 00cae4e857 | |||
| b3fb4188f5 | |||
| 71df1581f7 | |||
| d046cf7d35 | |||
| 68a5185c86 | |||
| 6e2fe26bfd | |||
| 77b5fa59c5 | |||
| a226920b52 | |||
| 7007f72409 | |||
| a6804de4a2 | |||
| 7f897a9fc4 | |||
| 0966663d2a | |||
| fb78f4f12d | |||
| 2220af6940 | |||
| 7a34832d52 | |||
| e973de64f9 | |||
| db94ca882d | |||
| 6985906a2e | |||
| 54f410db6c | |||
| c12a05b9c1 | |||
| 2e0f5c86cc | |||
| 1d63306295 | |||
| 6c93626f6f | |||
| 72c5bf07c8 | |||
| ed59f90f15 | |||
| a09ca7f27e | |||
| 8c02572e16 | |||
| 27dde51de8 | |||
| 10d4a775f1 | |||
| 72d9a81d99 | |||
| 4fa85c7963 | |||
| 806e8e66fb | |||
| 0b90051db8 | |||
| b305c779b2 | |||
| 2b3cd2d39c | |||
| bc3d1c9ee6 | |||
| e50d614636 | |||
| a8df0f1ffb | |||
| ace53e2d2f | |||
| ffc2992fc2 | |||
| c70a285c2c | |||
| 8b811feece | |||
| 37e8dc7a59 | |||
| 024a9f5de3 | |||
| 005195c23e | |||
| 6742f160df | |||
| 540d303250 | |||
| f1b3036ca1 | |||
| 46ec1743a2 | |||
| 70272b1108 | |||
| 2b6dcbfa1d | |||
| af9572d759 | |||
| ddea157979 | |||
| ad3f9a26c0 | |||
| e8d0980f9f | |||
| 52a7f1cb97 | |||
| 33f85fadf6 |
@@ -25,7 +25,7 @@ jobs:
|
||||
group: aws-g6e-4xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
|
||||
@@ -79,14 +79,14 @@ jobs:
|
||||
|
||||
# Check secret is set
|
||||
- name: whoami
|
||||
run: hf auth whoami
|
||||
run: huggingface-cli whoami
|
||||
env:
|
||||
HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
|
||||
|
||||
# Push to HF! (under subfolder based on checkout ref)
|
||||
# https://huggingface.co/datasets/diffusers/community-pipelines-mirror
|
||||
- name: Mirror community pipeline to HF
|
||||
run: hf upload diffusers/community-pipelines-mirror ./examples/community ${PATH_IN_REPO} --repo-type dataset
|
||||
run: huggingface-cli upload diffusers/community-pipelines-mirror ./examples/community ${PATH_IN_REPO} --repo-type dataset
|
||||
env:
|
||||
PATH_IN_REPO: ${{ env.PATH_IN_REPO }}
|
||||
HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
|
||||
|
||||
@@ -61,7 +61,7 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
@@ -107,7 +107,7 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
defaults:
|
||||
run:
|
||||
shell: bash
|
||||
@@ -178,7 +178,7 @@ jobs:
|
||||
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --gpus all --shm-size "16gb" --ipc host
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
@@ -222,7 +222,7 @@ jobs:
|
||||
group: aws-g6e-xlarge-plus
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
@@ -270,7 +270,7 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-minimum-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
defaults:
|
||||
run:
|
||||
shell: bash
|
||||
@@ -333,7 +333,7 @@ jobs:
|
||||
additional_deps: ["peft"]
|
||||
- backend: "gguf"
|
||||
test_location: "gguf"
|
||||
additional_deps: ["peft", "kernels"]
|
||||
additional_deps: ["peft"]
|
||||
- backend: "torchao"
|
||||
test_location: "torchao"
|
||||
additional_deps: []
|
||||
@@ -344,7 +344,7 @@ jobs:
|
||||
group: aws-g6e-xlarge-plus
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "20gb" --ipc host --gpus all
|
||||
options: --shm-size "20gb" --ipc host --gpus 0
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
@@ -396,7 +396,7 @@ jobs:
|
||||
group: aws-g6e-xlarge-plus
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "20gb" --ipc host --gpus all
|
||||
options: --shm-size "20gb" --ipc host --gpus 0
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
|
||||
@@ -1,141 +0,0 @@
|
||||
name: Fast PR tests for Modular
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
branches: [main]
|
||||
paths:
|
||||
- "src/diffusers/modular_pipelines/**.py"
|
||||
- "src/diffusers/models/modeling_utils.py"
|
||||
- "src/diffusers/models/model_loading_utils.py"
|
||||
- "src/diffusers/pipelines/pipeline_utils.py"
|
||||
- "src/diffusers/pipeline_loading_utils.py"
|
||||
- "src/diffusers/loaders/lora_base.py"
|
||||
- "src/diffusers/loaders/lora_pipeline.py"
|
||||
- "src/diffusers/loaders/peft.py"
|
||||
- "tests/modular_pipelines/**.py"
|
||||
- ".github/**.yml"
|
||||
- "utils/**.py"
|
||||
- "setup.py"
|
||||
push:
|
||||
branches:
|
||||
- ci-*
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
DIFFUSERS_IS_CI: yes
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 1
|
||||
OMP_NUM_THREADS: 4
|
||||
MKL_NUM_THREADS: 4
|
||||
PYTEST_TIMEOUT: 60
|
||||
|
||||
jobs:
|
||||
check_code_quality:
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: "3.10"
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
python -m pip install --upgrade pip
|
||||
pip install .[quality]
|
||||
- name: Check quality
|
||||
run: make quality
|
||||
- name: Check if failure
|
||||
if: ${{ failure() }}
|
||||
run: |
|
||||
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
|
||||
|
||||
check_repository_consistency:
|
||||
needs: check_code_quality
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: "3.10"
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
python -m pip install --upgrade pip
|
||||
pip install .[quality]
|
||||
- name: Check repo consistency
|
||||
run: |
|
||||
python utils/check_copies.py
|
||||
python utils/check_dummies.py
|
||||
python utils/check_support_list.py
|
||||
make deps_table_check_updated
|
||||
- name: Check if failure
|
||||
if: ${{ failure() }}
|
||||
run: |
|
||||
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
|
||||
|
||||
run_fast_tests:
|
||||
needs: [check_code_quality, check_repository_consistency]
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
config:
|
||||
- name: Fast PyTorch Modular Pipeline CPU tests
|
||||
framework: pytorch_pipelines
|
||||
runner: aws-highmemory-32-plus
|
||||
image: diffusers/diffusers-pytorch-cpu
|
||||
report: torch_cpu_modular_pipelines
|
||||
|
||||
name: ${{ matrix.config.name }}
|
||||
|
||||
runs-on:
|
||||
group: ${{ matrix.config.runner }}
|
||||
|
||||
container:
|
||||
image: ${{ matrix.config.image }}
|
||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
||||
|
||||
defaults:
|
||||
run:
|
||||
shell: bash
|
||||
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
with:
|
||||
fetch-depth: 2
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||
python -m uv pip install -e [quality,test]
|
||||
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
|
||||
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
|
||||
|
||||
- name: Environment
|
||||
run: |
|
||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||
python utils/print_env.py
|
||||
|
||||
- name: Run fast PyTorch Pipeline CPU tests
|
||||
if: ${{ matrix.config.framework == 'pytorch_pipelines' }}
|
||||
run: |
|
||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||
python -m pytest -n 8 --max-worker-restart=0 --dist=loadfile \
|
||||
-s -v -k "not Flax and not Onnx" \
|
||||
--make-reports=tests_${{ matrix.config.report }} \
|
||||
tests/modular_pipelines
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ failure() }}
|
||||
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
if: ${{ always() }}
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: pr_${{ matrix.config.framework }}_${{ matrix.config.report }}_test_reports
|
||||
path: reports
|
||||
|
||||
|
||||
@@ -13,7 +13,6 @@ on:
|
||||
- "src/diffusers/loaders/peft.py"
|
||||
- "tests/pipelines/test_pipelines_common.py"
|
||||
- "tests/models/test_modeling_common.py"
|
||||
- "examples/**/*.py"
|
||||
workflow_dispatch:
|
||||
|
||||
concurrency:
|
||||
@@ -118,7 +117,7 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
@@ -183,13 +182,13 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
defaults:
|
||||
run:
|
||||
shell: bash
|
||||
strategy:
|
||||
fail-fast: false
|
||||
max-parallel: 4
|
||||
max-parallel: 2
|
||||
matrix:
|
||||
module: [models, schedulers, lora, others]
|
||||
steps:
|
||||
@@ -253,7 +252,7 @@ jobs:
|
||||
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --gpus all --shm-size "16gb" --ipc host
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
|
||||
@@ -64,7 +64,7 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
@@ -109,7 +109,7 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
defaults:
|
||||
run:
|
||||
shell: bash
|
||||
@@ -167,7 +167,7 @@ jobs:
|
||||
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --gpus all --shm-size "16gb" --ipc host
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
@@ -210,7 +210,7 @@ jobs:
|
||||
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-xformers-cuda
|
||||
options: --gpus all --shm-size "16gb" --ipc host
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
@@ -252,7 +252,7 @@ jobs:
|
||||
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --gpus all --shm-size "16gb" --ipc host
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
|
||||
@@ -62,7 +62,7 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
uses: actions/checkout@v3
|
||||
@@ -107,7 +107,7 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
defaults:
|
||||
run:
|
||||
shell: bash
|
||||
@@ -163,7 +163,7 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-minimum-cuda
|
||||
options: --shm-size "16gb" --ipc host --gpus all
|
||||
options: --shm-size "16gb" --ipc host --gpus 0
|
||||
defaults:
|
||||
run:
|
||||
shell: bash
|
||||
@@ -222,7 +222,7 @@ jobs:
|
||||
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --gpus all --shm-size "16gb" --ipc host
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
@@ -265,7 +265,7 @@ jobs:
|
||||
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-xformers-cuda
|
||||
options: --gpus all --shm-size "16gb" --ipc host
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
@@ -307,7 +307,7 @@ jobs:
|
||||
|
||||
container:
|
||||
image: diffusers/diffusers-pytorch-cuda
|
||||
options: --gpus all --shm-size "16gb" --ipc host
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
|
||||
@@ -30,7 +30,7 @@ jobs:
|
||||
group: aws-g4dn-2xlarge
|
||||
container:
|
||||
image: ${{ github.event.inputs.docker_image }}
|
||||
options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
options: --gpus 0 --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
|
||||
steps:
|
||||
- name: Validate test files input
|
||||
|
||||
@@ -31,7 +31,7 @@ jobs:
|
||||
group: "${{ github.event.inputs.runner_type }}"
|
||||
container:
|
||||
image: ${{ github.event.inputs.docker_image }}
|
||||
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --gpus all --privileged
|
||||
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --gpus 0 --privileged
|
||||
|
||||
steps:
|
||||
- name: Checkout diffusers
|
||||
|
||||
@@ -31,7 +31,7 @@ pip install -r requirements.txt
|
||||
We need to be authenticated to access some of the checkpoints used during benchmarking:
|
||||
|
||||
```sh
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
We use an L40 GPU with 128GB RAM to run the benchmark CI. As such, the benchmarks are configured to run on NVIDIA GPUs. So, make sure you have access to a similar machine (or modify the benchmarking scripts accordingly).
|
||||
|
||||
@@ -47,10 +47,6 @@ RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
||||
tensorboard \
|
||||
transformers \
|
||||
matplotlib \
|
||||
setuptools==69.5.1 \
|
||||
bitsandbytes \
|
||||
torchao \
|
||||
gguf \
|
||||
optimum-quanto
|
||||
setuptools==69.5.1
|
||||
|
||||
CMD ["/bin/bash"]
|
||||
|
||||
+179
-210
@@ -1,39 +1,36 @@
|
||||
- title: Get started
|
||||
sections:
|
||||
- sections:
|
||||
- local: index
|
||||
title: Diffusers
|
||||
- local: installation
|
||||
title: Installation
|
||||
title: 🧨 Diffusers
|
||||
- local: quicktour
|
||||
title: Quicktour
|
||||
- local: stable_diffusion
|
||||
title: Effective and efficient diffusion
|
||||
|
||||
- title: DiffusionPipeline
|
||||
isExpanded: false
|
||||
sections:
|
||||
- local: using-diffusers/loading
|
||||
title: Load pipelines
|
||||
- local: installation
|
||||
title: Installation
|
||||
title: Get started
|
||||
- sections:
|
||||
- local: tutorials/tutorial_overview
|
||||
title: Overview
|
||||
- local: using-diffusers/write_own_pipeline
|
||||
title: Understanding pipelines, models and schedulers
|
||||
- local: tutorials/autopipeline
|
||||
title: AutoPipeline
|
||||
- local: tutorials/basic_training
|
||||
title: Train a diffusion model
|
||||
title: Tutorials
|
||||
- sections:
|
||||
- local: using-diffusers/loading
|
||||
title: Load pipelines
|
||||
- local: using-diffusers/custom_pipeline_overview
|
||||
title: Load community pipelines and components
|
||||
- local: using-diffusers/callback
|
||||
title: Pipeline callbacks
|
||||
- local: using-diffusers/reusing_seeds
|
||||
title: Reproducible pipelines
|
||||
- local: using-diffusers/schedulers
|
||||
title: Load schedulers and models
|
||||
- local: using-diffusers/scheduler_features
|
||||
title: Scheduler features
|
||||
- local: using-diffusers/other-formats
|
||||
title: Model files and layouts
|
||||
- local: using-diffusers/push_to_hub
|
||||
title: Push files to the Hub
|
||||
|
||||
- title: Adapters
|
||||
isExpanded: false
|
||||
sections:
|
||||
title: Load pipelines and adapters
|
||||
- sections:
|
||||
- local: tutorials/using_peft_for_inference
|
||||
title: LoRA
|
||||
- local: using-diffusers/ip_adapter
|
||||
@@ -46,12 +43,25 @@
|
||||
title: DreamBooth
|
||||
- local: using-diffusers/textual_inversion_inference
|
||||
title: Textual inversion
|
||||
|
||||
- title: Inference
|
||||
title: Adapters
|
||||
isExpanded: false
|
||||
sections:
|
||||
- local: using-diffusers/weighted_prompts
|
||||
title: Prompt techniques
|
||||
- sections:
|
||||
- local: using-diffusers/unconditional_image_generation
|
||||
title: Unconditional image generation
|
||||
- local: using-diffusers/conditional_image_generation
|
||||
title: Text-to-image
|
||||
- local: using-diffusers/img2img
|
||||
title: Image-to-image
|
||||
- local: using-diffusers/inpaint
|
||||
title: Inpainting
|
||||
- local: using-diffusers/text-img2vid
|
||||
title: Video generation
|
||||
- local: using-diffusers/depth2img
|
||||
title: Depth-to-image
|
||||
title: Generative tasks
|
||||
- sections:
|
||||
- local: using-diffusers/overview_techniques
|
||||
title: Overview
|
||||
- local: using-diffusers/create_a_server
|
||||
title: Create a server
|
||||
- local: using-diffusers/batched_inference
|
||||
@@ -66,38 +76,14 @@
|
||||
title: Reproducible pipelines
|
||||
- local: using-diffusers/image_quality
|
||||
title: Controlling image quality
|
||||
|
||||
- title: Inference optimization
|
||||
isExpanded: false
|
||||
sections:
|
||||
- local: optimization/fp16
|
||||
title: Accelerate inference
|
||||
- local: optimization/cache
|
||||
title: Caching
|
||||
- local: optimization/memory
|
||||
title: Reduce memory usage
|
||||
- local: optimization/speed-memory-optims
|
||||
title: Compile and offloading quantized models
|
||||
- title: Community optimizations
|
||||
sections:
|
||||
- local: optimization/pruna
|
||||
title: Pruna
|
||||
- local: optimization/xformers
|
||||
title: xFormers
|
||||
- local: optimization/tome
|
||||
title: Token merging
|
||||
- local: optimization/deepcache
|
||||
title: DeepCache
|
||||
- local: optimization/tgate
|
||||
title: TGATE
|
||||
- local: optimization/xdit
|
||||
title: xDiT
|
||||
- local: optimization/para_attn
|
||||
title: ParaAttention
|
||||
|
||||
- title: Hybrid Inference
|
||||
isExpanded: false
|
||||
sections:
|
||||
- local: using-diffusers/weighted_prompts
|
||||
title: Prompt techniques
|
||||
title: Inference techniques
|
||||
- sections:
|
||||
- local: advanced_inference/outpaint
|
||||
title: Outpainting
|
||||
title: Advanced inference
|
||||
- sections:
|
||||
- local: hybrid_inference/overview
|
||||
title: Overview
|
||||
- local: hybrid_inference/vae_decode
|
||||
@@ -106,110 +92,18 @@
|
||||
title: VAE Encode
|
||||
- local: hybrid_inference/api_reference
|
||||
title: API Reference
|
||||
|
||||
- title: Modular Diffusers
|
||||
isExpanded: false
|
||||
sections:
|
||||
- local: modular_diffusers/overview
|
||||
title: Overview
|
||||
- local: modular_diffusers/modular_pipeline
|
||||
title: Modular Pipeline
|
||||
title: Hybrid Inference
|
||||
- sections:
|
||||
- local: modular_diffusers/getting_started
|
||||
title: Getting Started
|
||||
- local: modular_diffusers/components_manager
|
||||
title: Components Manager
|
||||
- local: modular_diffusers/modular_diffusers_states
|
||||
title: Modular Diffusers States
|
||||
- local: modular_diffusers/pipeline_block
|
||||
title: Pipeline Block
|
||||
- local: modular_diffusers/sequential_pipeline_blocks
|
||||
title: Sequential Pipeline Blocks
|
||||
- local: modular_diffusers/loop_sequential_pipeline_blocks
|
||||
title: Loop Sequential Pipeline Blocks
|
||||
- local: modular_diffusers/auto_pipeline_blocks
|
||||
title: Auto Pipeline Blocks
|
||||
- local: modular_diffusers/write_own_pipeline_block
|
||||
title: Write your own pipeline block
|
||||
- local: modular_diffusers/end_to_end_guide
|
||||
title: End-to-End Example
|
||||
|
||||
- title: Training
|
||||
isExpanded: false
|
||||
sections:
|
||||
- local: training/overview
|
||||
title: Overview
|
||||
- local: training/create_dataset
|
||||
title: Create a dataset for training
|
||||
- local: training/adapt_a_model
|
||||
title: Adapt a model to a new task
|
||||
- local: tutorials/basic_training
|
||||
title: Train a diffusion model
|
||||
- title: Models
|
||||
sections:
|
||||
- local: training/unconditional_training
|
||||
title: Unconditional image generation
|
||||
- local: training/text2image
|
||||
title: Text-to-image
|
||||
- local: training/sdxl
|
||||
title: Stable Diffusion XL
|
||||
- local: training/kandinsky
|
||||
title: Kandinsky 2.2
|
||||
- local: training/wuerstchen
|
||||
title: Wuerstchen
|
||||
- local: training/controlnet
|
||||
title: ControlNet
|
||||
- local: training/t2i_adapters
|
||||
title: T2I-Adapters
|
||||
- local: training/instructpix2pix
|
||||
title: InstructPix2Pix
|
||||
- local: training/cogvideox
|
||||
title: CogVideoX
|
||||
- title: Methods
|
||||
sections:
|
||||
- local: training/text_inversion
|
||||
title: Textual Inversion
|
||||
- local: training/dreambooth
|
||||
title: DreamBooth
|
||||
- local: training/lora
|
||||
title: LoRA
|
||||
- local: training/custom_diffusion
|
||||
title: Custom Diffusion
|
||||
- local: training/lcm_distill
|
||||
title: Latent Consistency Distillation
|
||||
- local: training/ddpo
|
||||
title: Reinforcement learning training with DDPO
|
||||
|
||||
- title: Quantization
|
||||
isExpanded: false
|
||||
sections:
|
||||
- local: quantization/overview
|
||||
title: Getting started
|
||||
- local: quantization/bitsandbytes
|
||||
title: bitsandbytes
|
||||
- local: quantization/gguf
|
||||
title: gguf
|
||||
- local: quantization/torchao
|
||||
title: torchao
|
||||
- local: quantization/quanto
|
||||
title: quanto
|
||||
|
||||
- title: Model accelerators and hardware
|
||||
isExpanded: false
|
||||
sections:
|
||||
- local: using-diffusers/stable_diffusion_jax_how_to
|
||||
title: JAX/Flax
|
||||
- local: optimization/onnx
|
||||
title: ONNX
|
||||
- local: optimization/open_vino
|
||||
title: OpenVINO
|
||||
- local: optimization/coreml
|
||||
title: Core ML
|
||||
- local: optimization/mps
|
||||
title: Metal Performance Shaders (MPS)
|
||||
- local: optimization/habana
|
||||
title: Intel Gaudi
|
||||
- local: optimization/neuron
|
||||
title: AWS Neuron
|
||||
|
||||
- title: Specific pipeline examples
|
||||
isExpanded: false
|
||||
sections:
|
||||
title: End-to-End Developer Guide
|
||||
title: Modular Diffusers
|
||||
- sections:
|
||||
- local: using-diffusers/consisid
|
||||
title: ConsisID
|
||||
- local: using-diffusers/sdxl
|
||||
@@ -234,30 +128,106 @@
|
||||
title: Stable Video Diffusion
|
||||
- local: using-diffusers/marigold_usage
|
||||
title: Marigold Computer Vision
|
||||
|
||||
- title: Resources
|
||||
isExpanded: false
|
||||
sections:
|
||||
- title: Task recipes
|
||||
title: Specific pipeline examples
|
||||
- sections:
|
||||
- local: training/overview
|
||||
title: Overview
|
||||
- local: training/create_dataset
|
||||
title: Create a dataset for training
|
||||
- local: training/adapt_a_model
|
||||
title: Adapt a model to a new task
|
||||
- isExpanded: false
|
||||
sections:
|
||||
- local: using-diffusers/unconditional_image_generation
|
||||
- local: training/unconditional_training
|
||||
title: Unconditional image generation
|
||||
- local: using-diffusers/conditional_image_generation
|
||||
- local: training/text2image
|
||||
title: Text-to-image
|
||||
- local: using-diffusers/img2img
|
||||
title: Image-to-image
|
||||
- local: using-diffusers/inpaint
|
||||
title: Inpainting
|
||||
- local: advanced_inference/outpaint
|
||||
title: Outpainting
|
||||
- local: using-diffusers/text-img2vid
|
||||
title: Video generation
|
||||
- local: using-diffusers/depth2img
|
||||
title: Depth-to-image
|
||||
- local: using-diffusers/write_own_pipeline
|
||||
title: Understanding pipelines, models and schedulers
|
||||
- local: community_projects
|
||||
title: Projects built with Diffusers
|
||||
- local: training/sdxl
|
||||
title: Stable Diffusion XL
|
||||
- local: training/kandinsky
|
||||
title: Kandinsky 2.2
|
||||
- local: training/wuerstchen
|
||||
title: Wuerstchen
|
||||
- local: training/controlnet
|
||||
title: ControlNet
|
||||
- local: training/t2i_adapters
|
||||
title: T2I-Adapters
|
||||
- local: training/instructpix2pix
|
||||
title: InstructPix2Pix
|
||||
- local: training/cogvideox
|
||||
title: CogVideoX
|
||||
title: Models
|
||||
- isExpanded: false
|
||||
sections:
|
||||
- local: training/text_inversion
|
||||
title: Textual Inversion
|
||||
- local: training/dreambooth
|
||||
title: DreamBooth
|
||||
- local: training/lora
|
||||
title: LoRA
|
||||
- local: training/custom_diffusion
|
||||
title: Custom Diffusion
|
||||
- local: training/lcm_distill
|
||||
title: Latent Consistency Distillation
|
||||
- local: training/ddpo
|
||||
title: Reinforcement learning training with DDPO
|
||||
title: Methods
|
||||
title: Training
|
||||
- sections:
|
||||
- local: quantization/overview
|
||||
title: Getting Started
|
||||
- local: quantization/bitsandbytes
|
||||
title: bitsandbytes
|
||||
- local: quantization/gguf
|
||||
title: gguf
|
||||
- local: quantization/torchao
|
||||
title: torchao
|
||||
- local: quantization/quanto
|
||||
title: quanto
|
||||
title: Quantization Methods
|
||||
- sections:
|
||||
- local: optimization/fp16
|
||||
title: Accelerate inference
|
||||
- local: optimization/cache
|
||||
title: Caching
|
||||
- local: optimization/memory
|
||||
title: Reduce memory usage
|
||||
- local: optimization/speed-memory-optims
|
||||
title: Compile and offloading quantized models
|
||||
- local: optimization/pruna
|
||||
title: Pruna
|
||||
- local: optimization/xformers
|
||||
title: xFormers
|
||||
- local: optimization/tome
|
||||
title: Token merging
|
||||
- local: optimization/deepcache
|
||||
title: DeepCache
|
||||
- local: optimization/tgate
|
||||
title: TGATE
|
||||
- local: optimization/xdit
|
||||
title: xDiT
|
||||
- local: optimization/para_attn
|
||||
title: ParaAttention
|
||||
- sections:
|
||||
- local: using-diffusers/stable_diffusion_jax_how_to
|
||||
title: JAX/Flax
|
||||
- local: optimization/onnx
|
||||
title: ONNX
|
||||
- local: optimization/open_vino
|
||||
title: OpenVINO
|
||||
- local: optimization/coreml
|
||||
title: Core ML
|
||||
title: Optimized model formats
|
||||
- sections:
|
||||
- local: optimization/mps
|
||||
title: Metal Performance Shaders (MPS)
|
||||
- local: optimization/habana
|
||||
title: Intel Gaudi
|
||||
- local: optimization/neuron
|
||||
title: AWS Neuron
|
||||
title: Optimized hardware
|
||||
title: Accelerate inference and reduce memory
|
||||
- sections:
|
||||
- local: conceptual/philosophy
|
||||
title: Philosophy
|
||||
- local: using-diffusers/controlling_generation
|
||||
@@ -268,11 +238,13 @@
|
||||
title: Diffusers' Ethical Guidelines
|
||||
- local: conceptual/evaluation
|
||||
title: Evaluating Diffusion Models
|
||||
|
||||
- title: API
|
||||
isExpanded: false
|
||||
sections:
|
||||
- title: Main Classes
|
||||
title: Conceptual Guides
|
||||
- sections:
|
||||
- local: community_projects
|
||||
title: Projects built with Diffusers
|
||||
title: Community Projects
|
||||
- sections:
|
||||
- isExpanded: false
|
||||
sections:
|
||||
- local: api/configuration
|
||||
title: Configuration
|
||||
@@ -282,7 +254,8 @@
|
||||
title: Outputs
|
||||
- local: api/quantization
|
||||
title: Quantization
|
||||
- title: Loaders
|
||||
title: Main Classes
|
||||
- isExpanded: false
|
||||
sections:
|
||||
- local: api/loaders/ip_adapter
|
||||
title: IP-Adapter
|
||||
@@ -298,14 +271,14 @@
|
||||
title: SD3Transformer2D
|
||||
- local: api/loaders/peft
|
||||
title: PEFT
|
||||
- title: Models
|
||||
title: Loaders
|
||||
- isExpanded: false
|
||||
sections:
|
||||
- local: api/models/overview
|
||||
title: Overview
|
||||
- local: api/models/auto_model
|
||||
title: AutoModel
|
||||
- title: ControlNets
|
||||
sections:
|
||||
- sections:
|
||||
- local: api/models/controlnet
|
||||
title: ControlNetModel
|
||||
- local: api/models/controlnet_union
|
||||
@@ -320,8 +293,8 @@
|
||||
title: SD3ControlNetModel
|
||||
- local: api/models/controlnet_sparsectrl
|
||||
title: SparseControlNetModel
|
||||
- title: Transformers
|
||||
sections:
|
||||
title: ControlNets
|
||||
- sections:
|
||||
- local: api/models/allegro_transformer3d
|
||||
title: AllegroTransformer3DModel
|
||||
- local: api/models/aura_flow_transformer2d
|
||||
@@ -366,14 +339,10 @@
|
||||
title: PixArtTransformer2DModel
|
||||
- local: api/models/prior_transformer
|
||||
title: PriorTransformer
|
||||
- local: api/models/qwenimage_transformer2d
|
||||
title: QwenImageTransformer2DModel
|
||||
- local: api/models/sana_transformer2d
|
||||
title: SanaTransformer2DModel
|
||||
- local: api/models/sd3_transformer2d
|
||||
title: SD3Transformer2DModel
|
||||
- local: api/models/skyreels_v2_transformer_3d
|
||||
title: SkyReelsV2Transformer3DModel
|
||||
- local: api/models/stable_audio_transformer
|
||||
title: StableAudioDiTModel
|
||||
- local: api/models/transformer2d
|
||||
@@ -382,8 +351,8 @@
|
||||
title: TransformerTemporalModel
|
||||
- local: api/models/wan_transformer_3d
|
||||
title: WanTransformer3DModel
|
||||
- title: UNets
|
||||
sections:
|
||||
title: Transformers
|
||||
- sections:
|
||||
- local: api/models/stable_cascade_unet
|
||||
title: StableCascadeUNet
|
||||
- local: api/models/unet
|
||||
@@ -398,8 +367,8 @@
|
||||
title: UNetMotionModel
|
||||
- local: api/models/uvit2d
|
||||
title: UViT2DModel
|
||||
- title: VAEs
|
||||
sections:
|
||||
title: UNets
|
||||
- sections:
|
||||
- local: api/models/asymmetricautoencoderkl
|
||||
title: AsymmetricAutoencoderKL
|
||||
- local: api/models/autoencoder_dc
|
||||
@@ -420,8 +389,6 @@
|
||||
title: AutoencoderKLMagvit
|
||||
- local: api/models/autoencoderkl_mochi
|
||||
title: AutoencoderKLMochi
|
||||
- local: api/models/autoencoderkl_qwenimage
|
||||
title: AutoencoderKLQwenImage
|
||||
- local: api/models/autoencoder_kl_wan
|
||||
title: AutoencoderKLWan
|
||||
- local: api/models/consistency_decoder_vae
|
||||
@@ -432,7 +399,9 @@
|
||||
title: Tiny AutoEncoder
|
||||
- local: api/models/vq
|
||||
title: VQModel
|
||||
- title: Pipelines
|
||||
title: VAEs
|
||||
title: Models
|
||||
- isExpanded: false
|
||||
sections:
|
||||
- local: api/pipelines/overview
|
||||
title: Overview
|
||||
@@ -558,8 +527,6 @@
|
||||
title: PixArt-α
|
||||
- local: api/pipelines/pixart_sigma
|
||||
title: PixArt-Σ
|
||||
- local: api/pipelines/qwenimage
|
||||
title: QwenImage
|
||||
- local: api/pipelines/sana
|
||||
title: Sana
|
||||
- local: api/pipelines/sana_sprint
|
||||
@@ -570,14 +537,11 @@
|
||||
title: Semantic Guidance
|
||||
- local: api/pipelines/shap_e
|
||||
title: Shap-E
|
||||
- local: api/pipelines/skyreels_v2
|
||||
title: SkyReels-V2
|
||||
- local: api/pipelines/stable_audio
|
||||
title: Stable Audio
|
||||
- local: api/pipelines/stable_cascade
|
||||
title: Stable Cascade
|
||||
- title: Stable Diffusion
|
||||
sections:
|
||||
- sections:
|
||||
- local: api/pipelines/stable_diffusion/overview
|
||||
title: Overview
|
||||
- local: api/pipelines/stable_diffusion/depth2img
|
||||
@@ -614,6 +578,7 @@
|
||||
title: T2I-Adapter
|
||||
- local: api/pipelines/stable_diffusion/text2img
|
||||
title: Text-to-image
|
||||
title: Stable Diffusion
|
||||
- local: api/pipelines/stable_unclip
|
||||
title: Stable unCLIP
|
||||
- local: api/pipelines/text_to_video
|
||||
@@ -632,7 +597,8 @@
|
||||
title: Wan
|
||||
- local: api/pipelines/wuerstchen
|
||||
title: Wuerstchen
|
||||
- title: Schedulers
|
||||
title: Pipelines
|
||||
- isExpanded: false
|
||||
sections:
|
||||
- local: api/schedulers/overview
|
||||
title: Overview
|
||||
@@ -702,7 +668,8 @@
|
||||
title: UniPCMultistepScheduler
|
||||
- local: api/schedulers/vq_diffusion
|
||||
title: VQDiffusionScheduler
|
||||
- title: Internal classes
|
||||
title: Schedulers
|
||||
- isExpanded: false
|
||||
sections:
|
||||
- local: api/internal_classes_overview
|
||||
title: Overview
|
||||
@@ -720,3 +687,5 @@
|
||||
title: VAE Image Processor
|
||||
- local: api/video_processor
|
||||
title: Video Processor
|
||||
title: Internal classes
|
||||
title: API
|
||||
|
||||
@@ -16,7 +16,7 @@ Schedulers from [`~schedulers.scheduling_utils.SchedulerMixin`] and models from
|
||||
|
||||
<Tip>
|
||||
|
||||
To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `hf auth login`.
|
||||
To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
|
||||
|
||||
</Tip>
|
||||
|
||||
|
||||
@@ -26,11 +26,9 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
|
||||
- [`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
|
||||
- [`Lumina2LoraLoaderMixin`] provides similar functions for [Lumina2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina2).
|
||||
- [`WanLoraLoaderMixin`] provides similar functions for [Wan](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan).
|
||||
- [`SkyReelsV2LoraLoaderMixin`] provides similar functions for [SkyReels-V2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/skyreels_v2).
|
||||
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
|
||||
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
|
||||
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
|
||||
- [`QwenImageLoraLoaderMixin`] provides similar functions for [Qwen Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwen)
|
||||
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
|
||||
|
||||
<Tip>
|
||||
@@ -94,10 +92,6 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
|
||||
|
||||
[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin
|
||||
|
||||
## SkyReelsV2LoraLoaderMixin
|
||||
|
||||
[[autodoc]] loaders.lora_pipeline.SkyReelsV2LoraLoaderMixin
|
||||
|
||||
## AmusedLoraLoaderMixin
|
||||
|
||||
[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
|
||||
@@ -106,10 +100,6 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
|
||||
|
||||
[[autodoc]] loaders.lora_pipeline.HiDreamImageLoraLoaderMixin
|
||||
|
||||
## QwenImageLoraLoaderMixin
|
||||
## WanLoraLoaderMixin
|
||||
|
||||
[[autodoc]] loaders.lora_pipeline.QwenImageLoraLoaderMixin
|
||||
|
||||
## LoraBaseMixin
|
||||
|
||||
[[autodoc]] loaders.lora_base.LoraBaseMixin
|
||||
[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin
|
||||
@@ -1,35 +0,0 @@
|
||||
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License. -->
|
||||
|
||||
# AutoencoderKLQwenImage
|
||||
|
||||
The model can be loaded with the following code snippet.
|
||||
|
||||
```python
|
||||
from diffusers import AutoencoderKLQwenImage
|
||||
|
||||
vae = AutoencoderKLQwenImage.from_pretrained("Qwen/QwenImage-20B", subfolder="vae")
|
||||
```
|
||||
|
||||
## AutoencoderKLQwenImage
|
||||
|
||||
[[autodoc]] AutoencoderKLQwenImage
|
||||
- decode
|
||||
- encode
|
||||
- all
|
||||
|
||||
## AutoencoderKLOutput
|
||||
|
||||
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
|
||||
|
||||
## DecoderOutput
|
||||
|
||||
[[autodoc]] models.autoencoders.vae.DecoderOutput
|
||||
@@ -1,28 +0,0 @@
|
||||
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License. -->
|
||||
|
||||
# QwenImageTransformer2DModel
|
||||
|
||||
The model can be loaded with the following code snippet.
|
||||
|
||||
```python
|
||||
from diffusers import QwenImageTransformer2DModel
|
||||
|
||||
transformer = QwenImageTransformer2DModel.from_pretrained("Qwen/QwenImage-20B", subfolder="transformer", torch_dtype=torch.bfloat16)
|
||||
```
|
||||
|
||||
## QwenImageTransformer2DModel
|
||||
|
||||
[[autodoc]] QwenImageTransformer2DModel
|
||||
|
||||
## Transformer2DModelOutput
|
||||
|
||||
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
|
||||
@@ -1,30 +0,0 @@
|
||||
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License. -->
|
||||
|
||||
# SkyReelsV2Transformer3DModel
|
||||
|
||||
A Diffusion Transformer model for 3D video-like data was introduced in [SkyReels-V2](https://github.com/SkyworkAI/SkyReels-V2) by the Skywork AI.
|
||||
|
||||
The model can be loaded with the following code snippet.
|
||||
|
||||
```python
|
||||
from diffusers import SkyReelsV2Transformer3DModel
|
||||
|
||||
transformer = SkyReelsV2Transformer3DModel.from_pretrained("Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
|
||||
```
|
||||
|
||||
## SkyReelsV2Transformer3DModel
|
||||
|
||||
[[autodoc]] SkyReelsV2Transformer3DModel
|
||||
|
||||
## Transformer2DModelOutput
|
||||
|
||||
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
|
||||
@@ -36,7 +36,7 @@ import torch
|
||||
from diffusers import ChromaPipeline
|
||||
|
||||
pipe = ChromaPipeline.from_pretrained("lodestones/Chroma", torch_dtype=torch.bfloat16)
|
||||
pipe.enable_model_cpu_offload()
|
||||
pipe.enabe_model_cpu_offload()
|
||||
|
||||
prompt = [
|
||||
"A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
|
||||
|
||||
@@ -1,92 +0,0 @@
|
||||
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License. -->
|
||||
|
||||
# QwenImage
|
||||
|
||||
Qwen-Image from the Qwen team is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. Experiments show strong general capabilities in both image generation and editing, with exceptional performance in text rendering, especially for Chinese.
|
||||
|
||||
Check out the model card [here](https://huggingface.co/Qwen/Qwen-Image) to learn more.
|
||||
|
||||
<Tip>
|
||||
|
||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
|
||||
|
||||
</Tip>
|
||||
|
||||
## LoRA for faster inference
|
||||
|
||||
Use a LoRA from `lightx2v/Qwen-Image-Lightning` to speed up inference by reducing the
|
||||
number of steps. Refer to the code snippet below:
|
||||
|
||||
<details>
|
||||
<summary>Code</summary>
|
||||
|
||||
```py
|
||||
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler
|
||||
import torch
|
||||
import math
|
||||
|
||||
ckpt_id = "Qwen/Qwen-Image"
|
||||
|
||||
# From
|
||||
# https://github.com/ModelTC/Qwen-Image-Lightning/blob/342260e8f5468d2f24d084ce04f55e101007118b/generate_with_diffusers.py#L82C9-L97C10
|
||||
scheduler_config = {
|
||||
"base_image_seq_len": 256,
|
||||
"base_shift": math.log(3), # We use shift=3 in distillation
|
||||
"invert_sigmas": False,
|
||||
"max_image_seq_len": 8192,
|
||||
"max_shift": math.log(3), # We use shift=3 in distillation
|
||||
"num_train_timesteps": 1000,
|
||||
"shift": 1.0,
|
||||
"shift_terminal": None, # set shift_terminal to None
|
||||
"stochastic_sampling": False,
|
||||
"time_shift_type": "exponential",
|
||||
"use_beta_sigmas": False,
|
||||
"use_dynamic_shifting": True,
|
||||
"use_exponential_sigmas": False,
|
||||
"use_karras_sigmas": False,
|
||||
}
|
||||
scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
|
||||
pipe = DiffusionPipeline.from_pretrained(
|
||||
ckpt_id, scheduler=scheduler, torch_dtype=torch.bfloat16
|
||||
).to("cuda")
|
||||
pipe.load_lora_weights(
|
||||
"lightx2v/Qwen-Image-Lightning", weight_name="Qwen-Image-Lightning-8steps-V1.0.safetensors"
|
||||
)
|
||||
|
||||
prompt = "a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K, cinematic composition."
|
||||
negative_prompt = " "
|
||||
image = pipe(
|
||||
prompt=prompt,
|
||||
negative_prompt=negative_prompt,
|
||||
width=1024,
|
||||
height=1024,
|
||||
num_inference_steps=8,
|
||||
true_cfg_scale=1.0,
|
||||
generator=torch.manual_seed(0),
|
||||
).images[0]
|
||||
image.save("qwen_fewsteps.png")
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
## QwenImagePipeline
|
||||
|
||||
[[autodoc]] QwenImagePipeline
|
||||
- all
|
||||
- __call__
|
||||
|
||||
## QwenImagePipelineOutput
|
||||
|
||||
[[autodoc]] pipelines.qwenimage.pipeline_output.QwenImagePipelineOutput
|
||||
@@ -1,367 +0,0 @@
|
||||
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License. -->
|
||||
|
||||
<div style="float: right;">
|
||||
<div class="flex flex-wrap space-x-1">
|
||||
<a href="https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference" target="_blank" rel="noopener">
|
||||
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
# SkyReels-V2: Infinite-length Film Generative model
|
||||
|
||||
[SkyReels-V2](https://huggingface.co/papers/2504.13074) by the SkyReels Team.
|
||||
|
||||
*Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming from general-purpose MLLMs' inability to interpret cinematic grammar, such as shot composition, actor expressions, and camera motions. These intertwined limitations hinder realistic long-form synthesis and professional film-style generation. To address these limitations, we propose SkyReels-V2, an Infinite-length Film Generative Model, that synergizes Multi-modal Large Language Model (MLLM), Multi-stage Pretraining, Reinforcement Learning, and Diffusion Forcing Framework. Firstly, we design a comprehensive structural representation of video that combines the general descriptions by the Multi-modal LLM and the detailed shot language by sub-expert models. Aided with human annotation, we then train a unified Video Captioner, named SkyCaptioner-V1, to efficiently label the video data. Secondly, we establish progressive-resolution pretraining for the fundamental video generation, followed by a four-stage post-training enhancement: Initial concept-balanced Supervised Fine-Tuning (SFT) improves baseline quality; Motion-specific Reinforcement Learning (RL) training with human-annotated and synthetic distortion data addresses dynamic artifacts; Our diffusion forcing framework with non-decreasing noise schedules enables long-video synthesis in an efficient search space; Final high-quality SFT refines visual fidelity. All the code and models are available at [this https URL](https://github.com/SkyworkAI/SkyReels-V2).*
|
||||
|
||||
You can find all the original SkyReels-V2 checkpoints under the [Skywork](https://huggingface.co/collections/Skywork/skyreels-v2-6801b1b93df627d441d0d0d9) organization.
|
||||
|
||||
The following SkyReels-V2 models are supported in Diffusers:
|
||||
- [SkyReels-V2 DF 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers)
|
||||
- [SkyReels-V2 DF 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P-Diffusers)
|
||||
- [SkyReels-V2 DF 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-720P-Diffusers)
|
||||
- [SkyReels-V2 T2V 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-T2V-14B-540P-Diffusers)
|
||||
- [SkyReels-V2 T2V 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-T2V-14B-720P-Diffusers)
|
||||
- [SkyReels-V2 I2V 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P-Diffusers)
|
||||
- [SkyReels-V2 I2V 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P-Diffusers)
|
||||
- [SkyReels-V2 I2V 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-720P-Diffusers)
|
||||
- [SkyReels-V2 FLF2V 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-FLF2V-1.3B-540P-Diffusers)
|
||||
|
||||
> [!TIP]
|
||||
> Click on the SkyReels-V2 models in the right sidebar for more examples of video generation.
|
||||
|
||||
### A _Visual_ Demonstration
|
||||
|
||||
An example with these parameters:
|
||||
base_num_frames=97, num_frames=97, num_inference_steps=30, ar_step=5, causal_block_size=5
|
||||
|
||||
vae_scale_factor_temporal -> 4
|
||||
num_latent_frames: (97-1)//vae_scale_factor_temporal+1 = 25 frames -> 5 blocks of 5 frames each
|
||||
|
||||
base_num_latent_frames = (97-1)//vae_scale_factor_temporal+1 = 25 → blocks = 25//5 = 5 blocks
|
||||
This 5 blocks means the maximum context length of the model is 25 frames in the latent space.
|
||||
|
||||
Asynchronous Processing Timeline:
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Steps: 1 6 11 16 21 26 31 36 41 46 50 │
|
||||
│ Block 1: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
|
||||
│ Block 2: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
|
||||
│ Block 3: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
|
||||
│ Block 4: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
|
||||
│ Block 5: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
|
||||
For Long Videos (num_frames > base_num_frames):
|
||||
base_num_frames acts as the "sliding window size" for processing long videos.
|
||||
|
||||
Example: 257-frame video with base_num_frames=97, overlap_history=17
|
||||
┌──── Iteration 1 (frames 1-97) ────┐
|
||||
│ Processing window: 97 frames │ → 5 blocks, async processing
|
||||
│ Generates: frames 1-97 │
|
||||
└───────────────────────────────────┘
|
||||
┌────── Iteration 2 (frames 81-177) ──────┐
|
||||
│ Processing window: 97 frames │
|
||||
│ Overlap: 17 frames (81-97) from prev │ → 5 blocks, async processing
|
||||
│ Generates: frames 98-177 │
|
||||
└─────────────────────────────────────────┘
|
||||
┌────── Iteration 3 (frames 161-257) ──────┐
|
||||
│ Processing window: 97 frames │
|
||||
│ Overlap: 17 frames (161-177) from prev │ → 5 blocks, async processing
|
||||
│ Generates: frames 178-257 │
|
||||
└──────────────────────────────────────────┘
|
||||
|
||||
Each iteration independently runs the asynchronous processing with its own 5 blocks.
|
||||
base_num_frames controls:
|
||||
1. Memory usage (larger window = more VRAM)
|
||||
2. Model context length (must match training constraints)
|
||||
3. Number of blocks per iteration (base_num_latent_frames // causal_block_size)
|
||||
|
||||
Each block takes 30 steps to complete denoising.
|
||||
Block N starts at step: 1 + (N-1) x ar_step
|
||||
Total steps: 30 + (5-1) x 5 = 50 steps
|
||||
|
||||
|
||||
Synchronous mode (ar_step=0) would process all blocks/frames simultaneously:
|
||||
┌──────────────────────────────────────────────┐
|
||||
│ Steps: 1 ... 30 │
|
||||
│ All blocks: [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
|
||||
└──────────────────────────────────────────────┘
|
||||
Total steps: 30 steps
|
||||
|
||||
|
||||
An example on how the step matrix is constructed for asynchronous processing:
|
||||
Given the parameters: (num_inference_steps=30, flow_shift=8, num_frames=97, ar_step=5, causal_block_size=5)
|
||||
- num_latent_frames = (97 frames - 1) // (4 temporal downsampling) + 1 = 25
|
||||
- step_template = [999, 995, 991, 986, 980, 975, 969, 963, 956, 948,
|
||||
941, 932, 922, 912, 901, 888, 874, 859, 841, 822,
|
||||
799, 773, 743, 708, 666, 615, 551, 470, 363, 216]
|
||||
|
||||
The algorithm creates a 50x25 step_matrix where:
|
||||
- Row 1: [999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999]
|
||||
- Row 2: [995, 995, 995, 995, 995, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999]
|
||||
- Row 3: [991, 991, 991, 991, 991, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999]
|
||||
- ...
|
||||
- Row 7: [969, 969, 969, 969, 969, 995, 995, 995, 995, 995, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999, 999]
|
||||
- ...
|
||||
- Row 21: [799, 799, 799, 799, 799, 888, 888, 888, 888, 888, 941, 941, 941, 941, 941, 975, 975, 975, 975, 975, 999, 999, 999, 999, 999]
|
||||
- ...
|
||||
- Row 35: [ 0, 0, 0, 0, 0, 216, 216, 216, 216, 216, 666, 666, 666, 666, 666, 822, 822, 822, 822, 822, 901, 901, 901, 901, 901]
|
||||
- ...
|
||||
- Row 42: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 551, 551, 551, 551, 551, 773, 773, 773, 773, 773]
|
||||
- ...
|
||||
- Row 50: [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 216, 216, 216, 216, 216]
|
||||
|
||||
Detailed Row 6 Analysis:
|
||||
- step_matrix[5]: [ 975, 975, 975, 975, 975, 999, 999, 999, 999, 999, 999, ..., 999]
|
||||
- step_index[5]: [ 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 0, ..., 0]
|
||||
- step_update_mask[5]: [True,True,True,True,True,True,True,True,True,True,False, ...,False]
|
||||
- valid_interval[5]: (0, 25)
|
||||
|
||||
Key Pattern: Block i lags behind Block i-1 by exactly ar_step=5 timesteps, creating the
|
||||
staggered "diffusion forcing" effect where later blocks condition on cleaner earlier blocks.
|
||||
|
||||
### Text-to-Video Generation
|
||||
|
||||
The example below demonstrates how to generate a video from text.
|
||||
|
||||
<hfoptions id="T2V usage">
|
||||
<hfoption id="T2V memory">
|
||||
|
||||
Refer to the [Reduce memory usage](../../optimization/memory) guide for more details about the various memory saving techniques.
|
||||
|
||||
From the original repo:
|
||||
>You can use --ar_step 5 to enable asynchronous inference. When asynchronous inference, --causal_block_size 5 is recommended while it is not supposed to be set for synchronous generation... Asynchronous inference will take more steps to diffuse the whole sequence which means it will be SLOWER than synchronous mode. In our experiments, asynchronous inference may improve the instruction following and visual consistent performance.
|
||||
|
||||
```py
|
||||
# pip install ftfy
|
||||
import torch
|
||||
from diffusers import AutoModel, SkyReelsV2DiffusionForcingPipeline, UniPCMultistepScheduler
|
||||
from diffusers.utils import export_to_video
|
||||
|
||||
vae = AutoModel.from_pretrained("Skywork/SkyReels-V2-DF-14B-540P-Diffusers", subfolder="vae", torch_dtype=torch.float32)
|
||||
transformer = AutoModel.from_pretrained("Skywork/SkyReels-V2-DF-14B-540P-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
|
||||
|
||||
pipeline = SkyReelsV2DiffusionForcingPipeline.from_pretrained(
|
||||
"Skywork/SkyReels-V2-DF-14B-540P-Diffusers",
|
||||
vae=vae,
|
||||
transformer=transformer,
|
||||
torch_dtype=torch.bfloat16
|
||||
)
|
||||
flow_shift = 8.0 # 8.0 for T2V, 5.0 for I2V
|
||||
pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config, flow_shift=flow_shift)
|
||||
pipeline = pipeline.to("cuda")
|
||||
|
||||
prompt = "A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window."
|
||||
|
||||
output = pipeline(
|
||||
prompt=prompt,
|
||||
num_inference_steps=30,
|
||||
height=544, # 720 for 720P
|
||||
width=960, # 1280 for 720P
|
||||
num_frames=97,
|
||||
base_num_frames=97, # 121 for 720P
|
||||
ar_step=5, # Controls asynchronous inference (0 for synchronous mode)
|
||||
causal_block_size=5, # Number of frames in each block for asynchronous processing
|
||||
overlap_history=None, # Number of frames to overlap for smooth transitions in long videos; 17 for long video generations
|
||||
addnoise_condition=20, # Improves consistency in long video generation
|
||||
).frames[0]
|
||||
export_to_video(output, "T2V.mp4", fps=24, quality=8)
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
### First-Last-Frame-to-Video Generation
|
||||
|
||||
The example below demonstrates how to use the image-to-video pipeline to generate a video using a text description, a starting frame, and an ending frame.
|
||||
|
||||
<hfoptions id="FLF2V usage">
|
||||
<hfoption id="usage">
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import torch
|
||||
import torchvision.transforms.functional as TF
|
||||
from diffusers import AutoencoderKLWan, SkyReelsV2DiffusionForcingImageToVideoPipeline, UniPCMultistepScheduler
|
||||
from diffusers.utils import export_to_video, load_image
|
||||
|
||||
|
||||
model_id = "Skywork/SkyReels-V2-DF-14B-720P-Diffusers"
|
||||
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
|
||||
pipeline = SkyReelsV2DiffusionForcingImageToVideoPipeline.from_pretrained(
|
||||
model_id, vae=vae, torch_dtype=torch.bfloat16
|
||||
)
|
||||
flow_shift = 5.0 # 8.0 for T2V, 5.0 for I2V
|
||||
pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config, flow_shift=flow_shift)
|
||||
pipeline.to("cuda")
|
||||
|
||||
first_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
|
||||
last_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")
|
||||
|
||||
def aspect_ratio_resize(image, pipeline, max_area=720 * 1280):
|
||||
aspect_ratio = image.height / image.width
|
||||
mod_value = pipeline.vae_scale_factor_spatial * pipeline.transformer.config.patch_size[1]
|
||||
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
|
||||
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
|
||||
image = image.resize((width, height))
|
||||
return image, height, width
|
||||
|
||||
def center_crop_resize(image, height, width):
|
||||
# Calculate resize ratio to match first frame dimensions
|
||||
resize_ratio = max(width / image.width, height / image.height)
|
||||
|
||||
# Resize the image
|
||||
width = round(image.width * resize_ratio)
|
||||
height = round(image.height * resize_ratio)
|
||||
size = [width, height]
|
||||
image = TF.center_crop(image, size)
|
||||
|
||||
return image, height, width
|
||||
|
||||
first_frame, height, width = aspect_ratio_resize(first_frame, pipeline)
|
||||
if last_frame.size != first_frame.size:
|
||||
last_frame, _, _ = center_crop_resize(last_frame, height, width)
|
||||
|
||||
prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings. The bird's feathers are delicate, with a unique pattern on its chest. The background shows a blue sky with white clouds under bright sunshine. The camera follows the bird upward, capturing its flight and the vastness of the sky from a close-up, low-angle perspective."
|
||||
|
||||
output = pipeline(
|
||||
image=first_frame, last_image=last_frame, prompt=prompt, height=height, width=width, guidance_scale=5.0
|
||||
).frames[0]
|
||||
export_to_video(output, "output.mp4", fps=24, quality=8)
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
|
||||
### Video-to-Video Generation
|
||||
|
||||
<hfoptions id="V2V usage">
|
||||
<hfoption id="usage">
|
||||
|
||||
`SkyReelsV2DiffusionForcingVideoToVideoPipeline` extends a given video.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import torch
|
||||
import torchvision.transforms.functional as TF
|
||||
from diffusers import AutoencoderKLWan, SkyReelsV2DiffusionForcingVideoToVideoPipeline, UniPCMultistepScheduler
|
||||
from diffusers.utils import export_to_video, load_video
|
||||
|
||||
|
||||
model_id = "Skywork/SkyReels-V2-DF-14B-540P-Diffusers"
|
||||
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
|
||||
pipeline = SkyReelsV2DiffusionForcingVideoToVideoPipeline.from_pretrained(
|
||||
model_id, vae=vae, torch_dtype=torch.bfloat16
|
||||
)
|
||||
flow_shift = 5.0 # 8.0 for T2V, 5.0 for I2V
|
||||
pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config, flow_shift=flow_shift)
|
||||
pipeline.to("cuda")
|
||||
|
||||
video = load_video("input_video.mp4")
|
||||
|
||||
prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings. The bird's feathers are delicate, with a unique pattern on its chest. The background shows a blue sky with white clouds under bright sunshine. The camera follows the bird upward, capturing its flight and the vastness of the sky from a close-up, low-angle perspective."
|
||||
|
||||
output = pipeline(
|
||||
video=video, prompt=prompt, height=544, width=960, guidance_scale=5.0,
|
||||
num_inference_steps=30, num_frames=257, base_num_frames=97#, ar_step=5, causal_block_size=5,
|
||||
).frames[0]
|
||||
export_to_video(output, "output.mp4", fps=24, quality=8)
|
||||
# Total frames will be the number of frames of given video + 257
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
|
||||
## Notes
|
||||
|
||||
- SkyReels-V2 supports LoRAs with [`~loaders.SkyReelsV2LoraLoaderMixin.load_lora_weights`].
|
||||
|
||||
<details>
|
||||
<summary>Show example code</summary>
|
||||
|
||||
```py
|
||||
# pip install ftfy
|
||||
import torch
|
||||
from diffusers import AutoModel, SkyReelsV2DiffusionForcingPipeline
|
||||
from diffusers.utils import export_to_video
|
||||
|
||||
vae = AutoModel.from_pretrained(
|
||||
"Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", subfolder="vae", torch_dtype=torch.float32
|
||||
)
|
||||
pipeline = SkyReelsV2DiffusionForcingPipeline.from_pretrained(
|
||||
"Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", vae=vae, torch_dtype=torch.bfloat16
|
||||
)
|
||||
pipeline.to("cuda")
|
||||
|
||||
pipeline.load_lora_weights("benjamin-paine/steamboat-willie-1.3b", adapter_name="steamboat-willie")
|
||||
pipeline.set_adapters("steamboat-willie")
|
||||
|
||||
pipeline.enable_model_cpu_offload()
|
||||
|
||||
# use "steamboat willie style" to trigger the LoRA
|
||||
prompt = """
|
||||
steamboat willie style, golden era animation, The camera rushes from far to near in a low-angle shot,
|
||||
revealing a white ferret on a log. It plays, leaps into the water, and emerges, as the camera zooms in
|
||||
for a close-up. Water splashes berry bushes nearby, while moss, snow, and leaves blanket the ground.
|
||||
Birch trees and a light blue sky frame the scene, with ferns in the foreground. Side lighting casts dynamic
|
||||
shadows and warm highlights. Medium composition, front view, low angle, with depth of field.
|
||||
"""
|
||||
|
||||
output = pipeline(
|
||||
prompt=prompt,
|
||||
num_frames=97,
|
||||
guidance_scale=6.0,
|
||||
).frames[0]
|
||||
export_to_video(output, "output.mp4", fps=24)
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
## SkyReelsV2DiffusionForcingPipeline
|
||||
|
||||
[[autodoc]] SkyReelsV2DiffusionForcingPipeline
|
||||
- all
|
||||
- __call__
|
||||
|
||||
## SkyReelsV2DiffusionForcingImageToVideoPipeline
|
||||
|
||||
[[autodoc]] SkyReelsV2DiffusionForcingImageToVideoPipeline
|
||||
- all
|
||||
- __call__
|
||||
|
||||
## SkyReelsV2DiffusionForcingVideoToVideoPipeline
|
||||
|
||||
[[autodoc]] SkyReelsV2DiffusionForcingVideoToVideoPipeline
|
||||
- all
|
||||
- __call__
|
||||
|
||||
## SkyReelsV2Pipeline
|
||||
|
||||
[[autodoc]] SkyReelsV2Pipeline
|
||||
- all
|
||||
- __call__
|
||||
|
||||
## SkyReelsV2ImageToVideoPipeline
|
||||
|
||||
[[autodoc]] SkyReelsV2ImageToVideoPipeline
|
||||
- all
|
||||
- __call__
|
||||
|
||||
## SkyReelsV2PipelineOutput
|
||||
|
||||
[[autodoc]] pipelines.skyreels_v2.pipeline_output.SkyReelsV2PipelineOutput
|
||||
@@ -31,7 +31,7 @@ _As the model is gated, before using it with diffusers you first need to go to t
|
||||
Use the command below to log in:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
@@ -29,7 +29,6 @@
|
||||
You can find all the original Wan2.1 checkpoints under the [Wan-AI](https://huggingface.co/Wan-AI) organization.
|
||||
|
||||
The following Wan models are supported in Diffusers:
|
||||
|
||||
- [Wan 2.1 T2V 1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers)
|
||||
- [Wan 2.1 T2V 14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B-Diffusers)
|
||||
- [Wan 2.1 I2V 14B - 480P](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers)
|
||||
@@ -37,9 +36,6 @@ The following Wan models are supported in Diffusers:
|
||||
- [Wan 2.1 FLF2V 14B - 720P](https://huggingface.co/Wan-AI/Wan2.1-FLF2V-14B-720P-diffusers)
|
||||
- [Wan 2.1 VACE 1.3B](https://huggingface.co/Wan-AI/Wan2.1-VACE-1.3B-diffusers)
|
||||
- [Wan 2.1 VACE 14B](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B-diffusers)
|
||||
- [Wan 2.2 T2V 14B](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers)
|
||||
- [Wan 2.2 I2V 14B](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers)
|
||||
- [Wan 2.2 TI2V 5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers)
|
||||
|
||||
> [!TIP]
|
||||
> Click on the Wan2.1 models in the right sidebar for more examples of video generation.
|
||||
@@ -331,8 +327,6 @@ The general rule of thumb to keep in mind when preparing inputs for the VACE pip
|
||||
|
||||
- Try lower `shift` values (`2.0` to `5.0`) for lower resolution videos and higher `shift` values (`7.0` to `12.0`) for higher resolution images.
|
||||
|
||||
- Wan 2.1 and 2.2 support using [LightX2V LoRAs](https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Lightx2v) to speed up inference. Using them on Wan 2.2 is slightly more involed. Refer to [this code snippet](https://github.com/huggingface/diffusers/pull/12040#issuecomment-3144185272) to learn more.
|
||||
|
||||
## WanPipeline
|
||||
|
||||
[[autodoc]] WanPipeline
|
||||
|
||||
@@ -27,19 +27,19 @@ Learn how to quantize models in the [Quantization](../quantization/overview) gui
|
||||
|
||||
## BitsAndBytesConfig
|
||||
|
||||
[[autodoc]] quantizers.quantization_config.BitsAndBytesConfig
|
||||
[[autodoc]] BitsAndBytesConfig
|
||||
|
||||
## GGUFQuantizationConfig
|
||||
|
||||
[[autodoc]] quantizers.quantization_config.GGUFQuantizationConfig
|
||||
[[autodoc]] GGUFQuantizationConfig
|
||||
|
||||
## QuantoConfig
|
||||
|
||||
[[autodoc]] quantizers.quantization_config.QuantoConfig
|
||||
[[autodoc]] QuantoConfig
|
||||
|
||||
## TorchAoConfig
|
||||
|
||||
[[autodoc]] quantizers.quantization_config.TorchAoConfig
|
||||
[[autodoc]] TorchAoConfig
|
||||
|
||||
## DiffusersQuantizer
|
||||
|
||||
|
||||
+26
-13
@@ -12,24 +12,37 @@ specific language governing permissions and limitations under the License.
|
||||
|
||||
<p align="center">
|
||||
<br>
|
||||
<img src="https://raw.githubusercontent.com/huggingface/diffusers/77aadfee6a891ab9fcfb780f87c693f7a5beeb8e/docs/source/imgs/diffusers_library.jpg" width="400" style="border: none;"/>
|
||||
<img src="https://raw.githubusercontent.com/huggingface/diffusers/77aadfee6a891ab9fcfb780f87c693f7a5beeb8e/docs/source/imgs/diffusers_library.jpg" width="400"/>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
# Diffusers
|
||||
|
||||
Diffusers is a library of state-of-the-art pretrained diffusion models for generating videos, images, and audio.
|
||||
🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or want to train your own diffusion model, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](conceptual/philosophy#usability-over-performance), [simple over easy](conceptual/philosophy#simple-over-easy), and [customizability over abstractions](conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
|
||||
|
||||
The library revolves around the [`DiffusionPipeline`], an API designed for:
|
||||
The library has three main components:
|
||||
|
||||
- easy inference with only a few lines of code
|
||||
- flexibility to mix-and-match pipeline components (models, schedulers)
|
||||
- loading and using adapters like LoRA
|
||||
- State-of-the-art diffusion pipelines for inference with just a few lines of code. There are many pipelines in 🤗 Diffusers, check out the table in the pipeline [overview](api/pipelines/overview) for a complete list of available pipelines and the task they solve.
|
||||
- Interchangeable [noise schedulers](api/schedulers/overview) for balancing trade-offs between generation speed and quality.
|
||||
- Pretrained [models](api/models) that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems.
|
||||
|
||||
Diffusers also comes with optimizations - such as offloading and quantization - to ensure even the largest models are accessible on memory-constrained devices. If memory is not an issue, Diffusers supports torch.compile to boost inference speed.
|
||||
|
||||
Get started right away with a Diffusers model on the [Hub](https://huggingface.co/models?library=diffusers&sort=trending) today!
|
||||
|
||||
## Learn
|
||||
|
||||
If you're a beginner, we recommend starting with the [Hugging Face Diffusion Models Course](https://huggingface.co/learn/diffusion-course/unit0/1). You'll learn the theory behind diffusion models, and learn how to use the Diffusers library to generate images, fine-tune your own models, and more.
|
||||
<div class="mt-10">
|
||||
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
|
||||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/tutorial_overview"
|
||||
><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Tutorials</div>
|
||||
<p class="text-gray-700">Learn the fundamental skills you need to start generating outputs, build your own diffusion system, and train a diffusion model. We recommend starting here if you're using 🤗 Diffusers for the first time!</p>
|
||||
</a>
|
||||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./using-diffusers/loading_overview"
|
||||
><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div>
|
||||
<p class="text-gray-700">Practical guides for helping you load pipelines, models, and schedulers. You'll also learn how to use pipelines for specific tasks, control how outputs are generated, optimize for inference speed, and different training techniques.</p>
|
||||
</a>
|
||||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./conceptual/philosophy"
|
||||
><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Conceptual guides</div>
|
||||
<p class="text-gray-700">Understand why the library was designed the way it was, and learn more about the ethical guidelines and safety implementations for using the library.</p>
|
||||
</a>
|
||||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./api/models/overview"
|
||||
><div class="w-full text-center bg-gradient-to-br from-purple-400 to-purple-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Reference</div>
|
||||
<p class="text-gray-700">Technical descriptions of how 🤗 Diffusers classes and methods work.</p>
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
+102
-75
@@ -12,156 +12,183 @@ specific language governing permissions and limitations under the License.
|
||||
|
||||
# Installation
|
||||
|
||||
Diffusers is tested on Python 3.8+, PyTorch 1.4+, and Flax 0.4.1+. Follow the installation instructions for the deep learning library you're using, [PyTorch](https://pytorch.org/get-started/locally/) or [Flax](https://flax.readthedocs.io/en/latest/).
|
||||
🤗 Diffusers is tested on Python 3.8+, PyTorch 1.7.0+, and Flax. Follow the installation instructions below for the deep learning library you are using:
|
||||
|
||||
Create a [virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) for easier management of separate projects and to avoid compatibility issues between dependencies. Use [uv](https://docs.astral.sh/uv/), a Rust-based Python package and project manager, to create a virtual environment and install Diffusers.
|
||||
- [PyTorch](https://pytorch.org/get-started/locally/) installation instructions
|
||||
- [Flax](https://flax.readthedocs.io/en/latest/) installation instructions
|
||||
|
||||
## Install with pip
|
||||
|
||||
You should install 🤗 Diffusers in a [virtual environment](https://docs.python.org/3/library/venv.html).
|
||||
If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
|
||||
A virtual environment makes it easier to manage different projects and avoid compatibility issues between dependencies.
|
||||
|
||||
Create a virtual environment with Python or [uv](https://docs.astral.sh/uv/) (refer to [Installation](https://docs.astral.sh/uv/getting-started/installation/) for installation instructions), a fast Rust-based Python package and project manager.
|
||||
|
||||
<hfoptions id="install">
|
||||
<hfoption id="uv">
|
||||
|
||||
```bash
|
||||
uv venv my-env
|
||||
source my-env/bin/activate
|
||||
```
|
||||
|
||||
Install Diffusers with one of the following methods.
|
||||
|
||||
<hfoptions id="install">
|
||||
<hfoption id="pip">
|
||||
|
||||
PyTorch only supports Python 3.8 - 3.11 on Windows.
|
||||
</hfoption>
|
||||
<hfoption id="Python">
|
||||
|
||||
```bash
|
||||
uv pip install diffusers["torch"] transformers
|
||||
python -m venv my-env
|
||||
source my-env/bin/activate
|
||||
```
|
||||
|
||||
Use the command below for Flax.
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
You should also install 🤗 Transformers because 🤗 Diffusers relies on its models.
|
||||
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
||||
PyTorch only supports Python 3.8 - 3.11 on Windows. Install Diffusers with uv.
|
||||
|
||||
```bash
|
||||
uv install diffusers["torch"] transformers
|
||||
```
|
||||
|
||||
You can also install Diffusers with pip.
|
||||
|
||||
```bash
|
||||
pip install diffusers["torch"] transformers
|
||||
```
|
||||
|
||||
</pt>
|
||||
<jax>
|
||||
|
||||
Install Diffusers with uv.
|
||||
|
||||
```bash
|
||||
uv pip install diffusers["flax"] transformers
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="conda">
|
||||
You can also install Diffusers with pip.
|
||||
|
||||
```bash
|
||||
pip install diffusers["flax"] transformers
|
||||
```
|
||||
|
||||
</jax>
|
||||
</frameworkcontent>
|
||||
|
||||
## Install with conda
|
||||
|
||||
After activating your virtual environment, with `conda` (maintained by the community):
|
||||
|
||||
```bash
|
||||
conda install -c conda-forge diffusers
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="source">
|
||||
## Install from source
|
||||
|
||||
A source install installs the `main` version instead of the latest `stable` version. The `main` version is useful for staying updated with the latest changes but it may not always be stable. If you run into a problem, open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) and we will try to resolve it as soon as possible.
|
||||
Before installing 🤗 Diffusers from source, make sure you have PyTorch and 🤗 Accelerate installed.
|
||||
|
||||
Make sure [Accelerate](https://huggingface.co/docs/accelerate/index) is installed.
|
||||
To install 🤗 Accelerate:
|
||||
|
||||
```bash
|
||||
uv pip install accelerate
|
||||
pip install accelerate
|
||||
```
|
||||
|
||||
Install Diffusers from source with the command below.
|
||||
Then install 🤗 Diffusers from source:
|
||||
|
||||
```bash
|
||||
uv pip install git+https://github.com/huggingface/diffusers
|
||||
pip install git+https://github.com/huggingface/diffusers
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
This command installs the bleeding edge `main` version rather than the latest `stable` version.
|
||||
The `main` version is useful for staying up-to-date with the latest developments.
|
||||
For instance, if a bug has been fixed since the last official release but a new release hasn't been rolled out yet.
|
||||
However, this means the `main` version may not always be stable.
|
||||
We strive to keep the `main` version operational, and most issues are usually resolved within a few hours or a day.
|
||||
If you run into a problem, please open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) so we can fix it even sooner!
|
||||
|
||||
## Editable install
|
||||
|
||||
An editable install is recommended for development workflows or if you're using the `main` version of the source code. A special link is created between the cloned repository and the Python library paths. This avoids reinstalling a package after every change.
|
||||
You will need an editable install if you'd like to:
|
||||
|
||||
Clone the repository and install Diffusers with the following commands.
|
||||
* Use the `main` version of the source code.
|
||||
* Contribute to 🤗 Diffusers and need to test changes in the code.
|
||||
|
||||
<hfoptions id="editable">
|
||||
<hfoption id="PyTorch">
|
||||
Clone the repository and install 🤗 Diffusers with the following commands:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/huggingface/diffusers.git
|
||||
cd diffusers
|
||||
uv pip install -e ".[torch]"
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="Flax">
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
```bash
|
||||
git clone https://github.com/huggingface/diffusers.git
|
||||
cd diffusers
|
||||
uv pip install -e ".[flax]"
|
||||
pip install -e ".[torch]"
|
||||
```
|
||||
</pt>
|
||||
<jax>
|
||||
```bash
|
||||
pip install -e ".[flax]"
|
||||
```
|
||||
</jax>
|
||||
</frameworkcontent>
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
These commands will link the folder you cloned the repository to and your Python library paths.
|
||||
Python will now look inside the folder you cloned to in addition to the normal library paths.
|
||||
For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.10/site-packages/`, Python will also search the `~/diffusers/` folder you cloned to.
|
||||
|
||||
> [!WARNING]
|
||||
> You must keep the `diffusers` folder if you want to keep using the library with the editable install.
|
||||
<Tip warning={true}>
|
||||
|
||||
Update your cloned repository to the latest version of Diffusers with the command below.
|
||||
You must keep the `diffusers` folder if you want to keep using the library.
|
||||
|
||||
</Tip>
|
||||
|
||||
Now you can easily update your clone to the latest version of 🤗 Diffusers with the following command:
|
||||
|
||||
```bash
|
||||
cd ~/diffusers/
|
||||
git pull
|
||||
```
|
||||
|
||||
Your Python environment will find the `main` version of 🤗 Diffusers on the next run.
|
||||
|
||||
## Cache
|
||||
|
||||
Model weights and files are downloaded from the Hub to a cache, which is usually your home directory. Change the cache location with the [HF_HOME](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#hfhome) or [HF_HUB_CACHE](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#hfhubcache) environment variables or configuring the `cache_dir` parameter in methods like [`~DiffusionPipeline.from_pretrained`].
|
||||
Model weights and files are downloaded from the Hub to a cache which is usually your home directory. You can change the cache location by specifying the `HF_HOME` or `HUGGINFACE_HUB_CACHE` environment variables or configuring the `cache_dir` parameter in methods like [`~DiffusionPipeline.from_pretrained`].
|
||||
|
||||
<hfoptions id="cache">
|
||||
<hfoption id="env variable">
|
||||
|
||||
```bash
|
||||
export HF_HOME="/path/to/your/cache"
|
||||
export HF_HUB_CACHE="/path/to/your/hub/cache"
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="from_pretrained">
|
||||
|
||||
```py
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
pipeline = DiffusionPipeline.from_pretrained(
|
||||
"black-forest-labs/FLUX.1-dev",
|
||||
cache_dir="/path/to/your/cache"
|
||||
)
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
Cached files allow you to use Diffusers offline. Set the [HF_HUB_OFFLINE](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#hfhuboffline) environment variable to `1` to prevent Diffusers from connecting to the internet.
|
||||
Cached files allow you to run 🤗 Diffusers offline. To prevent 🤗 Diffusers from connecting to the internet, set the `HF_HUB_OFFLINE` environment variable to `1` and 🤗 Diffusers will only load previously downloaded files in the cache.
|
||||
|
||||
```shell
|
||||
export HF_HUB_OFFLINE=1
|
||||
```
|
||||
|
||||
For more details about managing and cleaning the cache, take a look at the [Understand caching](https://huggingface.co/docs/huggingface_hub/guides/manage-cache) guide.
|
||||
For more details about managing and cleaning the cache, take a look at the [caching](https://huggingface.co/docs/huggingface_hub/guides/manage-cache) guide.
|
||||
|
||||
## Telemetry logging
|
||||
|
||||
Diffusers gathers telemetry information during [`~DiffusionPipeline.from_pretrained`] requests.
|
||||
The data gathered includes the Diffusers and PyTorch/Flax version, the requested model or pipeline class,
|
||||
and the path to a pretrained checkpoint if it is hosted on the Hub.
|
||||
|
||||
Our library gathers telemetry information during [`~DiffusionPipeline.from_pretrained`] requests.
|
||||
The data gathered includes the version of 🤗 Diffusers and PyTorch/Flax, the requested model or pipeline class,
|
||||
and the path to a pretrained checkpoint if it is hosted on the Hugging Face Hub.
|
||||
This usage data helps us debug issues and prioritize new features.
|
||||
Telemetry is only sent when loading models and pipelines from the Hub,
|
||||
and it is not collected if you're loading local files.
|
||||
|
||||
Opt-out and disable telemetry collection with the [HF_HUB_DISABLE_TELEMETRY](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#hfhubdisabletelemetry) environment variable.
|
||||
We understand that not everyone wants to share additional information,and we respect your privacy.
|
||||
You can disable telemetry collection by setting the `HF_HUB_DISABLE_TELEMETRY` environment variable from your terminal:
|
||||
|
||||
<hfoptions id="telemetry">
|
||||
<hfoption id="Linux/macOS">
|
||||
On Linux/MacOS:
|
||||
|
||||
```bash
|
||||
export HF_HUB_DISABLE_TELEMETRY=1
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
<hfoption id="Windows">
|
||||
On Windows:
|
||||
|
||||
```bash
|
||||
set HF_HUB_DISABLE_TELEMETRY=1
|
||||
```
|
||||
|
||||
</hfoption>
|
||||
</hfoptions>
|
||||
|
||||
@@ -1,316 +0,0 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# AutoPipelineBlocks
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
|
||||
|
||||
</Tip>
|
||||
|
||||
`AutoPipelineBlocks` is a subclass of `ModularPipelineBlocks`. It is a multi-block that automatically selects which sub-blocks to run based on the inputs provided at runtime, creating conditional workflows that adapt to different scenarios. The main purpose is convenience and portability - for developers, you can package everything into one workflow, making it easier to share and use.
|
||||
|
||||
In this tutorial, we will show you how to create an `AutoPipelineBlocks` and learn more about how the conditional selection works.
|
||||
|
||||
<Tip>
|
||||
|
||||
Other types of multi-blocks include [SequentialPipelineBlocks](sequential_pipeline_blocks.md) (for linear workflows) and [LoopSequentialPipelineBlocks](loop_sequential_pipeline_blocks.md) (for iterative workflows). For information on creating individual blocks, see the [PipelineBlock guide](pipeline_block.md).
|
||||
|
||||
Additionally, like all `ModularPipelineBlocks`, `AutoPipelineBlocks` are definitions/specifications, not runnable pipelines. You need to convert them into a `ModularPipeline` to actually execute them. For information on creating and running pipelines, see the [Modular Pipeline guide](modular_pipeline.md).
|
||||
|
||||
</Tip>
|
||||
|
||||
For example, you might want to support text-to-image and image-to-image tasks. Instead of creating two separate pipelines, you can create an `AutoPipelineBlocks` that automatically chooses the workflow based on whether an `image` input is provided.
|
||||
|
||||
Let's see an example. We'll use the helper function from the [PipelineBlock guide](./pipeline_block.md) to create our blocks:
|
||||
|
||||
**Helper Function**
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import PipelineBlock, InputParam, OutputParam
|
||||
import torch
|
||||
|
||||
def make_block(inputs=[], intermediate_inputs=[], intermediate_outputs=[], block_fn=None, description=None):
|
||||
class TestBlock(PipelineBlock):
|
||||
model_name = "test"
|
||||
|
||||
@property
|
||||
def inputs(self):
|
||||
return inputs
|
||||
|
||||
@property
|
||||
def intermediate_inputs(self):
|
||||
return intermediate_inputs
|
||||
|
||||
@property
|
||||
def intermediate_outputs(self):
|
||||
return intermediate_outputs
|
||||
|
||||
@property
|
||||
def description(self):
|
||||
return description if description is not None else ""
|
||||
|
||||
def __call__(self, components, state):
|
||||
block_state = self.get_block_state(state)
|
||||
if block_fn is not None:
|
||||
block_state = block_fn(block_state, state)
|
||||
self.set_block_state(state, block_state)
|
||||
return components, state
|
||||
|
||||
return TestBlock
|
||||
```
|
||||
|
||||
Now let's create a dummy `AutoPipelineBlocks` that includes dummy text-to-image, image-to-image, and inpaint pipelines.
|
||||
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import AutoPipelineBlocks
|
||||
|
||||
# These are dummy blocks and we only focus on "inputs" for our purpose
|
||||
inputs = [InputParam(name="prompt")]
|
||||
# block_fn prints out which workflow is running so we can see the execution order at runtime
|
||||
block_fn = lambda x, y: print("running the text-to-image workflow")
|
||||
block_t2i_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a text-to-image workflow!")
|
||||
|
||||
inputs = [InputParam(name="prompt"), InputParam(name="image")]
|
||||
block_fn = lambda x, y: print("running the image-to-image workflow")
|
||||
block_i2i_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a image-to-image workflow!")
|
||||
|
||||
inputs = [InputParam(name="prompt"), InputParam(name="image"), InputParam(name="mask")]
|
||||
block_fn = lambda x, y: print("running the inpaint workflow")
|
||||
block_inpaint_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a inpaint workflow!")
|
||||
|
||||
class AutoImageBlocks(AutoPipelineBlocks):
|
||||
# List of sub-block classes to choose from
|
||||
block_classes = [block_inpaint_cls, block_i2i_cls, block_t2i_cls]
|
||||
# Names for each block in the same order
|
||||
block_names = ["inpaint", "img2img", "text2img"]
|
||||
# Trigger inputs that determine which block to run
|
||||
# - "mask" triggers inpaint workflow
|
||||
# - "image" triggers img2img workflow (but only if mask is not provided)
|
||||
# - if none of above, runs the text2img workflow (default)
|
||||
block_trigger_inputs = ["mask", "image", None]
|
||||
# Description is extremely important for AutoPipelineBlocks
|
||||
@property
|
||||
def description(self):
|
||||
return (
|
||||
"Pipeline generates images given different types of conditions!\n"
|
||||
+ "This is an auto pipeline block that works for text2img, img2img and inpainting tasks.\n"
|
||||
+ " - inpaint workflow is run when `mask` is provided.\n"
|
||||
+ " - img2img workflow is run when `image` is provided (but only when `mask` is not provided).\n"
|
||||
+ " - text2img workflow is run when neither `image` nor `mask` is provided.\n"
|
||||
)
|
||||
|
||||
# Create the blocks
|
||||
auto_blocks = AutoImageBlocks()
|
||||
# convert to pipeline
|
||||
auto_pipeline = auto_blocks.init_pipeline()
|
||||
```
|
||||
|
||||
Now we have created an `AutoPipelineBlocks` that contains 3 sub-blocks. Notice the warning message at the top - this automatically appears in every `ModularPipelineBlocks` that contains `AutoPipelineBlocks` to remind end users that dynamic block selection happens at runtime.
|
||||
|
||||
```py
|
||||
AutoImageBlocks(
|
||||
Class: AutoPipelineBlocks
|
||||
|
||||
====================================================================================================
|
||||
This pipeline contains blocks that are selected at runtime based on inputs.
|
||||
Trigger Inputs: ['mask', 'image']
|
||||
====================================================================================================
|
||||
|
||||
|
||||
Description: Pipeline generates images given different types of conditions!
|
||||
This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
|
||||
- inpaint workflow is run when `mask` is provided.
|
||||
- img2img workflow is run when `image` is provided (but only when `mask` is not provided).
|
||||
- text2img workflow is run when neither `image` nor `mask` is provided.
|
||||
|
||||
|
||||
|
||||
Sub-Blocks:
|
||||
• inpaint [trigger: mask] (TestBlock)
|
||||
Description: I'm a inpaint workflow!
|
||||
|
||||
• img2img [trigger: image] (TestBlock)
|
||||
Description: I'm a image-to-image workflow!
|
||||
|
||||
• text2img [default] (TestBlock)
|
||||
Description: I'm a text-to-image workflow!
|
||||
|
||||
)
|
||||
```
|
||||
|
||||
Check out the documentation with `print(auto_pipeline.doc)`:
|
||||
|
||||
```py
|
||||
>>> print(auto_pipeline.doc)
|
||||
class AutoImageBlocks
|
||||
|
||||
Pipeline generates images given different types of conditions!
|
||||
This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
|
||||
- inpaint workflow is run when `mask` is provided.
|
||||
- img2img workflow is run when `image` is provided (but only when `mask` is not provided).
|
||||
- text2img workflow is run when neither `image` nor `mask` is provided.
|
||||
|
||||
Inputs:
|
||||
|
||||
prompt (`None`, *optional*):
|
||||
|
||||
image (`None`, *optional*):
|
||||
|
||||
mask (`None`, *optional*):
|
||||
```
|
||||
|
||||
There is a fundamental trade-off of AutoPipelineBlocks: it trades clarity for convenience. While it is really easy for packaging multiple workflows, it can become confusing without proper documentation. e.g. if we just throw a pipeline at you and tell you that it contains 3 sub-blocks and takes 3 inputs `prompt`, `image` and `mask`, and ask you to run an image-to-image workflow: if you don't have any prior knowledge on how these pipelines work, you would be pretty clueless, right?
|
||||
|
||||
This pipeline we just made though, has a docstring that shows all available inputs and workflows and explains how to use each with different inputs. So it's really helpful for users. For example, it's clear that you need to pass `image` to run img2img. This is why the description field is absolutely critical for AutoPipelineBlocks. We highly recommend you to explain the conditional logic very well for each `AutoPipelineBlocks` you would make. We also recommend to always test individual pipelines first before packaging them into AutoPipelineBlocks.
|
||||
|
||||
Let's run this auto pipeline with different inputs to see if the conditional logic works as described. Remember that we have added `print` in each `PipelineBlock`'s `__call__` method to print out its workflow name, so it should be easy to tell which one is running:
|
||||
|
||||
```py
|
||||
>>> _ = auto_pipeline(image="image", mask="mask")
|
||||
running the inpaint workflow
|
||||
>>> _ = auto_pipeline(image="image")
|
||||
running the image-to-image workflow
|
||||
>>> _ = auto_pipeline(prompt="prompt")
|
||||
running the text-to-image workflow
|
||||
>>> _ = auto_pipeline(image="prompt", mask="mask")
|
||||
running the inpaint workflow
|
||||
```
|
||||
|
||||
However, even with documentation, it can become very confusing when AutoPipelineBlocks are combined with other blocks. The complexity grows quickly when you have nested AutoPipelineBlocks or use them as sub-blocks in larger pipelines.
|
||||
|
||||
Let's make another `AutoPipelineBlocks` - this one only contains one block, and it does not include `None` in its `block_trigger_inputs` (which corresponds to the default block to run when none of the trigger inputs are provided). This means this block will be skipped if the trigger input (`ip_adapter_image`) is not provided at runtime.
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import SequentialPipelineBlocks, InsertableDict
|
||||
inputs = [InputParam(name="ip_adapter_image")]
|
||||
block_fn = lambda x, y: print("running the ip-adapter workflow")
|
||||
block_ipa_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a IP-adapter workflow!")
|
||||
|
||||
class AutoIPAdapter(AutoPipelineBlocks):
|
||||
block_classes = [block_ipa_cls]
|
||||
block_names = ["ip-adapter"]
|
||||
block_trigger_inputs = ["ip_adapter_image"]
|
||||
@property
|
||||
def description(self):
|
||||
return "Run IP Adapter step if `ip_adapter_image` is provided."
|
||||
```
|
||||
|
||||
Now let's combine these 2 auto blocks together into a `SequentialPipelineBlocks`:
|
||||
|
||||
```py
|
||||
auto_ipa_blocks = AutoIPAdapter()
|
||||
blocks_dict = InsertableDict()
|
||||
blocks_dict["ip-adapter"] = auto_ipa_blocks
|
||||
blocks_dict["image-generation"] = auto_blocks
|
||||
all_blocks = SequentialPipelineBlocks.from_blocks_dict(blocks_dict)
|
||||
pipeline = all_blocks.init_pipeline()
|
||||
```
|
||||
|
||||
Let's take a look: now things get more confusing. In this particular example, you could still try to explain the conditional logic in the `description` field here - there are only 4 possible execution paths so it's doable. However, since this is a `SequentialPipelineBlocks` that could contain many more blocks, the complexity can quickly get out of hand as the number of blocks increases.
|
||||
|
||||
```py
|
||||
>>> all_blocks
|
||||
SequentialPipelineBlocks(
|
||||
Class: ModularPipelineBlocks
|
||||
|
||||
====================================================================================================
|
||||
This pipeline contains blocks that are selected at runtime based on inputs.
|
||||
Trigger Inputs: ['image', 'mask', 'ip_adapter_image']
|
||||
Use `get_execution_blocks()` with input names to see selected blocks (e.g. `get_execution_blocks('image')`).
|
||||
====================================================================================================
|
||||
|
||||
|
||||
Description:
|
||||
|
||||
|
||||
Sub-Blocks:
|
||||
[0] ip-adapter (AutoIPAdapter)
|
||||
Description: Run IP Adapter step if `ip_adapter_image` is provided.
|
||||
|
||||
|
||||
[1] image-generation (AutoImageBlocks)
|
||||
Description: Pipeline generates images given different types of conditions!
|
||||
This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
|
||||
- inpaint workflow is run when `mask` is provided.
|
||||
- img2img workflow is run when `image` is provided (but only when `mask` is not provided).
|
||||
- text2img workflow is run when neither `image` nor `mask` is provided.
|
||||
|
||||
|
||||
)
|
||||
|
||||
```
|
||||
|
||||
This is when the `get_execution_blocks()` method comes in handy - it basically extracts a `SequentialPipelineBlocks` that only contains the blocks that are actually run based on your inputs.
|
||||
|
||||
Let's try some examples:
|
||||
|
||||
`mask`: we expect it to skip the first ip-adapter since `ip_adapter_image` is not provided, and then run the inpaint for the second block.
|
||||
|
||||
```py
|
||||
>>> all_blocks.get_execution_blocks('mask')
|
||||
SequentialPipelineBlocks(
|
||||
Class: ModularPipelineBlocks
|
||||
|
||||
Description:
|
||||
|
||||
|
||||
Sub-Blocks:
|
||||
[0] image-generation (TestBlock)
|
||||
Description: I'm a inpaint workflow!
|
||||
|
||||
)
|
||||
```
|
||||
|
||||
Let's also actually run the pipeline to confirm:
|
||||
|
||||
```py
|
||||
>>> _ = pipeline(mask="mask")
|
||||
skipping auto block: AutoIPAdapter
|
||||
running the inpaint workflow
|
||||
```
|
||||
|
||||
Try a few more:
|
||||
|
||||
```py
|
||||
print(f"inputs: ip_adapter_image:")
|
||||
blocks_select = all_blocks.get_execution_blocks('ip_adapter_image')
|
||||
print(f"expected_execution_blocks: {blocks_select}")
|
||||
print(f"actual execution blocks:")
|
||||
_ = pipeline(ip_adapter_image="ip_adapter_image", prompt="prompt")
|
||||
# expect to see ip-adapter + text2img
|
||||
|
||||
print(f"inputs: image:")
|
||||
blocks_select = all_blocks.get_execution_blocks('image')
|
||||
print(f"expected_execution_blocks: {blocks_select}")
|
||||
print(f"actual execution blocks:")
|
||||
_ = pipeline(image="image", prompt="prompt")
|
||||
# expect to see img2img
|
||||
|
||||
print(f"inputs: prompt:")
|
||||
blocks_select = all_blocks.get_execution_blocks('prompt')
|
||||
print(f"expected_execution_blocks: {blocks_select}")
|
||||
print(f"actual execution blocks:")
|
||||
_ = pipeline(prompt="prompt")
|
||||
# expect to see text2img (prompt is not a trigger input so fallback to default)
|
||||
|
||||
print(f"inputs: mask + ip_adapter_image:")
|
||||
blocks_select = all_blocks.get_execution_blocks('mask','ip_adapter_image')
|
||||
print(f"expected_execution_blocks: {blocks_select}")
|
||||
print(f"actual execution blocks:")
|
||||
_ = pipeline(mask="mask", ip_adapter_image="ip_adapter_image")
|
||||
# expect to see ip-adapter + inpaint
|
||||
```
|
||||
|
||||
In summary, `AutoPipelineBlocks` is a good tool for packaging multiple workflows into a single, convenient interface and it can greatly simplify the user experience. However, always provide clear descriptions explaining the conditional logic, test individual pipelines first before combining them, and use `get_execution_blocks()` to understand runtime behavior in complex compositions.
|
||||
@@ -18,12 +18,12 @@ specific language governing permissions and limitations under the License.
|
||||
|
||||
</Tip>
|
||||
|
||||
The Components Manager is a central model registry and management system in diffusers. It lets you add models then reuse them across multiple pipelines and workflows. It tracks all models in one place with useful metadata such as model size, device placement and loaded adapters (LoRA, IP-Adapter). It has mechanisms in place to prevent duplicate model instances, enables memory-efficient sharing. Most significantly, it offers offloading that works across pipelines — unlike regular DiffusionPipeline offloading (i.e. `enable_model_cpu_offload` and `enable_sequential_cpu_offload`) which is limited to one pipeline with predefined sequences, the Components Manager automatically manages your device memory across all your models and workflows.
|
||||
The Components Manager is a central model registry and management system in diffusers. It lets you add models then reuse them across multiple pipelines and workflows. It tracks all models in one place with useful metadata such as model size, device placement and loaded adapters (LoRA, IP-Adapter). It has mechanisms in place to prevent duplicate model instances, enables memory-efficient sharing. Most significantly, it offers offloading that works across pipelines — unlike regular DiffusionPipeline offloading which is limited to one pipeline with predefined sequences, the Components Manager automatically manages your device memory across all your models and workflows.
|
||||
|
||||
|
||||
## Basic Operations
|
||||
|
||||
Let's start with the most basic operations. First, create a Components Manager:
|
||||
Let's start with the fundamental operations. First, create a Components Manager:
|
||||
|
||||
```py
|
||||
from diffusers import ComponentsManager
|
||||
@@ -144,9 +144,9 @@ Components:
|
||||
======================================================================================================================================================================================================
|
||||
Models:
|
||||
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
Name_ID | Class | Device: act(exec) | Dtype | Size (GB) | Load ID | Collection
|
||||
Name_ID | Class | Device: act(exec) | Dtype | Size (GB) | Load ID | Collection
|
||||
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
text_encoder_139918506246832 | CLIPTextModel | cpu | torch.float32 | 0.46 | stabilityai/stable-diffusion-xl-base-1.0|text_encoder|null|null | N/A
|
||||
text_encoder_139918506246832 | CLIPTextModel | cpu | torch.float32 | 0.46 | stabilityai/stable-diffusion-xl-base-1.0|text_encoder|null|null | N/A
|
||||
text_encoder_duplicated_139917580682672 | CLIPTextModel | cpu | torch.float32 | 0.46 | stabilityai/stable-diffusion-xl-base-1.0|text_encoder|null|null | N/A
|
||||
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
@@ -208,7 +208,7 @@ The `get_one()` method returns a single component and supports pattern matching
|
||||
- exclusion patterns like `comp.get_one(name="!unet")` to exclude components named "unet"
|
||||
- OR patterns like `comp.get_one(name="unet|vae")` to match either "unet" OR "vae".
|
||||
|
||||
Optionally, You can add collection and load_id as filters e.g. `comp.get_one(name="unet", collection="sdxl")`. If multiple components match, `get_one()` throws an error.
|
||||
You can also filter by collection with `comp.get_one(name="unet", collection="sdxl")` or by load_id. If multiple components match, `get_one()` throws an error.
|
||||
|
||||
Another useful method is `get_components_by_names()`, which takes a list of names and returns a dictionary mapping names to components. This is particularly helpful with modular pipelines since they provide lists of required component names, and the returned dictionary can be directly passed to `pipeline.update_components()`.
|
||||
|
||||
@@ -260,7 +260,7 @@ Now let's load all default components and then create a second pipeline that reu
|
||||
|
||||
```py
|
||||
# Load all default components
|
||||
>>> pipe.load_default_components()
|
||||
>>> pipe.load_default_components()`
|
||||
|
||||
# Create a second pipeline using the same Components Manager but with a different collection
|
||||
>>> pipe2 = ModularPipeline.from_pretrained("YiYiXu/modular-demo-auto", components_manager=comp, collection="test2")
|
||||
@@ -282,7 +282,7 @@ As mentioned earlier, `ModularPipeline` has a property `null_component_names` th
|
||||
|
||||
The warnings that follow are expected and indicate that the Components Manager is correctly identifying that these components already exist and will be reused rather than creating duplicates:
|
||||
|
||||
```out
|
||||
```
|
||||
ComponentsManager: component 'text_encoder' already exists as 'text_encoder_139917586016400'
|
||||
ComponentsManager: component 'text_encoder_2' already exists as 'text_encoder_2_139917699973424'
|
||||
ComponentsManager: component 'tokenizer' already exists as 'tokenizer_139917580599504'
|
||||
@@ -293,7 +293,7 @@ ComponentsManager: component 'vae' already exists as 'vae_139917722459040'
|
||||
ComponentsManager: component 'scheduler' already exists as 'scheduler_139916266559408'
|
||||
ComponentsManager: component 'controlnet' already exists as 'controlnet_139917722454432'
|
||||
```
|
||||
|
||||
```
|
||||
|
||||
The pipeline is now fully loaded:
|
||||
|
||||
@@ -359,9 +359,9 @@ When enabled, all models start on CPU. The manager moves models to the device ri
|
||||
|
||||
Now that we've covered the basics of the Components Manager, let's walk through a practical example that shows how to build workflows in a modular setting and use the Components Manager to reuse components across multiple pipelines. This example demonstrates the true power of Modular Diffusers by working with multiple pipelines that can share components.
|
||||
|
||||
In this example, we'll generate latents from a text-to-image pipeline, then refine them with an image-to-image pipeline.
|
||||
In this example, we'll generate latents from a text-to-image pipeline, then refine them with an image-to-image pipeline. We will also use Lora and IP-Adapter.
|
||||
|
||||
Let's create a modular text-to-image workflow by separating it into three workflows: `text_blocks` for encoding prompts, `t2i_blocks` for generating latents, and `decoder_blocks` for creating final images.
|
||||
Let's create a modular text-to-image workflow by separating it into three components: `text_blocks` for encoding prompts, `t2i_blocks` for generating latents, and `decoder_blocks` for creating final images.
|
||||
|
||||
```py
|
||||
import torch
|
||||
@@ -374,9 +374,7 @@ text_blocks = t2i_blocks.sub_blocks.pop("text_encoder")
|
||||
decoder_blocks = t2i_blocks.sub_blocks.pop("decode")
|
||||
```
|
||||
|
||||
Now we will convert them into runnalbe pipelines and set up the Components Manager with auto offloading and organize components under a "t2i" collection
|
||||
|
||||
Since we now have 3 different workflows that share components, we create a separate pipeline that serves as a dedicated loader to load all the components, register them to the component manager, and then reuse them across different workflows.
|
||||
Now we will convert them into runnalbe pipelines and set up the Components Manager with auto offloading and organize components under a "t2i" collection:
|
||||
|
||||
```py
|
||||
from diffusers import ComponentsManager, ModularPipeline
|
||||
@@ -385,21 +383,20 @@ from diffusers import ComponentsManager, ModularPipeline
|
||||
components = ComponentsManager()
|
||||
components.enable_auto_cpu_offload(device="cuda")
|
||||
|
||||
# Create a new pipeline to load the components
|
||||
# Create pipelines and load components
|
||||
t2i_repo = "YiYiXu/modular-demo-auto"
|
||||
t2i_loader_pipe = ModularPipeline.from_pretrained(t2i_repo, components_manager=components, collection="t2i")
|
||||
|
||||
# convert the 3 blocks into pipelines and attach the same components manager to all 3
|
||||
text_node = text_blocks.init_pipeline(t2i_repo, components_manager=components)
|
||||
decoder_node = decoder_blocks.init_pipeline(t2i_repo, components_manager=components)
|
||||
t2i_pipe = t2i_blocks.init_pipeline(t2i_repo, components_manager=components)
|
||||
```
|
||||
|
||||
Load all components into the loader pipeline, they should all be automatically registered to Components Manager under the "t2i" collection:
|
||||
Load all components into the Components Manager under the "t2i" collection:
|
||||
|
||||
```py
|
||||
# Load all components (including IP-Adapter and ControlNet for later use)
|
||||
t2i_loader_pipe.load_default_components(torch_dtype=torch.float16)
|
||||
t2i_loader_pipe.load_components(names=t2i_loader_pipe.pretrained_component_names, torch_dtype=torch.float16)
|
||||
```
|
||||
|
||||
Now distribute the loaded components to each pipeline:
|
||||
@@ -435,7 +432,7 @@ image.save("modular_part2_t2i.png")
|
||||
Let's add a LoRA:
|
||||
|
||||
```py
|
||||
# Load LoRA weights
|
||||
# Load LoRA weights - only the UNet gets the adapter
|
||||
>>> t2i_loader_pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy_face")
|
||||
>>> components
|
||||
Components:
|
||||
@@ -467,8 +464,7 @@ refiner_blocks = SequentialPipelineBlocks.from_blocks_dict(ALL_BLOCKS["img2img"]
|
||||
refiner_blocks.sub_blocks.pop("image_encoder")
|
||||
refiner_blocks.sub_blocks.pop("decode")
|
||||
|
||||
# Create refiner pipeline with different repo and collection,
|
||||
# Attach the same component manager to it
|
||||
# Create refiner pipeline with different repo and collection
|
||||
refiner_repo = "YiYiXu/modular_refiner"
|
||||
refiner_pipe = refiner_blocks.init_pipeline(refiner_repo, components_manager=components, collection="refiner")
|
||||
```
|
||||
|
||||
@@ -266,27 +266,27 @@ class SDXLDiffDiffLoopBeforeDenoiser(PipelineBlock):
|
||||
"Step within the denoising loop for differential diffusion that prepare the latent input for the denoiser"
|
||||
)
|
||||
|
||||
+ @property
|
||||
+ def inputs(self) -> List[Tuple[str, Any]]:
|
||||
+ return [
|
||||
+ InputParam("denoising_start"),
|
||||
+ ]
|
||||
@property
|
||||
def inputs(self) -> List[Tuple[str, Any]]:
|
||||
return [
|
||||
InputParam("denoising_start"),
|
||||
]
|
||||
|
||||
@property
|
||||
def intermediate_inputs(self) -> List[str]:
|
||||
return [
|
||||
InputParam("latents", required=True, type_hint=torch.Tensor),
|
||||
+ InputParam("original_latents", type_hint=torch.Tensor),
|
||||
+ InputParam("diffdiff_masks", type_hint=torch.Tensor),
|
||||
InputParam("original_latents", type_hint=torch.Tensor),
|
||||
InputParam("diffdiff_masks", type_hint=torch.Tensor),
|
||||
]
|
||||
|
||||
def __call__(self, components, block_state, i, t):
|
||||
+ # Apply differential diffusion logic
|
||||
+ if i == 0 and block_state.denoising_start is None:
|
||||
+ block_state.latents = block_state.original_latents[:1]
|
||||
+ else:
|
||||
+ block_state.mask = block_state.diffdiff_masks[i].unsqueeze(0).unsqueeze(1)
|
||||
+ block_state.latents = block_state.original_latents[i] * block_state.mask + block_state.latents * (1 - block_state.mask)
|
||||
# Apply differential diffusion logic
|
||||
if i == 0 and block_state.denoising_start is None:
|
||||
block_state.latents = block_state.original_latents[:1]
|
||||
else:
|
||||
block_state.mask = block_state.diffdiff_masks[i].unsqueeze(0).unsqueeze(1)
|
||||
block_state.latents = block_state.original_latents[i] * block_state.mask + block_state.latents * (1 - block_state.mask)
|
||||
|
||||
# ... rest of existing logic ...
|
||||
```
|
||||
@@ -361,9 +361,9 @@ Run the example now, you should see an apple with its right half transformed int
|
||||
|
||||
## Adding IP-adapter
|
||||
|
||||
We provide an auto IP-adapter block that you can plug-and-play into your modular workflow. It's an `AutoPipelineBlocks`, so it will only run when the user passes an IP adapter image. In this tutorial, we'll focus on how to package it into your differential diffusion workflow. To learn more about `AutoPipelineBlocks`, see [here](./auto_pipeline_blocks.md)
|
||||
We provide an auto IP-adapter block that you can plug-and-play into your modular workflow. It's an `AutoPipelineBlocks`, so it will only run when the user passes an IP adapter image. In this tutorial, we'll focus on how to package it into your differential diffusion workflow. To learn more about `AutoPipelineBlocks`, see [here](https://huggingface.co/docs/diffusers/modular_diffusers/write_own_pipeline_block#autopipelineblocks)
|
||||
|
||||
We talked about how to add IP-adapter into your workflow in the [Modular Pipeline Guide](./modular_pipeline.md). Let's just go ahead to create the IP-adapter block.
|
||||
We talked about how to add IP-adapter into your workflow in the [getting-started guide](https://huggingface.co/docs/diffusers/modular_diffusers/quicktour#ip-adapter). Let's just go ahead to create the IP-adapter block.
|
||||
|
||||
```py
|
||||
>>> from diffusers.modular_pipelines.stable_diffusion_xl.encoders import StableDiffusionXLAutoIPAdapterStep
|
||||
@@ -496,7 +496,7 @@ From looking at the code workflow: differential diffusion only modifies the "bef
|
||||
|
||||
Intuitively, these two techniques are orthogonal and should combine naturally: differential diffusion controls how much the inference process can deviate from the original in each region, while ControlNet controls in what direction that change occurs.
|
||||
|
||||
With this understanding, let's assemble the diffdiff-controlnet loop by combining the diffdiff before-denoiser step and controlnet denoiser step.
|
||||
With this understanding, let's assemble the `SDXLDiffDiffControlNetDenoiseStep`:
|
||||
|
||||
```py
|
||||
>>> class SDXLDiffDiffControlNetDenoiseStep(StableDiffusionXLDenoiseLoopWrapper):
|
||||
@@ -617,7 +617,7 @@ to use
|
||||
```
|
||||
## Creating a Modular Repo
|
||||
|
||||
You can easily share your differential diffusion workflow on the Hub by creating a modular repo. This is one created using the code we just wrote together: https://huggingface.co/YiYiXu/modular-diffdiff
|
||||
You can easily share your differential diffusion workflow on the hub, by creating a modular repo like this https://huggingface.co/YiYiXu/modular-diffdiff
|
||||
|
||||
To create a Modular Repo and share on hub, you just need to run `save_pretrained()` along with the `push_to_hub=True` flag. Note that if your pipeline contains custom block, you need to manually upload the code to the hub. But we are working on a command line tool to help you upload it very easily.
|
||||
|
||||
@@ -641,7 +641,7 @@ With a modular repo, it is very easy for the community to use the workflow you j
|
||||
>>> components.enable_auto_cpu_offload()
|
||||
```
|
||||
|
||||
see more usage example on model card.
|
||||
see more usage example on model card
|
||||
|
||||
## deploy a mellon node
|
||||
|
||||
|
||||
+127
-154
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ModularPipeline
|
||||
# Getting Started with Modular Diffusers: A Comprehensive Overview
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
@@ -18,33 +18,32 @@ specific language governing permissions and limitations under the License.
|
||||
|
||||
</Tip>
|
||||
|
||||
`ModularPipeline` is the main interface for end users to run pipelines in Modular Diffusers. It takes pipeline blocks and converts them into a runnable pipeline that can load models and execute the computation steps.
|
||||
With Modular Diffusers, we introduce a unified pipeline system that simplifies how you work with diffusion models. Instead of creating separate pipelines for each task, Modular Diffusers lets you:
|
||||
|
||||
In this guide, we will focus on how to build pipelines using the blocks we officially support at diffusers 🧨. We'll cover how to use predefined blocks and convert them into a `ModularPipeline` for execution.
|
||||
**Write Only What's New**: You won't need to write an entire pipeline from scratch every time you have a new use case. You can create pipeline blocks just for your new workflow's unique aspects and reuse existing blocks for existing functionalities.
|
||||
|
||||
<Tip>
|
||||
**Assemble Like LEGO®**: You can mix and match between blocks in flexible ways. This allows you to write dedicated blocks unique to specific workflows, and then assemble different blocks into a pipeline that can be used more conveniently for multiple workflows.
|
||||
|
||||
This guide shows you how to use predefined blocks. If you want to learn how to create your own pipeline blocks, see the [PipelineBlock guide](pipeline_block.md) for creating individual blocks, and the multi-block guides for connecting them together:
|
||||
- [SequentialPipelineBlocks](sequential_pipeline_blocks.md) (for linear workflows)
|
||||
- [LoopSequentialPipelineBlocks](loop_sequential_pipeline_blocks.md) (for iterative workflows)
|
||||
- [AutoPipelineBlocks](auto_pipeline_blocks.md) (for conditional workflows)
|
||||
In this guide, we will focus on how to build end-to-end pipelines using blocks we officially support at diffusers 🧨! We will show you how to write your own pipeline blocks and go into more details on how they work under the hood in this [guide](./write_own_pipeline_block.md). For advanced users who want to build complete workflows from scratch, we provide an end-to-end example in the [Developer Guide](./end_to_end.md) that covers everything from writing custom pipeline blocks to deploying your workflow as a UI node.
|
||||
|
||||
For information on how data flows through pipelines, see the [PipelineState and BlockState guide](modular_diffusers_states.md).
|
||||
Let's get started! The Modular Diffusers Framework consists of three main components:
|
||||
- ModularPipelineBlocks: Building blocks for your workflow, each block defines inputs/outputs and computation steps. These are just definitions and not runnable.
|
||||
- PipelineState & BlockState: Store and manage data as it flows through the pipeline.
|
||||
- ModularPipeline: Loads models and runs the computation steps. You convert blocks to pipelines to make them executable.
|
||||
|
||||
</Tip>
|
||||
## ModularPipelineBlocks
|
||||
|
||||
Pipeline blocks are the fundamental building blocks of the Modular Diffusers system. All pipeline blocks inherit from the base class `ModularPipelineBlocks`, including:
|
||||
|
||||
## Create ModularPipelineBlocks
|
||||
|
||||
In Modular Diffusers system, you build pipelines using Pipeline blocks. Pipeline Blocks are fundamental building blocks - they define what components, inputs/outputs, and computation logics are needed. They are designed to be assembled into workflows for tasks such as image generation, video creation, and inpainting. But they are just definitions and don't actually run anything. To execute blocks, you need to put them into a `ModularPipeline`. We'll first learn how to create predefined blocks here before talking about how to run them using `ModularPipeline`.
|
||||
|
||||
All pipeline blocks inherit from the base class `ModularPipelineBlocks`, including:
|
||||
|
||||
- [`PipelineBlock`]: The most granular block - you define the input/output/components requirements and computation logic.
|
||||
- [`PipelineBlock`]: The most granular block - you define the computation logic.
|
||||
- [`SequentialPipelineBlocks`]: A multi-block composed of multiple blocks that run sequentially, passing outputs as inputs to the next block.
|
||||
- [`LoopSequentialPipelineBlocks`]: A special type of `SequentialPipelineBlocks` that runs the same sequence of blocks multiple times (loops), typically used for iterative processes like denoising steps in diffusion models.
|
||||
- [`AutoPipelineBlocks`]: A multi-block composed of multiple blocks that are selected at runtime based on the inputs.
|
||||
|
||||
All blocks have a consistent interface defining their requirements (components, configs, inputs, outputs) and computation logic. They can be defined standalone or combined into larger blocks - They are designed to be assembled into workflows for tasks such as image generation, video creation, and inpainting. However, blocks aren't runnable on thier own and they need to be converted into a a ModularPipeline to actually run.
|
||||
|
||||
**Blocks vs Pipelines**: Blocks are just definitions - they define what components, inputs/outputs, and computation logics are needed, but they don't actually run anything. To execute blocks, you need to put them into a `ModularPipeline`. See the [ModularPipeline from ModularPipelineBlocks](#modularpipeline-from-modularpipelineblocks) section for how to create and run pipelines.
|
||||
|
||||
It is very easy to use a `ModularPipelineBlocks` officially supported in 🧨 Diffusers
|
||||
|
||||
```py
|
||||
@@ -75,7 +74,9 @@ StableDiffusionXLTextEncoderStep(
|
||||
)
|
||||
```
|
||||
|
||||
More commonly, you need multiple blocks to build your workflow. You can create a `SequentialPipelineBlocks` using block class presets from 🧨 Diffusers. `TEXT2IMAGE_BLOCKS` is a dict containing all the blocks needed for text-to-image generation.
|
||||
More commonly, you need multiple blocks to build your workflow. You can create a `SequentialPipelineBlocks` using block class presets from 🧨 Diffusers.
|
||||
|
||||
`TEXT2IMAGE_BLOCKS` is a predefined dictionary containing all the blocks needed for a complete text-to-image pipeline (text encoding, denoising, decoding, etc.). We will see more details soon.
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import SequentialPipelineBlocks
|
||||
@@ -83,7 +84,7 @@ from diffusers.modular_pipelines.stable_diffusion_xl import TEXT2IMAGE_BLOCKS
|
||||
t2i_blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)
|
||||
```
|
||||
|
||||
This creates a `SequentialPipelineBlocks`. Unlike the `text_encoder_block` we saw earlier, this is a multi-block and its `sub_blocks` attribute contains a list of other blocks (text_encoder, input, set_timesteps, prepare_latents, prepare_added_con, denoise, decode). Its requirements for components, inputs, and intermediate inputs are combined from these blocks that compose it. At runtime, it executes its sub-blocks sequentially and passes the pipeline state from one block to another.
|
||||
This creates a `SequentialPipelineBlocks`, which is a multi-block composed of other blocks. Unlike single blocks (like the `text_encoder_block` we saw earlier), this multi-block has a `sub_blocks` attribute that contains the sub-blocks (text_encoder, input, set_timesteps, prepare_latents, prepare_added_con, denoise, decode). Its requirements for components, inputs, and intermediate inputs are combined from these blocks that compose it. At runtime, it executes its sub-blocks sequentially and passes the pipeline state from one block to another.
|
||||
|
||||
```py
|
||||
>>> t2i_blocks
|
||||
@@ -144,7 +145,7 @@ SequentialPipelineBlocks(
|
||||
)
|
||||
```
|
||||
|
||||
This is the block classes preset (`TEXT2IMAGE_BLOCKS`) we used: It is just a dictionary that maps names to ModularPipelineBlocks classes
|
||||
The block classes preset (`TEXT2IMAGE_BLOCKS`) we used is just a dictionary that maps names to ModularPipelineBlocks classes
|
||||
|
||||
```py
|
||||
>>> TEXT2IMAGE_BLOCKS
|
||||
@@ -178,9 +179,9 @@ Note that both the block classes preset and the `sub_blocks` attribute are `Inse
|
||||
|
||||
**Add a block:**
|
||||
```py
|
||||
# BLOCKS is dict of block classes, you need to add class to it
|
||||
# BLOCKS is a block class preset, you need to add class to it
|
||||
BLOCKS.insert("block_name", BlockClass, index)
|
||||
# sub_blocks attribute contains instance, add a block instance to the attribute
|
||||
# Add a block instance to the `sub_blocks` attribute
|
||||
t2i_blocks.sub_blocks.insert("block_name", block_instance, index)
|
||||
```
|
||||
|
||||
@@ -196,7 +197,7 @@ text_encoder_block = t2i_blocks.sub_blocks.pop("text_encoder")
|
||||
```py
|
||||
# Replace block class in preset
|
||||
BLOCKS["prepare_latents"] = CustomPrepareLatents
|
||||
# Replace in sub_blocks attribute using an block instance
|
||||
# Replace in sub_blocks attribute
|
||||
t2i_blocks.sub_blocks["prepare_latents"] = CustomPrepareLatents()
|
||||
```
|
||||
|
||||
@@ -208,9 +209,7 @@ Let's make a new block classes preset by insert IP-Adapter at index 0 (before th
|
||||
```py
|
||||
from diffusers.modular_pipelines.stable_diffusion_xl import StableDiffusionXLAutoIPAdapterStep
|
||||
CUSTOM_BLOCKS = TEXT2IMAGE_BLOCKS.copy()
|
||||
# CUSTOM_BLOCKS is now a preset including ip_adapter
|
||||
CUSTOM_BLOCKS.insert("ip_adapter", StableDiffusionXLAutoIPAdapterStep, 0)
|
||||
# create a blocks isntance from the preset
|
||||
custom_blocks = SequentialPipelineBlocks.from_blocks_dict(CUSTOM_BLOCKS)
|
||||
```
|
||||
|
||||
@@ -300,16 +299,27 @@ ALL_BLOCKS = {
|
||||
|
||||
</Tip>
|
||||
|
||||
This covers the essentials of pipeline blocks! Like we have already mentioned, **pipeline blocks are not runnable by themselves**. They are essentially **"definitions"** - they define the specifications and computational steps for a pipeline, but they do not contain any model states. To actually run them, you need to convert them into a `ModularPipeline` object.
|
||||
We will not go over how to write your own ModularPipelineBlocks but you can learn more about it [here](./write_own_pipeline_block.md).
|
||||
|
||||
This covers the essentials of pipeline blocks! You may have noticed that we haven't discussed how to load or run pipeline blocks - that's because **pipeline blocks are not runnable by themselves**. They are essentially **"definitions"** - they define the specifications and computational steps for a pipeline, but they do not contain any model states. To actually run them, you need to convert them into a `ModularPipeline` object.
|
||||
|
||||
## Modular Repo
|
||||
## PipelineState & BlockState
|
||||
|
||||
To convert blocks into a runnable pipeline, you may need a repository if your blocks contain **pretrained components** (models with checkpoints that need to be loaded from the Hub). Pipeline blocks define what components they need (like a UNet, text encoder, etc.), as well as how to create them: components can be either created using **from_pretrained** method (with checkpoints) or **from_config** (initialized from scratch with default configuration, usually stateless like a guider or scheduler).
|
||||
`PipelineState` and `BlockState` manage dataflow between pipeline blocks. `PipelineState` acts as the global state container that `ModularPipelineBlocks` operate on - each block gets a local view (`BlockState`) of the relevant variables it needs from `PipelineState`, performs its operations, and then updates `PipelineState` as needed.
|
||||
|
||||
If your pipeline contains **pretrained components**, you typically need to use a repository to provide the loading specifications and metadata.
|
||||
<Tip>
|
||||
|
||||
`ModularPipeline` works specifically with modular repositories, which offer more flexibility in component loading compared to traditional repositories. You can find an example modular repo [here](https://huggingface.co/YiYiXu/modular-diffdiff).
|
||||
You typically don't need to manually create or manage these state objects. The `ModularPipeline` automatically creates and manages them for you. However, understanding their roles is important for developing custom pipeline blocks.
|
||||
|
||||
</Tip>
|
||||
|
||||
## ModularPipeline
|
||||
|
||||
`ModularPipeline` is the main interface to create and execute pipelines in the Modular Diffusers system.
|
||||
|
||||
### Modular Repo
|
||||
|
||||
`ModularPipeline` only works with modular repositories. You can find an example modular repo [here](https://huggingface.co/YiYiXu/modular-diffdiff).
|
||||
|
||||
A `DiffusionPipeline` defines `model_index.json` to configure its components. However, repositories for Modular Diffusers work with `modular_model_index.json`. Let's walk through the differences here.
|
||||
|
||||
@@ -328,13 +338,13 @@ In `modular_model_index.json`, each component entry contains 3 elements: `(libra
|
||||
|
||||
```py
|
||||
"text_encoder": [
|
||||
null, # library of actual loaded component (same as in model_index.json)
|
||||
null, # class of actual loaded componenet (same as in model_index.json)
|
||||
null, # library (same as model_index.json)
|
||||
null, # class (same as model_index.json)
|
||||
{ # loading specs map (unique to modular_model_index.json)
|
||||
"repo": "stabilityai/stable-diffusion-xl-base-1.0", # can be a different repo
|
||||
"revision": null,
|
||||
"subfolder": "text_encoder",
|
||||
"type_hint": [ # (library, class) for the expected component
|
||||
"type_hint": [ # (library, class) for the expected component class
|
||||
"transformers",
|
||||
"CLIPTextModel"
|
||||
],
|
||||
@@ -346,61 +356,60 @@ In `modular_model_index.json`, each component entry contains 3 elements: `(libra
|
||||
Unlike standard repositories where components must be in subfolders within the same repo, modular repositories can fetch components from different repositories based on the `loading_specs_dict`. e.g. the `text_encoder` component will be fetched from the "text_encoder" folder in `stabilityai/stable-diffusion-xl-base-1.0` while other components come from different repositories.
|
||||
|
||||
|
||||
## Creating a `ModularPipeline` from `ModularPipelineBlocks`
|
||||
### Creating a `ModularPipeline` from `ModularPipelineBlocks`
|
||||
|
||||
Each `ModularPipelineBlocks` has an `init_pipeline` method that can initialize a `ModularPipeline` object based on its component and configuration specifications.
|
||||
|
||||
Let's convert our `t2i_blocks` (which we created earlier) into a runnable `ModularPipeline`. We'll use a `ComponentsManager` to handle device placement, memory management, and component reuse automatically:
|
||||
Let's convert our `t2i_blocks` (which we created earlier) into a runnable `ModularPipeline`:
|
||||
|
||||
```py
|
||||
# We already have this from earlier
|
||||
t2i_blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)
|
||||
|
||||
# Now convert it to a ModularPipeline
|
||||
from diffusers import ComponentsManager
|
||||
modular_repo_id = "YiYiXu/modular-loader-t2i-0704"
|
||||
components = ComponentsManager()
|
||||
t2i_pipeline = t2i_blocks.init_pipeline(modular_repo_id, components_manager=components)
|
||||
t2i_pipeline = t2i_blocks.init_pipeline(modular_repo_id)
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
💡 **ComponentsManager** is the model registry and management system in diffusers, it track all the models in one place and let you add, remove and reuse them across different workflows in most efficient way. Without it, you'd need to manually manage GPU memory, device placement, and component sharing between workflows. See the [Components Manager guide](components_manager.md) for detailed information.
|
||||
|
||||
</Tip>
|
||||
|
||||
The `init_pipeline()` method creates a ModularPipeline and loads component specifications from the repository's `modular_model_index.json` file, but doesn't load the actual models yet.
|
||||
|
||||
<Tip>
|
||||
|
||||
## Creating a `ModularPipeline` with `from_pretrained`
|
||||
💡 We recommend using `ModularPipeline` with Component Manager by passing a `components_manager`:
|
||||
|
||||
```py
|
||||
>>> components = ComponentsManager()
|
||||
>>> pipeline = blocks.init_pipeline(modular_repo_id, components_manager=components)
|
||||
```
|
||||
|
||||
This helps you to:
|
||||
1. Detect and manage duplicated models (warns when trying to register an existing model)
|
||||
2. Easily reuse components across different pipelines
|
||||
3. Apply offloading strategies across multiple pipelines
|
||||
|
||||
You can read more about [Components Manager](./components_manager.md)
|
||||
|
||||
</Tip>
|
||||
|
||||
|
||||
### Creating a `ModularPipeline` with `from_pretrained`
|
||||
|
||||
You can create a `ModularPipeline` from a HuggingFace Hub repository with `from_pretrained` method, as long as it's a modular repo:
|
||||
|
||||
```py
|
||||
from diffusers import ModularPipeline, ComponentsManager
|
||||
components = ComponentsManager()
|
||||
pipeline = ModularPipeline.from_pretrained("YiYiXu/modular-loader-t2i-0704", components_manager=components)
|
||||
from diffusers import ModularPipeline
|
||||
pipeline = ModularPipeline.from_pretrained( "YiYiXu/modular-loader-t2i-0704")
|
||||
```
|
||||
|
||||
Loading custom code is also supported:
|
||||
|
||||
```py
|
||||
from diffusers import ModularPipeline, ComponentsManager
|
||||
components = ComponentsManager()
|
||||
from diffusers import ModularPipeline
|
||||
modular_repo_id = "YiYiXu/modular-diffdiff-0704"
|
||||
diffdiff_pipeline = ModularPipeline.from_pretrained(modular_repo_id, trust_remote_code=True, components_manager=components)
|
||||
diffdiff_pipeline = ModularPipeline.from_pretrained(modular_repo_id, trust_remote_code=True)
|
||||
```
|
||||
|
||||
This modular repository contains custom code. The folder contains these files:
|
||||
|
||||
```
|
||||
modular-diffdiff-0704/
|
||||
├── block.py # Custom pipeline blocks implementation
|
||||
├── config.json # Pipeline configuration and auto_map
|
||||
└── modular_model_index.json # Component loading specifications
|
||||
```
|
||||
|
||||
The [`config.json`](https://huggingface.co/YiYiXu/modular-diffdiff-0704/blob/main/config.json) file defines a custom `DiffDiffBlocks` class and points to its implementation:
|
||||
This modular repository contains custom code. The [`config.json`](https://huggingface.co/YiYiXu/modular-diffdiff-0704/blob/main/config.json) file defines a custom `DiffDiffBlocks` class and points to its implementation:
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -415,7 +424,7 @@ The `auto_map` tells the pipeline where to find the custom blocks definition - i
|
||||
|
||||
When `diffdiff_pipeline.blocks` is created, it's based on the `DiffDiffBlocks` definition from the custom code in the repository, allowing you to use specialized blocks that aren't part of the standard diffusers library.
|
||||
|
||||
## Loading components into a `ModularPipeline`
|
||||
### Loading components into a `ModularPipeline`
|
||||
|
||||
Unlike `DiffusionPipeline`, when you create a `ModularPipeline` instance (whether using `from_pretrained` or converting from pipeline blocks), its components aren't loaded automatically. You need to explicitly load model components using `load_default_components` or `load_components(names=..,)`:
|
||||
|
||||
@@ -542,7 +551,7 @@ StableDiffusionXLModularPipeline {
|
||||
}
|
||||
```
|
||||
|
||||
You can see all the **pretrained components** that will be loaded using `from_pretrained` method are listed as entries. Each entry contains 3 elements: `(library, class, loading_specs_dict)`:
|
||||
You can see all the components that will be loaded using `from_pretrained` method are listed as entries. Each entry contains 3 elements: `(library, class, loading_specs_dict)`:
|
||||
|
||||
- **`library` and `class`**: Show the actual loaded component info. If `null`, the component is not loaded yet.
|
||||
- **`loading_specs_dict`**: Contains all the information needed to load the component (repo, subfolder, variant, etc.)
|
||||
@@ -575,11 +584,9 @@ There are also a few properties that can provide a quick summary of component lo
|
||||
['guider', 'image_processor']
|
||||
```
|
||||
|
||||
From config components (like `guider` and `image_processor`) are not included in the pipeline output above because they don't need loading specs - they're already initialized during pipeline creation. You can see this because they're not listed in `null_component_names`.
|
||||
### Modifying Loading Specs
|
||||
|
||||
## Modifying Loading Specs
|
||||
|
||||
When you call `pipeline.load_components(names=)` or `pipeline.load_default_components()`, it uses the loading specs from the modular repository's `modular_model_index.json`. You can change where components are loaded from by modifying the `modular_model_index.json` in the repository. Just find the file on the Hub and click edit - you can change any field in the loading specs: `repo`, `subfolder`, `variant`, `revision`, etc.
|
||||
When you call `pipeline.load_components(names=)` or `pipeline.load_default_components()`, it uses the loading specs from the modular repository's `modular_model_index.json`. You can change where components are loaded from by default by modifying the `modular_model_index.json` in the repository. You can change any field in the loading specs: `repo`, `subfolder`, `variant`, `revision`, etc.
|
||||
|
||||
```py
|
||||
# Original spec in modular_model_index.json
|
||||
@@ -603,31 +610,18 @@ When you call `pipeline.load_components(names=)` or `pipeline.load_default_compo
|
||||
]
|
||||
```
|
||||
|
||||
Now if you create a pipeline using the same blocks and updated repository, it will by default load from the new repository.
|
||||
|
||||
```py
|
||||
pipeline = ModularPipeline.from_pretrained("YiYiXu/modular-loader-t2i-0704", components_manager=components)
|
||||
pipeline.load_components(names="unet")
|
||||
```
|
||||
When you call `pipeline.load_components(...)`/`pipeline.load_default_components()`, it will now load from the new repository by default.
|
||||
|
||||
|
||||
## Updating components in a `ModularPipeline`
|
||||
### Updating components in a `ModularPipeline`
|
||||
|
||||
Similar to `DiffusionPipeline`, you can load components separately to replace the default ones in the pipeline. In Modular Diffusers, the approach depends on the component type:
|
||||
|
||||
- **Pretrained components** (`default_creation_method='from_pretrained'`): Must use `ComponentSpec` to load them to update the existing one.
|
||||
- **Config components** (`default_creation_method='from_config'`): These are components that don't need loading specs - they're created during pipeline initialization with default config. To update them, you can either pass the object directly or pass a ComponentSpec directly.
|
||||
|
||||
<Tip>
|
||||
|
||||
💡 **Component Type Changes**: The component type (pretrained vs config-based) can change when you update components. These types are initially defined in pipeline blocks' `expected_components` field using `ComponentSpec` with `default_creation_method`. See the [Customizing Guidance Techniques](#customizing-guidance-techniques) section for examples of how this works in practice.
|
||||
|
||||
</Tip>
|
||||
- **Pretrained components** (`default_creation_method='from_pretrained'`): Must use `ComponentSpec` to load them, as they get tagged with a unique ID that encodes their loading parameters
|
||||
- **Config components** (`default_creation_method='from_config'`): These are components that don't need loading specs - they're created during pipeline initialization with default config. To update them, you can either pass the object directly or pass a ComponentSpec directly (which will call `create()` under the hood).
|
||||
|
||||
`ComponentSpec` defines how to create or load components and can actually create them using its `create()` method (for ConfigMixin objects) or `load()` method (wrapper around `from_pretrained()`). When a component is loaded with a ComponentSpec, it gets tagged with a unique ID that encodes its creation parameters, allowing you to always extract the original specification using `ComponentSpec.from_component()`.
|
||||
|
||||
Now let's look at how to update pretrained components in practice:
|
||||
|
||||
So instead of
|
||||
|
||||
```py
|
||||
@@ -635,7 +629,7 @@ from diffusers import UNet2DConditionModel
|
||||
import torch
|
||||
unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", variant="fp16", torch_dtype=torch.float16)
|
||||
```
|
||||
You should load your model like this
|
||||
You should do
|
||||
|
||||
```py
|
||||
from diffusers import ComponentSpec, UNet2DConditionModel
|
||||
@@ -643,15 +637,13 @@ unet_spec = ComponentSpec(name="unet",type_hint=UNet2DConditionModel, repo="stab
|
||||
unet2 = unet_spec.load(torch_dtype=torch.float16)
|
||||
```
|
||||
|
||||
The key difference is that the second unet retains its loading specs, so you can extract the spec and recreate the unet:
|
||||
The key difference is that the second unet (the one we load with `ComponentSpec`) retains its loading specs, so you can extract and recreate it:
|
||||
|
||||
```py
|
||||
# component -> spec
|
||||
# to extract spec, you can do spec.load() to recreate it
|
||||
>>> spec = ComponentSpec.from_component("unet", unet2)
|
||||
>>> spec
|
||||
ComponentSpec(name='unet', type_hint=<class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'>, description=None, config=None, repo='stabilityai/stable-diffusion-xl-base-1.0', subfolder='unet', variant='fp16', revision=None, default_creation_method='from_pretrained')
|
||||
# spec -> component
|
||||
>>> unet2_recreatd = spec.load(torch_dtype=torch.float16)
|
||||
```
|
||||
|
||||
To replace the unet in the pipeline
|
||||
@@ -660,7 +652,7 @@ To replace the unet in the pipeline
|
||||
t2i_pipeline.update_components(unet=unet2)
|
||||
```
|
||||
|
||||
Not only is the `unet` component swapped, but its loading specs are also updated from "RunDiffusion/Juggernaut-XL-v9" to "stabilityai/stable-diffusion-xl-base-1.0" in pipeline config. This means that if you save the pipeline now and load it back with `from_pretrained`, the new pipeline will by default load the SDXL original unet.
|
||||
Not only is the `unet` component swapped, but its loading specs are also updated from "RunDiffusion/Juggernaut-XL-v9" to "stabilityai/stable-diffusion-xl-base-1.0". This means that if you save the pipeline now and load it back with `from_pretrained`, the new pipeline will by default load the SDXL original unet.
|
||||
|
||||
```
|
||||
>>> t2i_pipeline
|
||||
@@ -708,7 +700,7 @@ ComponentSpec(
|
||||
|
||||
</Tip>
|
||||
|
||||
## Customizing Guidance Techniques
|
||||
### Customizing Guidance Techniques
|
||||
|
||||
Guiders are implementations of different [classifier-free guidance](https://huggingface.co/papers/2207.12598) techniques that can be applied during the denoising process to improve generation quality, control, and adherence to prompts. They work by steering the model predictions towards desired directions and away from undesired directions. In diffusers, guiders are implemented as subclasses of `BaseGuidance`. They can easily be integrated into modular pipelines and provide a flexible way to enhance generation quality without modifying the underlying diffusion models.
|
||||
|
||||
@@ -745,9 +737,6 @@ ClassifierFreeGuidance {
|
||||
To change parameters of the same guider type (e.g., adjusting the `guidance_scale` for CFG), you have two options:
|
||||
|
||||
**Option 1: Use ComponentSpec.create() method**
|
||||
|
||||
You just need to pass the parameter with the new value to override the default one.
|
||||
|
||||
```python
|
||||
>>> guider_spec = t2i_pipeline.get_component_spec("guider")
|
||||
>>> guider = guider_spec.create(guidance_scale=10)
|
||||
@@ -755,9 +744,6 @@ You just need to pass the parameter with the new value to override the default o
|
||||
```
|
||||
|
||||
**Option 2: Pass ComponentSpec directly**
|
||||
|
||||
Update the spec directly and pass it to `update_components()`.
|
||||
|
||||
```python
|
||||
>>> guider_spec = t2i_pipeline.get_component_spec("guider")
|
||||
>>> guider_spec.config["guidance_scale"] = 10
|
||||
@@ -799,6 +785,7 @@ ModularPipeline.update_components: adding guider with new type: PerturbedAttenti
|
||||
|
||||
<Tip>
|
||||
|
||||
💡 **Component Loading Methods**:
|
||||
- For `from_config` components (like guiders, schedulers): You can pass an object of required type OR pass a ComponentSpec directly (which calls `create()` under the hood)
|
||||
- For `from_pretrained` components (like models): You must use ComponentSpec to ensure proper tagging and loading
|
||||
|
||||
@@ -839,68 +826,24 @@ The component spec has also been updated to reflect the new guider type:
|
||||
|
||||
```py
|
||||
>>> t2i_pipeline.get_component_spec("guider")
|
||||
ComponentSpec(name='guider', type_hint=<class 'diffusers.guiders.perturbed_attention_guidance.PerturbedAttentionGuidance'>, description=None, config=FrozenDict([('guidance_scale', 5.0), ('perturbed_guidance_scale', 2.5), ('perturbed_guidance_start', 0.01), ('perturbed_guidance_stop', 0.2), ('perturbed_guidance_layers', None), ('perturbed_guidance_config', LayerSkipConfig(indices=[2, 9], fqn='mid_block.attentions.0.transformer_blocks', skip_attention=False, skip_attention_scores=True, skip_ff=False, dropout=1.0)), ('guidance_rescale', 0.0), ('use_original_formulation', False), ('start', 0.0), ('stop', 1.0), ('_use_default_values', ['perturbed_guidance_start', 'use_original_formulation', 'perturbed_guidance_layers', 'stop', 'start', 'guidance_rescale', 'perturbed_guidance_stop']), ('_class_name', 'PerturbedAttentionGuidance'), ('_diffusers_version', '0.35.0.dev0')]), repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_config')
|
||||
ComponentSpec(name='guider', type_hint=<class 'diffusers.guiders.perturbed_attention_guidance.PerturbedAttentionGuidance'>, description=None, config=FrozenDict([('guidance_scale', 5.0), ('perturbed_guidance_scale', 2.5), ('perturbed_guidance_start', 0.01), ('perturbed_guidance_stop', 0.2), ('perturbed_guidance_layers', None), ('perturbed_guidance_config', LayerSkipConfig(indices=[2, 9], fqn='mid_block.attentions.0.transformer_blocks', skip_attention=False, skip_attention_scores=True, skip_ff=False, dropout=1.0)), ('guidance_rescale', 0.0), ('use_original_formulation', False), ('start', 0.0), ('stop', 1.0), ('_use_default_values', ['use_original_formulation', 'perturbed_guidance_stop', 'stop', 'guidance_rescale', 'start', 'perturbed_guidance_layers', 'perturbed_guidance_start']), ('_class_name', 'PerturbedAttentionGuidance'), ('_diffusers_version', '0.35.0.dev0')]), repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_config')
|
||||
```
|
||||
|
||||
The "guider" is still a `from_config` component: is still not included in the pipeline config and will not be saved into the `modular_model_index.json`.
|
||||
However, the "guider" is still not included in the pipeline config and will not be saved into the `modular_model_index.json` since it remains a `from_config` component:
|
||||
|
||||
```py
|
||||
>>> assert "guider" not in t2i_pipeline.config
|
||||
```
|
||||
|
||||
However, you can change it to a `from_pretrained` component, which allows you to upload your customized guider to the Hub and load it into your pipeline.
|
||||
|
||||
#### Loading Custom Guiders from Hub
|
||||
|
||||
If you already have a guider saved on the Hub and a `modular_model_index.json` with the loading spec for that guider, it will automatically be changed to a `from_pretrained` component during pipeline initialization.
|
||||
|
||||
For example, this `modular_model_index.json` includes loading specs for the guider:
|
||||
|
||||
```json
|
||||
{
|
||||
"guider": [
|
||||
null,
|
||||
null,
|
||||
{
|
||||
"repo": "YiYiXu/modular-loader-t2i-guider",
|
||||
"revision": null,
|
||||
"subfolder": "pag_guider",
|
||||
"type_hint": [
|
||||
"diffusers",
|
||||
"PerturbedAttentionGuidance"
|
||||
],
|
||||
"variant": null
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
When you use this repository to create a pipeline with the same blocks (that originally configured guider as a `from_config` component), the guider becomes a `from_pretrained` component. This means it doesn't get created during initialization, and after you call `load_default_components()`, it loads based on the spec - resulting in the PAG guider instead of the default CFG.
|
||||
|
||||
```py
|
||||
t2i_pipeline = t2i_blocks.init_pipeline("YiYiXu/modular-doc-guider")
|
||||
assert t2i_pipeline.guider is None # Not created during init
|
||||
t2i_pipeline.load_default_components()
|
||||
t2i_pipeline.guider # Now loaded as PAG guider
|
||||
```
|
||||
|
||||
#### Upload Custom Guider to Hub for Easy Loading & Sharing
|
||||
|
||||
Now let's see how we can share the guider on the Hub and change it to a `from_pretrained` component.
|
||||
You can upload your customized guider to the Hub so that it can be loaded more easily:
|
||||
|
||||
```py
|
||||
guider.push_to_hub("YiYiXu/modular-loader-t2i-guider", subfolder="pag_guider")
|
||||
```
|
||||
|
||||
Voilà! Now you have a subfolder called `pag_guider` on that repository.
|
||||
|
||||
You have a few options to make this guider available in your pipeline:
|
||||
|
||||
1. **Directly modify the `modular_model_index.json`** to add a loading spec for the guider by pointing to a folder containing the desired guider config.
|
||||
|
||||
2. **Use the `update_components` method** to change it to a `from_pretrained` component for your pipeline. This is easier if you just want to try it out with different repositories.
|
||||
|
||||
Let's use the second approach and change our guider_spec to use `from_pretrained` as the default creation method and update the loading spec to use this subfolder we just created:
|
||||
Voilà! Now you have a subfolder called `pag_guider` on that repository. Let's change our guider_spec to use `from_pretrained` as the default creation method and update the loading spec to use this subfolder we just created:
|
||||
|
||||
```python
|
||||
guider_spec = t2i_pipeline.get_component_spec("guider")
|
||||
@@ -917,14 +860,44 @@ You will get a warning about changing the creation method:
|
||||
ModularPipeline.update_components: changing the default_creation_method of guider from from_config to from_pretrained.
|
||||
```
|
||||
|
||||
Now not only the `guider` component and its component_spec are updated, but so is the pipeline config.
|
||||
|
||||
If you want to change the default behavior for future pipelines, you can push the updated pipeline to the Hub. This way, when others use your repository, they'll get the PAG guider by default. However, this is optional - you don't have to do this if you just want to experiment locally.
|
||||
Now not only the `guider` component and its component_spec are updated, but so is the pipeline config. Let's push it to a new repository:
|
||||
|
||||
```py
|
||||
t2i_pipeline.push_to_hub("YiYiXu/modular-doc-guider")
|
||||
```
|
||||
|
||||
If you check the `modular_model_index.json`, you'll see the guider is now included:
|
||||
|
||||
```json
|
||||
{
|
||||
"guider": [
|
||||
"diffusers",
|
||||
"PerturbedAttentionGuidance",
|
||||
{
|
||||
"repo": "YiYiXu/modular-loader-t2i-guider",
|
||||
"revision": null,
|
||||
"subfolder": "pag_guider",
|
||||
"type_hint": [
|
||||
"diffusers",
|
||||
"PerturbedAttentionGuidance"
|
||||
],
|
||||
"variant": null
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Now when you create the pipeline from that repo directly, the `guider` is not automatically loaded anymore (since it's now a `from_pretrained` component), but when you run `load_default_components()`, the PAG guider will be loaded by default:
|
||||
|
||||
```py
|
||||
t2i_pipeline = t2i_blocks.init_pipeline("YiYiXu/modular-doc-guider")
|
||||
assert t2i_pipeline.guider is None
|
||||
t2i_pipeline.load_default_components()
|
||||
t2i_pipeline.guider
|
||||
```
|
||||
|
||||
Of course, you can also directly modify the `modular_model_index.json` to add a loading spec for the guider by pointing to a folder containing the desired guider config.
|
||||
|
||||
|
||||
<Tip>
|
||||
|
||||
@@ -934,7 +907,7 @@ Additionally, you can write your own guider implementations, for example, CFG Ze
|
||||
|
||||
</Tip>
|
||||
|
||||
## Running a `ModularPipeline`
|
||||
### Running a `ModularPipeline`
|
||||
|
||||
The API to run the `ModularPipeline` is very similar to how you would run a regular `DiffusionPipeline`:
|
||||
|
||||
@@ -953,14 +926,14 @@ Under the hood, `ModularPipeline`'s `__call__` method is a wrapper around the pi
|
||||
|
||||
You can inspect the docstring of a `ModularPipeline` to check what arguments the pipeline accepts and how to specify the `output` you want. It will list all available outputs (basically everything in the intermediate pipeline state) so you can choose from the list.
|
||||
|
||||
```py
|
||||
t2i_pipeline.doc
|
||||
```
|
||||
|
||||
**Important**: It is important to always check the docstring because arguments can be different from standard pipelines that you're familar with. For example, in Modular Diffusers we standardized controlnet image input as `control_image`, but regular pipelines have inconsistencies over the names, e.g. controlnet text-to-image uses `image` while SDXL controlnet img2img uses `control_image`.
|
||||
|
||||
**Note**: The `output` list might be longer than you expected - it includes everything in the intermediate state that you can choose to return. Most of the time, you'll just want `output="images"` or `output="latents"`.
|
||||
|
||||
```py
|
||||
t2i_pipeline.doc
|
||||
```
|
||||
|
||||
</Tip>
|
||||
|
||||
#### Text-to-Image, Image-to-Image, and Inpainting
|
||||
@@ -1099,7 +1072,7 @@ StableDiffusionXLAutoControlnetStep(
|
||||
|
||||
<Tip>
|
||||
|
||||
💡 **Auto Blocks**: This is first time we meet a Auto Blocks! `AutoPipelineBlocks` automatically adapt to your inputs by combining multiple workflows with conditional logic. This is why one convenient block can work for all tasks and controlnet types. See the [Auto Blocks Guide](./auto_pipeline_blocks.md) for more details.
|
||||
💡 **Auto Blocks**: This is first time we meet a Auto Blocks! `AutoPipelineBlocks` automatically adapt to your inputs by combining multiple workflows with conditional logic. This is why one convenient block can work for all tasks and controlnet types. See the [Auto Blocks Guide](https://huggingface.co/docs/diffusers/modular_diffusers/write_own_pipeline_block#autopipelineblocks) for more details.
|
||||
|
||||
</Tip>
|
||||
|
||||
@@ -1,194 +0,0 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# LoopSequentialPipelineBlocks
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
|
||||
|
||||
</Tip>
|
||||
|
||||
`LoopSequentialPipelineBlocks` is a subclass of `ModularPipelineBlocks`. It is a multi-block that composes other blocks together in a loop, creating iterative workflows where blocks run multiple times with evolving state. It's particularly useful for denoising loops requiring repeated execution of the same blocks.
|
||||
|
||||
<Tip>
|
||||
|
||||
Other types of multi-blocks include [SequentialPipelineBlocks](./sequential_pipeline_blocks.md) (for linear workflows) and [AutoPipelineBlocks](./auto_pipeline_blocks.md) (for conditional block selection). For information on creating individual blocks, see the [PipelineBlock guide](./pipeline_block.md).
|
||||
|
||||
Additionally, like all `ModularPipelineBlocks`, `LoopSequentialPipelineBlocks` are definitions/specifications, not runnable pipelines. You need to convert them into a `ModularPipeline` to actually execute them. For information on creating and running pipelines, see the [Modular Pipeline guide](modular_pipeline.md).
|
||||
|
||||
</Tip>
|
||||
|
||||
You could create a loop using `PipelineBlock` like this:
|
||||
|
||||
```python
|
||||
class DenoiseLoop(PipelineBlock):
|
||||
def __call__(self, components, state):
|
||||
block_state = self.get_block_state(state)
|
||||
for t in range(block_state.num_inference_steps):
|
||||
# ... loop logic here
|
||||
pass
|
||||
self.set_block_state(state, block_state)
|
||||
return components, state
|
||||
```
|
||||
|
||||
But in this tutorial, we will focus on how to use `LoopSequentialPipelineBlocks` to create a "composable" denoising loop where you can add or remove blocks within the loop or reuse the same loop structure with different block combinations.
|
||||
|
||||
It involves two parts: a **loop wrapper** and **loop blocks**
|
||||
|
||||
* The **loop wrapper** (`LoopSequentialPipelineBlocks`) defines the loop structure, e.g. it defines the iteration variables, and loop configurations such as progress bar.
|
||||
|
||||
* The **loop blocks** are basically standard pipeline blocks you add to the loop wrapper.
|
||||
- they run sequentially for each iteration of the loop
|
||||
- they receive the current iteration index as an additional parameter
|
||||
- they share the same block_state throughout the entire loop
|
||||
|
||||
Unlike regular `SequentialPipelineBlocks` where each block gets its own state, loop blocks share a single state that persists and evolves across iterations.
|
||||
|
||||
We will build a simple loop block to demonstrate these concepts. Creating a loop block involves three steps:
|
||||
1. defining the loop wrapper class
|
||||
2. creating the loop blocks
|
||||
3. adding the loop blocks to the loop wrapper class to create the loop wrapper instance
|
||||
|
||||
**Step 1: Define the Loop Wrapper**
|
||||
|
||||
To create a `LoopSequentialPipelineBlocks` class, you need to define:
|
||||
|
||||
* `loop_inputs`: User input variables (equivalent to `PipelineBlock.inputs`)
|
||||
* `loop_intermediate_inputs`: Intermediate variables needed from the mutable pipeline state (equivalent to `PipelineBlock.intermediates_inputs`)
|
||||
* `loop_intermediate_outputs`: New intermediate variables this block will add to the mutable pipeline state (equivalent to `PipelineBlock.intermediates_outputs`)
|
||||
* `__call__` method: Defines the loop structure and iteration logic
|
||||
|
||||
Here is an example of a loop wrapper:
|
||||
|
||||
```py
|
||||
import torch
|
||||
from diffusers.modular_pipelines import LoopSequentialPipelineBlocks, PipelineBlock, InputParam, OutputParam
|
||||
|
||||
class LoopWrapper(LoopSequentialPipelineBlocks):
|
||||
model_name = "test"
|
||||
@property
|
||||
def description(self):
|
||||
return "I'm a loop!!"
|
||||
@property
|
||||
def loop_inputs(self):
|
||||
return [InputParam(name="num_steps")]
|
||||
@torch.no_grad()
|
||||
def __call__(self, components, state):
|
||||
block_state = self.get_block_state(state)
|
||||
# Loop structure - can be customized to your needs
|
||||
for i in range(block_state.num_steps):
|
||||
# loop_step executes all registered blocks in sequence
|
||||
components, block_state = self.loop_step(components, block_state, i=i)
|
||||
self.set_block_state(state, block_state)
|
||||
return components, state
|
||||
```
|
||||
|
||||
**Step 2: Create Loop Blocks**
|
||||
|
||||
Loop blocks are standard `PipelineBlock`s, but their `__call__` method works differently:
|
||||
* It receives the iteration variable (e.g., `i`) passed by the loop wrapper
|
||||
* It works directly with `block_state` instead of pipeline state
|
||||
* No need to call `self.get_block_state()` or `self.set_block_state()`
|
||||
|
||||
```py
|
||||
class LoopBlock(PipelineBlock):
|
||||
# this is used to identify the model family, we won't worry about it in this example
|
||||
model_name = "test"
|
||||
@property
|
||||
def inputs(self):
|
||||
return [InputParam(name="x")]
|
||||
@property
|
||||
def intermediate_outputs(self):
|
||||
# outputs produced by this block
|
||||
return [OutputParam(name="x")]
|
||||
@property
|
||||
def description(self):
|
||||
return "I'm a block used inside the `LoopWrapper` class"
|
||||
def __call__(self, components, block_state, i: int):
|
||||
block_state.x += 1
|
||||
return components, block_state
|
||||
```
|
||||
|
||||
**Step 3: Combine Everything**
|
||||
|
||||
Finally, assemble your loop by adding the block(s) to the wrapper:
|
||||
|
||||
```py
|
||||
loop = LoopWrapper.from_blocks_dict({"block1": LoopBlock})
|
||||
```
|
||||
|
||||
Now you've created a loop with one step:
|
||||
|
||||
```py
|
||||
>>> loop
|
||||
LoopWrapper(
|
||||
Class: LoopSequentialPipelineBlocks
|
||||
|
||||
Description: I'm a loop!!
|
||||
|
||||
Sub-Blocks:
|
||||
[0] block1 (LoopBlock)
|
||||
Description: I'm a block used inside the `LoopWrapper` class
|
||||
|
||||
)
|
||||
```
|
||||
|
||||
It has two inputs: `x` (used at each step within the loop) and `num_steps` used to define the loop.
|
||||
|
||||
```py
|
||||
>>> print(loop.doc)
|
||||
class LoopWrapper
|
||||
|
||||
I'm a loop!!
|
||||
|
||||
Inputs:
|
||||
|
||||
x (`None`, *optional*):
|
||||
|
||||
num_steps (`None`, *optional*):
|
||||
|
||||
Outputs:
|
||||
|
||||
x (`None`):
|
||||
```
|
||||
|
||||
**Running the Loop:**
|
||||
|
||||
```py
|
||||
# run the loop
|
||||
loop_pipeline = loop.init_pipeline()
|
||||
x = loop_pipeline(num_steps=10, x=0, output="x")
|
||||
assert x == 10
|
||||
```
|
||||
|
||||
**Adding Multiple Blocks:**
|
||||
|
||||
We can add multiple blocks to run within each iteration. Let's run the loop block twice within each iteration:
|
||||
|
||||
```py
|
||||
loop = LoopWrapper.from_blocks_dict({"block1": LoopBlock(), "block2": LoopBlock})
|
||||
loop_pipeline = loop.init_pipeline()
|
||||
x = loop_pipeline(num_steps=10, x=0, output="x")
|
||||
assert x == 20 # Each iteration runs 2 blocks, so 10 iterations * 2 = 20
|
||||
```
|
||||
|
||||
**Key Differences from SequentialPipelineBlocks:**
|
||||
|
||||
The main difference is that loop blocks share the same `block_state` across all iterations, allowing values to accumulate and evolve throughout the loop. Loop blocks could receive additional arguments (like the current iteration index) depending on the loop wrapper's implementation, since the wrapper defines how loop blocks are called. You can easily add, remove, or reorder blocks within the loop without changing the loop logic itself.
|
||||
|
||||
The officially supported denoising loops in Modular Diffusers are implemented using `LoopSequentialPipelineBlocks`. You can explore the actual implementation to see how these concepts work in practice:
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines.stable_diffusion_xl.denoise import StableDiffusionXLDenoiseStep
|
||||
StableDiffusionXLDenoiseStep()
|
||||
```
|
||||
@@ -1,59 +0,0 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# PipelineState and BlockState
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
|
||||
|
||||
</Tip>
|
||||
|
||||
In Modular Diffusers, `PipelineState` and `BlockState` are the core data structures that enable blocks to communicate and share data. The concept is fundamental to understand how blocks interact with each other and the pipeline system.
|
||||
|
||||
In the modular diffusers system, `PipelineState` acts as the global state container that all pipeline blocks operate on. It maintains the complete runtime state of the pipeline and provides a structured way for blocks to read from and write to shared data.
|
||||
|
||||
A `PipelineState` consists of two distinct states:
|
||||
|
||||
- **The immutable state** (i.e. the `inputs` dict) contains a copy of values provided by users. Once a value is added to the immutable state, it cannot be changed. Blocks can read from the immutable state but cannot write to it.
|
||||
|
||||
- **The mutable state** (i.e. the `intermediates` dict) contains variables that are passed between blocks and can be modified by them.
|
||||
|
||||
Here's an example of what a `PipelineState` looks like:
|
||||
|
||||
```py
|
||||
PipelineState(
|
||||
inputs={
|
||||
'prompt': 'a cat'
|
||||
'guidance_scale': 7.0
|
||||
'num_inference_steps': 25
|
||||
},
|
||||
intermediates={
|
||||
'prompt_embeds': Tensor(dtype=torch.float32, shape=torch.Size([1, 1, 1, 1]))
|
||||
'negative_prompt_embeds': None
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
Each pipeline blocks define what parts of that state they can read from and write to through their `inputs`, `intermediate_inputs`, and `intermediate_outputs` properties. At run time, they gets a local view (`BlockState`) of the relevant variables it needs from `PipelineState`, performs its operations, and then updates `PipelineState` with any changes.
|
||||
|
||||
For example, if a block defines an input `image`, inside the block's `__call__` method, the `BlockState` would contain:
|
||||
|
||||
```py
|
||||
BlockState(
|
||||
image: <PIL.Image.Image image mode=RGB size=512x512 at 0x7F3ECC494640>
|
||||
)
|
||||
```
|
||||
|
||||
You can access the variables directly as attributes: `block_state.image`.
|
||||
|
||||
We will explore more on how blocks interact with pipeline state through their `inputs`, `intermediate_inputs`, and `intermediate_outputs` properties, see the [PipelineBlock guide](./pipeline_block.md).
|
||||
@@ -1,42 +0,0 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Getting Started with Modular Diffusers
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
|
||||
|
||||
</Tip>
|
||||
|
||||
With Modular Diffusers, we introduce a unified pipeline system that simplifies how you work with diffusion models. Instead of creating separate pipelines for each task, Modular Diffusers lets you:
|
||||
|
||||
**Write Only What's New**: You won't need to write an entire pipeline from scratch every time you have a new use case. You can create pipeline blocks just for your new workflow's unique aspects and reuse existing blocks for existing functionalities.
|
||||
|
||||
**Assemble Like LEGO®**: You can mix and match between blocks in flexible ways. This allows you to write dedicated blocks unique to specific workflows, and then assemble different blocks into a pipeline that can be used more conveniently for multiple workflows.
|
||||
|
||||
|
||||
Here's how our guides are organized to help you navigate the Modular Diffusers documentation:
|
||||
|
||||
### 🚀 Running Pipelines
|
||||
- **[Modular Pipeline Guide](./modular_pipeline.md)** - How to use predefined blocks to build a pipeline and run it
|
||||
- **[Components Manager Guide](./components_manager.md)** - How to manage and reuse components across multiple pipelines
|
||||
|
||||
### 📚 Creating PipelineBlocks
|
||||
- **[Pipeline and Block States](./modular_diffusers_states.md)** - Understanding PipelineState and BlockState
|
||||
- **[Pipeline Block](./pipeline_block.md)** - How to write custom PipelineBlocks
|
||||
- **[SequentialPipelineBlocks](sequential_pipeline_blocks.md)** - Connecting blocks in sequence
|
||||
- **[LoopSequentialPipelineBlocks](./loop_sequential_pipeline_blocks.md)** - Creating iterative workflows
|
||||
- **[AutoPipelineBlocks](./auto_pipeline_blocks.md)** - Conditional block selection
|
||||
|
||||
### 🎯 Practical Examples
|
||||
- **[End-to-End Example](./end_to_end_guide.md)** - Complete end-to-end examples including sharing your workflow in huggingface hub and deplying UI nodes
|
||||
@@ -1,292 +0,0 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# PipelineBlock
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
|
||||
|
||||
</Tip>
|
||||
|
||||
In Modular Diffusers, you build your workflow using `ModularPipelineBlocks`. We support 4 different types of blocks: `PipelineBlock`, `SequentialPipelineBlocks`, `LoopSequentialPipelineBlocks`, and `AutoPipelineBlocks`. Among them, `PipelineBlock` is the most fundamental building block of the whole system - it's like a brick in a Lego system. These blocks are designed to easily connect with each other, allowing for modular construction of creative and potentially very complex workflows.
|
||||
|
||||
<Tip>
|
||||
|
||||
**Important**: `PipelineBlock`s are definitions/specifications, not runnable pipelines. They define what a block should do and what data it needs, but you need to convert them into a `ModularPipeline` to actually execute them. For information on creating and running pipelines, see the [Modular Pipeline guide](./modular_pipeline.md).
|
||||
|
||||
</Tip>
|
||||
|
||||
In this tutorial, we will focus on how to write a basic `PipelineBlock` and how it interacts with the pipeline state.
|
||||
|
||||
## PipelineState
|
||||
|
||||
Before we dive into creating `PipelineBlock`s, make sure you have a basic understanding of `PipelineState`. It acts as the global state container that all blocks operate on - each block gets a local view (`BlockState`) of the relevant variables it needs from `PipelineState`, performs its operations, and then updates `PipelineState` with any changes. See the [PipelineState and BlockState guide](./modular_diffusers_states.md) for more details.
|
||||
|
||||
## Define a `PipelineBlock`
|
||||
|
||||
To write a `PipelineBlock` class, you need to define a few properties that determine how your block interacts with the pipeline state. Understanding these properties is crucial - they define what data your block can access and what it can produce.
|
||||
|
||||
The three main properties you need to define are:
|
||||
- `inputs`: Immutable values from the user that cannot be modified
|
||||
- `intermediate_inputs`: Mutable values from previous blocks that can be read and modified
|
||||
- `intermediate_outputs`: New values your block creates for subsequent blocks and user access
|
||||
|
||||
Let's explore each one and understand how they work with the pipeline state.
|
||||
|
||||
**Inputs: Immutable User Values**
|
||||
|
||||
Inputs are variables your block needs from the immutable pipeline state - these are user-provided values that cannot be modified by any block. You define them using `InputParam`:
|
||||
|
||||
```py
|
||||
user_inputs = [
|
||||
InputParam(name="image", type_hint="PIL.Image", description="raw input image to process")
|
||||
]
|
||||
```
|
||||
|
||||
When you list something as an input, you're saying "I need this value directly from the end user, and I will talk to them directly, telling them what I need in the 'description' field. They will provide it and it will come to me unchanged."
|
||||
|
||||
This is especially useful for raw values that serve as the "source of truth" in your workflow. For example, with a raw image, many workflows require preprocessing steps like resizing that a previous block might have performed. But in many cases, you also want the raw PIL image. In some inpainting workflows, you need the original image to overlay with the generated result for better control and consistency.
|
||||
|
||||
**Intermediate Inputs: Mutable Values from Previous Blocks, or Users**
|
||||
|
||||
Intermediate inputs are variables your block needs from the mutable pipeline state - these are values that can be read and modified. They're typically created by previous blocks, but could also be directly provided by the user if not the case:
|
||||
|
||||
```py
|
||||
user_intermediate_inputs = [
|
||||
InputParam(name="processed_image", type_hint="torch.Tensor", description="image that has been preprocessed and normalized"),
|
||||
]
|
||||
```
|
||||
|
||||
When you list something as an intermediate input, you're saying "I need this value, but I want to work with a different block that has already created it. I already know for sure that I can get it from this other block, but it's okay if other developers want use something different."
|
||||
|
||||
**Intermediate Outputs: New Values for Subsequent Blocks and User Access**
|
||||
|
||||
Intermediate outputs are new variables your block creates and adds to the mutable pipeline state. They serve two purposes:
|
||||
|
||||
1. **For subsequent blocks**: They can be used as intermediate inputs by other blocks in the pipeline
|
||||
2. **For users**: They become available as final outputs that users can access when running the pipeline
|
||||
|
||||
```py
|
||||
user_intermediate_outputs = [
|
||||
OutputParam(name="image_latents", description="latents representing the image")
|
||||
]
|
||||
```
|
||||
|
||||
Intermediate inputs and intermediate outputs work together like Lego studs and anti-studs - they're the connection points that make blocks modular. When one block produces an intermediate output, it becomes available as an intermediate input for subsequent blocks. This is where the "modular" nature of the system really shines - blocks can be connected and reconnected in different ways as long as their inputs and outputs match.
|
||||
|
||||
Additionally, all intermediate outputs are accessible to users when they run the pipeline, typically you would only need the final images, but they are also able to access intermediate results like latents, embeddings, or other processing steps.
|
||||
|
||||
**The `__call__` Method Structure**
|
||||
|
||||
Your `PipelineBlock`'s `__call__` method should follow this structure:
|
||||
|
||||
```py
|
||||
def __call__(self, components, state):
|
||||
# Get a local view of the state variables this block needs
|
||||
block_state = self.get_block_state(state)
|
||||
|
||||
# Your computation logic here
|
||||
# block_state contains all your inputs and intermediate_inputs
|
||||
# You can access them like: block_state.image, block_state.processed_image
|
||||
|
||||
# Update the pipeline state with your updated block_states
|
||||
self.set_block_state(state, block_state)
|
||||
return components, state
|
||||
```
|
||||
|
||||
The `block_state` object contains all the variables you defined in `inputs` and `intermediate_inputs`, making them easily accessible for your computation.
|
||||
|
||||
**Components and Configs**
|
||||
|
||||
You can define the components and pipeline-level configs your block needs using `ComponentSpec` and `ConfigSpec`:
|
||||
|
||||
```py
|
||||
from diffusers import ComponentSpec, ConfigSpec
|
||||
|
||||
# Define components your block needs
|
||||
expected_components = [
|
||||
ComponentSpec(name="unet", type_hint=UNet2DConditionModel),
|
||||
ComponentSpec(name="scheduler", type_hint=EulerDiscreteScheduler)
|
||||
]
|
||||
|
||||
# Define pipeline-level configs
|
||||
expected_config = [
|
||||
ConfigSpec("force_zeros_for_empty_prompt", True)
|
||||
]
|
||||
```
|
||||
|
||||
**Components**: In the `ComponentSpec`, you must provide a `name` and ideally a `type_hint`. You can also specify a `default_creation_method` to indicate whether the component should be loaded from a pretrained model or created with default configurations. The actual loading details (`repo`, `subfolder`, `variant` and `revision` fields) are typically specified when creating the pipeline, as we covered in the [Modular Pipeline Guide](./modular_pipeline.md).
|
||||
|
||||
**Configs**: Pipeline-level settings that control behavior across all blocks.
|
||||
|
||||
When you convert your blocks into a pipeline using `blocks.init_pipeline()`, the pipeline collects all component requirements from the blocks and fetches the loading specs from the modular repository. The components are then made available to your block as the first argument of the `__call__` method. You can access any component you need using dot notation:
|
||||
|
||||
```py
|
||||
def __call__(self, components, state):
|
||||
# Access components using dot notation
|
||||
unet = components.unet
|
||||
vae = components.vae
|
||||
scheduler = components.scheduler
|
||||
```
|
||||
|
||||
That's all you need to define in order to create a `PipelineBlock`. There is no hidden complexity. In fact we are going to create a helper function that take exactly these variables as input and return a pipeline block. We will use this helper function through out the tutorial to create test blocks
|
||||
|
||||
Note that for `__call__` method, the only part you should implement differently is the part between `self.get_block_state()` and `self.set_block_state()`, which can be abstracted into a simple function that takes `block_state` and returns the updated state. Our helper function accepts a `block_fn` that does exactly that.
|
||||
|
||||
**Helper Function**
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import PipelineBlock, InputParam, OutputParam
|
||||
import torch
|
||||
|
||||
def make_block(inputs=[], intermediate_inputs=[], intermediate_outputs=[], block_fn=None, description=None):
|
||||
class TestBlock(PipelineBlock):
|
||||
model_name = "test"
|
||||
|
||||
@property
|
||||
def inputs(self):
|
||||
return inputs
|
||||
|
||||
@property
|
||||
def intermediate_inputs(self):
|
||||
return intermediate_inputs
|
||||
|
||||
@property
|
||||
def intermediate_outputs(self):
|
||||
return intermediate_outputs
|
||||
|
||||
@property
|
||||
def description(self):
|
||||
return description if description is not None else ""
|
||||
|
||||
def __call__(self, components, state):
|
||||
block_state = self.get_block_state(state)
|
||||
if block_fn is not None:
|
||||
block_state = block_fn(block_state, state)
|
||||
self.set_block_state(state, block_state)
|
||||
return components, state
|
||||
|
||||
return TestBlock
|
||||
```
|
||||
|
||||
## Example: Creating a Simple Pipeline Block
|
||||
|
||||
Let's create a simple block to see how these definitions interact with the pipeline state. To better understand what's happening, we'll print out the states before and after updates to inspect them:
|
||||
|
||||
```py
|
||||
inputs = [
|
||||
InputParam(name="image", type_hint="PIL.Image", description="raw input image to process")
|
||||
]
|
||||
|
||||
intermediate_inputs = [InputParam(name="batch_size", type_hint=int)]
|
||||
|
||||
intermediate_outputs = [
|
||||
OutputParam(name="image_latents", description="latents representing the image")
|
||||
]
|
||||
|
||||
def image_encoder_block_fn(block_state, pipeline_state):
|
||||
print(f"pipeline_state (before update): {pipeline_state}")
|
||||
print(f"block_state (before update): {block_state}")
|
||||
|
||||
# Simulate processing the image
|
||||
block_state.image = torch.randn(1, 3, 512, 512)
|
||||
block_state.batch_size = block_state.batch_size * 2
|
||||
block_state.processed_image = [torch.randn(1, 3, 512, 512)] * block_state.batch_size
|
||||
block_state.image_latents = torch.randn(1, 4, 64, 64)
|
||||
|
||||
print(f"block_state (after update): {block_state}")
|
||||
return block_state
|
||||
|
||||
# Create a block with our definitions
|
||||
image_encoder_block_cls = make_block(
|
||||
inputs=inputs,
|
||||
intermediate_inputs=intermediate_inputs,
|
||||
intermediate_outputs=intermediate_outputs,
|
||||
block_fn=image_encoder_block_fn,
|
||||
description="Encode raw image into its latent presentation"
|
||||
)
|
||||
image_encoder_block = image_encoder_block_cls()
|
||||
pipe = image_encoder_block.init_pipeline()
|
||||
```
|
||||
|
||||
Let's check the pipeline's docstring to see what inputs it expects:
|
||||
```py
|
||||
>>> print(pipe.doc)
|
||||
class TestBlock
|
||||
|
||||
Encode raw image into its latent presentation
|
||||
|
||||
Inputs:
|
||||
|
||||
image (`PIL.Image`, *optional*):
|
||||
raw input image to process
|
||||
|
||||
batch_size (`int`, *optional*):
|
||||
|
||||
Outputs:
|
||||
|
||||
image_latents (`None`):
|
||||
latents representing the image
|
||||
```
|
||||
|
||||
Notice that `batch_size` appears as an input even though we defined it as an intermediate input. This happens because no previous block provided it, so the pipeline makes it available as a user input. However, unlike regular inputs, this value goes directly into the mutable intermediate state.
|
||||
|
||||
Now let's run the pipeline:
|
||||
|
||||
```py
|
||||
from diffusers.utils import load_image
|
||||
|
||||
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/image_of_squirrel_painting.png")
|
||||
state = pipe(image=image, batch_size=2)
|
||||
print(f"pipeline_state (after update): {state}")
|
||||
```
|
||||
```out
|
||||
pipeline_state (before update): PipelineState(
|
||||
inputs={
|
||||
image: <PIL.Image.Image image mode=RGB size=512x512 at 0x7F3ECC494550>
|
||||
},
|
||||
intermediates={
|
||||
batch_size: 2
|
||||
},
|
||||
)
|
||||
block_state (before update): BlockState(
|
||||
image: <PIL.Image.Image image mode=RGB size=512x512 at 0x7F3ECC494640>
|
||||
batch_size: 2
|
||||
)
|
||||
|
||||
block_state (after update): BlockState(
|
||||
image: Tensor(dtype=torch.float32, shape=torch.Size([1, 3, 512, 512]))
|
||||
batch_size: 4
|
||||
processed_image: List[4] of Tensors with shapes [torch.Size([1, 3, 512, 512]), torch.Size([1, 3, 512, 512]), torch.Size([1, 3, 512, 512]), torch.Size([1, 3, 512, 512])]
|
||||
image_latents: Tensor(dtype=torch.float32, shape=torch.Size([1, 4, 64, 64]))
|
||||
)
|
||||
pipeline_state (after update): PipelineState(
|
||||
inputs={
|
||||
image: <PIL.Image.Image image mode=RGB size=512x512 at 0x7F3ECC494550>
|
||||
},
|
||||
intermediates={
|
||||
batch_size: 4
|
||||
image_latents: Tensor(dtype=torch.float32, shape=torch.Size([1, 4, 64, 64]))
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
**Key Observations:**
|
||||
|
||||
1. **Before the update**: `image` (the input) goes to the immutable inputs dict, while `batch_size` (the intermediate_input) goes to the mutable intermediates dict, and both are available in `block_state`.
|
||||
|
||||
2. **After the update**:
|
||||
- **`image` (inputs)** changed in `block_state` but not in `pipeline_state` - this change is local to the block only.
|
||||
- **`batch_size (intermediate_inputs)`** was updated in both `block_state` and `pipeline_state` - this change affects subsequent blocks (we didn't need to declare it as an intermediate output since it was already in the intermediates dict)
|
||||
- **`image_latents (intermediate_outputs)`** was added to `pipeline_state` because it was declared as an intermediate output
|
||||
- **`processed_image`** was not added to `pipeline_state` because it wasn't declared as an intermediate output
|
||||
@@ -1,189 +0,0 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# SequentialPipelineBlocks
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
|
||||
|
||||
</Tip>
|
||||
|
||||
`SequentialPipelineBlocks` is a subclass of `ModularPipelineBlocks`. Unlike `PipelineBlock`, it is a multi-block that composes other blocks together in sequence, creating modular workflows where data flows from one block to the next. It's one of the most common ways to build complex pipelines by combining simpler building blocks.
|
||||
|
||||
<Tip>
|
||||
|
||||
Other types of multi-blocks include [AutoPipelineBlocks](auto_pipeline_blocks.md) (for conditional block selection) and [LoopSequentialPipelineBlocks](loop_sequential_pipeline_blocks.md) (for iterative workflows). For information on creating individual blocks, see the [PipelineBlock guide](pipeline_block.md).
|
||||
|
||||
Additionally, like all `ModularPipelineBlocks`, `SequentialPipelineBlocks` are definitions/specifications, not runnable pipelines. You need to convert them into a `ModularPipeline` to actually execute them. For information on creating and running pipelines, see the [Modular Pipeline guide](modular_pipeline.md).
|
||||
|
||||
</Tip>
|
||||
|
||||
In this tutorial, we will focus on how to create `SequentialPipelineBlocks` and how blocks connect and work together.
|
||||
|
||||
The key insight is that blocks connect through their intermediate inputs and outputs - the "studs and anti-studs" we discussed in the [PipelineBlock guide](pipeline_block.md). When one block produces an intermediate output, it becomes available as an intermediate input for subsequent blocks.
|
||||
|
||||
Let's explore this through an example. We will use the same helper function from the PipelineBlock guide to create blocks.
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import PipelineBlock, InputParam, OutputParam
|
||||
import torch
|
||||
|
||||
def make_block(inputs=[], intermediate_inputs=[], intermediate_outputs=[], block_fn=None, description=None):
|
||||
class TestBlock(PipelineBlock):
|
||||
model_name = "test"
|
||||
|
||||
@property
|
||||
def inputs(self):
|
||||
return inputs
|
||||
|
||||
@property
|
||||
def intermediate_inputs(self):
|
||||
return intermediate_inputs
|
||||
|
||||
@property
|
||||
def intermediate_outputs(self):
|
||||
return intermediate_outputs
|
||||
|
||||
@property
|
||||
def description(self):
|
||||
return description if description is not None else ""
|
||||
|
||||
def __call__(self, components, state):
|
||||
block_state = self.get_block_state(state)
|
||||
if block_fn is not None:
|
||||
block_state = block_fn(block_state, state)
|
||||
self.set_block_state(state, block_state)
|
||||
return components, state
|
||||
|
||||
return TestBlock
|
||||
```
|
||||
|
||||
Let's create a block that produces `batch_size`, which we'll call "input_block":
|
||||
|
||||
```py
|
||||
def input_block_fn(block_state, pipeline_state):
|
||||
|
||||
batch_size = len(block_state.prompt)
|
||||
block_state.batch_size = batch_size * block_state.num_images_per_prompt
|
||||
|
||||
return block_state
|
||||
|
||||
input_block_cls = make_block(
|
||||
inputs=[
|
||||
InputParam(name="prompt", type_hint=list, description="list of text prompts"),
|
||||
InputParam(name="num_images_per_prompt", type_hint=int, description="number of images per prompt")
|
||||
],
|
||||
intermediate_outputs=[
|
||||
OutputParam(name="batch_size", description="calculated batch size")
|
||||
],
|
||||
block_fn=input_block_fn,
|
||||
description="A block that determines batch_size based on the number of prompts and num_images_per_prompt argument."
|
||||
)
|
||||
input_block = input_block_cls()
|
||||
```
|
||||
|
||||
Now let's create a second block that uses the `batch_size` from the first block:
|
||||
|
||||
```py
|
||||
def image_encoder_block_fn(block_state, pipeline_state):
|
||||
# Simulate processing the image
|
||||
block_state.image = torch.randn(1, 3, 512, 512)
|
||||
block_state.batch_size = block_state.batch_size * 2
|
||||
block_state.image_latents = torch.randn(1, 4, 64, 64)
|
||||
return block_state
|
||||
|
||||
image_encoder_block_cls = make_block(
|
||||
inputs=[
|
||||
InputParam(name="image", type_hint="PIL.Image", description="raw input image to process")
|
||||
],
|
||||
intermediate_inputs=[
|
||||
InputParam(name="batch_size", type_hint=int)
|
||||
],
|
||||
intermediate_outputs=[
|
||||
OutputParam(name="image_latents", description="latents representing the image")
|
||||
],
|
||||
block_fn=image_encoder_block_fn,
|
||||
description="Encode raw image into its latent presentation"
|
||||
)
|
||||
image_encoder_block = image_encoder_block_cls()
|
||||
```
|
||||
|
||||
Now let's connect these blocks to create a `SequentialPipelineBlocks`:
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import SequentialPipelineBlocks, InsertableDict
|
||||
|
||||
# Define a dict mapping block names to block instances
|
||||
blocks_dict = InsertableDict()
|
||||
blocks_dict["input"] = input_block
|
||||
blocks_dict["image_encoder"] = image_encoder_block
|
||||
|
||||
# Create the SequentialPipelineBlocks
|
||||
blocks = SequentialPipelineBlocks.from_blocks_dict(blocks_dict)
|
||||
```
|
||||
|
||||
Now you have a `SequentialPipelineBlocks` with 2 blocks:
|
||||
|
||||
```py
|
||||
>>> blocks
|
||||
SequentialPipelineBlocks(
|
||||
Class: ModularPipelineBlocks
|
||||
|
||||
Description:
|
||||
|
||||
|
||||
Sub-Blocks:
|
||||
[0] input (TestBlock)
|
||||
Description: A block that determines batch_size based on the number of prompts and num_images_per_prompt argument.
|
||||
|
||||
[1] image_encoder (TestBlock)
|
||||
Description: Encode raw image into its latent presentation
|
||||
|
||||
)
|
||||
```
|
||||
|
||||
When you inspect `blocks.doc`, you can see that `batch_size` is not listed as an input. The pipeline automatically detects that the `input_block` can produce `batch_size` for the `image_encoder_block`, so it doesn't ask the user to provide it.
|
||||
|
||||
```py
|
||||
>>> print(blocks.doc)
|
||||
class SequentialPipelineBlocks
|
||||
|
||||
Inputs:
|
||||
|
||||
prompt (`None`, *optional*):
|
||||
|
||||
num_images_per_prompt (`None`, *optional*):
|
||||
|
||||
image (`PIL.Image`, *optional*):
|
||||
raw input image to process
|
||||
|
||||
Outputs:
|
||||
|
||||
batch_size (`None`):
|
||||
|
||||
image_latents (`None`):
|
||||
latents representing the image
|
||||
```
|
||||
|
||||
At runtime, you have data flow like this:
|
||||
|
||||

|
||||
|
||||
**How SequentialPipelineBlocks Works:**
|
||||
|
||||
1. Blocks are executed in the order they're registered in the `blocks_dict`
|
||||
2. Outputs from one block become available as intermediate inputs to all subsequent blocks
|
||||
3. The pipeline automatically figures out which values need to be provided by the user and which will be generated by previous blocks
|
||||
4. Each block maintains its own behavior and operates through its defined interface, while collectively these interfaces determine what the entire pipeline accepts and produces
|
||||
|
||||
What happens within each block follows the same pattern we described earlier: each block gets its own `block_state` with the relevant inputs and intermediate inputs, performs its computation, and updates the pipeline state with its intermediate outputs.
|
||||
@@ -0,0 +1,817 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Writing Your Own Pipeline Blocks
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
|
||||
|
||||
</Tip>
|
||||
|
||||
In Modular Diffusers, you build your workflow using `ModularPipelineBlocks`. We support 4 different types of blocks: `PipelineBlock`, `SequentialPipelineBlocks`, `LoopSequentialPipelineBlocks`, and `AutoPipelineBlocks`. Among them, `PipelineBlock` is the most fundamental building block of the whole system - it's like a brick in a Lego system. These blocks are designed to easily connect with each other, allowing for modular construction of creative and potentially very complex workflows.
|
||||
|
||||
In this tutorial, we will focus on how to write a basic `PipelineBlock` and how it interacts with other components in the system. We will also cover how to connect them together using the multi-blocks: `SequentialPipelineBlocks`, `LoopSequentialPipelineBlocks`, and `AutoPipelineBlocks`.
|
||||
|
||||
|
||||
## Understanding the Foundation: `PipelineState`
|
||||
|
||||
Before we dive into creating `PipelineBlock`s, we need to have a basic understanding of `PipelineState` - the core data structure that all blocks operate on. This concept is fundamental to understanding how blocks interact with each other and the pipeline system.
|
||||
|
||||
In the modular diffusers system, `PipelineState` acts as the global state container that `PipelineBlock`s operate on - each block gets a local view (`BlockState`) of the relevant variables it needs from `PipelineState`, performs its operations, and then updates `PipelineState` with any changes.
|
||||
|
||||
While `PipelineState` maintains the complete runtime state of the pipeline, `PipelineBlock`s define what parts of that state they can read from and write to through their `input`s, `intermediates_inputs`, and `intermediates_outputs` properties.
|
||||
|
||||
A `PipelineState` consists of two distinct states:
|
||||
- The **immutable state** (i.e. the `inputs` dict) contains a copy of values provided by users. Once a value is added to the immutable state, it cannot be changed. Blocks can read from the immutable state but cannot write to it.
|
||||
- The **mutable state** (i.e. the `intermediates` dict) contains variables that are passed between blocks and can be modified by them.
|
||||
|
||||
Here's an example of what a `PipelineState` looks like:
|
||||
|
||||
```
|
||||
PipelineState(
|
||||
inputs={
|
||||
prompt: 'a cat'
|
||||
guidance_scale: 7.0
|
||||
num_inference_steps: 25
|
||||
},
|
||||
intermediates={
|
||||
prompt_embeds: Tensor(dtype=torch.float32, shape=torch.Size([1, 1, 1, 1]))
|
||||
negative_prompt_embeds: None
|
||||
},
|
||||
```
|
||||
|
||||
## Creating a `PipelineBlock`
|
||||
|
||||
To write a `PipelineBlock` class, you need to define a few properties that determine how your block interacts with the pipeline state. Understanding these properties is crucial - they define what data your block can access and what it can produce.
|
||||
|
||||
The three main properties you need to define are:
|
||||
- `inputs`: Immutable values from the user that cannot be modified
|
||||
- `intermediate_inputs`: Mutable values from previous blocks that can be read and modified
|
||||
- `intermediate_outputs`: New values your block creates for subsequent blocks
|
||||
|
||||
Let's explore each one and understand how they work with the pipeline state.
|
||||
|
||||
**Inputs: Immutable User Values**
|
||||
|
||||
Inputs are variables your block needs from the immutable pipeline state - these are user-provided values that cannot be modified by any block. You define them using `InputParam`:
|
||||
|
||||
```py
|
||||
user_inputs = [
|
||||
InputParam(name="image", type_hint="PIL.Image", description="raw input image to process")
|
||||
]
|
||||
```
|
||||
|
||||
When you list something as an input, you're saying "I need this value directly from the end user, and I will talk to them directly, telling them what I need in the 'description' field. They will provide it and it will come to me unchanged."
|
||||
|
||||
This is especially useful for raw values that serve as the "source of truth" in your workflow. For example, with a raw image, many workflows require preprocessing steps like resizing that a previous block might have performed. But in many cases, you also want the raw PIL image. In some inpainting workflows, you need the original image to overlay with the generated result for better control and consistency.
|
||||
|
||||
**Intermediate Inputs: Mutable Values from Previous Blocks**
|
||||
|
||||
Intermediate inputs are variables your block needs from the mutable pipeline state - these are values that can be read and modified. They're typically created by previous blocks, but could also be directly provided by the user if not the case:
|
||||
|
||||
```py
|
||||
user_intermediate_inputs = [
|
||||
InputParam(name="processed_image", type_hint="torch.Tensor", description="image that has been preprocessed and normalized"),
|
||||
]
|
||||
```
|
||||
|
||||
When you list something as an intermediate input, you're saying "I need this value, but I want to work with a different block that has already created it. I already know for sure that I can get it from this other block, but it's okay if other developers want use something different."
|
||||
|
||||
**Intermediate Outputs: New Values for Subsequent Blocks**
|
||||
|
||||
Intermediate outputs are new variables your block creates and adds to the mutable pipeline state so they can be used by subsequent blocks:
|
||||
|
||||
```py
|
||||
user_intermediate_outputs = [
|
||||
OutputParam(name="image_latents", description="latents representing the image")
|
||||
]
|
||||
```
|
||||
|
||||
Intermediate inputs and intermediate outputs work together like Lego studs and anti-studs - they're the connection points that make blocks modular. When one block produces an intermediate output, it becomes available as an intermediate input for subsequent blocks. This is where the "modular" nature of the system really shines - blocks can be connected and reconnected in different ways as long as their inputs and outputs match. We will see more how they connect when we talk about multi-blocks.
|
||||
|
||||
**The `__call__` Method Structure**
|
||||
|
||||
Your `PipelineBlock`'s `__call__` method should follow this structure:
|
||||
|
||||
```py
|
||||
def __call__(self, components, state):
|
||||
# Get a local view of the state variables this block needs
|
||||
block_state = self.get_block_state(state)
|
||||
|
||||
# Your computation logic here
|
||||
# block_state contains all your inputs and intermediate_inputs
|
||||
# You can access them like: block_state.image, block_state.processed_image
|
||||
|
||||
# Update the pipeline state with your updated block_states
|
||||
self.set_block_state(state, block_state)
|
||||
return components, state
|
||||
```
|
||||
|
||||
The `block_state` object contains all the variables you defined in `inputs` and `intermediate_inputs`, making them easily accessible for your computation.
|
||||
|
||||
**Components and Configs**
|
||||
|
||||
You can define the components and pipeline-level configs your block needs using `ComponentSpec` and `ConfigSpec`:
|
||||
|
||||
```py
|
||||
from diffusers import ComponentSpec, ConfigSpec
|
||||
|
||||
# Define components your block needs
|
||||
expected_components = [
|
||||
ComponentSpec(name="unet", type_hint=UNet2DConditionModel),
|
||||
ComponentSpec(name="scheduler", type_hint=EulerDiscreteScheduler)
|
||||
]
|
||||
|
||||
# Define pipeline-level configs
|
||||
expected_config = [
|
||||
ConfigSpec("force_zeros_for_empty_prompt", True)
|
||||
]
|
||||
```
|
||||
|
||||
**Components**: In the `ComponentSpec`, You must provide a `name` and ideally a `type_hint`. The actual loading details (`repo`, `subfolder`, `variant` and `revision` fields) are typically specified when creating the pipeline, as we covered in the [Getting Started Guide](https://huggingface.co/docs/diffusers/en/modular_diffusers/getting_started#loading-components-into-a-modularpipeline).
|
||||
|
||||
**Configs**: Simple pipeline-level settings that control behavior across all blocks.
|
||||
|
||||
When you convert your blocks into a pipeline using `blocks.init_pipeline()`, the pipeline collects all component requirements from the blocks and fetches the loading specs from the modular repository. The components are then made available to your block in the `components` argument of the `__call__` method.
|
||||
|
||||
That's all you need to define in order to create a `PipelineBlock`. There is no hidden complexity. In fact we are going to create a helper function that take exactly these variables as input and return a pipeline block. We will use this helper function through out the tutorial to create test blocks
|
||||
|
||||
Note that for `__call__` method, the only part you should implement differently is the part between `self.get_block_state()` and `self.set_block_state()`, which can be abstracted into a simple function that takes `block_state` and returns the updated state. Our helper function accepts a `block_fn` that does exactly that.
|
||||
|
||||
**Helper Function**
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import PipelineBlock, InputParam, OutputParam
|
||||
import torch
|
||||
|
||||
def make_block(inputs=[], intermediate_inputs=[], intermediate_outputs=[], block_fn=None, description=None):
|
||||
class TestBlock(PipelineBlock):
|
||||
model_name = "test"
|
||||
|
||||
@property
|
||||
def inputs(self):
|
||||
return inputs
|
||||
|
||||
@property
|
||||
def intermediate_inputs(self):
|
||||
return intermediate_inputs
|
||||
|
||||
@property
|
||||
def intermediate_outputs(self):
|
||||
return intermediate_outputs
|
||||
|
||||
@property
|
||||
def description(self):
|
||||
return description if description is not None else ""
|
||||
|
||||
def __call__(self, components, state):
|
||||
block_state = self.get_block_state(state)
|
||||
if block_fn is not None:
|
||||
block_state = block_fn(block_state, state)
|
||||
self.set_block_state(state, block_state)
|
||||
return components, state
|
||||
|
||||
return TestBlock
|
||||
```
|
||||
|
||||
|
||||
Let's create a simple block to see how these definitions interact with the pipeline state. To better understand what's happening, we'll print out the states before and after updates to inspect them:
|
||||
|
||||
```py
|
||||
inputs = [
|
||||
InputParam(name="image", type_hint="PIL.Image", description="raw input image to process")
|
||||
]
|
||||
|
||||
intermediate_inputs = [InputParam(name="batch_size", type_hint=int)]
|
||||
|
||||
intermediate_outputs = [
|
||||
OutputParam(name="image_latents", description="latents representing the image")
|
||||
]
|
||||
|
||||
def image_encoder_block_fn(block_state, pipeline_state):
|
||||
print(f"pipeline_state (before update): {pipeline_state}")
|
||||
print(f"block_state (before update): {block_state}")
|
||||
|
||||
# Simulate processing the image
|
||||
block_state.image = torch.randn(1, 3, 512, 512)
|
||||
block_state.batch_size = block_state.batch_size * 2
|
||||
block_state.processed_image = [torch.randn(1, 3, 512, 512)] * block_state.batch_size
|
||||
block_state.image_latents = torch.randn(1, 4, 64, 64)
|
||||
|
||||
print(f"block_state (after update): {block_state}")
|
||||
return block_state
|
||||
|
||||
# Create a block with our definitions
|
||||
image_encoder_block_cls = make_block(
|
||||
inputs=inputs,
|
||||
intermediate_inputs=intermediate_inputs,
|
||||
intermediate_outputs=intermediate_outputs,
|
||||
block_fn=image_encoder_block_fn,
|
||||
description=" Encode raw image into its latent presentation"
|
||||
)
|
||||
image_encoder_block = image_encoder_block_cls()
|
||||
pipe = image_encoder_block.init_pipeline()
|
||||
```
|
||||
|
||||
Let's check the pipeline's docstring to see what inputs it expects:
|
||||
```py
|
||||
>>> print(pipe.doc)
|
||||
class TestBlock
|
||||
|
||||
Encode raw image into its latent presentation
|
||||
|
||||
Inputs:
|
||||
|
||||
image (`PIL.Image`, *optional*):
|
||||
raw input image to process
|
||||
|
||||
batch_size (`int`, *optional*):
|
||||
|
||||
Outputs:
|
||||
|
||||
image_latents (`None`):
|
||||
latents representing the image
|
||||
```
|
||||
|
||||
Notice that `batch_size` appears as an input even though we defined it as an intermediate input. This happens because no previous block provided it, so the pipeline makes it available as a user input. However, unlike regular inputs, this value goes directly into the mutable intermediate state.
|
||||
|
||||
Now let's run the pipeline:
|
||||
|
||||
```py
|
||||
from diffusers.utils import load_image
|
||||
|
||||
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/image_of_squirrel_painting.png")
|
||||
state = pipe(image=image, batch_size=2)
|
||||
print(f"pipeline_state (after update): {state}")
|
||||
```
|
||||
```out
|
||||
pipeline_state (before update): PipelineState(
|
||||
inputs={
|
||||
image: <PIL.Image.Image image mode=RGB size=512x512 at 0x7F3ECC494550>
|
||||
},
|
||||
intermediates={
|
||||
batch_size: 2
|
||||
},
|
||||
)
|
||||
block_state (before update): BlockState(
|
||||
image: <PIL.Image.Image image mode=RGB size=512x512 at 0x7F3ECC494640>
|
||||
batch_size: 2
|
||||
)
|
||||
|
||||
block_state (after update): BlockState(
|
||||
image: Tensor(dtype=torch.float32, shape=torch.Size([1, 3, 512, 512]))
|
||||
batch_size: 4
|
||||
processed_image: List[4] of Tensors with shapes [torch.Size([1, 3, 512, 512]), torch.Size([1, 3, 512, 512]), torch.Size([1, 3, 512, 512]), torch.Size([1, 3, 512, 512])]
|
||||
image_latents: Tensor(dtype=torch.float32, shape=torch.Size([1, 4, 64, 64]))
|
||||
)
|
||||
pipeline_state (after update): PipelineState(
|
||||
inputs={
|
||||
image: <PIL.Image.Image image mode=RGB size=512x512 at 0x7F3ECC494550>
|
||||
},
|
||||
intermediates={
|
||||
batch_size: 4
|
||||
image_latents: Tensor(dtype=torch.float32, shape=torch.Size([1, 4, 64, 64]))
|
||||
},
|
||||
)
|
||||
```
|
||||
**Key Observations:**
|
||||
|
||||
1. **Before the update**: `image` (the input) goes to the immutable inputs dict, while `batch_size` (the intermediate_input) goes to the mutable intermediates dict, and both are available in `block_state`.
|
||||
|
||||
2. **After the update**:
|
||||
- **`image` (inputs)** changed in `block_state` but not in `pipeline_state` - this change is local to the block only.
|
||||
- **`batch_size (intermediate_inputs)`** was updated in both `block_state` and `pipeline_state` - this change affects subsequent blocks (we didn't need to declare it as an intermediate output since it was already in the intermediates dict)
|
||||
- **`image_latents (intermediate_outputs)`** was added to `pipeline_state` because it was declared as an intermediate output
|
||||
- **`processed_image`** was not added to `pipeline_state` because it wasn't declared as an intermediate output
|
||||
|
||||
I hope by now you have a basic idea about how `PipelineBlock` manages state through inputs, intermediate inputs, and intermediate outputs. The real power comes when we connect multiple blocks together - their intermediate outputs become intermediate inputs for subsequent blocks, creating modular workflows. Let's explore how to build these connections using multi-blocks like `SequentialPipelineBlocks`.
|
||||
|
||||
## Create a `SequentialPipelineBlocks`
|
||||
|
||||
I assume that you're already familiar with `SequentialPipelineBlocks` and how to create them with the `from_blocks_dict` API. It's one of the most common ways to use Modular Diffusers, and we've covered it pretty well in the [Getting Started Guide](https://huggingface.co/docs/diffusers/pr_9672/en/modular_diffusers/getting_started#modularpipelineblocks).
|
||||
|
||||
But how do blocks actually connect and work together? Understanding this is crucial for building effective modular workflows. Let's explore this through an example.
|
||||
|
||||
**How Blocks Connect in SequentialPipelineBlocks:**
|
||||
|
||||
The key insight is that blocks connect through their intermediate inputs and outputs - the "studs and anti-studs" we discussed earlier. Let's expand on our example to create a new block that produces `batch_size`, which we'll call "input_block":
|
||||
|
||||
```py
|
||||
def input_block_fn(block_state, pipeline_state):
|
||||
|
||||
batch_size = len(block_state.prompt)
|
||||
block_state.batch_size = batch_size * block_state.num_images_per_prompt
|
||||
|
||||
return block_state
|
||||
|
||||
input_block_cls = make_block(
|
||||
inputs=[
|
||||
InputParam(name="prompt", type_hint=list, description="list of text prompts"),
|
||||
InputParam(name="num_images_per_prompt", type_hint=int, description="number of images per prompt")
|
||||
],
|
||||
intermediate_outputs=[
|
||||
OutputParam(name="batch_size", description="calculated batch size")
|
||||
],
|
||||
block_fn=input_block_fn,
|
||||
description="A block that determines batch_size based on the number of prompts and num_images_per_prompt argument."
|
||||
)
|
||||
input_block = input_block_cls()
|
||||
```
|
||||
|
||||
Now let's connect these blocks to create a pipeline:
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import SequentialPipelineBlocks, InsertableDict
|
||||
# define a dict map block names to block class
|
||||
blocks_dict = InsertableDict()
|
||||
blocks_dict["input"] = input_block
|
||||
blocks_dict["image_encoder"] = image_encoder_block
|
||||
# create the multi-block
|
||||
blocks = SequentialPipelineBlocks.from_blocks_dict(blocks_dict)
|
||||
# convert it to a runnable pipeline
|
||||
pipeline = blocks.init_pipeline()
|
||||
```
|
||||
|
||||
Now you have a pipeline with 2 blocks.
|
||||
|
||||
```py
|
||||
>>> pipeline.blocks
|
||||
SequentialPipelineBlocks(
|
||||
Class: ModularPipelineBlocks
|
||||
|
||||
Description:
|
||||
|
||||
|
||||
Sub-Blocks:
|
||||
[0] input (TestBlock)
|
||||
Description: A block that determines batch_size based on the number of prompts and num_images_per_prompt argument.
|
||||
|
||||
[1] image_encoder (TestBlock)
|
||||
Description: Encode raw image into its latent presentation
|
||||
|
||||
)
|
||||
```
|
||||
|
||||
When you inspect `pipeline.doc`, you can see that `batch_size` is not listed as an input. The pipeline automatically detects that the `input_block` can produce `batch_size` for the `image_encoder_block`, so it doesn't ask the user to provide it.
|
||||
|
||||
```py
|
||||
>>> print(pipeline.doc)
|
||||
class SequentialPipelineBlocks
|
||||
|
||||
Inputs:
|
||||
|
||||
prompt (`None`, *optional*):
|
||||
|
||||
num_images_per_prompt (`None`, *optional*):
|
||||
|
||||
image (`PIL.Image`, *optional*):
|
||||
raw input image to process
|
||||
|
||||
Outputs:
|
||||
|
||||
batch_size (`None`):
|
||||
|
||||
image_latents (`None`):
|
||||
latents representing the image
|
||||
```
|
||||
|
||||
At runtime, you have data flow like this:
|
||||
|
||||

|
||||
|
||||
**How SequentialPipelineBlocks Works:**
|
||||
|
||||
1. Blocks are executed in the order they're registered in the `blocks_dict`
|
||||
2. Outputs from one block become available as intermediate inputs to all subsequent blocks
|
||||
3. The pipeline automatically figures out which values need to be provided by the user and which will be generated by previous blocks
|
||||
4. Each block maintains its own behavior and operates through its defined interface, while collectively these interfaces determine what the entire pipeline accepts and produces
|
||||
|
||||
What happens within each block follows the same pattern we described earlier: each block gets its own `block_state` with the relevant inputs and intermediate inputs, performs its computation, and updates the pipeline state with its intermediate outputs.
|
||||
|
||||
## `LoopSequentialPipelineBlocks`
|
||||
|
||||
To create a loop in Modular Diffusers, you could use a single `PipelineBlock` like this:
|
||||
|
||||
```python
|
||||
class DenoiseLoop(PipelineBlock):
|
||||
def __call__(self, components, state):
|
||||
block_state = self.get_block_state(state)
|
||||
for t in range(block_state.num_inference_steps):
|
||||
# ... loop logic here
|
||||
pass
|
||||
self.set_block_state(state, block_state)
|
||||
return components, state
|
||||
```
|
||||
|
||||
Or you could create a `LoopSequentialPipelineBlocks`. The key difference is that with `LoopSequentialPipelineBlocks`, the loop itself is modular: you can add or remove blocks within the loop or reuse the same loop structure with different block combinations.
|
||||
|
||||
It involves two parts: a **loop wrapper** and **loop blocks**
|
||||
|
||||
* The **loop wrapper** (`LoopSequentialPipelineBlocks`) defines the loop structure, e.g. it defines the iteration variables, and loop configurations such as progress bar.
|
||||
|
||||
* The **loop blocks** are basically standard pipeline blocks you add to the loop wrapper.
|
||||
- they run sequentially for each iteration of the loop
|
||||
- they receive the current iteration index as an additional parameter
|
||||
- they share the same block_state throughout the entire loop
|
||||
|
||||
Unlike regular `SequentialPipelineBlocks` where each block gets its own state, loop blocks share a single state that persists and evolves across iterations.
|
||||
|
||||
We will build a simple loop block to demonstrate these concepts. Creating a loop block involves three steps:
|
||||
1. defining the loop wrapper class
|
||||
2. creating the loop blocks
|
||||
3. adding the loop blocks to the loop wrapper class to create the loop wrapper instance
|
||||
|
||||
**Step 1: Define the Loop Wrapper**
|
||||
|
||||
To create a `LoopSequentialPipelineBlocks` class, you need to define:
|
||||
|
||||
* `loop_inputs`: User input variables (equivalent to `PipelineBlock.inputs`)
|
||||
* `loop_intermediate_inputs`: Intermediate variables needed from the mutable pipeline state (equivalent to `PipelineBlock.intermediates_inputs`)
|
||||
* `loop_intermediate_outputs`: New intermediate variables this block will add to the mutable pipeline state (equivalent to `PipelineBlock.intermediates_outputs`)
|
||||
* `__call__` method: Defines the loop structure and iteration logic
|
||||
|
||||
Here is an example of a loop wrapper:
|
||||
|
||||
```py
|
||||
import torch
|
||||
from diffusers.modular_pipelines import LoopSequentialPipelineBlocks, PipelineBlock, InputParam, OutputParam
|
||||
|
||||
class LoopWrapper(LoopSequentialPipelineBlocks):
|
||||
model_name = "test"
|
||||
@property
|
||||
def description(self):
|
||||
return "I'm a loop!!"
|
||||
@property
|
||||
def loop_inputs(self):
|
||||
return [InputParam(name="num_steps")]
|
||||
@torch.no_grad()
|
||||
def __call__(self, components, state):
|
||||
block_state = self.get_block_state(state)
|
||||
# Loop structure - can be customized to your needs
|
||||
for i in range(block_state.num_steps):
|
||||
# loop_step executes all registered blocks in sequence
|
||||
components, block_state = self.loop_step(components, block_state, i=i)
|
||||
self.set_block_state(state, block_state)
|
||||
return components, state
|
||||
```
|
||||
|
||||
**Step 2: Create Loop Blocks**
|
||||
|
||||
Loop blocks are standard `PipelineBlock`s, but their `__call__` method works differently:
|
||||
* It receives the iteration variable (e.g., `i`) passed by the loop wrapper
|
||||
* It works directly with `block_state` instead of pipeline state
|
||||
* No need to call `self.get_block_state()` or `self.set_block_state()`
|
||||
|
||||
```py
|
||||
class LoopBlock(PipelineBlock):
|
||||
# this is used to identify the model family, we won't worry about it in this example
|
||||
model_name = "test"
|
||||
@property
|
||||
def inputs(self):
|
||||
return [InputParam(name="x")]
|
||||
@property
|
||||
def intermediate_outputs(self):
|
||||
# outputs produced by this block
|
||||
return [OutputParam(name="x")]
|
||||
@property
|
||||
def description(self):
|
||||
return "I'm a block used inside the `LoopWrapper` class"
|
||||
def __call__(self, components, block_state, i: int):
|
||||
block_state.x += 1
|
||||
return components, block_state
|
||||
```
|
||||
|
||||
**Step 3: Combine Everything**
|
||||
|
||||
Finally, assemble your loop by adding the block(s) to the wrapper:
|
||||
|
||||
```py
|
||||
loop = LoopWrapper.from_blocks_dict({"block1": LoopBlock})
|
||||
```
|
||||
|
||||
Now you've created a loop with one step:
|
||||
|
||||
```py
|
||||
>>> loop
|
||||
LoopWrapper(
|
||||
Class: LoopSequentialPipelineBlocks
|
||||
|
||||
Description: I'm a loop!!
|
||||
|
||||
Sub-Blocks:
|
||||
[0] block1 (LoopBlock)
|
||||
Description: I'm a block used inside the `LoopWrapper` class
|
||||
|
||||
)
|
||||
```
|
||||
|
||||
It has two inputs: `x` (used at each step within the loop) and `num_steps` used to define the loop.
|
||||
|
||||
```py
|
||||
>>> print(loop.doc)
|
||||
class LoopWrapper
|
||||
|
||||
I'm a loop!!
|
||||
|
||||
Inputs:
|
||||
|
||||
x (`None`, *optional*):
|
||||
|
||||
num_steps (`None`, *optional*):
|
||||
|
||||
Outputs:
|
||||
|
||||
x (`None`):
|
||||
```
|
||||
|
||||
**Running the Loop:**
|
||||
|
||||
```py
|
||||
# run the loop
|
||||
loop_pipeline = loop.init_pipeline()
|
||||
x = loop_pipeline(num_steps=10, x=0, output="x")
|
||||
assert x == 10
|
||||
```
|
||||
|
||||
**Adding Multiple Blocks:**
|
||||
|
||||
We can add multiple blocks to run within each iteration. Let's run the loop block twice within each iteration:
|
||||
|
||||
```py
|
||||
loop = LoopWrapper.from_blocks_dict({"block1": LoopBlock(), "block2": LoopBlock})
|
||||
loop_pipeline = loop.init_pipeline()
|
||||
x = loop_pipeline(num_steps=10, x=0, output="x")
|
||||
assert x == 20 # Each iteration runs 2 blocks, so 10 iterations * 2 = 20
|
||||
```
|
||||
|
||||
**Key Differences from SequentialPipelineBlocks:**
|
||||
|
||||
The main difference is that loop blocks share the same `block_state` across all iterations, allowing values to accumulate and evolve throughout the loop. Loop blocks could receive additional arguments (like the current iteration index) depending on the loop wrapper's implementation, since the wrapper defines how loop blocks are called. You can easily add, remove, or reorder blocks within the loop without changing the loop logic itself.
|
||||
|
||||
The officially supported denoising loops in Modular Diffusers are implemented using `LoopSequentialPipelineBlocks`. You can explore the actual implementation to see how these concepts work in practice:
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines.stable_diffusion_xl.denoise import StableDiffusionXLDenoiseStep
|
||||
StableDiffusionXLDenoiseStep()
|
||||
```
|
||||
|
||||
## `AutoPipelineBlocks`
|
||||
|
||||
`AutoPipelineBlocks` allows you to pack different pipelines into one and automatically select which one to run at runtime based on the inputs. The main purpose is convenience and portability - for developers, you can package everything into one workflow, making it easier to share and use.
|
||||
|
||||
For example, you might want to support text-to-image and image-to-image tasks. Instead of creating two separate pipelines, you can create an `AutoPipelineBlocks` that automatically chooses the workflow based on whether an `image` input is provided.
|
||||
|
||||
Let's see an example. Here we'll create a dummy `AutoPipelineBlocks` that includes dummy text-to-image, image-to-image, and inpaint pipelines.
|
||||
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import AutoPipelineBlocks
|
||||
|
||||
# These are dummy blocks and we only focus on "inputs" for our purpose
|
||||
inputs = [InputParam(name="prompt")]
|
||||
# block_fn prints out which workflow is running so we can see the execution order at runtime
|
||||
block_fn = lambda x, y: print("running the text-to-image workflow")
|
||||
block_t2i_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a text-to-image workflow!")
|
||||
|
||||
inputs = [InputParam(name="prompt"), InputParam(name="image")]
|
||||
block_fn = lambda x, y: print("running the image-to-image workflow")
|
||||
block_i2i_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a image-to-image workflow!")
|
||||
|
||||
inputs = [InputParam(name="prompt"), InputParam(name="image"), InputParam(name="mask")]
|
||||
block_fn = lambda x, y: print("running the inpaint workflow")
|
||||
block_inpaint_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a inpaint workflow!")
|
||||
|
||||
class AutoImageBlocks(AutoPipelineBlocks):
|
||||
# List of sub-block classes to choose from
|
||||
block_classes = [block_inpaint_cls, block_i2i_cls, block_t2i_cls]
|
||||
# Names for each block in the same order
|
||||
block_names = ["inpaint", "img2img", "text2img"]
|
||||
# Trigger inputs that determine which block to run
|
||||
# - "mask" triggers inpaint workflow
|
||||
# - "image" triggers img2img workflow (but only if mask is not provided)
|
||||
# - if none of above, runs the text2img workflow (default)
|
||||
block_trigger_inputs = ["mask", "image", None]
|
||||
# Description is extremely important for AutoPipelineBlocks
|
||||
@property
|
||||
def description(self):
|
||||
return (
|
||||
"Pipeline generates images given different types of conditions!\n"
|
||||
+ "This is an auto pipeline block that works for text2img, img2img and inpainting tasks.\n"
|
||||
+ " - inpaint workflow is run when `mask` is provided.\n"
|
||||
+ " - img2img workflow is run when `image` is provided (but only when `mask` is not provided).\n"
|
||||
+ " - text2img workflow is run when neither `image` nor `mask` is provided.\n"
|
||||
)
|
||||
|
||||
# Create the blocks
|
||||
auto_blocks = AutoImageBlocks()
|
||||
# convert to pipeline
|
||||
auto_pipeline = auto_blocks.init_pipeline()
|
||||
```
|
||||
|
||||
Now we have created an `AutoPipelineBlocks` that contains 3 sub-blocks. Notice the warning message at the top - this automatically appears in every `ModularPipelineBlocks` that contains `AutoPipelineBlocks` to remind end users that dynamic block selection happens at runtime.
|
||||
|
||||
```py
|
||||
AutoImageBlocks(
|
||||
Class: AutoPipelineBlocks
|
||||
|
||||
====================================================================================================
|
||||
This pipeline contains blocks that are selected at runtime based on inputs.
|
||||
Trigger Inputs: ['mask', 'image']
|
||||
====================================================================================================
|
||||
|
||||
|
||||
Description: Pipeline generates images given different types of conditions!
|
||||
This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
|
||||
- inpaint workflow is run when `mask` is provided.
|
||||
- img2img workflow is run when `image` is provided (but only when `mask` is not provided).
|
||||
- text2img workflow is run when neither `image` nor `mask` is provided.
|
||||
|
||||
|
||||
|
||||
Sub-Blocks:
|
||||
• inpaint [trigger: mask] (TestBlock)
|
||||
Description: I'm a inpaint workflow!
|
||||
|
||||
• img2img [trigger: image] (TestBlock)
|
||||
Description: I'm a image-to-image workflow!
|
||||
|
||||
• text2img [default] (TestBlock)
|
||||
Description: I'm a text-to-image workflow!
|
||||
|
||||
)
|
||||
```
|
||||
|
||||
Check out the documentation with `print(auto_pipeline.doc)`:
|
||||
|
||||
```py
|
||||
>>> print(auto_pipeline.doc)
|
||||
class AutoImageBlocks
|
||||
|
||||
Pipeline generates images given different types of conditions!
|
||||
This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
|
||||
- inpaint workflow is run when `mask` is provided.
|
||||
- img2img workflow is run when `image` is provided (but only when `mask` is not provided).
|
||||
- text2img workflow is run when neither `image` nor `mask` is provided.
|
||||
|
||||
Inputs:
|
||||
|
||||
prompt (`None`, *optional*):
|
||||
|
||||
image (`None`, *optional*):
|
||||
|
||||
mask (`None`, *optional*):
|
||||
```
|
||||
|
||||
There is a fundamental trade-off of AutoPipelineBlocks: it trades clarity for convenience. While it is really easy for packaging multiple workflows, it can become confusing without proper documentation. e.g. if we just throw a pipeline at you and tell you that it contains 3 sub-blocks and takes 3 inputs `prompt`, `image` and `mask`, and ask you to run an image-to-image workflow: if you don't have any prior knowledge on how these pipelines work, you would be pretty clueless, right?
|
||||
|
||||
This pipeline we just made though, has a docstring that shows all available inputs and workflows and explains how to use each with different inputs. So it's really helpful for users. For example, it's clear that you need to pass `image` to run img2img. This is why the description field is absolutely critical for AutoPipelineBlocks. We highly recommend you to explain the conditional logic very well for each `AutoPipelineBlocks` you would make. We also recommend to always test individual pipelines first before packaging them into AutoPipelineBlocks.
|
||||
|
||||
Let's run this auto pipeline with different inputs to see if the conditional logic works as described. Remember that we have added `print` in each `PipelineBlock`'s `__call__` method to print out its workflow name, so it should be easy to tell which one is running:
|
||||
|
||||
```py
|
||||
>>> _ = auto_pipeline(image="image", mask="mask")
|
||||
running the inpaint workflow
|
||||
>>> _ = auto_pipeline(image="image")
|
||||
running the image-to-image workflow
|
||||
>>> _ = auto_pipeline(prompt="prompt")
|
||||
running the text-to-image workflow
|
||||
>>> _ = auto_pipeline(image="prompt", mask="mask")
|
||||
running the inpaint workflow
|
||||
```
|
||||
|
||||
However, even with documentation, it can become very confusing when AutoPipelineBlocks are combined with other blocks. The complexity grows quickly when you have nested AutoPipelineBlocks or use them as sub-blocks in larger pipelines.
|
||||
|
||||
Let's make another `AutoPipelineBlocks` - this one only contains one block, and it does not include `None` in its `block_trigger_inputs` (which corresponds to the default block to run when none of the trigger inputs are provided). This means this block will be skipped if the trigger input (`ip_adapter_image`) is not provided at runtime.
|
||||
|
||||
```py
|
||||
from diffusers.modular_pipelines import SequentialPipelineBlocks, InsertableDict
|
||||
inputs = [InputParam(name="ip_adapter_image")]
|
||||
block_fn = lambda x, y: print("running the ip-adapter workflow")
|
||||
block_ipa_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a IP-adapter workflow!")
|
||||
|
||||
class AutoIPAdapter(AutoPipelineBlocks):
|
||||
block_classes = [block_ipa_cls]
|
||||
block_names = ["ip-adapter"]
|
||||
block_trigger_inputs = ["ip_adapter_image"]
|
||||
@property
|
||||
def description(self):
|
||||
return "Run IP Adapter step if `ip_adapter_image` is provided."
|
||||
```
|
||||
|
||||
Now let's combine these 2 auto blocks together into a `SequentialPipelineBlocks`:
|
||||
|
||||
```py
|
||||
auto_ipa_blocks = AutoIPAdapter()
|
||||
blocks_dict = InsertableDict()
|
||||
blocks_dict["ip-adapter"] = auto_ipa_blocks
|
||||
blocks_dict["image-generation"] = auto_blocks
|
||||
all_blocks = SequentialPipelineBlocks.from_blocks_dict(blocks_dict)
|
||||
pipeline = all_blocks.init_pipeline()
|
||||
```
|
||||
|
||||
Let's take a look: now things get more confusing. In this particular example, you could still try to explain the conditional logic in the `description` field here - there are only 4 possible execution paths so it's doable. However, since this is a `SequentialPipelineBlocks` that could contain many more blocks, the complexity can quickly get out of hand as the number of blocks increases.
|
||||
|
||||
```py
|
||||
>>> all_blocks
|
||||
SequentialPipelineBlocks(
|
||||
Class: ModularPipelineBlocks
|
||||
|
||||
====================================================================================================
|
||||
This pipeline contains blocks that are selected at runtime based on inputs.
|
||||
Trigger Inputs: ['image', 'mask', 'ip_adapter_image']
|
||||
Use `get_execution_blocks()` with input names to see selected blocks (e.g. `get_execution_blocks('image')`).
|
||||
====================================================================================================
|
||||
|
||||
|
||||
Description:
|
||||
|
||||
|
||||
Sub-Blocks:
|
||||
[0] ip-adapter (AutoIPAdapter)
|
||||
Description: Run IP Adapter step if `ip_adapter_image` is provided.
|
||||
|
||||
|
||||
[1] image-generation (AutoImageBlocks)
|
||||
Description: Pipeline generates images given different types of conditions!
|
||||
This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
|
||||
- inpaint workflow is run when `mask` is provided.
|
||||
- img2img workflow is run when `image` is provided (but only when `mask` is not provided).
|
||||
- text2img workflow is run when neither `image` nor `mask` is provided.
|
||||
|
||||
|
||||
)
|
||||
|
||||
```
|
||||
|
||||
This is when the `get_execution_blocks()` method comes in handy - it basically extracts a `SequentialPipelineBlocks` that only contains the blocks that are actually run based on your inputs.
|
||||
|
||||
Let's try some examples:
|
||||
|
||||
`mask`: we expect it to skip the first ip-adapter since `ip_adapter_image` is not provided, and then run the inpaint for the second block.
|
||||
|
||||
```py
|
||||
>>> all_blocks.get_execution_blocks('mask')
|
||||
SequentialPipelineBlocks(
|
||||
Class: ModularPipelineBlocks
|
||||
|
||||
Description:
|
||||
|
||||
|
||||
Sub-Blocks:
|
||||
[0] image-generation (TestBlock)
|
||||
Description: I'm a inpaint workflow!
|
||||
|
||||
)
|
||||
```
|
||||
|
||||
Let's also actually run the pipeline to confirm:
|
||||
|
||||
```py
|
||||
>>> _ = pipeline(mask="mask")
|
||||
skipping auto block: AutoIPAdapter
|
||||
running the inpaint workflow
|
||||
```
|
||||
|
||||
Try a few more:
|
||||
|
||||
```py
|
||||
print(f"inputs: ip_adapter_image:")
|
||||
blocks_select = all_blocks.get_execution_blocks('ip_adapter_image')
|
||||
print(f"expected_execution_blocks: {blocks_select}")
|
||||
print(f"actual execution blocks:")
|
||||
_ = pipeline(ip_adapter_image="ip_adapter_image", prompt="prompt")
|
||||
# expect to see ip-adapter + text2img
|
||||
|
||||
print(f"inputs: image:")
|
||||
blocks_select = all_blocks.get_execution_blocks('image')
|
||||
print(f"expected_execution_blocks: {blocks_select}")
|
||||
print(f"actual execution blocks:")
|
||||
_ = pipeline(image="image", prompt="prompt")
|
||||
# expect to see img2img
|
||||
|
||||
print(f"inputs: prompt:")
|
||||
blocks_select = all_blocks.get_execution_blocks('prompt')
|
||||
print(f"expected_execution_blocks: {blocks_select}")
|
||||
print(f"actual execution blocks:")
|
||||
_ = pipeline(prompt="prompt")
|
||||
# expect to see text2img (prompt is not a trigger input so fallback to default)
|
||||
|
||||
print(f"inputs: mask + ip_adapter_image:")
|
||||
blocks_select = all_blocks.get_execution_blocks('mask','ip_adapter_image')
|
||||
print(f"expected_execution_blocks: {blocks_select}")
|
||||
print(f"actual execution blocks:")
|
||||
_ = pipeline(mask="mask", ip_adapter_image="ip_adapter_image")
|
||||
# expect to see ip-adapter + inpaint
|
||||
```
|
||||
|
||||
In summary, `AutoPipelineBlocks` is a good tool for packaging multiple workflows into a single, convenient interface and it can greatly simplify the user experience. However, always provide clear descriptions explaining the conditional logic, test individual pipelines first before combining them, and use `get_execution_blocks()` to understand runtime behavior in complex compositions.
|
||||
@@ -174,36 +174,39 @@ Feel free to open an issue if dynamic compilation doesn't work as expected for a
|
||||
|
||||
### Regional compilation
|
||||
|
||||
[Regional compilation](https://docs.pytorch.org/tutorials/recipes/regional_compilation.html) trims cold-start latency by only compiling the *small and frequently-repeated block(s)* of a model - typically a transformer layer - and enables reusing compiled artifacts for every subsequent occurrence.
|
||||
For many diffusion architectures, this delivers the same runtime speedups as full-graph compilation and reduces compile time by 8–10x.
|
||||
|
||||
Use the [`~ModelMixin.compile_repeated_blocks`] method, a helper that wraps `torch.compile`, on any component such as the transformer model as shown below.
|
||||
[Regional compilation](https://docs.pytorch.org/tutorials/recipes/regional_compilation.html) trims cold-start latency by compiling **only the small, frequently-repeated block(s)** of a model, typically a Transformer layer, enabling reuse of compiled artifacts for every subsequent occurrence.
|
||||
For many diffusion architectures this delivers the *same* runtime speed-ups as full-graph compilation yet cuts compile time by **8–10 ×**.
|
||||
|
||||
To make this effortless, [`ModelMixin`] exposes [`ModelMixin.compile_repeated_blocks`] API, a helper that wraps `torch.compile` around any sub-modules you designate as repeatable:
|
||||
|
||||
```py
|
||||
# pip install -U diffusers
|
||||
import torch
|
||||
from diffusers import StableDiffusionXLPipeline
|
||||
|
||||
pipeline = StableDiffusionXLPipeline.from_pretrained(
|
||||
pipe = StableDiffusionXLPipeline.from_pretrained(
|
||||
"stabilityai/stable-diffusion-xl-base-1.0",
|
||||
torch_dtype=torch.float16,
|
||||
).to("cuda")
|
||||
|
||||
# compile only the repeated transformer layers inside the UNet
|
||||
pipeline.unet.compile_repeated_blocks(fullgraph=True)
|
||||
# Compile only the repeated Transformer layers inside the UNet
|
||||
pipe.unet.compile_repeated_blocks(fullgraph=True)
|
||||
```
|
||||
|
||||
To enable regional compilation for a new model, add a `_repeated_blocks` attribute to a model class containing the class names (as strings) of the blocks you want to compile.
|
||||
To enable a new model with regional compilation, add a `_repeated_blocks` attribute to your model class containing the class names (as strings) of the blocks you want compiled:
|
||||
|
||||
|
||||
```py
|
||||
class MyUNet(ModelMixin):
|
||||
_repeated_blocks = ("Transformer2DModel",) # ← compiled by default
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> For more regional compilation examples, see the reference [PR](https://github.com/huggingface/diffusers/pull/11705).
|
||||
For more examples, see the reference [PR](https://github.com/huggingface/diffusers/pull/11705).
|
||||
|
||||
**Relation to Accelerate compile_regions** There is also a separate API in [accelerate](https://huggingface.co/docs/accelerate/index) - [compile_regions](https://github.com/huggingface/accelerate/blob/273799c85d849a1954a4f2e65767216eb37fa089/src/accelerate/utils/other.py#L78). It takes a fully automatic approach: it walks the module, picks candidate blocks, then compiles the remaining graph separately. That hands-off experience is handy for quick experiments, but it also leaves fewer knobs when you want to fine-tune which blocks are compiled or adjust compilation flags.
|
||||
|
||||
|
||||
There is also a [compile_regions](https://github.com/huggingface/accelerate/blob/273799c85d849a1954a4f2e65767216eb37fa089/src/accelerate/utils/other.py#L78) method in [Accelerate](https://huggingface.co/docs/accelerate/index) that automatically selects candidate blocks in a model to compile. The remaining graph is compiled separately. This is useful for quick experiments because there aren't as many options for you to set which blocks to compile or adjust compilation flags.
|
||||
|
||||
```py
|
||||
# pip install -U accelerate
|
||||
@@ -216,8 +219,8 @@ pipeline = StableDiffusionXLPipeline.from_pretrained(
|
||||
).to("cuda")
|
||||
pipeline.unet = compile_regions(pipeline.unet, mode="reduce-overhead", fullgraph=True)
|
||||
```
|
||||
`compile_repeated_blocks`, by contrast, is intentionally explicit. You list the repeated blocks once (via `_repeated_blocks`) and the helper compiles exactly those, nothing more. In practice this small dose of control hits a sweet spot for diffusion models: predictable behavior, easy reasoning about cache reuse, and still a one-liner for users.
|
||||
|
||||
[`~ModelMixin.compile_repeated_blocks`] is intentionally explicit. List the blocks to repeat in `_repeated_blocks` and the helper only compiles those blocks. It offers predictable behavior and easy reasoning about cache reuse in one line of code.
|
||||
|
||||
### Graph breaks
|
||||
|
||||
@@ -239,12 +242,6 @@ The `step()` function is [called](https://github.com/huggingface/diffusers/blob/
|
||||
|
||||
In general, the `sigmas` should [stay on the CPU](https://github.com/huggingface/diffusers/blob/35a969d297cba69110d175ee79c59312b9f49e1e/src/diffusers/schedulers/scheduling_euler_discrete.py#L240) to avoid the communication sync and latency.
|
||||
|
||||
<Tip>
|
||||
|
||||
Refer to the [torch.compile and Diffusers: A Hands-On Guide to Peak Performance](https://pytorch.org/blog/torch-compile-and-diffusers-a-hands-on-guide-to-peak-performance/) blog post for maximizing performance with `torch.compile` for diffusion models.
|
||||
|
||||
</Tip>
|
||||
|
||||
### Benchmarks
|
||||
|
||||
Refer to the [diffusers/benchmarks](https://huggingface.co/datasets/diffusers/benchmarks) dataset to see inference latency and memory usage data for compiled pipelines.
|
||||
@@ -299,11 +296,3 @@ An input is projected into three subspaces, represented by the projection matric
|
||||
```py
|
||||
pipeline.fuse_qkv_projections()
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- Read the [Presenting Flux Fast: Making Flux go brrr on H100s](https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/) blog post to learn more about how you can combine all of these optimizations with [TorchInductor](https://docs.pytorch.org/docs/stable/torch.compiler.html) and [AOTInductor](https://docs.pytorch.org/docs/stable/torch.compiler_aot_inductor.html) for a ~2.5x speedup using recipes from [flux-fast](https://github.com/huggingface/flux-fast).
|
||||
|
||||
These recipes support AMD hardware and [Flux.1 Kontext Dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev).
|
||||
- Read the [torch.compile and Diffusers: A Hands-On Guide to Peak Performance](https://pytorch.org/blog/torch-compile-and-diffusers-a-hands-on-guide-to-peak-performance/) blog post
|
||||
to maximize performance when using `torch.compile`.
|
||||
@@ -14,9 +14,6 @@ specific language governing permissions and limitations under the License.
|
||||
|
||||
Optimizing models often involves trade-offs between [inference speed](./fp16) and [memory-usage](./memory). For instance, while [caching](./cache) can boost inference speed, it also increases memory consumption since it needs to store the outputs of intermediate attention layers. A more balanced optimization strategy combines quantizing a model, [torch.compile](./fp16#torchcompile) and various [offloading methods](./memory#offloading).
|
||||
|
||||
> [!TIP]
|
||||
> Check the [torch.compile](./fp16#torchcompile) guide to learn more about compilation and how they can be applied here. For example, regional compilation can significantly reduce compilation time without giving up any speedups.
|
||||
|
||||
For image generation, combining quantization and [model offloading](./memory#model-offloading) can often give the best trade-off between quality, speed, and memory. Group offloading is not as effective for image generation because it is usually not possible to *fully* overlap data transfer if the compute kernel finishes faster. This results in some communication overhead between the CPU and GPU.
|
||||
|
||||
For video generation, combining quantization and [group-offloading](./memory#group-offloading) tends to be better because video models are more compute-bound.
|
||||
@@ -28,7 +25,7 @@ The table below provides a comparison of optimization strategy combinations and
|
||||
| quantization | 32.602 | 14.9453 |
|
||||
| quantization, torch.compile | 25.847 | 14.9448 |
|
||||
| quantization, torch.compile, model CPU offloading | 32.312 | 12.2369 |
|
||||
<small>These results are benchmarked on Flux with a RTX 4090. The transformer and text_encoder components are quantized. Refer to the [benchmarking script](https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d) if you're interested in evaluating your own model.</small>
|
||||
<small>These results are benchmarked on Flux with a RTX 4090. The transformer and text_encoder components are quantized. Refer to the <a href="https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d" benchmarking script</a> if you're interested in evaluating your own model.</small>
|
||||
|
||||
This guide will show you how to compile and offload a quantized model with [bitsandbytes](../quantization/bitsandbytes#torchcompile). Make sure you are using [PyTorch nightly](https://pytorch.org/get-started/locally/) and the latest version of bitsandbytes.
|
||||
|
||||
|
||||
@@ -53,16 +53,6 @@ image = pipe(prompt, generator=torch.manual_seed(0)).images[0]
|
||||
image.save("flux-gguf.png")
|
||||
```
|
||||
|
||||
## Using Optimized CUDA Kernels with GGUF
|
||||
|
||||
Optimized CUDA kernels can accelerate GGUF quantized model inference by approximately 10%. This functionality requires a compatible GPU with `torch.cuda.get_device_capability` greater than 7 and the kernels library:
|
||||
|
||||
```shell
|
||||
pip install -U kernels
|
||||
```
|
||||
|
||||
Once installed, set `DIFFUSERS_GGUF_CUDA_KERNELS=true` to use optimized kernels when available. Note that CUDA kernels may introduce minor numerical differences compared to the original GGUF implementation, potentially causing subtle visual variations in generated images. To disable CUDA kernel usage, set the environment variable `DIFFUSERS_GGUF_CUDA_KERNELS=false`.
|
||||
|
||||
## Supported Quantization Types
|
||||
|
||||
- BF16
|
||||
@@ -77,44 +67,3 @@ Once installed, set `DIFFUSERS_GGUF_CUDA_KERNELS=true` to use optimized kernels
|
||||
- Q5_K
|
||||
- Q6_K
|
||||
|
||||
## Convert to GGUF
|
||||
|
||||
Use the Space below to convert a Diffusers checkpoint into the GGUF format for inference.
|
||||
run conversion:
|
||||
|
||||
<iframe
|
||||
src="https://diffusers-internal-dev-diffusers-to-gguf.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
|
||||
```py
|
||||
import torch
|
||||
|
||||
from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig
|
||||
|
||||
ckpt_path = (
|
||||
"https://huggingface.co/sayakpaul/different-lora-from-civitai/blob/main/flux_dev_diffusers-q4_0.gguf"
|
||||
)
|
||||
transformer = FluxTransformer2DModel.from_single_file(
|
||||
ckpt_path,
|
||||
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
|
||||
config="black-forest-labs/FLUX.1-dev",
|
||||
subfolder="transformer",
|
||||
torch_dtype=torch.bfloat16,
|
||||
)
|
||||
pipe = FluxPipeline.from_pretrained(
|
||||
"black-forest-labs/FLUX.1-dev",
|
||||
transformer=transformer,
|
||||
torch_dtype=torch.bfloat16,
|
||||
)
|
||||
pipe.enable_model_cpu_offload()
|
||||
prompt = "A cat holding a sign that says hello world"
|
||||
image = pipe(prompt, generator=torch.manual_seed(0)).images[0]
|
||||
image.save("flux-gguf.png")
|
||||
```
|
||||
|
||||
When using Diffusers format GGUF checkpoints, it's a must to provide the model `config` path. If the
|
||||
model config resides in a `subfolder`, that needs to be specified, too.
|
||||
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License.
|
||||
|
||||
-->
|
||||
|
||||
# Getting started
|
||||
# Quantization
|
||||
|
||||
Quantization focuses on representing data with fewer bits while also trying to preserve the precision of the original data. This often means converting a data type to represent the same information with fewer bits. For example, if your model weights are stored as 32-bit floating points and they're quantized to 16-bit floating points, this halves the model size which makes it easier to store and reduces memory usage. Lower precision can also speedup inference because it takes less time to perform calculations with fewer bits.
|
||||
|
||||
@@ -19,25 +19,19 @@ Diffusers supports multiple quantization backends to make large diffusion models
|
||||
|
||||
## Pipeline-level quantization
|
||||
|
||||
There are two ways to use [`~quantizers.PipelineQuantizationConfig`] depending on how much customization you want to apply to the quantization configuration.
|
||||
There are two ways you can use [`~quantizers.PipelineQuantizationConfig`] depending on the level of control you want over the quantization specifications of each model in the pipeline.
|
||||
|
||||
- for basic use cases, define the `quant_backend`, `quant_kwargs`, and `components_to_quantize` arguments
|
||||
- for granular quantization control, define a `quant_mapping` that provides the quantization configuration for individual model components
|
||||
- for more basic and simple use cases, you only need to define the `quant_backend`, `quant_kwargs`, and `components_to_quantize`
|
||||
- for more granular quantization control, provide a `quant_mapping` that provides the quantization specifications for the individual model components
|
||||
|
||||
### Basic quantization
|
||||
### Simple quantization
|
||||
|
||||
Initialize [`~quantizers.PipelineQuantizationConfig`] with the following parameters.
|
||||
|
||||
- `quant_backend` specifies which quantization backend to use. Currently supported backends include: `bitsandbytes_4bit`, `bitsandbytes_8bit`, `gguf`, `quanto`, and `torchao`.
|
||||
- `quant_kwargs` specifies the quantization arguments to use.
|
||||
|
||||
> [!TIP]
|
||||
> These `quant_kwargs` arguments are different for each backend. Refer to the [Quantization API](../api/quantization) docs to view the arguments for each backend.
|
||||
|
||||
- `quant_kwargs` contains the specific quantization arguments to use.
|
||||
- `components_to_quantize` specifies which components of the pipeline to quantize. Typically, you should quantize the most compute intensive components like the transformer. The text encoder is another component to consider quantizing if a pipeline has more than one such as [`FluxPipeline`]. The example below quantizes the T5 text encoder in [`FluxPipeline`] while keeping the CLIP model intact.
|
||||
|
||||
The example below loads the bitsandbytes backend with the following arguments from [`~quantizers.quantization_config.BitsAndBytesConfig`], `load_in_4bit`, `bnb_4bit_quant_type`, and `bnb_4bit_compute_dtype`.
|
||||
|
||||
```py
|
||||
import torch
|
||||
from diffusers import DiffusionPipeline
|
||||
@@ -62,13 +56,13 @@ pipe = DiffusionPipeline.from_pretrained(
|
||||
image = pipe("photo of a cute dog").images[0]
|
||||
```
|
||||
|
||||
### Advanced quantization
|
||||
### quant_mapping
|
||||
|
||||
The `quant_mapping` argument provides more options for how to quantize each individual component in a pipeline, like combining different quantization backends.
|
||||
The `quant_mapping` argument provides more flexible options for how to quantize each individual component in a pipeline, like combining different quantization backends.
|
||||
|
||||
Initialize [`~quantizers.PipelineQuantizationConfig`] and pass a `quant_mapping` to it. The `quant_mapping` allows you to specify the quantization options for each component in the pipeline such as the transformer and text encoder.
|
||||
|
||||
The example below uses two quantization backends, [`~quantizers.quantization_config.QuantoConfig`] and [`transformers.BitsAndBytesConfig`], for the transformer and text encoder.
|
||||
The example below uses two quantization backends, [`~quantizers.QuantoConfig`] and [`transformers.BitsAndBytesConfig`], for the transformer and text encoder.
|
||||
|
||||
```py
|
||||
import torch
|
||||
@@ -91,7 +85,7 @@ pipeline_quant_config = PipelineQuantizationConfig(
|
||||
There is a separate bitsandbytes backend in [Transformers](https://huggingface.co/docs/transformers/main_classes/quantization#transformers.BitsAndBytesConfig). You need to import and use [`transformers.BitsAndBytesConfig`] for components that come from Transformers. For example, `text_encoder_2` in [`FluxPipeline`] is a [`~transformers.T5EncoderModel`] from Transformers so you need to use [`transformers.BitsAndBytesConfig`] instead of [`diffusers.BitsAndBytesConfig`].
|
||||
|
||||
> [!TIP]
|
||||
> Use the [basic quantization](#basic-quantization) method above if you don't want to manage these distinct imports or aren't sure where each pipeline component comes from.
|
||||
> Use the [simple quantization](#simple-quantization) method above if you don't want to manage these distinct imports or aren't sure where each pipeline component comes from.
|
||||
|
||||
```py
|
||||
import torch
|
||||
@@ -135,4 +129,4 @@ Check out the resources below to learn more about quantization.
|
||||
|
||||
- The Transformers quantization [Overview](https://huggingface.co/docs/transformers/quantization/overview#when-to-use-what) provides an overview of the pros and cons of different quantization backends.
|
||||
|
||||
- Read the [Exploring Quantization Backends in Diffusers](https://huggingface.co/blog/diffusers-quantization) blog post for a brief introduction to each quantization backend, how to choose a backend, and combining quantization with other memory optimizations.
|
||||
- Read the [Exploring Quantization Backends in Diffusers](https://huggingface.co/blog/diffusers-quantization) blog post for a brief introduction to each quantization backend, how to choose a backend, and combining quantization with other memory optimizations.
|
||||
@@ -145,10 +145,10 @@ When running `accelerate config`, if you use torch.compile, there can be dramati
|
||||
If you would like to push your model to the Hub after training is completed with a neat model card, make sure you're logged in:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
|
||||
# Alternatively, you could upload your model manually using:
|
||||
# hf upload my-cool-account-name/my-cool-lora-name /path/to/awesome/lora
|
||||
# huggingface-cli upload my-cool-account-name/my-cool-lora-name /path/to/awesome/lora
|
||||
```
|
||||
|
||||
Make sure your data is prepared as described in [Data Preparation](#data-preparation). When ready, you can begin training!
|
||||
|
||||
@@ -67,7 +67,7 @@ dataset = load_dataset(
|
||||
Then use the [`~datasets.Dataset.push_to_hub`] method to upload the dataset to the Hub:
|
||||
|
||||
```python
|
||||
# assuming you have ran the hf auth login command in a terminal
|
||||
# assuming you have ran the huggingface-cli login command in a terminal
|
||||
dataset.push_to_hub("name_of_your_dataset")
|
||||
|
||||
# if you want to push to a private repo, simply pass private=True:
|
||||
|
||||
@@ -42,7 +42,7 @@ We encourage you to share your model with the community, and in order to do that
|
||||
Or login in from the terminal:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
Since the model checkpoints are quite large, install [Git-LFS](https://git-lfs.com/) to version these large files:
|
||||
|
||||
@@ -0,0 +1,23 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Overview
|
||||
|
||||
Welcome to 🧨 Diffusers! If you're new to diffusion models and generative AI, and want to learn more, then you've come to the right place. These beginner-friendly tutorials are designed to provide a gentle introduction to diffusion models and help you understand the library fundamentals - the core components and how 🧨 Diffusers is meant to be used.
|
||||
|
||||
You'll learn how to use a pipeline for inference to rapidly generate things, and then deconstruct that pipeline to really understand how to use the library as a modular toolbox for building your own diffusion systems. In the next lesson, you'll learn how to train your own diffusion model to generate what you want.
|
||||
|
||||
After completing the tutorials, you'll have gained the necessary skills to start exploring the library on your own and see how to use it for your own projects and applications.
|
||||
|
||||
Feel free to join our community on [Discord](https://discord.com/invite/JfAtkvEtRb) or the [forums](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) to connect and collaborate with other users and developers!
|
||||
|
||||
Let's start diffusing! 🧨
|
||||
@@ -319,19 +319,6 @@ If you expect to varied resolutions during inference with this feature, then mak
|
||||
|
||||
There are still scenarios where recompulation is unavoidable, such as when the hotswapped LoRA targets more layers than the initial adapter. Try to load the LoRA that targets the most layers *first*. For more details about this limitation, refer to the PEFT [hotswapping](https://huggingface.co/docs/peft/main/en/package_reference/hotswap#peft.utils.hotswap.hotswap_adapter) docs.
|
||||
|
||||
<details>
|
||||
<summary>Technical details of hotswapping</summary>
|
||||
|
||||
The [`~loaders.lora_base.LoraBaseMixin.enable_lora_hotswap`] method converts the LoRA scaling factor from floats to torch.tensors and pads the shape of the weights to the largest required shape to avoid reassigning the whole attribute when the data in the weights are replaced.
|
||||
|
||||
This is why the `max_rank` argument is important. The results are unchanged even when the values are padded with zeros. Computation may be slower though depending on the padding size.
|
||||
|
||||
Since no new LoRA attributes are added, each subsequent LoRA is only allowed to target the same layers, or subset of layers, the first LoRA targets. Choosing the LoRA loading order is important because if the LoRAs target disjoint layers, you may end up creating a dummy LoRA that targets the union of all target layers.
|
||||
|
||||
For more implementation details, take a look at the [`hotswap.py`](https://github.com/huggingface/peft/blob/92d65cafa51c829484ad3d95cf71d09de57ff066/src/peft/utils/hotswap.py) file.
|
||||
|
||||
</details>
|
||||
|
||||
## Merge
|
||||
|
||||
The weights from each LoRA can be merged together to produce a blend of multiple existing styles. There are several methods for merging LoRAs, each of which differ in *how* the weights are merged (may affect generation quality).
|
||||
@@ -686,6 +673,4 @@ Browse the [LoRA Studio](https://lorastudio.co/models) for different LoRAs to us
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
You can find additional LoRAs in the [FLUX LoRA the Explorer](https://huggingface.co/spaces/multimodalart/flux-lora-the-explorer) and [LoRA the Explorer](https://huggingface.co/spaces/multimodalart/LoraTheExplorer) Spaces.
|
||||
|
||||
Check out the [Fast LoRA inference for Flux with Diffusers and PEFT](https://huggingface.co/blog/lora-fast) blog post to learn how to optimize LoRA inference with methods like FlashAttention-3 and fp8 quantization.
|
||||
You can find additional LoRAs in the [FLUX LoRA the Explorer](https://huggingface.co/spaces/multimodalart/flux-lora-the-explorer) and [LoRA the Explorer](https://huggingface.co/spaces/multimodalart/LoraTheExplorer) Spaces.
|
||||
@@ -0,0 +1,18 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Overview
|
||||
|
||||
The inference pipeline supports and enables a wide range of techniques that are divided into two categories:
|
||||
|
||||
* Pipeline functionality: these techniques modify the pipeline or extend it for other applications. For example, pipeline callbacks add new features to a pipeline and a pipeline can also be extended for distributed inference.
|
||||
* Improve inference quality: these techniques increase the visual quality of the generated images. For example, you can enhance your prompts with GPT2 to create better images with lower effort.
|
||||
@@ -37,7 +37,7 @@ Diffusers는 Stable Diffusion 추론을 위해 PyTorch `mps`를 사용해 Apple
|
||||
|
||||
|
||||
```python
|
||||
# `hf auth login`에 로그인되어 있음을 확인
|
||||
# `huggingface-cli login`에 로그인되어 있음을 확인
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5")
|
||||
|
||||
@@ -75,7 +75,7 @@ dataset = load_dataset(
|
||||
[push_to_hub(https://huggingface.co/docs/datasets/v2.13.1/en/package_reference/main_classes#datasets.Dataset.push_to_hub) 을 사용해서 Hub에 데이터셋을 업로드 합니다:
|
||||
|
||||
```python
|
||||
# 터미널에서 hf auth login 커맨드를 이미 실행했다고 가정합니다
|
||||
# 터미널에서 huggingface-cli login 커맨드를 이미 실행했다고 가정합니다
|
||||
dataset.push_to_hub("name_of_your_dataset")
|
||||
|
||||
# 개인 repo로 push 하고 싶다면, `private=True` 을 추가하세요:
|
||||
|
||||
@@ -39,7 +39,7 @@ specific language governing permissions and limitations under the License.
|
||||
모델을 저장하거나 커뮤니티와 공유하려면 Hugging Face 계정에 로그인하세요(아직 계정이 없는 경우 [생성](https://huggingface.co/join)하세요):
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
## Text-to-image
|
||||
|
||||
@@ -42,7 +42,7 @@ Unconditional 이미지 생성은 학습에 사용된 데이터셋과 유사한
|
||||
또는 터미널로 로그인할 수 있습니다:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
모델 체크포인트가 상당히 크기 때문에 [Git-LFS](https://git-lfs.com/)에서 대용량 파일의 버전 관리를 할 수 있습니다.
|
||||
|
||||
@@ -42,7 +42,7 @@ Stable Diffusion 모델들은 학습 및 저장된 프레임워크와 다운로
|
||||
시작하기 전에 스크립트를 실행할 🤗 Diffusers의 로컬 클론(clone)이 있는지 확인하고 Hugging Face 계정에 로그인하여 pull request를 열고 변환된 모델을 허브에 푸시할 수 있도록 하세요.
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
스크립트를 사용하려면:
|
||||
|
||||
@@ -69,7 +69,7 @@ Note also that we use PEFT library as backend for LoRA training, make sure to ha
|
||||
|
||||
Lastly, we recommend logging into your HF account so that your trained LoRA is automatically uploaded to the hub:
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
This command will prompt you for a token. Copy-paste yours from your [settings/tokens](https://huggingface.co/settings/tokens),and press Enter.
|
||||
|
||||
|
||||
@@ -67,7 +67,7 @@ Note also that we use PEFT library as backend for LoRA training, make sure to ha
|
||||
|
||||
Lastly, we recommend logging into your HF account so that your trained LoRA is automatically uploaded to the hub:
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
This command will prompt you for a token. Copy-paste yours from your [settings/tokens](https://huggingface.co/settings/tokens),and press Enter.
|
||||
|
||||
|
||||
@@ -13,20 +13,6 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
|
||||
# /// script
|
||||
# dependencies = [
|
||||
# "diffusers @ git+https://github.com/huggingface/diffusers.git",
|
||||
# "torch>=2.0.0",
|
||||
# "accelerate>=0.31.0",
|
||||
# "transformers>=4.41.2",
|
||||
# "ftfy",
|
||||
# "tensorboard",
|
||||
# "Jinja2",
|
||||
# "peft>=0.11.1",
|
||||
# "sentencepiece",
|
||||
# ]
|
||||
# ///
|
||||
|
||||
import argparse
|
||||
import copy
|
||||
import itertools
|
||||
@@ -985,7 +971,6 @@ class DreamBoothDataset(Dataset):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
args,
|
||||
instance_data_root,
|
||||
instance_prompt,
|
||||
class_prompt,
|
||||
@@ -995,8 +980,10 @@ class DreamBoothDataset(Dataset):
|
||||
class_num=None,
|
||||
size=1024,
|
||||
repeats=1,
|
||||
center_crop=False,
|
||||
):
|
||||
self.size = size
|
||||
self.center_crop = center_crop
|
||||
|
||||
self.instance_prompt = instance_prompt
|
||||
self.custom_instance_prompts = None
|
||||
@@ -1071,7 +1058,7 @@ class DreamBoothDataset(Dataset):
|
||||
if interpolation is None:
|
||||
raise ValueError(f"Unsupported interpolation mode {interpolation=}.")
|
||||
train_resize = transforms.Resize(size, interpolation=interpolation)
|
||||
train_crop = transforms.CenterCrop(size) if args.center_crop else transforms.RandomCrop(size)
|
||||
train_crop = transforms.CenterCrop(size) if center_crop else transforms.RandomCrop(size)
|
||||
train_flip = transforms.RandomHorizontalFlip(p=1.0)
|
||||
train_transforms = transforms.Compose(
|
||||
[
|
||||
@@ -1088,11 +1075,11 @@ class DreamBoothDataset(Dataset):
|
||||
# flip
|
||||
image = train_flip(image)
|
||||
if args.center_crop:
|
||||
y1 = max(0, int(round((image.height - self.size) / 2.0)))
|
||||
x1 = max(0, int(round((image.width - self.size) / 2.0)))
|
||||
y1 = max(0, int(round((image.height - args.resolution) / 2.0)))
|
||||
x1 = max(0, int(round((image.width - args.resolution) / 2.0)))
|
||||
image = train_crop(image)
|
||||
else:
|
||||
y1, x1, h, w = train_crop.get_params(image, (self.size, self.size))
|
||||
y1, x1, h, w = train_crop.get_params(image, (args.resolution, args.resolution))
|
||||
image = crop(image, y1, x1, h, w)
|
||||
image = train_transforms(image)
|
||||
self.pixel_values.append(image)
|
||||
@@ -1115,7 +1102,7 @@ class DreamBoothDataset(Dataset):
|
||||
self.image_transforms = transforms.Compose(
|
||||
[
|
||||
transforms.Resize(size, interpolation=interpolation),
|
||||
transforms.CenterCrop(size) if args.center_crop else transforms.RandomCrop(size),
|
||||
transforms.CenterCrop(size) if center_crop else transforms.RandomCrop(size),
|
||||
transforms.ToTensor(),
|
||||
transforms.Normalize([0.5], [0.5]),
|
||||
]
|
||||
@@ -1335,7 +1322,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
@@ -1840,7 +1827,6 @@ def main(args):
|
||||
|
||||
# Dataset and DataLoaders creation:
|
||||
train_dataset = DreamBoothDataset(
|
||||
args=args,
|
||||
instance_data_root=args.instance_data_dir,
|
||||
instance_prompt=args.instance_prompt,
|
||||
train_text_encoder_ti=args.train_text_encoder_ti,
|
||||
@@ -1850,6 +1836,7 @@ def main(args):
|
||||
class_num=args.num_class_images,
|
||||
size=args.resolution,
|
||||
repeats=args.repeats,
|
||||
center_crop=args.center_crop,
|
||||
)
|
||||
|
||||
train_dataloader = torch.utils.data.DataLoader(
|
||||
|
||||
@@ -13,20 +13,6 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
|
||||
# /// script
|
||||
# dependencies = [
|
||||
# "diffusers @ git+https://github.com/huggingface/diffusers.git",
|
||||
# "torch>=2.0.0",
|
||||
# "accelerate>=0.31.0",
|
||||
# "transformers>=4.41.2",
|
||||
# "ftfy",
|
||||
# "tensorboard",
|
||||
# "Jinja2",
|
||||
# "peft>=0.11.1",
|
||||
# "sentencepiece",
|
||||
# ]
|
||||
# ///
|
||||
|
||||
import argparse
|
||||
import gc
|
||||
import hashlib
|
||||
@@ -1064,7 +1050,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -13,20 +13,6 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
|
||||
# /// script
|
||||
# dependencies = [
|
||||
# "diffusers @ git+https://github.com/huggingface/diffusers.git",
|
||||
# "torch>=2.0.0",
|
||||
# "accelerate>=0.31.0",
|
||||
# "transformers>=4.41.2",
|
||||
# "ftfy",
|
||||
# "tensorboard",
|
||||
# "Jinja2",
|
||||
# "peft>=0.11.1",
|
||||
# "sentencepiece",
|
||||
# ]
|
||||
# ///
|
||||
|
||||
import argparse
|
||||
import gc
|
||||
import itertools
|
||||
@@ -1306,7 +1292,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if args.do_edm_style_training and args.snr_gamma is not None:
|
||||
|
||||
@@ -125,10 +125,10 @@ When running `accelerate config`, if we specify torch compile mode to True there
|
||||
If you would like to push your model to the HF Hub after training is completed with a neat model card, make sure you're logged in:
|
||||
|
||||
```
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
|
||||
# Alternatively, you could upload your model manually using:
|
||||
# hf upload my-cool-account-name/my-cool-lora-name /path/to/awesome/lora
|
||||
# huggingface-cli upload my-cool-account-name/my-cool-lora-name /path/to/awesome/lora
|
||||
```
|
||||
|
||||
Make sure your data is prepared as described in [Data Preparation](#data-preparation). When ready, you can begin training!
|
||||
|
||||
@@ -962,7 +962,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
|
||||
@@ -984,7 +984,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
|
||||
@@ -10,7 +10,7 @@ To incorporate additional condition latents, we expand the input features of Cog
|
||||
> As the model is gated, before using it with diffusers you first need to go to the [CogView4 Hugging Face page](https://huggingface.co/THUDM/CogView4-6B), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
The example command below shows how to launch fine-tuning for pose conditions. The dataset ([`raulc0399/open_pose_controlnet`](https://huggingface.co/datasets/raulc0399/open_pose_controlnet)) being used here already has the pose conditions of the original images, so we don't have to compute them.
|
||||
|
||||
@@ -705,7 +705,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_out_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -87,7 +87,6 @@ PIXART-α Controlnet pipeline | Implementation of the controlnet model for pixar
|
||||
| CogVideoX DDIM Inversion Pipeline | Implementation of DDIM inversion and guided attention-based editing denoising process on CogVideoX. | [CogVideoX DDIM Inversion Pipeline](#cogvideox-ddim-inversion-pipeline) | - | [LittleNyima](https://github.com/LittleNyima) |
|
||||
| FaithDiff Stable Diffusion XL Pipeline | Implementation of [(CVPR 2025) FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolutionUnleashing Diffusion Priors for Faithful Image Super-resolution](https://huggingface.co/papers/2411.18824) - FaithDiff is a faithful image super-resolution method that leverages latent diffusion models by actively adapting the diffusion prior and jointly fine-tuning its components (encoder and diffusion model) with an alignment module to ensure high fidelity and structural consistency. | [FaithDiff Stable Diffusion XL Pipeline](#faithdiff-stable-diffusion-xl-pipeline) | [](https://huggingface.co/jychen9811/FaithDiff) | [Junyang Chen, Jinshan Pan, Jiangxin Dong, IMAG Lab, (Adapted by Eliseu Silva)](https://github.com/JyChen9811/FaithDiff) |
|
||||
| Stable Diffusion 3 InstructPix2Pix Pipeline | Implementation of Stable Diffusion 3 InstructPix2Pix Pipeline | [Stable Diffusion 3 InstructPix2Pix Pipeline](#stable-diffusion-3-instructpix2pix-pipeline) | [](https://huggingface.co/BleachNick/SD3_UltraEdit_freeform) [](https://huggingface.co/CaptainZZZ/sd3-instructpix2pix) | [Jiayu Zhang](https://github.com/xduzhangjiayu) and [Haozhe Zhao](https://github.com/HaozheZhao)|
|
||||
| Flux Kontext multiple images | A modified version of the `FluxKontextPipeline` that supports calling Flux Kontext with multiple reference images.| [Flux Kontext multiple input Pipeline](#flux-kontext-multiple-images) | - | [Net-Mist](https://github.com/Net-Mist) |
|
||||
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
|
||||
|
||||
```py
|
||||
@@ -3129,7 +3128,7 @@ from io import BytesIO
|
||||
from diffusers import DiffusionPipeline
|
||||
|
||||
# load the pipeline
|
||||
# make sure you're logged in with `hf auth login`
|
||||
# make sure you're logged in with `huggingface-cli login`
|
||||
model_id_or_path = "stable-diffusion-v1-5/stable-diffusion-v1-5"
|
||||
# can also be used with dreamlike-art/dreamlike-photoreal-2.0
|
||||
pipe = DiffusionPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16, custom_pipeline="pipeline_fabric").to("cuda")
|
||||
@@ -5480,48 +5479,4 @@ edited_image.save("edited_image.png")
|
||||
### Note
|
||||
This model is trained on 512x512, so input size is better on 512x512.
|
||||
For better editing performance, please refer to this powerful model https://huggingface.co/BleachNick/SD3_UltraEdit_freeform and Paper "UltraEdit: Instruction-based Fine-Grained Image
|
||||
Editing at Scale", many thanks to their contribution!
|
||||
|
||||
# Flux Kontext multiple images
|
||||
|
||||
This implementation of Flux Kontext allows users to pass multiple reference images. Each image is encoded separately, and the resulting latent vectors are concatenated.
|
||||
|
||||
As explained in Section 3 of [the paper](https://arxiv.org/pdf/2506.15742), the model's sequence concatenation mechanism can extend its capabilities to handle multiple reference images. However, note that the current version of Flux Kontext was not trained for this use case. In practice, stacking along the first axis does not yield correct results, while stacking along the other two axes appears to work.
|
||||
|
||||
## Example Usage
|
||||
|
||||
This pipeline loads two reference images and generates a new image based on them.
|
||||
|
||||
```python
|
||||
import torch
|
||||
|
||||
from diffusers import FluxKontextPipeline
|
||||
from diffusers.utils import load_image
|
||||
|
||||
|
||||
pipe = FluxKontextPipeline.from_pretrained(
|
||||
"black-forest-labs/FLUX.1-Kontext-dev",
|
||||
torch_dtype=torch.bfloat16,
|
||||
custom_pipeline="pipeline_flux_kontext_multiple_images",
|
||||
)
|
||||
pipe.to("cuda")
|
||||
|
||||
pikachu_image = load_image(
|
||||
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/yarn-art-pikachu.png"
|
||||
).convert("RGB")
|
||||
cat_image = load_image(
|
||||
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"
|
||||
).convert("RGB")
|
||||
|
||||
|
||||
prompts = [
|
||||
"Pikachu and the cat are sitting together at a pizzeria table, enjoying a delicious pizza.",
|
||||
]
|
||||
images = pipe(
|
||||
multiple_images=[(pikachu_image, cat_image)],
|
||||
prompt=prompts,
|
||||
guidance_scale=2.5,
|
||||
generator=torch.Generator().manual_seed(42),
|
||||
).images
|
||||
images[0].save("pizzeria.png")
|
||||
```
|
||||
Editing at Scale", many thanks to their contribution!
|
||||
File diff suppressed because it is too large
Load Diff
@@ -877,7 +877,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -709,7 +709,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -872,7 +872,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -842,7 +842,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -882,7 +882,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -359,7 +359,7 @@ wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/ma
|
||||
We encourage you to store or share your model with the community. To use huggingface hub, please login to your Hugging Face account, or ([create one](https://huggingface.co/docs/diffusers/main/en/training/hf.co/join) if you don’t have one already):
|
||||
|
||||
```sh
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
Make sure you have the `MODEL_DIR`,`OUTPUT_DIR` and `HUB_MODEL_ID` environment variables set. The `OUTPUT_DIR` and `HUB_MODEL_ID` variables specify where to save the model to on the Hub:
|
||||
|
||||
@@ -22,7 +22,7 @@ Here is a gpu memory consumption for reference, tested on a single A100 with 80G
|
||||
|
||||
> **Gated access**
|
||||
>
|
||||
> As the model is gated, before using it with diffusers you first need to go to the [FLUX.1 [dev] Hugging Face page](https://huggingface.co/black-forest-labs/FLUX.1-dev), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in: `hf auth login`
|
||||
> As the model is gated, before using it with diffusers you first need to go to the [FLUX.1 [dev] Hugging Face page](https://huggingface.co/black-forest-labs/FLUX.1-dev), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in: `huggingface-cli login`
|
||||
|
||||
|
||||
## Running locally with PyTorch
|
||||
@@ -88,7 +88,7 @@ wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/ma
|
||||
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png
|
||||
```
|
||||
|
||||
Then run `hf auth login` to log into your Hugging Face account. This is needed to be able to push the trained ControlNet parameters to Hugging Face Hub.
|
||||
Then run `huggingface-cli login` to log into your Hugging Face account. This is needed to be able to push the trained ControlNet parameters to Hugging Face Hub.
|
||||
|
||||
we can define the num_layers, num_single_layers, which determines the size of the control(default values are num_layers=4, num_single_layers=10)
|
||||
|
||||
|
||||
@@ -56,7 +56,7 @@ First download the SD3 model from [Hugging Face Hub](https://huggingface.co/stab
|
||||
> As the model is gated, before using it with diffusers you first need to go to the [Stable Diffusion 3 Medium Hugging Face page](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers) or [Stable Diffusion 3.5 Large Hugging Face page](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
This will also allow us to push the trained model parameters to the Hugging Face Hub platform.
|
||||
|
||||
@@ -58,7 +58,7 @@ wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/ma
|
||||
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png
|
||||
```
|
||||
|
||||
Then run `hf auth login` to log into your Hugging Face account. This is needed to be able to push the trained ControlNet parameters to Hugging Face Hub.
|
||||
Then run `huggingface-cli login` to log into your Hugging Face account. This is needed to be able to push the trained ControlNet parameters to Hugging Face Hub.
|
||||
|
||||
```bash
|
||||
export MODEL_DIR="stabilityai/stable-diffusion-xl-base-1.0"
|
||||
|
||||
@@ -734,7 +734,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -665,7 +665,7 @@ def main():
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging.basicConfig(
|
||||
|
||||
@@ -814,7 +814,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_out_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -928,7 +928,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
@@ -1330,7 +1330,7 @@ def main(args):
|
||||
# controlnet(s) inference
|
||||
controlnet_image = batch["conditioning_pixel_values"].to(dtype=weight_dtype)
|
||||
controlnet_image = vae.encode(controlnet_image).latent_dist.sample()
|
||||
controlnet_image = (controlnet_image - vae.config.shift_factor) * vae.config.scaling_factor
|
||||
controlnet_image = controlnet_image * vae.config.scaling_factor
|
||||
|
||||
control_block_res_samples = controlnet(
|
||||
hidden_states=noisy_model_input,
|
||||
|
||||
@@ -829,7 +829,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -663,7 +663,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -330,7 +330,7 @@ For this example we want to directly store the trained LoRA embeddings on the Hu
|
||||
we need to be logged in and add the `--push_to_hub` flag.
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
Now we can start training!
|
||||
|
||||
@@ -19,7 +19,7 @@ The `train_dreambooth_flux.py` script shows how to implement the training proced
|
||||
> As the model is gated, before using it with diffusers you first need to go to the [FLUX.1 [dev] Hugging Face page](https://huggingface.co/black-forest-labs/FLUX.1-dev), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
This will also allow us to push the trained model parameters to the Hugging Face Hub platform.
|
||||
|
||||
@@ -95,7 +95,7 @@ accelerate launch train_dreambooth_lora_hidream.py \
|
||||
For using `push_to_hub`, make you're logged into your Hugging Face account:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
To better track our training experiments, we're using the following flags in the command above:
|
||||
|
||||
@@ -101,7 +101,7 @@ accelerate launch train_dreambooth_lora_lumina2.py \
|
||||
For using `push_to_hub`, make you're logged into your Hugging Face account:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
To better track our training experiments, we're using the following flags in the command above:
|
||||
|
||||
@@ -1,136 +0,0 @@
|
||||
# DreamBooth training example for Qwen Image
|
||||
|
||||
[DreamBooth](https://huggingface.co/papers/2208.12242) is a method to personalize text2image models like stable diffusion given just a few (3~5) images of a subject.
|
||||
|
||||
The `train_dreambooth_lora_qwen_image.py` script shows how to implement the training procedure with [LoRA](https://huggingface.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora) and adapt it for [Qwen Image](https://huggingface.co/Qwen/Qwen-Image).
|
||||
|
||||
|
||||
This will also allow us to push the trained model parameters to the Hugging Face Hub platform.
|
||||
|
||||
## Running locally with PyTorch
|
||||
|
||||
### Installing the dependencies
|
||||
|
||||
Before running the scripts, make sure to install the library's training dependencies:
|
||||
|
||||
**Important**
|
||||
|
||||
To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/huggingface/diffusers
|
||||
cd diffusers
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
Then cd in the `examples/dreambooth` folder and run
|
||||
```bash
|
||||
pip install -r requirements_sana.txt
|
||||
```
|
||||
|
||||
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
|
||||
|
||||
```bash
|
||||
accelerate config
|
||||
```
|
||||
|
||||
Or for a default accelerate configuration without answering questions about your environment
|
||||
|
||||
```bash
|
||||
accelerate config default
|
||||
```
|
||||
|
||||
Or if your environment doesn't support an interactive shell (e.g., a notebook)
|
||||
|
||||
```python
|
||||
from accelerate.utils import write_basic_config
|
||||
write_basic_config()
|
||||
```
|
||||
|
||||
When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups.
|
||||
Note also that we use PEFT library as backend for LoRA training, make sure to have `peft>=0.14.0` installed in your environment.
|
||||
|
||||
|
||||
### Dog toy example
|
||||
|
||||
Now let's get our dataset. For this example we will use some dog images: https://huggingface.co/datasets/diffusers/dog-example.
|
||||
|
||||
Let's first download it locally:
|
||||
|
||||
```python
|
||||
from huggingface_hub import snapshot_download
|
||||
|
||||
local_dir = "./dog"
|
||||
snapshot_download(
|
||||
"diffusers/dog-example",
|
||||
local_dir=local_dir, repo_type="dataset",
|
||||
ignore_patterns=".gitattributes",
|
||||
)
|
||||
```
|
||||
|
||||
This will also allow us to push the trained LoRA parameters to the Hugging Face Hub platform.
|
||||
|
||||
Now, we can launch training using:
|
||||
|
||||
```bash
|
||||
export MODEL_NAME="Qwen/Qwen-Image"
|
||||
export INSTANCE_DIR="dog"
|
||||
export OUTPUT_DIR="trained-sana-lora"
|
||||
|
||||
accelerate launch train_dreambooth_lora_sana.py \
|
||||
--pretrained_model_name_or_path=$MODEL_NAME \
|
||||
--instance_data_dir=$INSTANCE_DIR \
|
||||
--output_dir=$OUTPUT_DIR \
|
||||
--mixed_precision="bf16" \
|
||||
--instance_prompt="a photo of sks dog" \
|
||||
--resolution=1024 \
|
||||
--train_batch_size=1 \
|
||||
--gradient_accumulation_steps=4 \
|
||||
--use_8bit_adam \
|
||||
--learning_rate=2e-4 \
|
||||
--report_to="wandb" \
|
||||
--lr_scheduler="constant" \
|
||||
--lr_warmup_steps=0 \
|
||||
--max_train_steps=500 \
|
||||
--validation_prompt="A photo of sks dog in a bucket" \
|
||||
--validation_epochs=25 \
|
||||
--seed="0" \
|
||||
--push_to_hub
|
||||
```
|
||||
|
||||
For using `push_to_hub`, make you're logged into your Hugging Face account:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
```
|
||||
|
||||
To better track our training experiments, we're using the following flags in the command above:
|
||||
|
||||
* `report_to="wandb` will ensure the training runs are tracked on [Weights and Biases](https://wandb.ai/site). To use it, be sure to install `wandb` with `pip install wandb`. Don't forget to call `wandb login <your_api_key>` before training if you haven't done it before.
|
||||
* `validation_prompt` and `validation_epochs` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
|
||||
|
||||
## Notes
|
||||
|
||||
Additionally, we welcome you to explore the following CLI arguments:
|
||||
|
||||
* `--lora_layers`: The transformer modules to apply LoRA training on. Please specify the layers in a comma separated. E.g. - "to_k,to_q,to_v" will result in lora training of attention layers only.
|
||||
* `--max_sequence_length`: Maximum sequence length to use for text embeddings.
|
||||
|
||||
We provide several options for optimizing memory optimization:
|
||||
|
||||
* `--offload`: When enabled, we will offload the text encoder and VAE to CPU, when they are not used.
|
||||
* `cache_latents`: When enabled, we will pre-compute the latents from the input images with the VAE and remove the VAE from memory once done.
|
||||
* `--use_8bit_adam`: When enabled, we will use the 8bit version of AdamW provided by the `bitsandbytes` library.
|
||||
|
||||
Refer to the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwenimage) of the `QwenImagePipeline` to know more about the models available under the SANA family and their preferred dtypes during inference.
|
||||
|
||||
## Using quantization
|
||||
|
||||
You can quantize the base model with [`bitsandbytes`](https://huggingface.co/docs/bitsandbytes/index) to reduce memory usage. To do so, pass a JSON file path to `--bnb_quantization_config_path`. This file should hold the configuration to initialize `BitsAndBytesConfig`. Below is an example JSON file:
|
||||
|
||||
```json
|
||||
{
|
||||
"load_in_4bit": true,
|
||||
"bnb_4bit_quant_type": "nf4"
|
||||
}
|
||||
```
|
||||
@@ -101,7 +101,7 @@ accelerate launch train_dreambooth_lora_sana.py \
|
||||
For using `push_to_hub`, make you're logged into your Hugging Face account:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
To better track our training experiments, we're using the following flags in the command above:
|
||||
|
||||
@@ -8,7 +8,7 @@ The `train_dreambooth_sd3.py` script shows how to implement the training procedu
|
||||
> As the model is gated, before using it with diffusers you first need to go to the [Stable Diffusion 3 Medium Hugging Face page](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
This will also allow us to push the trained model parameters to the Hugging Face Hub platform.
|
||||
|
||||
@@ -1,248 +0,0 @@
|
||||
# coding=utf-8
|
||||
# Copyright 2025 HuggingFace Inc.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
|
||||
import safetensors
|
||||
|
||||
from diffusers.loaders.lora_base import LORA_ADAPTER_METADATA_KEY
|
||||
|
||||
|
||||
sys.path.append("..")
|
||||
from test_examples_utils import ExamplesTestsAccelerate, run_command # noqa: E402
|
||||
|
||||
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
|
||||
logger = logging.getLogger()
|
||||
stream_handler = logging.StreamHandler(sys.stdout)
|
||||
logger.addHandler(stream_handler)
|
||||
|
||||
|
||||
class DreamBoothLoRAQwenImage(ExamplesTestsAccelerate):
|
||||
instance_data_dir = "docs/source/en/imgs"
|
||||
instance_prompt = "photo"
|
||||
pretrained_model_name_or_path = "hf-internal-testing/tiny-qwenimage-pipe"
|
||||
script_path = "examples/dreambooth/train_dreambooth_lora_qwen_image.py"
|
||||
transformer_layer_type = "transformer_blocks.0.attn.to_k"
|
||||
|
||||
def test_dreambooth_lora_qwen(self):
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
test_args = f"""
|
||||
{self.script_path}
|
||||
--pretrained_model_name_or_path {self.pretrained_model_name_or_path}
|
||||
--instance_data_dir {self.instance_data_dir}
|
||||
--instance_prompt {self.instance_prompt}
|
||||
--resolution 64
|
||||
--train_batch_size 1
|
||||
--gradient_accumulation_steps 1
|
||||
--max_train_steps 2
|
||||
--learning_rate 5.0e-04
|
||||
--scale_lr
|
||||
--lr_scheduler constant
|
||||
--lr_warmup_steps 0
|
||||
--output_dir {tmpdir}
|
||||
""".split()
|
||||
|
||||
run_command(self._launch_args + test_args)
|
||||
# save_pretrained smoke test
|
||||
self.assertTrue(os.path.isfile(os.path.join(tmpdir, "pytorch_lora_weights.safetensors")))
|
||||
|
||||
# make sure the state_dict has the correct naming in the parameters.
|
||||
lora_state_dict = safetensors.torch.load_file(os.path.join(tmpdir, "pytorch_lora_weights.safetensors"))
|
||||
is_lora = all("lora" in k for k in lora_state_dict.keys())
|
||||
self.assertTrue(is_lora)
|
||||
|
||||
# when not training the text encoder, all the parameters in the state dict should start
|
||||
# with `"transformer"` in their names.
|
||||
starts_with_transformer = all(key.startswith("transformer") for key in lora_state_dict.keys())
|
||||
self.assertTrue(starts_with_transformer)
|
||||
|
||||
def test_dreambooth_lora_latent_caching(self):
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
test_args = f"""
|
||||
{self.script_path}
|
||||
--pretrained_model_name_or_path {self.pretrained_model_name_or_path}
|
||||
--instance_data_dir {self.instance_data_dir}
|
||||
--instance_prompt {self.instance_prompt}
|
||||
--resolution 64
|
||||
--train_batch_size 1
|
||||
--gradient_accumulation_steps 1
|
||||
--max_train_steps 2
|
||||
--cache_latents
|
||||
--learning_rate 5.0e-04
|
||||
--scale_lr
|
||||
--lr_scheduler constant
|
||||
--lr_warmup_steps 0
|
||||
--output_dir {tmpdir}
|
||||
""".split()
|
||||
|
||||
run_command(self._launch_args + test_args)
|
||||
# save_pretrained smoke test
|
||||
self.assertTrue(os.path.isfile(os.path.join(tmpdir, "pytorch_lora_weights.safetensors")))
|
||||
|
||||
# make sure the state_dict has the correct naming in the parameters.
|
||||
lora_state_dict = safetensors.torch.load_file(os.path.join(tmpdir, "pytorch_lora_weights.safetensors"))
|
||||
is_lora = all("lora" in k for k in lora_state_dict.keys())
|
||||
self.assertTrue(is_lora)
|
||||
|
||||
# when not training the text encoder, all the parameters in the state dict should start
|
||||
# with `"transformer"` in their names.
|
||||
starts_with_transformer = all(key.startswith("transformer") for key in lora_state_dict.keys())
|
||||
self.assertTrue(starts_with_transformer)
|
||||
|
||||
def test_dreambooth_lora_layers(self):
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
test_args = f"""
|
||||
{self.script_path}
|
||||
--pretrained_model_name_or_path {self.pretrained_model_name_or_path}
|
||||
--instance_data_dir {self.instance_data_dir}
|
||||
--instance_prompt {self.instance_prompt}
|
||||
--resolution 64
|
||||
--train_batch_size 1
|
||||
--gradient_accumulation_steps 1
|
||||
--max_train_steps 2
|
||||
--cache_latents
|
||||
--learning_rate 5.0e-04
|
||||
--scale_lr
|
||||
--lora_layers {self.transformer_layer_type}
|
||||
--lr_scheduler constant
|
||||
--lr_warmup_steps 0
|
||||
--output_dir {tmpdir}
|
||||
""".split()
|
||||
|
||||
run_command(self._launch_args + test_args)
|
||||
# save_pretrained smoke test
|
||||
self.assertTrue(os.path.isfile(os.path.join(tmpdir, "pytorch_lora_weights.safetensors")))
|
||||
|
||||
# make sure the state_dict has the correct naming in the parameters.
|
||||
lora_state_dict = safetensors.torch.load_file(os.path.join(tmpdir, "pytorch_lora_weights.safetensors"))
|
||||
is_lora = all("lora" in k for k in lora_state_dict.keys())
|
||||
self.assertTrue(is_lora)
|
||||
|
||||
# when not training the text encoder, all the parameters in the state dict should start
|
||||
# with `"transformer"` in their names. In this test, we only params of
|
||||
# transformer.transformer_blocks.0.attn.to_k should be in the state dict
|
||||
starts_with_transformer = all(
|
||||
key.startswith(f"transformer.{self.transformer_layer_type}") for key in lora_state_dict.keys()
|
||||
)
|
||||
self.assertTrue(starts_with_transformer)
|
||||
|
||||
def test_dreambooth_lora_qwen_checkpointing_checkpoints_total_limit(self):
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
test_args = f"""
|
||||
{self.script_path}
|
||||
--pretrained_model_name_or_path={self.pretrained_model_name_or_path}
|
||||
--instance_data_dir={self.instance_data_dir}
|
||||
--output_dir={tmpdir}
|
||||
--instance_prompt={self.instance_prompt}
|
||||
--resolution=64
|
||||
--train_batch_size=1
|
||||
--gradient_accumulation_steps=1
|
||||
--max_train_steps=6
|
||||
--checkpoints_total_limit=2
|
||||
--checkpointing_steps=2
|
||||
""".split()
|
||||
|
||||
run_command(self._launch_args + test_args)
|
||||
|
||||
self.assertEqual(
|
||||
{x for x in os.listdir(tmpdir) if "checkpoint" in x},
|
||||
{"checkpoint-4", "checkpoint-6"},
|
||||
)
|
||||
|
||||
def test_dreambooth_lora_qwen_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints(self):
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
test_args = f"""
|
||||
{self.script_path}
|
||||
--pretrained_model_name_or_path={self.pretrained_model_name_or_path}
|
||||
--instance_data_dir={self.instance_data_dir}
|
||||
--output_dir={tmpdir}
|
||||
--instance_prompt={self.instance_prompt}
|
||||
--resolution=64
|
||||
--train_batch_size=1
|
||||
--gradient_accumulation_steps=1
|
||||
--max_train_steps=4
|
||||
--checkpointing_steps=2
|
||||
""".split()
|
||||
|
||||
run_command(self._launch_args + test_args)
|
||||
|
||||
self.assertEqual({x for x in os.listdir(tmpdir) if "checkpoint" in x}, {"checkpoint-2", "checkpoint-4"})
|
||||
|
||||
resume_run_args = f"""
|
||||
{self.script_path}
|
||||
--pretrained_model_name_or_path={self.pretrained_model_name_or_path}
|
||||
--instance_data_dir={self.instance_data_dir}
|
||||
--output_dir={tmpdir}
|
||||
--instance_prompt={self.instance_prompt}
|
||||
--resolution=64
|
||||
--train_batch_size=1
|
||||
--gradient_accumulation_steps=1
|
||||
--max_train_steps=8
|
||||
--checkpointing_steps=2
|
||||
--resume_from_checkpoint=checkpoint-4
|
||||
--checkpoints_total_limit=2
|
||||
""".split()
|
||||
|
||||
run_command(self._launch_args + resume_run_args)
|
||||
|
||||
self.assertEqual({x for x in os.listdir(tmpdir) if "checkpoint" in x}, {"checkpoint-6", "checkpoint-8"})
|
||||
|
||||
def test_dreambooth_lora_with_metadata(self):
|
||||
# Use a `lora_alpha` that is different from `rank`.
|
||||
lora_alpha = 8
|
||||
rank = 4
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
test_args = f"""
|
||||
{self.script_path}
|
||||
--pretrained_model_name_or_path {self.pretrained_model_name_or_path}
|
||||
--instance_data_dir {self.instance_data_dir}
|
||||
--instance_prompt {self.instance_prompt}
|
||||
--resolution 64
|
||||
--train_batch_size 1
|
||||
--gradient_accumulation_steps 1
|
||||
--max_train_steps 2
|
||||
--lora_alpha={lora_alpha}
|
||||
--rank={rank}
|
||||
--learning_rate 5.0e-04
|
||||
--scale_lr
|
||||
--lr_scheduler constant
|
||||
--lr_warmup_steps 0
|
||||
--output_dir {tmpdir}
|
||||
""".split()
|
||||
|
||||
run_command(self._launch_args + test_args)
|
||||
# save_pretrained smoke test
|
||||
state_dict_file = os.path.join(tmpdir, "pytorch_lora_weights.safetensors")
|
||||
self.assertTrue(os.path.isfile(state_dict_file))
|
||||
|
||||
# Check if the metadata was properly serialized.
|
||||
with safetensors.torch.safe_open(state_dict_file, framework="pt", device="cpu") as f:
|
||||
metadata = f.metadata() or {}
|
||||
|
||||
metadata.pop("format", None)
|
||||
raw = metadata.get(LORA_ADAPTER_METADATA_KEY)
|
||||
if raw:
|
||||
raw = json.loads(raw)
|
||||
|
||||
loaded_lora_alpha = raw["transformer.lora_alpha"]
|
||||
self.assertTrue(loaded_lora_alpha == lora_alpha)
|
||||
loaded_lora_rank = raw["transformer.r"]
|
||||
self.assertTrue(loaded_lora_rank == rank)
|
||||
@@ -807,7 +807,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -13,20 +13,6 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
|
||||
# /// script
|
||||
# dependencies = [
|
||||
# "diffusers @ git+https://github.com/huggingface/diffusers.git",
|
||||
# "torch>=2.0.0",
|
||||
# "accelerate>=0.31.0",
|
||||
# "transformers>=4.41.2",
|
||||
# "ftfy",
|
||||
# "tensorboard",
|
||||
# "Jinja2",
|
||||
# "peft>=0.11.1",
|
||||
# "sentencepiece",
|
||||
# ]
|
||||
# ///
|
||||
|
||||
import argparse
|
||||
import copy
|
||||
import gc
|
||||
@@ -1027,7 +1013,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
|
||||
@@ -756,7 +756,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -13,20 +13,6 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
|
||||
# /// script
|
||||
# dependencies = [
|
||||
# "diffusers @ git+https://github.com/huggingface/diffusers.git",
|
||||
# "torch>=2.0.0",
|
||||
# "accelerate>=0.31.0",
|
||||
# "transformers>=4.41.2",
|
||||
# "ftfy",
|
||||
# "tensorboard",
|
||||
# "Jinja2",
|
||||
# "peft>=0.11.1",
|
||||
# "sentencepiece",
|
||||
# ]
|
||||
# ///
|
||||
|
||||
import argparse
|
||||
import copy
|
||||
import itertools
|
||||
@@ -1065,7 +1051,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
|
||||
@@ -1199,7 +1199,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
@@ -1614,7 +1614,7 @@ def main(args):
|
||||
)
|
||||
if args.cond_image_column is not None:
|
||||
logger.info("I2I fine-tuning enabled.")
|
||||
batch_sampler = BucketBatchSampler(train_dataset, batch_size=args.train_batch_size, drop_last=True)
|
||||
batch_sampler = BucketBatchSampler(train_dataset, batch_size=args.train_batch_size, drop_last=False)
|
||||
train_dataloader = torch.utils.data.DataLoader(
|
||||
train_dataset,
|
||||
batch_sampler=batch_sampler,
|
||||
|
||||
@@ -58,7 +58,6 @@ from diffusers.training_utils import (
|
||||
compute_density_for_timestep_sampling,
|
||||
compute_loss_weighting_for_sd3,
|
||||
free_memory,
|
||||
offload_models,
|
||||
)
|
||||
from diffusers.utils import (
|
||||
check_min_version,
|
||||
@@ -936,7 +935,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
@@ -1365,34 +1364,43 @@ def main(args):
|
||||
# provided (i.e. the --instance_prompt is used for all images), we encode the instance prompt once to avoid
|
||||
# the redundant encoding.
|
||||
if not train_dataset.custom_instance_prompts:
|
||||
with offload_models(text_encoding_pipeline, device=accelerator.device, offload=args.offload):
|
||||
(
|
||||
instance_prompt_hidden_states_t5,
|
||||
instance_prompt_hidden_states_llama3,
|
||||
instance_pooled_prompt_embeds,
|
||||
_,
|
||||
_,
|
||||
_,
|
||||
) = compute_text_embeddings(args.instance_prompt, text_encoding_pipeline)
|
||||
if args.offload:
|
||||
text_encoding_pipeline = text_encoding_pipeline.to(accelerator.device)
|
||||
(
|
||||
instance_prompt_hidden_states_t5,
|
||||
instance_prompt_hidden_states_llama3,
|
||||
instance_pooled_prompt_embeds,
|
||||
_,
|
||||
_,
|
||||
_,
|
||||
) = compute_text_embeddings(args.instance_prompt, text_encoding_pipeline)
|
||||
if args.offload:
|
||||
text_encoding_pipeline = text_encoding_pipeline.to("cpu")
|
||||
|
||||
# Handle class prompt for prior-preservation.
|
||||
if args.with_prior_preservation:
|
||||
with offload_models(text_encoding_pipeline, device=accelerator.device, offload=args.offload):
|
||||
(class_prompt_hidden_states_t5, class_prompt_hidden_states_llama3, class_pooled_prompt_embeds, _, _, _) = (
|
||||
compute_text_embeddings(args.class_prompt, text_encoding_pipeline)
|
||||
)
|
||||
if args.offload:
|
||||
text_encoding_pipeline = text_encoding_pipeline.to(accelerator.device)
|
||||
(class_prompt_hidden_states_t5, class_prompt_hidden_states_llama3, class_pooled_prompt_embeds, _, _, _) = (
|
||||
compute_text_embeddings(args.class_prompt, text_encoding_pipeline)
|
||||
)
|
||||
if args.offload:
|
||||
text_encoding_pipeline = text_encoding_pipeline.to("cpu")
|
||||
|
||||
validation_embeddings = {}
|
||||
if args.validation_prompt is not None:
|
||||
with offload_models(text_encoding_pipeline, device=accelerator.device, offload=args.offload):
|
||||
(
|
||||
validation_embeddings["prompt_embeds_t5"],
|
||||
validation_embeddings["prompt_embeds_llama3"],
|
||||
validation_embeddings["pooled_prompt_embeds"],
|
||||
validation_embeddings["negative_prompt_embeds_t5"],
|
||||
validation_embeddings["negative_prompt_embeds_llama3"],
|
||||
validation_embeddings["negative_pooled_prompt_embeds"],
|
||||
) = compute_text_embeddings(args.validation_prompt, text_encoding_pipeline)
|
||||
if args.offload:
|
||||
text_encoding_pipeline = text_encoding_pipeline.to(accelerator.device)
|
||||
(
|
||||
validation_embeddings["prompt_embeds_t5"],
|
||||
validation_embeddings["prompt_embeds_llama3"],
|
||||
validation_embeddings["pooled_prompt_embeds"],
|
||||
validation_embeddings["negative_prompt_embeds_t5"],
|
||||
validation_embeddings["negative_prompt_embeds_llama3"],
|
||||
validation_embeddings["negative_pooled_prompt_embeds"],
|
||||
) = compute_text_embeddings(args.validation_prompt, text_encoding_pipeline)
|
||||
if args.offload:
|
||||
text_encoding_pipeline = text_encoding_pipeline.to("cpu")
|
||||
|
||||
# If custom instance prompts are NOT provided (i.e. the instance prompt is used for all images),
|
||||
# pack the statically computed variables appropriately here. This is so that we don't
|
||||
@@ -1573,10 +1581,12 @@ def main(args):
|
||||
if args.cache_latents:
|
||||
model_input = latents_cache[step].sample()
|
||||
else:
|
||||
with offload_models(vae, device=accelerator.device, offload=args.offload):
|
||||
pixel_values = batch["pixel_values"].to(dtype=vae.dtype)
|
||||
if args.offload:
|
||||
vae = vae.to(accelerator.device)
|
||||
pixel_values = batch["pixel_values"].to(dtype=vae.dtype)
|
||||
model_input = vae.encode(pixel_values).latent_dist.sample()
|
||||
|
||||
if args.offload:
|
||||
vae = vae.to("cpu")
|
||||
model_input = (model_input - vae_config_shift_factor) * vae_config_scaling_factor
|
||||
model_input = model_input.to(dtype=weight_dtype)
|
||||
|
||||
|
||||
@@ -859,7 +859,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -13,20 +13,6 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
|
||||
# /// script
|
||||
# dependencies = [
|
||||
# "diffusers @ git+https://github.com/huggingface/diffusers.git",
|
||||
# "torch>=2.0.0",
|
||||
# "accelerate>=1.0.0",
|
||||
# "transformers>=4.47.0",
|
||||
# "ftfy",
|
||||
# "tensorboard",
|
||||
# "Jinja2",
|
||||
# "peft>=0.14.0",
|
||||
# "sentencepiece",
|
||||
# ]
|
||||
# ///
|
||||
|
||||
import argparse
|
||||
import copy
|
||||
import itertools
|
||||
@@ -866,7 +852,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
|
||||
@@ -1063,7 +1063,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
|
||||
@@ -983,7 +983,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if args.do_edm_style_training and args.snr_gamma is not None:
|
||||
|
||||
@@ -988,7 +988,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
if torch.backends.mps.is_available() and args.mixed_precision == "bf16":
|
||||
|
||||
@@ -13,7 +13,7 @@ To incorporate additional condition latents, we expand the input features of Flu
|
||||
> As the model is gated, before using it with diffusers you first need to go to the [FLUX.1 [dev] Hugging Face page](https://huggingface.co/black-forest-labs/FLUX.1-dev), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in:
|
||||
|
||||
```bash
|
||||
hf auth login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
The example command below shows how to launch fine-tuning for pose conditions. The dataset ([`raulc0399/open_pose_controlnet`](https://huggingface.co/datasets/raulc0399/open_pose_controlnet)) being used here already has the pose conditions of the original images, so we don't have to compute them.
|
||||
|
||||
@@ -697,7 +697,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
|
||||
logging_out_dir = Path(args.output_dir, args.logging_dir)
|
||||
|
||||
@@ -725,7 +725,7 @@ def main(args):
|
||||
if args.report_to == "wandb" and args.hub_token is not None:
|
||||
raise ValueError(
|
||||
"You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
|
||||
" Please use `hf auth login` to authenticate with the Hub."
|
||||
" Please use `huggingface-cli login` to authenticate with the Hub."
|
||||
)
|
||||
if args.use_lora_bias and args.gaussian_init_lora:
|
||||
raise ValueError("`gaussian` LoRA init scheme isn't supported when `use_lora_bias` is True.")
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user