7879 - adjust documentation to use naruto dataset, since pokemon is now gated (#7880)

* 7879 - adjust documentation to use naruto dataset, since pokemon is now gated

* replace references to pokemon in docs

* more references to pokemon replaced

* Japanese translation update

---------

Co-authored-by: bghira <bghira@users.github.com>
This commit is contained in:
Bagheera
2024-05-07 10:36:39 -06:00
committed by GitHub
parent 23e091564f
commit 8edaf3b79c
31 changed files with 94 additions and 94 deletions
+10 -10
View File
@@ -205,7 +205,7 @@ model_pred = unet(noisy_latents, timesteps, None, added_cond_kwargs=added_cond_k
Once youve made all your changes or youre okay with the default configuration, youre ready to launch the training script! 🚀 Once youve made all your changes or youre okay with the default configuration, youre ready to launch the training script! 🚀
You'll train on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate your own Pokémon, but you can also create and train on your own dataset by following the [Create a dataset for training](create_dataset) guide. Set the environment variable `DATASET_NAME` to the name of the dataset on the Hub or if you're training on your own files, set the environment variable `TRAIN_DIR` to a path to your dataset. You'll train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters, but you can also create and train on your own dataset by following the [Create a dataset for training](create_dataset) guide. Set the environment variable `DATASET_NAME` to the name of the dataset on the Hub or if you're training on your own files, set the environment variable `TRAIN_DIR` to a path to your dataset.
If youre training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command. If youre training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command.
@@ -219,7 +219,7 @@ To monitor training progress with Weights & Biases, add the `--report_to=wandb`
<hfoption id="prior model"> <hfoption id="prior model">
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \
--dataset_name=$DATASET_NAME \ --dataset_name=$DATASET_NAME \
@@ -232,17 +232,17 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \
--checkpoints_total_limit=3 \ --checkpoints_total_limit=3 \
--lr_scheduler="constant" \ --lr_scheduler="constant" \
--lr_warmup_steps=0 \ --lr_warmup_steps=0 \
--validation_prompts="A robot pokemon, 4k photo" \ --validation_prompts="A robot naruto, 4k photo" \
--report_to="wandb" \ --report_to="wandb" \
--push_to_hub \ --push_to_hub \
--output_dir="kandi2-prior-pokemon-model" --output_dir="kandi2-prior-naruto-model"
``` ```
</hfoption> </hfoption>
<hfoption id="decoder model"> <hfoption id="decoder model">
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \
--dataset_name=$DATASET_NAME \ --dataset_name=$DATASET_NAME \
@@ -256,10 +256,10 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \
--checkpoints_total_limit=3 \ --checkpoints_total_limit=3 \
--lr_scheduler="constant" \ --lr_scheduler="constant" \
--lr_warmup_steps=0 \ --lr_warmup_steps=0 \
--validation_prompts="A robot pokemon, 4k photo" \ --validation_prompts="A robot naruto, 4k photo" \
--report_to="wandb" \ --report_to="wandb" \
--push_to_hub \ --push_to_hub \
--output_dir="kandi2-decoder-pokemon-model" --output_dir="kandi2-decoder-naruto-model"
``` ```
</hfoption> </hfoption>
@@ -279,7 +279,7 @@ prior_components = {"prior_" + k: v for k,v in prior_pipeline.components.items()
pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", **prior_components, torch_dtype=torch.float16) pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", **prior_components, torch_dtype=torch.float16)
pipe.enable_model_cpu_offload() pipe.enable_model_cpu_offload()
prompt="A robot pokemon, 4k photo" prompt="A robot naruto, 4k photo"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt).images[0] image = pipeline(prompt=prompt, negative_prompt=negative_prompt).images[0]
``` ```
@@ -299,7 +299,7 @@ import torch
pipeline = AutoPipelineForText2Image.from_pretrained("path/to/saved/model", torch_dtype=torch.float16) pipeline = AutoPipelineForText2Image.from_pretrained("path/to/saved/model", torch_dtype=torch.float16)
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
prompt="A robot pokemon, 4k photo" prompt="A robot naruto, 4k photo"
image = pipeline(prompt=prompt).images[0] image = pipeline(prompt=prompt).images[0]
``` ```
@@ -313,7 +313,7 @@ unet = UNet2DConditionModel.from_pretrained("path/to/saved/model" + "/checkpoint
pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", unet=unet, torch_dtype=torch.float16) pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", unet=unet, torch_dtype=torch.float16)
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
image = pipeline(prompt="A robot pokemon, 4k photo").images[0] image = pipeline(prompt="A robot naruto, 4k photo").images[0]
``` ```
</hfoption> </hfoption>
+6 -6
View File
@@ -170,7 +170,7 @@ Aside from setting up the LoRA layers, the training script is more or less the s
Once you've made all your changes or you're okay with the default configuration, you're ready to launch the training script! 🚀 Once you've made all your changes or you're okay with the default configuration, you're ready to launch the training script! 🚀
Let's train on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate our own Pokémon. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and dataset respectively. You should also specify where to save the model in `OUTPUT_DIR`, and the name of the model to save to on the Hub with `HUB_MODEL_ID`. The script creates and saves the following files to your repository: Let's train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and dataset respectively. You should also specify where to save the model in `OUTPUT_DIR`, and the name of the model to save to on the Hub with `HUB_MODEL_ID`. The script creates and saves the following files to your repository:
- saved model checkpoints - saved model checkpoints
- `pytorch_lora_weights.safetensors` (the trained LoRA weights) - `pytorch_lora_weights.safetensors` (the trained LoRA weights)
@@ -185,9 +185,9 @@ A full training run takes ~5 hours on a 2080 Ti GPU with 11GB of VRAM.
```bash ```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="/sddata/finetune/lora/pokemon" export OUTPUT_DIR="/sddata/finetune/lora/naruto"
export HUB_MODEL_ID="pokemon-lora" export HUB_MODEL_ID="naruto-lora"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
@@ -208,7 +208,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--hub_model_id=${HUB_MODEL_ID} \ --hub_model_id=${HUB_MODEL_ID} \
--report_to=wandb \ --report_to=wandb \
--checkpointing_steps=500 \ --checkpointing_steps=500 \
--validation_prompt="A pokemon with blue eyes." \ --validation_prompt="A naruto with blue eyes." \
--seed=1337 --seed=1337
``` ```
@@ -220,7 +220,7 @@ import torch
pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda") pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("path/to/lora/model", weight_name="pytorch_lora_weights.safetensors") pipeline.load_lora_weights("path/to/lora/model", weight_name="pytorch_lora_weights.safetensors")
image = pipeline("A pokemon with blue eyes").images[0] image = pipeline("A naruto with blue eyes").images[0]
``` ```
## Next steps ## Next steps
+7 -7
View File
@@ -176,7 +176,7 @@ If you want to learn more about how the training loop works, check out the [Unde
Once youve made all your changes or youre okay with the default configuration, youre ready to launch the training script! 🚀 Once youve made all your changes or youre okay with the default configuration, youre ready to launch the training script! 🚀
Lets train on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate your own Pokémon. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and the dataset (either from the Hub or a local path). You should also specify a VAE other than the SDXL VAE (either from the Hub or a local path) with `VAE_NAME` to avoid numerical instabilities. Lets train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and the dataset (either from the Hub or a local path). You should also specify a VAE other than the SDXL VAE (either from the Hub or a local path) with `VAE_NAME` to avoid numerical instabilities.
<Tip> <Tip>
@@ -187,7 +187,7 @@ To monitor training progress with Weights & Biases, add the `--report_to=wandb`
```bash ```bash
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0" export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix" export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch train_text_to_image_sdxl.py \ accelerate launch train_text_to_image_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
@@ -211,7 +211,7 @@ accelerate launch train_text_to_image_sdxl.py \
--validation_prompt="a cute Sundar Pichai creature" \ --validation_prompt="a cute Sundar Pichai creature" \
--validation_epochs 5 \ --validation_epochs 5 \
--checkpointing_steps=5000 \ --checkpointing_steps=5000 \
--output_dir="sdxl-pokemon-model" \ --output_dir="sdxl-naruto-model" \
--push_to_hub --push_to_hub
``` ```
@@ -226,9 +226,9 @@ import torch
pipeline = DiffusionPipeline.from_pretrained("path/to/your/model", torch_dtype=torch.float16).to("cuda") pipeline = DiffusionPipeline.from_pretrained("path/to/your/model", torch_dtype=torch.float16).to("cuda")
prompt = "A pokemon with green eyes and red legs." prompt = "A naruto with green eyes and red legs."
image = pipeline(prompt, num_inference_steps=30, guidance_scale=7.5).images[0] image = pipeline(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png") image.save("naruto.png")
``` ```
</hfoption> </hfoption>
@@ -244,11 +244,11 @@ import torch_xla.core.xla_model as xm
device = xm.xla_device() device = xm.xla_device()
pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0").to(device) pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0").to(device)
prompt = "A pokemon with green eyes and red legs." prompt = "A naruto with green eyes and red legs."
start = time() start = time()
image = pipeline(prompt, num_inference_steps=inference_steps).images[0] image = pipeline(prompt, num_inference_steps=inference_steps).images[0]
print(f'Compilation time is {time()-start} sec') print(f'Compilation time is {time()-start} sec')
image.save("pokemon.png") image.save("naruto.png")
start = time() start = time()
image = pipeline(prompt, num_inference_steps=inference_steps).images[0] image = pipeline(prompt, num_inference_steps=inference_steps).images[0]
+8 -8
View File
@@ -158,7 +158,7 @@ Once you've made all your changes or you're okay with the default configuration,
<hfoptions id="training-inference"> <hfoptions id="training-inference">
<hfoption id="PyTorch"> <hfoption id="PyTorch">
Let's train on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate your own Pokémon. Set the environment variables `MODEL_NAME` and `dataset_name` to the model and the dataset (either from the Hub or a local path). If you're training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command. Let's train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters. Set the environment variables `MODEL_NAME` and `dataset_name` to the model and the dataset (either from the Hub or a local path). If you're training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command.
<Tip> <Tip>
@@ -168,7 +168,7 @@ To train on a local dataset, set the `TRAIN_DIR` and `OUTPUT_DIR` environment va
```bash ```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export dataset_name="lambdalabs/pokemon-blip-captions" export dataset_name="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image.py \ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
@@ -183,7 +183,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--max_grad_norm=1 \ --max_grad_norm=1 \
--enable_xformers_memory_efficient_attention --enable_xformers_memory_efficient_attention
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" \ --output_dir="sd-naruto-model" \
--push_to_hub --push_to_hub
``` ```
@@ -202,7 +202,7 @@ To train on a local dataset, set the `TRAIN_DIR` and `OUTPUT_DIR` environment va
```bash ```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export dataset_name="lambdalabs/pokemon-blip-captions" export dataset_name="lambdalabs/naruto-blip-captions"
python train_text_to_image_flax.py \ python train_text_to_image_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
@@ -212,7 +212,7 @@ python train_text_to_image_flax.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--output_dir="sd-pokemon-model" \ --output_dir="sd-naruto-model" \
--push_to_hub --push_to_hub
``` ```
@@ -231,7 +231,7 @@ import torch
pipeline = StableDiffusionPipeline.from_pretrained("path/to/saved_model", torch_dtype=torch.float16, use_safetensors=True).to("cuda") pipeline = StableDiffusionPipeline.from_pretrained("path/to/saved_model", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
image = pipeline(prompt="yoda").images[0] image = pipeline(prompt="yoda").images[0]
image.save("yoda-pokemon.png") image.save("yoda-naruto.png")
``` ```
</hfoption> </hfoption>
@@ -246,7 +246,7 @@ from diffusers import FlaxStableDiffusionPipeline
pipeline, params = FlaxStableDiffusionPipeline.from_pretrained("path/to/saved_model", dtype=jax.numpy.bfloat16) pipeline, params = FlaxStableDiffusionPipeline.from_pretrained("path/to/saved_model", dtype=jax.numpy.bfloat16)
prompt = "yoda pokemon" prompt = "yoda naruto"
prng_seed = jax.random.PRNGKey(0) prng_seed = jax.random.PRNGKey(0)
num_inference_steps = 50 num_inference_steps = 50
@@ -261,7 +261,7 @@ prompt_ids = shard(prompt_ids)
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
image.save("yoda-pokemon.png") image.save("yoda-naruto.png")
``` ```
</hfoption> </hfoption>
+5 -5
View File
@@ -131,7 +131,7 @@ If you want to learn more about how the training loop works, check out the [Unde
Once youve made all your changes or youre okay with the default configuration, youre ready to launch the training script! 🚀 Once youve made all your changes or youre okay with the default configuration, youre ready to launch the training script! 🚀
Set the `DATASET_NAME` environment variable to the dataset name from the Hub. This guide uses the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset, but you can create and train on your own datasets as well (see the [Create a dataset for training](create_dataset) guide). Set the `DATASET_NAME` environment variable to the dataset name from the Hub. This guide uses the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset, but you can create and train on your own datasets as well (see the [Create a dataset for training](create_dataset) guide).
<Tip> <Tip>
@@ -140,7 +140,7 @@ To monitor training progress with Weights & Biases, add the `--report_to=wandb`
</Tip> </Tip>
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch train_text_to_image_prior.py \ accelerate launch train_text_to_image_prior.py \
--mixed_precision="fp16" \ --mixed_precision="fp16" \
@@ -156,10 +156,10 @@ accelerate launch train_text_to_image_prior.py \
--checkpoints_total_limit=3 \ --checkpoints_total_limit=3 \
--lr_scheduler="constant" \ --lr_scheduler="constant" \
--lr_warmup_steps=0 \ --lr_warmup_steps=0 \
--validation_prompts="A robot pokemon, 4k photo" \ --validation_prompts="A robot naruto, 4k photo" \
--report_to="wandb" \ --report_to="wandb" \
--push_to_hub \ --push_to_hub \
--output_dir="wuerstchen-prior-pokemon-model" --output_dir="wuerstchen-prior-naruto-model"
``` ```
Once training is complete, you can use your newly trained model for inference! Once training is complete, you can use your newly trained model for inference!
@@ -171,7 +171,7 @@ from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS
pipeline = AutoPipelineForText2Image.from_pretrained("path/to/saved/model", torch_dtype=torch.float16).to("cuda") pipeline = AutoPipelineForText2Image.from_pretrained("path/to/saved/model", torch_dtype=torch.float16).to("cuda")
caption = "A cute bird pokemon holding a shield" caption = "A cute bird naruto holding a shield"
images = pipeline( images = pipeline(
caption, caption,
width=1024, width=1024,
+4 -4
View File
@@ -49,15 +49,15 @@ huggingface-cli login
### 학습[[dreambooth-training]] ### 학습[[dreambooth-training]]
[Pokémon BLIP 캡션](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) 데이터셋으로 [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)를 파인튜닝해 나만의 포켓몬을 생성해 보겠습니다. [Naruto BLIP 캡션](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) 데이터셋으로 [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)를 파인튜닝해 나만의 포켓몬을 생성해 보겠습니다.
시작하려면 `MODEL_NAME``DATASET_NAME` 환경 변수가 설정되어 있는지 확인하십시오. `OUTPUT_DIR``HUB_MODEL_ID` 변수는 선택 사항이며 허브에서 모델을 저장할 위치를 지정합니다. 시작하려면 `MODEL_NAME``DATASET_NAME` 환경 변수가 설정되어 있는지 확인하십시오. `OUTPUT_DIR``HUB_MODEL_ID` 변수는 선택 사항이며 허브에서 모델을 저장할 위치를 지정합니다.
```bash ```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="/sddata/finetune/lora/pokemon" export OUTPUT_DIR="/sddata/finetune/lora/naruto"
export HUB_MODEL_ID="pokemon-lora" export HUB_MODEL_ID="naruto-lora"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
``` ```
학습을 시작하기 전에 알아야 할 몇 가지 플래그가 있습니다. 학습을 시작하기 전에 알아야 할 몇 가지 플래그가 있습니다.
+9 -9
View File
@@ -73,12 +73,12 @@ xFormers는 Flax에 사용할 수 없습니다.
<frameworkcontent> <frameworkcontent>
<pt> <pt>
다음과 같이 [Pokémon BLIP 캡션](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) 데이터셋에서 파인튜닝 실행을 위해 [PyTorch 학습 스크립트](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py)를 실행합니다: 다음과 같이 [Naruto BLIP 캡션](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) 데이터셋에서 파인튜닝 실행을 위해 [PyTorch 학습 스크립트](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py)를 실행합니다:
```bash ```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export dataset_name="lambdalabs/pokemon-blip-captions" export dataset_name="lambdalabs/naruto-blip-captions"
accelerate launch train_text_to_image.py \ accelerate launch train_text_to_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
@@ -93,7 +93,7 @@ accelerate launch train_text_to_image.py \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" --output_dir="sd-naruto-model"
``` ```
자체 데이터셋으로 파인튜닝하려면 🤗 [Datasets](https://huggingface.co/docs/datasets/index)에서 요구하는 형식에 따라 데이터셋을 준비하세요. [데이터셋을 허브에 업로드](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub)하거나 [파일들이 있는 로컬 폴더를 준비](https ://huggingface.co/docs/datasets/image_dataset#imagefolder)할 수 있습니다. 자체 데이터셋으로 파인튜닝하려면 🤗 [Datasets](https://huggingface.co/docs/datasets/index)에서 요구하는 형식에 따라 데이터셋을 준비하세요. [데이터셋을 허브에 업로드](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub)하거나 [파일들이 있는 로컬 폴더를 준비](https ://huggingface.co/docs/datasets/image_dataset#imagefolder)할 수 있습니다.
@@ -136,7 +136,7 @@ pip install -U -r requirements_flax.txt
```bash ```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export dataset_name="lambdalabs/pokemon-blip-captions" export dataset_name="lambdalabs/naruto-blip-captions"
python train_text_to_image_flax.py \ python train_text_to_image_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
@@ -146,7 +146,7 @@ python train_text_to_image_flax.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--output_dir="sd-pokemon-model" --output_dir="sd-naruto-model"
``` ```
자체 데이터셋으로 파인튜닝하려면 🤗 [Datasets](https://huggingface.co/docs/datasets/index)에서 요구하는 형식에 따라 데이터셋을 준비하세요. [데이터셋을 허브에 업로드](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub)하거나 [파일들이 있는 로컬 폴더를 준비](https ://huggingface.co/docs/datasets/image_dataset#imagefolder)할 수 있습니다. 자체 데이터셋으로 파인튜닝하려면 🤗 [Datasets](https://huggingface.co/docs/datasets/index)에서 요구하는 형식에 따라 데이터셋을 준비하세요. [데이터셋을 허브에 업로드](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub)하거나 [파일들이 있는 로컬 폴더를 준비](https ://huggingface.co/docs/datasets/image_dataset#imagefolder)할 수 있습니다.
@@ -166,7 +166,7 @@ python train_text_to_image_flax.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--output_dir="sd-pokemon-model" --output_dir="sd-naruto-model"
``` ```
</jax> </jax>
</frameworkcontent> </frameworkcontent>
@@ -189,7 +189,7 @@ pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.flo
pipe.to("cuda") pipe.to("cuda")
image = pipe(prompt="yoda").images[0] image = pipe(prompt="yoda").images[0]
image.save("yoda-pokemon.png") image.save("yoda-naruto.png")
``` ```
</pt> </pt>
<jax> <jax>
@@ -203,7 +203,7 @@ from diffusers import FlaxStableDiffusionPipeline
model_path = "path_to_saved_model" model_path = "path_to_saved_model"
pipe, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16) pipe, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16)
prompt = "yoda pokemon" prompt = "yoda naruto"
prng_seed = jax.random.PRNGKey(0) prng_seed = jax.random.PRNGKey(0)
num_inference_steps = 50 num_inference_steps = 50
@@ -218,7 +218,7 @@ prompt_ids = shard(prompt_ids)
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
image.save("yoda-pokemon.png") image.save("yoda-naruto.png")
``` ```
</jax> </jax>
</frameworkcontent> </frameworkcontent>
@@ -103,13 +103,13 @@ accelerate launch train_unconditional.py \
<div class="flex justify-center"> <div class="flex justify-center">
<img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png"/> <img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png"/>
</div> </div>
[Pokemon](https://huggingface.co/datasets/huggan/pokemon) 데이터셋을 사용할 경우: [Naruto](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) 데이터셋을 사용할 경우:
```bash ```bash
accelerate launch train_unconditional.py \ accelerate launch train_unconditional.py \
--dataset_name="huggan/pokemon" \ --dataset_name="lambdalabs/naruto-blip-captions" \
--resolution=64 \ --resolution=64 \
--output_dir="ddpm-ema-pokemon-64" \ --output_dir="ddpm-ema-naruto-64" \
--train_batch_size=16 \ --train_batch_size=16 \
--num_epochs=100 \ --num_epochs=100 \
--gradient_accumulation_steps=1 \ --gradient_accumulation_steps=1 \
@@ -129,9 +129,9 @@ accelerate launch train_unconditional.py \
```bash ```bash
accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \ accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
--dataset_name="huggan/pokemon" \ --dataset_name="lambdalabs/naruto-blip-captions" \
--resolution=64 --center_crop --random_flip \ --resolution=64 --center_crop --random_flip \
--output_dir="ddpm-ema-pokemon-64" \ --output_dir="ddpm-ema-naruto-64" \
--train_batch_size=16 \ --train_batch_size=16 \
--num_epochs=100 \ --num_epochs=100 \
--gradient_accumulation_steps=1 \ --gradient_accumulation_steps=1 \
@@ -115,11 +115,11 @@ accelerate launch train_lcm_distill_lora_sdxl_wds.py \
We provide another version for LCM LoRA SDXL that follows best practices of `peft` and leverages the `datasets` library for quick experimentation. The script doesn't load two UNets unlike `train_lcm_distill_lora_sdxl_wds.py` which reduces the memory requirements quite a bit. We provide another version for LCM LoRA SDXL that follows best practices of `peft` and leverages the `datasets` library for quick experimentation. The script doesn't load two UNets unlike `train_lcm_distill_lora_sdxl_wds.py` which reduces the memory requirements quite a bit.
Below is an example training command that trains an LCM LoRA on the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions): Below is an example training command that trains an LCM LoRA on the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions):
```bash ```bash
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0" export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix" export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"
accelerate launch train_lcm_distill_lora_sdxl.py \ accelerate launch train_lcm_distill_lora_sdxl.py \
@@ -71,7 +71,7 @@ check_min_version("0.28.0.dev0")
logger = get_logger(__name__) logger = get_logger(__name__)
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -57,7 +57,7 @@ To disable wandb logging, remove the `--report_to=="wandb"` and `--validation_pr
<!-- accelerate_snippet_start --> <!-- accelerate_snippet_start -->
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \
--dataset_name=$DATASET_NAME \ --dataset_name=$DATASET_NAME \
@@ -139,7 +139,7 @@ You can fine-tune the Kandinsky prior model with `train_text_to_image_prior.py`
<!-- accelerate_snippet_start --> <!-- accelerate_snippet_start -->
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \
--dataset_name=$DATASET_NAME \ --dataset_name=$DATASET_NAME \
@@ -183,7 +183,7 @@ If you want to use a fine-tuned decoder checkpoint along with your fine-tuned pr
for running distributed training with `accelerate`. Here is an example command: for running distributed training with `accelerate`. Here is an example command:
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image_decoder.py \ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image_decoder.py \
--dataset_name=$DATASET_NAME \ --dataset_name=$DATASET_NAME \
@@ -227,13 +227,13 @@ on consumer GPUs like Tesla T4, Tesla V100.
### Training ### Training
First, you need to set up your development environment as explained in the [installation](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Kandinsky 2.2](https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as explained in the [installation](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Kandinsky 2.2](https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions).
#### Train decoder #### Train decoder
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_decoder_lora.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder_lora.py \
--dataset_name=$DATASET_NAME --caption_column="text" \ --dataset_name=$DATASET_NAME --caption_column="text" \
@@ -252,7 +252,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder_lora.py \
#### Train prior #### Train prior
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_prior_lora.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_prior_lora.py \
--dataset_name=$DATASET_NAME --caption_column="text" \ --dataset_name=$DATASET_NAME --caption_column="text" \
@@ -332,7 +332,7 @@ def parse_args():
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -56,7 +56,7 @@ check_min_version("0.28.0.dev0")
logger = get_logger(__name__, log_level="INFO") logger = get_logger(__name__, log_level="INFO")
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
+2 -2
View File
@@ -19,7 +19,7 @@ on consumer GPUs like Tesla T4, Tesla V100.
### Training ### Training
First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions).
**___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___** **___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___**
@@ -27,7 +27,7 @@ First, you need to set up your development environment as is explained in the [i
```bash ```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
``` ```
For this example we want to directly store the trained LoRA embeddings on the Hub, so For this example we want to directly store the trained LoRA embeddings on the Hub, so
@@ -387,7 +387,7 @@ def parse_args():
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -55,7 +55,7 @@ The command to train a DDPM UNetCondition model on the Pokemon dataset with onnx
```bash ```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export dataset_name="lambdalabs/pokemon-blip-captions" export dataset_name="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image.py \ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$dataset_name \ --dataset_name=$dataset_name \
@@ -59,7 +59,7 @@ check_min_version("0.17.0.dev0")
logger = get_logger(__name__, log_level="INFO") logger = get_logger(__name__, log_level="INFO")
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -61,7 +61,7 @@ check_min_version("0.28.0.dev0")
logger = get_logger(__name__, log_level="INFO") logger = get_logger(__name__, log_level="INFO")
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -406,7 +406,7 @@ def parse_args():
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -468,7 +468,7 @@ def parse_args(input_args=None):
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -60,7 +60,7 @@ logger = get_logger(__name__)
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
+5 -5
View File
@@ -57,7 +57,7 @@ With `gradient_checkpointing` and `mixed_precision` it should be possible to fin
<!-- accelerate_snippet_start --> <!-- accelerate_snippet_start -->
```bash ```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image.py \ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
@@ -136,7 +136,7 @@ for running distributed training with `accelerate`. Here is an example command:
```bash ```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
@@ -192,7 +192,7 @@ on consumer GPUs like Tesla T4, Tesla V100.
### Training ### Training
First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions).
**___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___** **___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___**
@@ -200,7 +200,7 @@ First, you need to set up your development environment as is explained in the [i
```bash ```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
``` ```
For this example we want to directly store the trained LoRA embeddings on the Hub, so For this example we want to directly store the trained LoRA embeddings on the Hub, so
@@ -282,7 +282,7 @@ pip install -U -r requirements_flax.txt
```bash ```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
python train_text_to_image_flax.py \ python train_text_to_image_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
+5 -5
View File
@@ -52,7 +52,7 @@ Note also that we use PEFT library as backend for LoRA training, make sure to ha
```bash ```bash
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0" export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix" export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch train_text_to_image_sdxl.py \ accelerate launch train_text_to_image_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
@@ -76,7 +76,7 @@ accelerate launch train_text_to_image_sdxl.py \
**Notes**: **Notes**:
* The `train_text_to_image_sdxl.py` script pre-computes text embeddings and the VAE encodings and keeps them in memory. While for smaller datasets like [`lambdalabs/pokemon-blip-captions`](https://hf.co/datasets/lambdalabs/pokemon-blip-captions), it might not be a problem, it can definitely lead to memory problems when the script is used on a larger dataset. For those purposes, you would want to serialize these pre-computed representations to disk separately and load them during the fine-tuning process. Refer to [this PR](https://github.com/huggingface/diffusers/pull/4505) for a more in-depth discussion. * The `train_text_to_image_sdxl.py` script pre-computes text embeddings and the VAE encodings and keeps them in memory. While for smaller datasets like [`lambdalabs/naruto-blip-captions`](https://hf.co/datasets/lambdalabs/naruto-blip-captions), it might not be a problem, it can definitely lead to memory problems when the script is used on a larger dataset. For those purposes, you would want to serialize these pre-computed representations to disk separately and load them during the fine-tuning process. Refer to [this PR](https://github.com/huggingface/diffusers/pull/4505) for a more in-depth discussion.
* The training script is compute-intensive and may not run on a consumer GPU like Tesla T4. * The training script is compute-intensive and may not run on a consumer GPU like Tesla T4.
* The training command shown above performs intermediate quality validation in between the training epochs and logs the results to Weights and Biases. `--report_to`, `--validation_prompt`, and `--validation_epochs` are the relevant CLI arguments here. * The training command shown above performs intermediate quality validation in between the training epochs and logs the results to Weights and Biases. `--report_to`, `--validation_prompt`, and `--validation_epochs` are the relevant CLI arguments here.
* SDXL's VAE is known to suffer from numerical instability issues. This is why we also expose a CLI argument namely `--pretrained_vae_model_name_or_path` that lets you specify the location of a better VAE (such as [this one](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)). * SDXL's VAE is known to suffer from numerical instability issues. This is why we also expose a CLI argument namely `--pretrained_vae_model_name_or_path` that lets you specify the location of a better VAE (such as [this one](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)).
@@ -142,14 +142,14 @@ on consumer GPUs like Tesla T4, Tesla V100.
### Training ### Training
First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables and, optionally, the `VAE_NAME` variable. Here, we will use [Stable Diffusion XL 1.0-base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables and, optionally, the `VAE_NAME` variable. Here, we will use [Stable Diffusion XL 1.0-base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions).
**___Note: It is quite useful to monitor the training progress by regularly generating sample images during training. [Weights and Biases](https://docs.wandb.ai/quickstart) is a nice solution to easily see generating images during training. All you need to do is to run `pip install wandb` before training to automatically log images.___** **___Note: It is quite useful to monitor the training progress by regularly generating sample images during training. [Weights and Biases](https://docs.wandb.ai/quickstart) is a nice solution to easily see generating images during training. All you need to do is to run `pip install wandb` before training to automatically log images.___**
```bash ```bash
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0" export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix" export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
``` ```
For this example we want to directly store the trained LoRA embeddings on the Hub, so For this example we want to directly store the trained LoRA embeddings on the Hub, so
@@ -219,7 +219,7 @@ You need to save the mentioned configuration as an `accelerate_config.yaml` file
```shell ```shell
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0" export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix" export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
export ACCELERATE_CONFIG_FILE="your accelerate_config.yaml" export ACCELERATE_CONFIG_FILE="your accelerate_config.yaml"
accelerate launch --config_file $ACCELERATE_CONFIG_FILE train_text_to_image_lora_sdxl.py \ accelerate launch --config_file $ACCELERATE_CONFIG_FILE train_text_to_image_lora_sdxl.py \
@@ -62,7 +62,7 @@ check_min_version("0.28.0.dev0")
logger = get_logger(__name__, log_level="INFO") logger = get_logger(__name__, log_level="INFO")
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -250,7 +250,7 @@ def parse_args():
dataset_name_mapping = { dataset_name_mapping = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -387,7 +387,7 @@ def parse_args():
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -454,7 +454,7 @@ def parse_args(input_args=None):
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -61,7 +61,7 @@ logger = get_logger(__name__)
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
+3 -3
View File
@@ -37,7 +37,7 @@ You can fine-tune the Würstchen prior model with the `train_text_to_image_prior
<!-- accelerate_snippet_start --> <!-- accelerate_snippet_start -->
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch train_text_to_image_prior.py \ accelerate launch train_text_to_image_prior.py \
--mixed_precision="fp16" \ --mixed_precision="fp16" \
@@ -72,10 +72,10 @@ In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-de
### Prior Training ### Prior Training
First, you need to set up your development environment as explained in the [installation](#Running-locally-with-PyTorch) section. Make sure to set the `DATASET_NAME` environment variable. Here, we will use the [Pokemon captions dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as explained in the [installation](#Running-locally-with-PyTorch) section. Make sure to set the `DATASET_NAME` environment variable. Here, we will use the [Pokemon captions dataset](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions).
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch train_text_to_image_lora_prior.py \ accelerate launch train_text_to_image_lora_prior.py \
--mixed_precision="fp16" \ --mixed_precision="fp16" \
@@ -55,7 +55,7 @@ check_min_version("0.28.0.dev0")
logger = get_logger(__name__, log_level="INFO") logger = get_logger(__name__, log_level="INFO")
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
@@ -56,7 +56,7 @@ check_min_version("0.28.0.dev0")
logger = get_logger(__name__, log_level="INFO") logger = get_logger(__name__, log_level="INFO")
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }