Fine-tuning large models is typically done on CUDA-enabled devices. Whether using consumer-grade GPUs or specialized AI accelerator cards, the cost is often high. These setups also demand substantial power and efficient cooling, which usually requires a large desktop workstation. Alternatively, you can rent cloud computing resources by the hour using platforms like Runpod or Lambda.ai. However, this still incurs significant costs and often requires considerable time to upload data from your local machine to the cloud.
Since Apple introduced its Silicon chip series, PyTorch has added support for the MPS (Metal Performance Shaders) backend on M1 and later devices, significantly improving compute performance on macOS. Thanks to the unified memory architecture of Apple Silicon, it’s possible to load larger models than what most consumer GPUs can handle, reducing the constraints imposed by limited VRAM. This allows developers to fine-tune models locally while still enjoying the portability of a laptop.
Compared to bulky desktop workstations, this offers a more convenient alternative—although it’s worth noting that in terms of raw training speed, CUDA-based setups still have a clear advantage.
In this experiment, I used the Diffusers library from Hugging Face along with its full-parameter fine-tuning example for Stable Diffusion XL (SDXL).
Device Information
Virtual Environment Setup
Python 3.12
Step 1. Download and install the Diffusers library
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
Step 2. Navigate to the fine-tuning example code
cd examples/text_to_image
python3.12 -m venv venv
source ./venv/bin/activate
pip install -r requirements_sdxl.txt
Step 3. Logging [Optional]
pip install wandb
wandb login # Enter your API key
Step 4. Start training using the provided train_text_to_image_sdxl.py script.
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/naruto-blip-captions"
python train_text_to_image_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--pretrained_vae_model_name_or_path=$VAE_NAME \
--dataset_name=$DATASET_NAME \
--resolution=512 \
--center_crop \
--random_flip \
--report_to="wandb" \
--proportion_empty_prompts=0.2 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--max_train_steps=10000 \
--learning_rate=1e-06 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--validation_prompt="a cute Sundar Pichai creature" \
--validation_epochs 5 \
--checkpointing_steps=5000 \
--output_dir="sdxl-naruto-model"
Training Statistics
-
Dataset size: 1,221 samples (~740 MB download)
-
Model size: approximately 15 GB download
-
Speed: 5–6 seconds per iteration, estimated total time 17–18 hours
-
Memory usage: 64–70 GB RAM
-
Disk space per checkpoint: around 30 GB
Note
-
Validation images are not visible
If you don’t enable TensorBoard or wandb for logging, validation images will not be saved. You need to add extra code to enable this.
# After generating images, add the following code (before `for tracker in accelerator.trackers:`)
import os
validation_images_dir = os.path.join(args.output_dir, "validation_images")
os.makedirs(validation_images_dir, exist_ok=True)
for i, image in enumerate(images):
image.save(os.path.join(validation_images_dir, f"{args.validation_prompt.replace(' ', '_')}_step_{global_step}_{i}.png"))
- "accelerate config" and "accelerate launch" commands do not work
-
"enable_xformers_memory_efficient_attention" does not support xformers
-
"use_8bit_adam" is not supported because bitsandbytes lacks GPU support
-
"mixed_precision="fp16"" is not supported, and bf16 is also unsupported
Step 5. Use the trained model for inference
from diffusers import DiffusionPipeline
import torch
MODEL_PATH = "path/your/model"
pipeline = DiffusionPipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.float16).to("mps")
prompt = "A naruto with green eyes and red legs."
image = pipeline(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("naruto.png")
For more details, refer to the official Hugging Face Diffusers SDXL training guide.
Comments