Text-to-Image with StableDiffusionPipeline
In this post, I’ll delve into the capabilities of the StableDiffusionPipeline for generating photorealistic images based on textual inputs.
Text-to-Image
Continuing from the previous post, I initiated the environment setup:
cd stable-diffusion
conda activate ldm
Subsequently, I installed the necessary libraries, diffusers and transformers:
pip install --upgrade diffusers[torch] transformers
To begin, let’s explore the diffusion pipeline:
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")
image = pipeline("An image of a squirrel in Picasso style").images[0]
image.save("squirrel-image.jpg")
Textual Inversion
Referencing from Textual inversion, the StableDiffusionPipeline supports a fascinating technique allowing models like Stable Diffusion to learn new concepts from a few images.
To employ Textual Inversion embedding vectors, as outlined in Image-to-image, let’s download the charturner v2 embeddings:
wget https://huggingface.co/AmornthepKladmee/embeddings/resolve/main/charturnerv2.pt
Now, let’s apply textual inversion:
from diffusers import StableDiffusionPipeline
import torch
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
pipe.load_textual_inversion("./charturnerv2.pt", token="charturnerv2")
prompt = "charturnerv2, multiple views of the same character in the same outfit, a fit character for a RPG game in best quality, intricate details."
image = pipe(prompt, num_inference_steps=50).images[0]
image.save("character.png")
Image to Image
Next, leveraging the previously generated carton-insect.png from the earlier post, let’s explore the image-to-image pipeline:
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline
import torch
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
init_image = Image.open("cartoon-insect.png").convert("RGB")
pipe.load_textual_inversion("./charturnerv2.pt", token="charturnerv2")
prompt = "charturnerv2, cartoon insect"
images = pipe(prompt, image=init_image, strength=0.75, guidance_scale=7.5, num_inference_steps=50).images
images[0].save("cartoon-insect-1.png")
Animagine XL 2.0
Introducing Animagine XL 2.0 an advanced latent text-to-image diffusion model tailored for creating high-resolution anime images. It is fine-tuned from Stable Diffusion XL 1.0 (SDXL) using a premium anime-style image dataset.
Let’s try out the sample code with some tweaks:
import torch
from diffusers import (
StableDiffusionXLPipeline,
EulerAncestralDiscreteScheduler,
AutoencoderKL
)
# Load VAE component
vae = AutoencoderKL.from_pretrained(
"madebyollin/sdxl-vae-fp16-fix",
torch_dtype=torch.float16
)
# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
"Linaqruf/animagine-xl-2.0",
vae=vae,
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16"
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')
# Define prompts and generate image
prompt = "face focus, cute, masterpiece, best quality, 1girl, red hair, sweater, looking at viewer, upper body, smiley, outdoors, daylight, blouse, earings"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, hat"
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=1024,
height=1024,
guidance_scale=12,
num_inference_steps=50
).images[0]
image.save("./animagine.png")
Stable Diffusion XL
Stable Diffusion XL (SDXL) is a powerful text-to-image generation model that iterates on the previous Stable Diffusion Models.
Let’s try with the text-to-image by passing the prompt. By default, SDXL generates a 1024x1024 image for the best results.
from diffusers import AutoPipelineForText2Image
import torch
pipeline_text2image = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")
prompt = "Majestic dragon flying, huge fireworks in the form of Happy CNY 2024, detailed, 8k"
image = pipeline_text2image(prompt=prompt).images[0]
image.save("majestic-dragon.png")
Wishing everyone a joyous and prosperous Chinese New Year 2024! Huat ah! ๐๐