Online shopping has become increasingly popular in recent years, but one persistent challenge has been the difficulty of knowing how clothes will look and fit before making a purchase. This often leads to disappointment and a high rate of returns when items don’t match expectations. However, a new AI-powered solution is poised to change the game – virtual try-on for apparel.
How Virtual Try-On Technology Works?
Virtual try-on solutions employ two main approaches:
- “Real-time solutions”: use a live video feed from the user’s camera and computer vision algorithms to track the user’s movements, continuously updating the virtual try-on display in real-time.
- “Non-real-time solutions”: allow users to upload a photo or video of themselves and view how the product would look in a static image or video format, without requiring a live video feed.
Benefits of Virtual Try-On for Customers and Brands:
For customers, virtual try-on provides a more confident and satisfying shopping experience by reducing the uncertainty of online purchases. It allows them to test products at their own pace until they find the perfect fit, reducing the likelihood of returns.
For fashion brands, virtual try-on can boost conversion rates, reduce costs associated with returns, and provide valuable data on customer preferences and buying behaviour. By enhancing the online shopping experience and reducing friction, virtual try-on can help brands stay competitive in the rapidly evolving e-commerce landscape.
Let’s build our own virtual try-on:
The virtual try-on technology is currently in active development, with major tech companies and fashion brands investing heavily in its advancement. As AI and computer vision technologies continue to evolve, we can expect to see virtual try-on solutions become increasingly sophisticated and accurate.Check out the Google’s Virtual Try-On which is under development,
While virtual try-on tech is getting a lot of attention from big companies, it’s still under development. But that doesn’t mean you have to wait! With the power of open-source AI and computer vision models, we can create our own virtual try-on experiences. Let’s build our virtual try-on tool with a powerful combination of open-source models! We’ll be using two champions
1. Stable-diffusion-xl-inpainting
2. IP-Adapter
“Stable Diffusion Inpainting”: This powerhouse diffusion model takes text descriptions and brings them to life as photorealistic images. The magic doesn’t stop there – it’s also an expert at inpainting, which means it can seamlessly fill in specific areas of an image based on your instructions. Imagine using a mask to define the area where you want new clothes to appear, and Stable Diffusion Inpainting, guided by IP-Adapter’s image prompt, will generate a realistic image of you wearing those clothes!
IP-Adapter: This isn’t your average inpainting model. It’s an image prompt adapter, specifically designed for diffusion models like Stable Diffusion. Think of it as a translator between images and text prompts. IP-Adapter can understand an image you provide (like a specific dress) and translate it into a prompt that Stable Diffusion can understand.
Implementation:
Install Libraries
pip install diffusers accelerate
Import Libraries
from diffusers import AutoPipelineForInpainting, AutoencoderKL
from diffusers.utils import load_image
import torch
Load the Pipeline
This vae reduces the amount of RAM needed to run this model:
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
Let’s load the SDXL 1.0 Inpainting model into a pipline:
pipeline = AutoPipelineForInpainting.from_pretrained(“diffusers/stable-diffusion-xl-1.0-inpainting-0.1”,
vae=vae,
torch_dtype=torch.float16,
variant=”fp16″,
use_safetensors=True
).to(“cuda”)
This is where we load the IP-Adapter. This adapter connects to our inpainting pipeline and allows us to give the pipeline an image prompt (or IP):
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin", low_cpu_mem_usage=True)
Loading the Main Image:
We’ll begin by loading the main image containing the person of interest. These images should be in JPG format and have a resolution of 1024×1024 pixels. The load_image() function allows you to load an image from either a local file path or a publicly accessible URL. It’s important to note that the loaded images must be compatible with the Python Imaging Library (PIL).
from diffusers.utils import load_image
image = load_image(‘/content/white_tshirt.jpg’).convert(“RGB”)
image
Next we’ll load our clothing image. I highly suggest using a clothing image that doesn’t have a person in it like the example below.
ip_image = load_image(‘/content/yellow-tshirt.png’).convert(“RGB”)
ip_image
Finally we’ll need to create a mask for the inpainting pipeline. There are a number of ways to do this. We can create the mask by simply drawing it or (since we’re all AI enthusiasts here) by using an image segmentation model! The main benefit of using an image segmentation model is it will generate the mask automatically so that all we need to do is provide the person image and the clothing image and the code will take care of the rest.
Another thing we need to consider is which part of the person are we masking. The torso? The arms? The entire legs? The whole body? To answer this we would need to know what type of clothing is being diffused into the image. For example a sleeveless mini dress would only cover some of the torso and some of the legs, whereas a long sleeve gown will cover the entire body. In order to account for most types of clothing I propose masking the entire body. It is important to not mask the face so that the face in the final image looks like the face in the original image.
To do the body segmentation, let’s use my segment body module which can generate a body mask with 1 import statement and 1 line of code:
from SegBody import segment_body
seg_image, mask_image = segment_body(image, face=False)
Generate Image:
The inpainting model should prioritize the clothing image (IP image) completely, so set_ip_adapter_scale must be set to 1.0. This ensures the generated clothing aligns closely with the provided clothing image.
pipeline.set_ip_adapter_scale(1.0)
With everything prepared – the inpainting pipeline, loaded images, and chosen parameters – we can finally generate the image! I achieved good results with these settings, but feel free to experiment and adjust them to your liking.
final_image = pipeline(
prompt=”photorealistic, perfect body, beautiful skin, realistic skin, natural skin”,
negative_prompt=”ugly, bad quality, bad anatomy, deformed body, deformed hands, deformed feet, deformed face, deformed clothing, deformed skin, bad skin, leggings, tights, stockings”,
image=image,
mask_image=mask_image,
ip_adapter_image=ip_image,
strength=0.99,
guidance_scale=7.5,
num_inference_steps=100,
).images[0]
Result:
Absolutely! We’ve built a functional Virtual Try-On system using just a dozen lines of Python code. That’s impressive, isn’t it?
The Future of Virtual Try-On:
As AI and computer vision technologies continue to advance, virtual try-on is poised to become an essential feature for fashion brands and retailers. By providing a more engaging, personalized, and convenient shopping experience, virtual try-on has the potential to revolutionize the way we discover, try on, and purchase clothes in the digital age.
In the near future, we can expect to see virtual try-on solutions become increasingly sophisticated, with features such as real-time fabric simulation, personalized recommendations based on user preferences, and seamless integration with e-commerce platforms. As consumer expectations continue to evolve, brands that embrace virtual try-on technology will be well-positioned to thrive in the competitive online retail landscape.