MIT, Google Using Synthetic Images to Train AI Image Models

Researchers describe a new method for creating highly detailed AI images, using training data made up of AI-generated images

Ben Wodecki, Junior Editor - AI Business

November 28, 2023

2 Min Read
MIT and Google researchers developed a new technique that generates highly detailed images in image generation models.AI Business via Dall-E 3

Upon launch, OpenAI’s DALL-E 3 wowed users with its ability to generate highly detailed images compared to prior versions. OpenAI said the model's improved ability to do so came from using synthetic images to train the model. Now, a team of researchers from MIT and Google are expanding on this concept, applying it to the popular open source text-to-image model Stable Diffusion.

In a newly published paper, the researchers described a new approach to using AI-generated images to train image generation models that they call StableRep. It uses millions of labeled synthetic images to generate high-quality images.

The researchers said StableRep is a “multi-positive contrastive learning method” where multiple images generated from the same text prompt are treated as positives for each other, which enhances the learning process. That means an AI image generation model would view several variations of, for example, a landscape and cross-reference them with all descriptions related to that landscape to recognize nuances based on those images. It would then apply them in the final output. This is what creates a highly detailed image.

Outperforms rivals

The MIT and Google researchers applied StableRep to Stable Diffusion to make it outperform rival image generation models such as SimCLR and CLIP, which were trained with the same text prompts and corresponding real images.

StableRep achieved 76.7% linear accuracy on the ImageNet classification with a Vision Transformer model. Adding language supervision, the researchers found that StableRep, trained on 20 million synthetic images, outperformed CLIP, which was trained on 50 million real images.

Lijie Fan, a doctoral candidate at MIT and lead researcher, said that their technique is superior as it “not just feeding it data.” “When multiple images, all generated from the same text, all treated as depictions of the same underlying thing, the model dives deeper into the concepts behind the images, say the object, not just their pixels.”

StableRep does have its flaws. For example, it is slow to generate images. It also gets confused on semantic mismatches between text prompts and the resultant images.

StableRep’s underlying model, Stable Diffusion, also needed to go through an initial round of training on real data – so using StableRep to create images will take longer and likely be costlier.

Access StableRep

StableRep can be accessed via GitHub.

It is available for commercial use – StableRep is under an Apache2.0 License, meaning you can use it and produce derivative works.

However, you would have to provide a copy of the Apache License with any redistributed work or derivative works and include a notice of the changes. The license also includes a limitation of liability, where contributors are not liable for any damages arising from the use of the licensed work.

This article first appeared on IoT World Today's sister site, AI Business.

Read more about:


About the Author(s)

Ben Wodecki

Junior Editor - AI Business

Ben Wodecki is the junior editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to junior editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others.

Sign Up for the Newsletter
The most up-to-date news and insights into the latest emerging technologies ... delivered right to your inbox!

You May Also Like