top of page

I can just use AI generated images for training. Can't I?

  • chrislongstaff1
  • Sep 3
  • 4 min read

Updated: Sep 16

The pitfalls of Gen AI for training networks


Introduction


him the title of "The GANfather" by the respected MIT Technology review. This introduction

of GAN techniques led the way to the development of countless text to image, but more importantly spurred research and development in the field to produce competing techniques such as Diffusion models.

Today we see both techniques in common use in countless AI based image and video creation tools such as those on This X does not exist , OpenAI's Sora and Midjourney.



Wow, those images look amazing


You're right, we can use many web based tools, or even download and train our own networks (if you have the vast compute resource and training data needed to perform such a task, online speculation that Google could have used over a billion minutes of video to train VEO), to create images that at first glance appear amazing. Indeed, if I was creating a Powerpoint presentation for an internal meeting, I wouldn't hesitate to use an image like the below.


"Generate an image of a child crossing the road in a busy city"
"Generate an image of a child crossing the road in a busy city"

I sense a "But" coming here


You are of course correct!! There is a but, a big BUT.

For many use cases, especially around ML, those images have serious issues.


  1. Hallucinations

Imagining content that isn't really there, or placing things inaccurately. Look at the image above of the child crossing the road. Several issues are immediately apparent, in a busy street it is unlikely that a child would be alone in the middle of the road, the lighting reflected on the damp road is inconsistent, the traffic lights are facing the wrong way up a one way street, the depth of field is very limited and so on.


Some more examples below:


City at night, where cars appear to be driving the wrong way, and even worse, half way down the street, they switch direction. People seem to be walking up and down the road from the crossing, not crossing the street, and for some reason are stood in a very straight line! Text on the buildings and busses is mostly illegible. I could go on, but, you get the idea; an ML model trained on this type of content is going to learn some very bad habits indeed!


ree


  1. Rights Issues


What do you mean rights issues? But my provider x,y,z says I can use the images however I like, so long as it is not for illegal purposes. Well that is true, but are you confident, that all the images used to train the model have all been lawfully used? What happens if someone exercises a right to be forgotten? Are you sure that the companies have retrained the models without that person in there?


  1. New data and Model Collapse


Humans and machines are similar in that they thrive on new data. Generative data is "old data". It doesn't come up with new concepts, it simply mixes together the old concepts. Think of it like giving someone eggs, sugar, flour, butter. They can bake a cake. They can play with the quantities to make it richer, sweeter, lighter. But ultimately it will be a plain cake. They will also have a lot of failed, soggy messes and inedible lumps of flour. They can't create a lemon cake, a fruit cake or a chocolate cake. For that they need new ingredients ("data").

In the worst case, the lack of new data can lead to the so called cake model collapse, such as detailed in the note by Shumailov et al.


  1. Controllability and Repeatability


Training a model is an iterative process. Honing in on the correct parameters. but also honing in on the correct data. As you exercise your model, you will come across more and more use cases where the model does very well; but also those where it fails to perform. For those failure cases, you will often realize that the data you trained on is biased and needs to be augmented with new (but similar content), such as changing lighting, or population.



Synthera training data showing the small, but important changes in data, precisely controllable
Synthera training data showing the small, but important changes in data, precisely controllable

  1. Annotations


And I have saved perhaps the biggest issue for last ... Annotations. Key for any ML training is the annotations that are provided with an image. Sure I can do basic hot-dog/not hot-dog training without, but for anyone wanting to do more complex training, accurate and advanced annotations are essential. Even simple bounding boxes are a step to far for AI generated images, let alone skeleton key-points, instance semantic segmentation, pose, gaze direction , distance data, velocities, all of which are easily and automatically generated for Synthera's computer graphics based synthetic data.



Synthera Chameleon generates pixel accurate and comprehensive annotations automatically
Synthera Chameleon generates pixel accurate and comprehensive annotations automatically


Conclusion


Extreme care needs to be taken if you are considering using AI generated data to train or test your ML network. A far better approach is the use of a combination of real-world annotated data, and 3D computer graphics based synthetic data.





 
 
2.png

We'd love to hear your thoughts on this article

bottom of page