Chameleon - Zero to Model in half a day.
- chrislongstaff1
- Oct 15
- 7 min read
Updated: Oct 16

Introduction
Yes. It's possible. From nothing; no data, no model. To a trained vision model in half a day. Let's show you how ...
One of the key benefits of Synthera's Chameleon is the speed with which you can prototype a concept, and produce a working model within hours. And without risk of compromising yourself and your company by using data from unknown, or worse illegal sources.
In this blog, I'm going to run through the steps, taking an example of building an object detector model to detect the license plates on a vehicle. It's a simple example, but shows the concepts of:
Define the problem
Gather the Data
Train the model
Test the model
Define the problem
This example problem is to define the location on a vehicle of the license plate (number plate for fellow Brits like me!). This is typically the first step for any sort of OCR which could be used for ALPR/ANPR (perhaps this could be the topic of a later blog!), or could be used for identifying areas for redacting/blurring content for privacy reasons as another example. This should work across multiple different countries and vehicle types. The data we will require to train the model will be a wide variety of vehicle types and colors, each with license plates. In our use case, we will create the vehicles with a variety of license plates from differing geographies (such as US, UK, Japan, ...) but in practice, clearly the focus would be on the area of deployment.
Gather the data
As this is a proof of concept only, we plan to use a single scene (environment) to set up and gather the data, a typical city block. This obviously gives limited robustness to the model, but the objective here is to demonstrate the ease of use of the Chameleon tool, and the effectiveness of the output data, not in creating a fully robust model.
About Vehicles in Chameleon
To get the data we need, we will need to use a wide variety of vehicles. In Chameleon we can directly place vehicles into the scene, as shown in the video below. As you can see it is possible to change the color of the vehicle and have license plates on the front and rear that can be randomized for both the number as well as the state/country.
The numbers for the license plates are stored in simple text files, a separate text file for each country/region and vehicle type (e.g. passenger, commercial, military, diplomatic) so can easily be edited and added to to meet specific requirements. In this case we don't need to worry too much about specifics as we are interested in having a variety of letter/numbers in the plates, but as we are not using for OCR, we do not need to be concerned with bias/diversity for this dataset.

Automation
In order to effectively create the required large datasets, with minimal user intervention, Chameleon implements several automation features that allow the creation of large, relevant datasets in a minimum amount of time. This sections talks through some of the features used and helps the reader to understand the requirements to obtaining a large, diverse but relevant dataset
Picklists
Manually placing the vehicles would be a tedious and ineffective method. We overcome this in Chameleon by using "Picklists". These are lists of defined assets. They can be used in two ways. The first is as a method to spawn items into the scene (see video below). This allows a large number of random vehicles to be placed, but still has the disadvantage of requiring manual placement and is therefore not effective for our requirements.
Generators
Instead of this method, we utilize Chameleon's "Generators". Generators take a picklist, and automatically populate the generator area at runtime with a pseudo random element from the picklist. In our case, it will also randomize the license plates and the color of the vehicle - see the video below. In our city scene, cars placed on the road will also automatically follow the road route, including stopping as required at lights.
Cloud Cameras
Chameleon has sophisticated camera handling, including the ability to change camera parameters, use different camera types such as body cameras, and drone cameras, as well as handling multiple, cameras per scene, allowing simultaneous multi-view processing to be developed.
In this use case, to enable us to rapidly generate randomized views of license plates, we use the "Cloud Camera". This concept allows you to place a focal point, or focal volume to which a camera will point. Around that focal area, is a defined annulus. At run time, the camera will be randomly be assigned a position, on, or if desired above or below the annulus, with a defined max/min angle of view to the focal point. The position of the camera can be changed at user selected intervals, up to a new position for every frame.
Digital Humans
As a common occlusion for an LPR camera will be humans, it is important to have a wide selection of humans occurring within our images. Chameleon uses unique digital human technology to ensure that a wide diversity of synthetic humans can be used. A specific population diversity can be selected, looking at height ranges, skin tone ranges and so on, or a completely random population used. The video below shows the Chameleon Digital Human Editor, where your template can be created to set the required parameters for your digital human.
Passes
One of the key benefits of synthetic data is the ability of the platform to simulate differing lighting, time of day, weather conditions. As well as being able to vary e.g. pedestrian routes, pedestrian population make up and so on. Chameleon includes a "Pass System" which allows the user to vary simulation parameters between simulations automatically, so e.g. a simulation can be run at 00:00:00, 07:30:00, 12:00:00 with the other parameters unchanged, just by setting the starting conditions for each pass, as shown in the video below.
Putting the scenario together
The scenario editor makes it straight forward to create the data that we need. Putting together all the automation elements, we place our generators on the roads, where cars will automatically follow simple traffic rules, and drive around our scene. We create routes for pedestrians using circuits, so they can cross the roads, and cause typical occlusions that would be experienced by ALPR cameras whilst looking at license plates. The key steps to data are:
1) Chose the environment (scene) - Smart City
2) Setup the "actors" - In this instance the vehicles driving on the streets and the digital humans walking the streets, we use generators for this to randomize at run time.
3) Add the cameras - We use cloud cameras and regular cameras to maximize diversity
4) Setup environmental conditions (time of day, lighting, weather) - we will use the pass system to set up differing times of day and weather conditions.
The Simulation
After the scenario is set up, and the passes are set, it is time, in the words of the film director for the "Action". We set the simulation parameters up using a script. As we want to achieve a good variation in the data, we will use a relatively long simulation time, but with a low capture rate (the entropy of 2 consecutive frames at 30/60 fps from an ALPR system point of view is minimal), we will select 1fps, which gives a balance of entropy and gathering important events. As we are targeting training, we will also set the resolution to 640x640, as there is little point in using higher resolution, as the inferencing systems will likely scale any camera input to that resolution to help with real time performance on limited hardware. We will also record only required annotations as writing unrequired annotations to disk would slow the simulation.
When the simulation has run, we can check the output data. The annotations are written in a described format, and easily converted to Yolo, Pascal VOC, COCO json or other required formats. In this case I used some simple Python code to extract the license plate bounding boxes and write the data set in YOLO format.
Training
Having created our dataset, we need to train a model. For this we will use roboflow. The video below shows the process. But it is relatively straight forward:
1) Upload the Synthetic data
2) Split the Synthetic data between the train and validation data sets
3) Optionally upload some real data for the test data set. This real data can also be used in the validation dataset
4) IMPORTANT - check the labels are aligned and rename the class labels if required.
5) Select your model, decide if you want to do any pre-processing and augmentation.
6) Run the training. Depending on dataset size, this can take anywhere from 30 minutes to several hours.
Once the model is complete - see ours here - it is time to do some testing.
We found some test images on kaggle and used them in the model:
You can see the results of the model inferencing on our test data below.
Summary
We set out to show that we could train a model on synthetic data in a few hours. To that end we have achieved our goals. The model is of course not perfect, and has some lower confidences, some missed positives, and some false positives. But overall for going from zero at 08.00 to a model performing this well by lunch time ... you can make your own decision, but I think we can be more than satisfied with the result.
This wouldn't have been possible without the robust automation features built into Chameleon to ensure we can rapidly achieve, robust, relevant data:
Generators using picklist for vehicles and digital human templates for pedestrians
Vehicles randomize colours and license plates. Follow random paths on city streets
Digital humans used for pedestrians to maximise the variation of clothing, size, age, skin tone
Passes used to automatically run at varying times of day and weather conditions
Cloud Cameras used to change camera angle/view point
Auto Traffic system used to drive cars and stop at lights
