Categories: PhotoTools

Training AI to know what you actually need to clip

Easy, hard, sometimes impossible: what do we do to develop Background Remover, and what’s the most challenging? (Spoilers: it’s not about drawing around your hair).

Background Remover should work just like that: users upload content, click the button, and receive the exact result that they need. Is it only about clipping the background out? Not really.

How did it start

This product has come a long way in development. As the primary dataset, we used photos from our photo library Moose and our retouchers manually clipped objects in the photos. To train our neural network, we pass to the AI an object with a background and an object without a background, and the AI learns from these examples. Basically, the result of a real person’s work became an example for an AI.

Just cut it out

In some cases, our tool already meets the user’s expectations. Background Remover does an excellent job with images that focus on one subject. Here it performs as a real one-click tool.

Easy case: Background remover just cuts the environment

 

But if you start training the network just to cut out a simple background and collect a dataset of objects on a solid colored background only, the neural network will learn to remove the color, not the background. This will not work on real data. Users upload any pictures they want with different types of backgrounds or scenes. Which of these should we include?

Clip it or leave it?

In real life, we often face questionable situations where our neural network has human-like concerns about the user’s purpose.

For what exactly do you need it?

There are several elements here, and the neural network detected them all and “chose” to include them in the result:

Case with objects in the background: Background Remover decided to include them in the result

 

Yes, Background Remover did a great job with these two men in the background, but they’re out of focus and blurry. Thus, it makes it impossible to use them in other scenes. It’s hard to come up with a context where it would be appropriate. Do the users need these objects in their results?

This is how we came up with the principle of portability: to what extent can we reuse a cut-out object in another scene?

In the example above, from the point of this principle, our tool did not work quite as it should, this is one of the directions in which we are now developing our AI.

Who is the main character of the scene?

Sometimes an object is only partially clipped, this means that the neural network itself is not certain whether to clip it or not. Here is an example of AI’s uncertainty:

Medium level: we have a crowd but with a well-defined focus

 

For the human eye, it is a feasible task to tell that these two women in the camera’s focus are more likely the ones that the user needs. But here, even for us, it is difficult to say what is the main thing in the composition:

Hard level: even the human eye can’t detect, what’s the main object here

When persons are interacting with an objects

Should we include the object when a person interacts with it? How to understand the difference between necessary and unnecessary objects and their parts in the image? To train our AI, we usually clip the people along with the objects that the people in the photo are interacting with. However, here we see that sometimes Remover doesn’t define the object’s borders completely.

Background remover identified an object but couldn’t define the whole shape of it

Salient object detection

As you see, the detection of the scene is something that just doesn’t have a correct answer. When the neural network must determine significant objects and highlight them, there is Salient object detection.

In our training process, we have divided images into several subclasses: simple images with one object on a plain background, portraits, landscapes, and illustrations. There are two possible approaches: train the network on each class separately, or balance the dataset so that it covers everything. We choose to follow the second.

The main challenge of this approach is not to create the datasets themselves but to understand if somewhere it has become worse after retraining the network. There are metrics that show improvements in numbers, but visually the result may become worse, even though the numbers are growing in reports. We can not rely only on stats, so we should always have a finger on this process’s pulse. Nobody wants to throw the baby out with the bathwater.

So, what’s next?

Well, in a not far future, we strive to make Background Remover a tool that divides the image into layers that can be switched on and off by the user. Also, the neural network can even draw new objects and backgrounds by itself to replace the removed ones, but these are already far-reaching plans.

icons8

Recent Posts

5 Stunning Graphic Design Portfolio Examples and How to Make Your Own

An impressive graphic design portfolio is a must-have for visual artists seeking freelance work or…

1 week ago

Curated graphics, image organizer, and AI tools in a single app

Store, organize, and manage your photos, screenshots, and reference images right next to Icons8 graphics.

3 weeks ago

AI-generated content as a tool to create art

Neuroscientist composes AI-generated art videos awarded at international festivals. We asked him about his experience.

1 month ago

100,000 humans that don’t exist

Generated Photos are giving away a set of 100K full-bodied photos of people generated completely…

2 months ago

Building image search for an icon stock: how it works

What's under hood of our search by image technology? Not only machine learning.

2 months ago

Editing live web pages from a graphic app is no longer a dream

Lunacy users can now make edits to the design of live web pages right from…

2 months ago

This website uses cookies.