Google+ Photo Search Detects More Than 1,000 Objects

Back in May, Google announced an impressive search feature that allows to find photos even if they don't include any useful metadata. "To make computers do the hard work for you, we've also begun using computer vision and machine learning to help recognize more general concepts in your photos such as sunsets, food and flowers." Here are more details: "This is powered by computer vision and machine learning technology, which uses the visual content of an image to generate searchable tags for photos combined with other sources like text tags and EXIF metadata to enable search across thousands of concepts like a flower, food, car, jet ski, or turtle."

Now Google announced that it detects more than 1,000 objects. It may not seem like a lot, but it's extremely difficult to detect objects algorithmically and do this with enough precision. Distinguishing between so many objects makes this task even more difficult. Google can now detect labradors and snowmen, tulips and umbrellas, laptops and shoes.



Google's announcement is strange because a Google post from June mentioned that the classifier already detected 1,100 classes of objects:

We came up with a set of about 2000 visual classes based on the most popular labels on Google+ Photos and which also seemed to have a visual component, that a human could recognize visually. In contrast, the ImageNet competition has 1000 classes. As in ImageNet, the classes were not text strings, but are entities, in our case we use Freebase entities which form the basis of the Knowledge Graph used in Google search. An entity is a way to uniquely identify something in a language-independent way. In English when we encounter the word 'jaguar', it is hard to determine if it represents the animal or the car manufacturer. Entities assign a unique ID to each, removing that ambiguity, in this case '/m/0449p' for the former and '/m/012x34' for the latter. In order to train better classifiers we used more training images per class than ImageNet, 5000 versus 1000. Since we wanted to provide only high precision labels, we also refined the classes from our initial set of 2000 to the most precise 1100 classes for our launch.

I'm not sure if there's some improvement I'm missing. It's likely that the search results are better, but the number of objects has not increased.