Use case

Enriching (millions of) housing images with meta data

What if there was a machine that looks at millions of images? One of our customers has one.
Enriching (millions of) housing images with meta data

Executive summary

Tracking thousands of user-generated pictures is not an easy task. Yet it is essential to ensure the relevance of the content seen by visitors within an online platform.

colabel empowers a client of ours to classify all images posted on its marketplace and thereby providing them with insights on what actually happens on the platform. All this without having to hire additional people or sourcing from an AI consultancy.

Our customer

The client is a niche listing site for vacation rentals that offers direct instant booking and commission free advertising to its users. Key to the success of listings – i.e. bookings – is ensuring a high quality content and especially images uploaded by its users.

Challenge

Allowing user-generated content is a must for marketplaces and other online companies in order to permit a high level of relevance of listings or posts. E-commerce, social networks, and peer-to-peer platforms count thousands of users, who then are likely to generate dozens of pictures in a single post. However, having lots of content on the platform comes with a big burden: As the platform grows, this quickly translates in million images to keep track of.

In order to effectively perform this task at scale, start-ups and small businesses need to be endowed with deep technical expertise in the field of Deep Learning. But since these skills are typically out-of-core for the majority of them (at least in the beginning), they are forced to hire an army of workers (who go by the fancy term Content Moderators) or ask external consultants/freelancers for support.

In this specific case, our customers had to track around 20 images for each of the 50.000+ apartments listed on his platform. All lacking even basic information: inside/outside, room, furniture, image quality, etc. Being unable to cope with the sheer volume of images, our customer was suffering from the perceived quality of user-generated content.

Objective

The goal of our client was clear: Find a fast solution to generate reliable data about their existing images in a scalable, cost-effective, and simple way.

In addition, the desired solution had to be integrated with the client's running processes: Meta data should be generated for all the pictures posted together with new listings. In synthesis, the goal was creating an ongoing automated process to classify housing images on their website.

The final process should have the following features:

  • Connect every new upload to their platform with a customized image classifier
  • Automatic classification of new images through the model, thus enriching them with the right meta information
  • Integrate back the enriched images onto the platform to allow the client's team getting insights into the content and providing actionable information for further automation

What our customer built

The company used colabel to build a customized deep learning model: a multi-label image classifier based on their very own training data:

Building an end-to-end solution with colabel

More concretely, our customer went through the following steps:

  1. Integrate their platform with Colabel's API
  2. Define the upload of new images as the trigger event
  3. Define classes (15) and collaboratively label an training dataset in with colabel's Slack integration (1000 per class), totaling 15,000 labeled examples
  4. Build the model and evaluate results
  5. Integrate the created model with their processes and start using our API on a daily basis

Result

Within a matter of few hours on our platform, our client was able to not only develop an own image classifier but also gain insights into fundamental aspects about their very own business: How many listings actually had a pool? What percentage of images is of poor quality and what impact does it have on bookings? Are there any correlations between furniture and listings price?

Connecting the API to the systems was only a matter of minutes. The rest was done without hiring additional technical people with the right background and going through long (and costly) phases of experimentation. And thanks to colabel's integrated human-in-the-loop process, their model keeps getting better every day.

Ready to build your own?