Deep learning is a fascinating field where superhuman performance can be achieved on several applications. However, not too long ago, working with images at scale was a luxury only large tech companies could afford. Besides the substantial computing power required to build a performant system, the software tools used were exclusive to those who had studied the craft for several years.
Throughout time, the power of deep learning has arrived in the hands of the masses: High-level libraries like Keras turned mathematically complex operations in straightforward commands. Stiff competition in cloud computing has brought us infinite and comparably cheap computing power onto everyone's laptops and phones. Finally, transfer learning has revolutionized the way we think about training neural networks to unseen levels of accuracy.
"Easy" and "simple" can mean very different things for different people: A world-class musician will be able to pick up new instruments with relative ease and quickly advance. The same goes for technology – if the above sounds straightforward to you, chances are that you are a software engineer or data scientist already. However, it is more likely that you are neither of the two.
We are building colabel with people in mind who are similar to you: People who understand what an image classifier does and how it can be used in a business context but lack the time and/or ability to build one from scratch. Because the interesting part is not the mechanical process of building the classifier itself. It is what you do with it.
Define the problem. This is where many people struggle because the task at hand appears as too big and complicated in order for something as simple as image classification can make a difference. Unless you already know how to make use of it, it usually helps to break the problem into smaller pieces: Many processes can be significantly simplified through classification, be it through sorting files to a folder, identifying certain characteristics, triggering an alert or maybe just rotating the image. Sounds too simple? If you have ever rotated a few images on a Windows PC, you know the damage it does for a few thousand.
Get and label your data. The next question revolves around getting and labeling your data. A few questions to consider: Do you have immediate access to it? Is there some way to automate the workflow later on, e.g. through automation tools like Zapier? If so, chances are high that you are really onto something. If not, investing time on this part pays handsome dividends. The best image classifier (or really any machine learning model) is only worth little if the data needs to be touched by hand. This is not to say that there will be no value at all – it will just be many times higher in an automated setting.
Labeling the data is a topic on its own. In some cases, data is already labeled in the right way. For instance, files may be neatly sorted in their respective folders or have been renamed in a useful way. Even if data needs to be labeled, fear not: Modern image classifiers require around 100 examples per class. Going back to the rotation example, one might choose "left", "upside down", "right" and "correct", which would require around 400 files to get started with. If all you need are the ones that are not correct, we are down to 200 only.
Train the classifier. In the traditional world, this is where it gets technical. The illustration below shows a typical way of how an engineer would get from data to a deployed model. At all stages there are alternatives but you can think of this chain when you hear an engineer saying "I am still working on it". Or think of a nice cup of coffee.
With colabel, this process takes only a fraction of the time. In fact, once you have the data prepared, you can get to a working image classifier in less than one hour. We save the technical details for a later post but generally speaking, we chained a series of activities into an automated process which does all the heavy lifting for you. If you need an analogy, think of it in a way as phone does not require you to understand telecommunication at its core – you just use it.
Evaluate the results. In the phone example, you immediately know if the connection breaks down. Everything is happening in real-time and you are part of the process at all times. However, an image classifier works best in the dark so that you can move on to more important tasks. Therefore you want to measure how well your task is handled.
Improve over time. Naturally, the machine will not be 100% convinced in every not all predictions will be made with 100% confidence. In that case, you want to be sure to see the case before information is transmitted to other systems. We call this process Human in the loop: The software automatically asks you to manually label the example. The quickest way to do this is sending the question to a Slack channel but there are other ways to deal with it.
This manual effort is put to great use: Every time there is a human intervention, your custom model improves. As a result, the model's performance gets better and better.
We take it for granted that software is learning our usage patterns, such as Apple's mail app suggesting emails to be moved into the correct folder or Facebook to know whom we spoke to during a friend's house party.
But these developments do not invent themselves and require someone like you to come up with the idea in the first place. It then requires a crafty software developer, access to certain infrastructure and sufficient time to make it come to reality. Our ambition is to arm people with the power of deep learning and allow them to solve interesting problems – one model at a time.