Working with PDFs sometimes means trouble. Especially, when dealing with dozens every single day: filing, delegating, forwarding, following-up and checking. No matter how minor the task may be, it creates work.
A customer of ours has automated the process for incoming PDF documents by training his own colabel model to identify a document and then trigger the appropriate action. As a result, they were able to save money and keep their employees focused on the relevant tasks.
The company is a digital healthcare provider that has completely digitized the process of a range of initial diagnostics. In this case, our customer connects patients with one of the thousands of local healthcare providers who are active on their platform. This helps lowering the barrier to consulting the appropriate specialist and thus allowing hundreds of thousands of customers to improve their quality of life.
Handling documents is a common problem for many businesses: They come through a variety of channels, they require some form of human intervention and need to be stored or archived in certain places in order to be available later on.
Our customer noticed that there was a certain pattern behind dealing with a variety of files. Most had a set of business rules that could be applied – but only once the type of document was known:
- Invoice: Upload to bookkeeping system
- CV: Import to HR tool
- Contract: Upload to CRM system
Working with the actual contents of the documents certainly requires common sense, specific skills and concrete actions – whereas the sheer receipt, handling and archiving does not. And even though the thousands of files are spread across many heads, the combined effort resulted in big productivity losses throughout the firm.
There had to be a way to structure the large amount of incoming documents in an effective manner. The question was just how.
The objective regarding the problem of dealing with PDFs was obvious: find a way to avoid spending hours and hours on tedious tasks like forwarding, saving or checking countless PDF documents.
The company wanted a solution which allowed them to import documents automatically from all kinds of sources (e.g. email inbox, shared drives, ...), categorize the document into one of the range of possible types and then further process the document according to pre-defined business rules.
The solution needed to be easily integrated into our customer's existing workflows, while also offering the necessary capabilities to fully automate the previously manual processes. One of the main objectives for our customer was that they neither wanted to hire any extra capacity to solve this problem nor use any of the existing engineering capacity on this task. What they wanted was a simple, quickly implementable and powerful solution.
To conclude, the solution needed the following features to fulfill the customer's requirements.
- Connecting external data sources (e.g., mail account, drive, or database)
- Automatic identification of the document type (e.g., invoice, CV, or contract)
- Further processing the PDF document depending on the prediction of the ML model
What our customer built
Once the objective was clear, our customer developed the to-be process that would cater for their needs. They decided to make incoming emails with an attachment the starting point: This would be the trigger that sets the system in motion.
After connecting their email inboxes using a workflow automation tool, the customer uses newly received email attachments as the trigger to start the process. The colabel model then analyzes the attachment and automatically recognizes the type of document and returns . Depending on whether the PDF file is an invoice, a CV, or maybe a contract, the document is then processed according to pre-defined workflows using the workflow automation tool.
With the model and workflow were up and running, the only human interaction that is involved is when the model is not confident enough in regards to a specific PDF. Verifying the prediction of the model is a matter of seconds. Still, this last tiny bit of human interaction serves a double purpose: with every new input the model keeps improving and the necessary degree of manual interventions decreases over time.
Using colabel, our customer was able to eliminate almost all document processing related work, except for the occasional need for human intervention. Apart from the obvious advantage of saving money, it also ensures that employees are not disrupted and can be more effective in their work.