Case Studies | Cleanlab Technology

Untitled

Google used Cleanlab to find and fix label errors in millions of speech samples across different languages, to quantify annotator accuracy, and provide clean data for training speech models.

“Cleanlab is well-designed, scalable and theoretically grounded: it accurately finds data errors, even on well-known and established datasets. After using it for a successful pilot project at Google, Cleanlab is now one of my go-to libraries for dataset cleanup.”

Patrick Violette, Senior Software Engineer at Google

Berkeley Research Group increases ML model accuracy by 15% and reduces time spent by 1/3 using Cleanlab Studio.

“We've started relying on Cleanlab to improve our ML and AI models at Berkeley Research Group LLC for over a month... I have to say, I'm impressed. Here's what we found:

Increased model accuracy by 15%

Improved explainability & addressed performance impediments

Cut out training iterations by one-third

Overall performance improvement for our Data Science team”

Steven Gawthorpe, Senior Managing Consultant Data Scientist at Berkeley Research Group

Untitled

The CEO of OpenTeams and Founder of Anaconda shares OpenTeams success using Cleanlab for data preparation.

“I am excited by the cleanlab 2.0 project. We use this successfully at OpenTeams. I haven’t seen something this interesting in the space of data-preparation and labeling since snorkel.”

Travis Oliphant, Founder of Anaconda, NumPy, SciPy, and the CEO of OpenTeams

The Stakeholder Company reduced time spent by 8x in their ML data workflow by using Cleanlab to order data by label quality.

“We used Cleanlab to quickly validate one of our classifier models’ predictions for a dataset. This is typically a very time-consuming task since we would have to check thousands of examples by hand. However, since Cleanlab helped us identify the data points that were most likely to have label errors, we only had to inspect an eighth of our dataset to see that our model was problematic. We later realized that this was due to a post-processing error in the dataset — something that would otherwise have taken a much longer time to notice.”

Seah Bei Ying, Data Analyst at The Stakeholder Company

*“Cleanlab Studio is a very effective solution to calm my nerves when it comes to label noise! I hope to be able to incorporate it in our future in-house annotation pipeline for NLP.

Oh, and another thing. I really like that Cleanlab is based on principled, solid research (check this link for papers https://lnkd.in/dntFfGuq) and the way that it's been translated into a product that's very intuitive to use. I'm looking forward to see how this will evolve!”*

Fredrik Olsson, Head of Data Science at Gavagai.io

Untitled

One of the largest financial institutions in the world, Banco Bilbao Vizcaya Argentaria, uses Cleanlab to reduce label costs by over 98% and boost model accuracy by 28%.

“Cleanlab helped us reduce the uncertainty of noise in the tags. This process enabled us to train the model, update the training set, and optimize its performance. The goal was to reduce the number of labeled transactions and make the model more efficient, requiring less time and dedication. With the current model, we were able to improve accuracy by 28%, while reducing the number of labeled transactions required to train the model by more than 98%.”

- David Muelas Recuenco, Expert Data Scientist at BBVA