Google used Cleanlab to find and fix label errors in millions of speech samples across different languages, to quantify annotator accuracy, and provide clean data for training speech models.

“Cleanlab is well-designed, scalable and theoretically grounded: it accurately finds data errors, even on well-known and established datasets. After using it for a successful pilot project at Google, Cleanlab is now one of my go-to libraries for dataset cleanup.”


Wells Fargo used Cleanlab to train accurate (F1 score = 80) ML classifiers on financial data with with 40% label noise.

“We used cleanlab for finding label errors in financial text data, helping us find label errors in our human annotation process. I like cleanlab more than alternative solutions because it's like 'bring your own model' & 'bring your own data', by acting like a wrapper around your model, it's superbly easy to implement, and it works well even when the model itself is not decent in classification due to relatively high noise rate (40% noisy) to achieve a consistent f1 score around.”


The CEO of OpenTeams and Founder of Anaconda shares OpenTeams success using Cleanlab for data preparation.

“I am excited by the cleanlab 2.0 project. We use this successfully at OpenTeams. I haven’t seen something this interesting in the space of data-preparation and labeling since snorkel.”


The Stakeholder Company reduced time spent by 8x in their ML data workflow by using Cleanlab to order data by label quality.

“We used Cleanlab to quickly validate one of our classifier models’ predictions for a dataset. This is typically a very time-consuming task since we would have to check thousands of examples by hand. However, since Cleanlab helped us identify the data points that were most likely to have label errors, we only had to inspect an eighth of our dataset to see that our model was problematic. We later realized that this was due to a post-processing error in the dataset — something that would otherwise have taken a much longer time to notice.”