Cleanlab: Labeled Datasets that Correct Themselves Automatically // Curtis Northcutt // MLOps Coffee Sessions #105

MLOps.community - A podcast by Demetrios Brinkmann

Categories:

MLOps Coffee Sessions #106 with Curtis Northcutt, CEO & Co-Founder of Cleanlab, Cleanlab: Labeled Datasets that Correct Themselves Automatically co-hosted by Vishnu Rachakonda. // Abstract Pioneered at MIT by 3 Ph.D. Co-Founders, Cleanlab is an open-source/SaaS company building the premier data-centric AI tools workflows for (1) automatically correcting messy data and labels, (2) auto-tracking of dataset quality over time, (3) automatically finding classes to merge and delete, (4) auto ml for data tasks, (5) obtaining and ranking high-quality annotations, and (6) training ML models with messy data. Most of the prescriptive tasks (finding issues) can be done in one line of code with their open-source product: https://github.com/cleanlab/cleanlab. // Bio Curtis Northcutt is the CEO and Co-Founder of Cleanlab focused on making AI work reliably for people and their messy, real-world data by automatically fixing issues in any ML dataset. Curtis completed his Ph.D. in Computer Science at MIT, receiving the MIT Thesis Award, NSF Fellowship, and the Goldwater Scholarship. Prior to Cleanlab, Curtis worked at AI research groups including Google, Oculus, Amazon, Facebook, Microsoft, and NASA. // MLOps Jobs board   https://mlops.pallet.xyz/jobs MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links https://github.com/cleanlab/cleanlab https://cleanlab.ai/blog/cleanlab-history/ https://labelerrors.com/ https://l7.curtisnorthcutt.com/ https://nips.cc/Conferences/2021/ScheduleMultitrack?event=47102 https://www.youtube.com/watch?v=ieUOv1sQPlw https://cleanlab.typeform.com/to/NLnU1XZF Cameo cheating detection system: https://arxiv.org/ftp/arxiv/papers/1508/1508.05699.pdf   The Cathedral & the Bazaar book: https://www.amazon.com/Cathedral-Bazaar-Musings-Accidental-Revolutionary/dp/0596001088 --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Curtis on LinkedIn: https://www.linkedin.com/in/cgnorthcutt/ Timestamps: [00:00] Introduction to Curtis Northcutt [00:30] Difference between MLOps and Data-Centric AI [04:04] Realizing the problem of data quality in ML manifesting [05:11] Computer vision problems [06:54] War story that got Curtis into Data-Centric AI [13:50] Overview of Curtis' vision [14:45] PU Learning [21:25] Consistency Rate and Flipping Rate [25:25] One line of code [29:48] Models makes mistakes   [33:09] Cleanlab play with the environment [36:30] How ML Engineers should approach data quality problem [42:42] Quantum computing [46:39] Result of confident learning [52:31] Utility for small data sets [53:53] Cleanlab's huge success stories [56:13] Rapid fire questions [58:58] Cloudy and mystified space [1:03:46] Cleanlab is hiring! [1:05:06] Wrap up

Visit the podcast's native language site