Fixing Your ML Data Blind Spots // Yash Sheth // MLOps Coffee Sessions #102

MLOps.community - A podcast by Demetrios Brinkmann

Categories:

MLOps Coffee Sessions #102 with Yash Sheth, Fixing Your ML Data Blindspots co-hosted by Adam Sroka.   // Abstract Improving your dataset quality is absolutely critical for effective ML. Finding errors in your datasets is generally a slow, iterative, and painstaking process.     Data scientists should be proactively fixing their model’s blindspots by improving their training data. In this talk, Yash discusses how Galileo helps data scientists identify, fix, and track data across the entire ML workflow.   // Bio Co-founder and VP of Engineering. Prior to starting Galileo, Yash spent the last decade working on Automatic Speech Recognition (ASR) at Google, leading their core speech recognition platform team, that powers speech-to-text across 20+ products at Google in over 80 languages along with thousands of businesses through their Cloud Speech API.   // MLOps Jobs board   https://mlops.pallet.xyz/jobs MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: https://www.rungalileo.io/ Trade-Off: Why Some Things Catch On, and Others book by Kevin Maney: https://www.amazon.com/Trade-Off-Some-Things-Catch-Others/dp/0385525958 --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Adam on LinkedIn: https://www.linkedin.com/in/aesroka/ Connect with Yash on LinkedIn: https://www.linkedin.com/in/yash-sheth-72111216/ Timestamps: [00:00] Introduction to Yash Sheth [02:53] Takeaways [04:35] Why unstructured data? [06:59] Fitting in the workflow [10:56] Digging into the different pains [18:23] Vision around the democratization of machine learning [24:31] Unstructured data problem [25:49] Galileo handling unified tools [27:21] Calculus for ML [28:45] Gatekeep [29:49] Synthetic data in the unstructured data world of Galileo [33:10] Tips for data scientists that have unstructured data but with a small data set [35:00] Benefits of users from Galileo [37:15] Business case for dummies [42:36] War stories [44:49] Rapid fire questions [50:55] Wrap up

Visit the podcast's native language site