Open Standards Make MLOps Easier and Silos Harder // Cody Peterson // #234
MLOps.community - A podcast by Demetrios Brinkmann
Categories:
Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/ Cody Peterson has a diverse work experience in the field of product management and engineering. Cody is currently working as a Technical Product Manager at Voltron Data, starting from May 2023. Previously, they worked as a Product Manager at dbt Labs from July 2022 to March 2023. MLOps podcast #234 with Cody Peterson, Senior Technical Product Manager at Voltron Data | Ibis project // Open Standards Make MLOps Easier and Silos Harder. Huge thank you to Weights & Biases for sponsoring this episode. WandB Free Courses -http://wandb.me/courses_mlops // Abstract MLOps is fundamentally a discipline of people working together on a system with data and machine learning models. These systems are already built on open standards we may not notice -- Linux, git, scikit-learn, etc. -- but are increasingly hitting walls with respect to the size and velocity of data. Pandas, for instance, is the tool of choice for many Python data scientists -- but its scalability is a known issue. Many tools make the assumption of data that fits in memory, but most organizations have data that will never fit in a laptop. What approaches can we take? One emerging approach with the Ibis project (created by the creator of pandas, Wes McKinney) is to leverage existing "big" data systems to do the heavy lifting on a lightweight Python data frame interface. Alongside other open source standards like Apache Arrow, this can allow data systems to communicate with each other and users of these systems to learn a single data frame API that works across any of them. Open standards like Apache Arrow, Ibis, and more in the MLOps tech stack enable freedom for composable data systems, where components can be swapped out allowing engineers to use the right tool for the job to be done. It also helps avoid vendor lock-in and keep costs low. // Bio Cody is a Senior Technical Product Manager at Voltron Data, a next-generation data systems builder that recently launched an accelerator-native GPU query engine for petabyte-scale ETL called Theseus. While Theseus is proprietary, Voltron Data takes an open periphery approach -- it is built on and interfaces through open standards like Apache Arrow, Substrait, and Ibis. Cody focuses on the Ibis project, a portable Python dataframe library that aims to be the standard Python interface for any data system, including Theseus and over 20 other backends. Prior to Voltron Data, Cody was a product manager at dbt Labs focusing on the open source dbt Core and launching Python models (note: models is a confusing term here). Later, he led the Cloud Runtime team and drastically improved the efficiency of engineering execution and product outcomes. Cody started his carrer as a Product Manager at Microsoft working on Azure ML. He spent about 2 years on the dedicated MLOps product team, and 2 more years on various teams across the ML lifecycel including data, training, and inferencing. He is now passionate about using open source standards to break down the silos and challenges facing real world engineering teams, where engineering increasingly involves data and machine learning. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Ibis Project: https://ibis-project.org Apache Arrow and the “10 Things I Hate About pandas”: https://wesmckinney.com/blog/apache-arrow-pandas-internals/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Cody on LinkedIn: https://linkedin.com/in/codydkdc