Feature Stores at Shopify and Skyscanner // Matt Delacour and Mike Moran // Reading Group #4

MLOps.community - A podcast by Demetrios Brinkmann

Categories:

MLOps Reading Group meeting on February 11, 2022   Reading Group Session about Feature Stores with Matt Delacour and Mike Moran   --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Connect with us on LinkedIn: https://www.linkedin.com/company/mlopscommunity/ Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/ Timestamps: [00:05] Matt's intro [00:26] Mike's intro [01:09] Matt’s talk: Feature store system at Shopify [01:45] What is Shopify? [02:05] Shopify Use Case [02:38] Choosing a solution [03:19] Managed service vs In-house vs Open-source (Feast) [06:01] Why did we choose Feast? [11:25] Implementation Strategy (multi-repo vs mono-repo approaches) [13:01] Mono-repo approach breakdown [14:30] Internal SDK [17:01] Q&A: Does feast satisfy scalability for online inference of Shopify latency requirements? [19:05] Q&A: Do you rely on Feast to serialize data to the online store? [20:13] Q&A: Is your mono-repo library a subset of Feast? [21:18] Q&A: Did you consider using git submodules for a multi-repo? [23:02] Q&A: Are you storing embeddings with Feast? [24:30] Q&A: Regarding the mono-repo, which modules are responsible for feature engineering? How do you guarantee that different feature engineering can be used across many DS? [27:58] Mike’s talk (Feature store at Skyscanner) [28:08] Kaleidoscope System [28:25] Background and context of the Feature store [29:30] Initial state of the feature store [30:13] How does the marketing team also leverage the feature store [31:04] Current state of the feature store (marketing & machine learning) [31:44] SDK approach of creating schemas with dataframes (easy access) [32:16] Reusability across teams among marketing and DS team [33:06] GDPR constraints [33:34] Data updates at the feature store [36:09] Q&A: When a DS updates a feature, how are you communicating that across teams? [38:25] Q&A: Are you applying different levels of feature engineering to increase the likelihood of a DS going back to a previous checkpoint of processing? [40:55] Q&A: In what languages are you implementing the feature store? [44:28] Q&A: Regarding performance-wise, how do you decide what code remains in Apache Spark vs SQL? [49:00] Wrap-up

Visit the podcast's native language site