How Pinterest Powers Image Similarity // Shaji Chennan Kunnummel // System Design Reviews #1
MLOps.community - A podcast by Demetrios Brinkmann
Categories:
In this Machine Learning System Design Review, Shaji Chennan Kunnummel walks us through the system design for Pinterest’s near-real-time architecture for detecting similar images. We discuss their usage of Kafka, Flink, rocksdb, and much more. Starting with the high-level requirements for the system, we discussed Pinterest’s focus on debuggability and an easy transition from their batch processing system to stream processing. We then touch on the different system interfaces and components involved such as Manas—Pinterest’s custom search engine—and how it all ends up in their custom graph database, downstream Kafka streams, and to Pinterest’s feature store—Galaxy. With Shaji’s expert knowledge of the system, we were able to do a deep dive into the system’s architecture and some of its components. // Experiences 15+ years of experience in software product development. Led multiple teams in a highly agile, collaborative, and cross-functional environment. Designed and implemented highly scalable, fault-tolerant, and optimized distributed systems that scale to handle millions of requests per second. In-depth knowledge of Object-oriented programming and design patterns in C++/Java/Python/Golang. Designed and built complex data pipelines and microservices to train and serve machine learning models. Built analytics pipelines for processing and mining high-volume data set using Hadoop and Map-Reduce frameworks. In-depth knowledge of distributed storage, consistency models, NoSQL data modeling, Cloud computing environment (AWS and Google Cloud).