Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This research presents a principled framework to Bayes-optimaly retrain** when input data contains noisy labels. The central contribution is the derivation of the **Bayes optimal aggregator function**, which determines the mathematically ideal method for combining a model’s current predictions with the initial, noisy labels to minimize prediction error. Using the **Approximate Message Passing (AMP)** framework, the authors analyze this iterative procedure for two ground truth settings: the **Gaussian mixture model (GMM)** and the **generalized linear model (GLM)**. This analysis provides a precise state evolution recursion that characterizes the asymptotic behavior of the estimator across multiple retraining rounds. Furthermore, a practical variant of the optimal function is developed for real-world application in linear probing, where it is shown to significantly outperform existing retraining baselines, particularly in **high label noise regimes**.

Visit the podcast's native language site