Best AI papers explained

A podcast by Enoch H. Kang

550 Episodes

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
Published: 9/05/2025
Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data
Published: 9/05/2025
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Published: 9/05/2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Published: 9/05/2025
Prediction-Powered Statistical Inference Framework
Published: 9/05/2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Published: 9/05/2025
RM-R1: Reward Modeling as Reasoning
Published: 9/05/2025
Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy
Published: 8/05/2025
Decoding Claude Code: Terminal Agent for Developers
Published: 7/05/2025
Emergent Strategic AI Equilibrium from Pre-trained Reasoning
Published: 7/05/2025
Benefiting from Proprietary Data with Siloed Training
Published: 6/05/2025
Advantage Alignment Algorithms
Published: 6/05/2025
Asymptotic Safety Guarantees Based On Scalable Oversight
Published: 6/05/2025
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Published: 6/05/2025
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Published: 6/05/2025
Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts
Published: 6/05/2025
You Are What You Eat - AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
Published: 6/05/2025
Interplay of LLMs in Information Retrieval Evaluation
Published: 3/05/2025
Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence
Published: 3/05/2025
Toward Efficient Exploration by Large Language Model Agents
Published: 3/05/2025

20 / 28

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

Visit the podcast's native language site

550 Episodes

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs

Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data

Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

Accelerating Unbiased LLM Evaluation via Synthetic Feedback

Prediction-Powered Statistical Inference Framework

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

RM-R1: Reward Modeling as Reasoning

Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy

Decoding Claude Code: Terminal Agent for Developers

Emergent Strategic AI Equilibrium from Pre-trained Reasoning

Benefiting from Proprietary Data with Siloed Training

Advantage Alignment Algorithms

Asymptotic Safety Guarantees Based On Scalable Oversight

What Makes a Reward Model a Good Teacher? An Optimization Perspective

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts

You Are What You Eat - AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

Interplay of LLMs in Information Retrieval Evaluation

Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence

Toward Efficient Exploration by Large Language Model Agents