Chongyu Fan

Chongyu Fan

Ph.D student @ Michigan State University

Blog

May 15, 2026

Rethinking Muon Beyond Pretraining: Spectral Failures and High Pass Remedies for VLA and RLVR

Muon orthogonalizes the momentum matrix and pushes every singular value to one. This works beautifully for LLM pretraining, which is essentially next token classification on text via supervised learning. But what happens when we move along three orthogonal axes: a different modality, a different loss,...

#optimizer#vla#rlvr