Blog posts

2026

Rethinking Muon Beyond Pretraining: Spectral Failures and High Pass Remedies for VLA and RLVR

Published:

Muon orthogonalizes the momentum matrix and pushes every singular value to one. This works beautifully for LLM pretraining, which is essentially next token classification on text via supervised learning. But what happens when we move along three orthogonal axes: a different modality, a different loss, or a different learning paradigm? Pion is a drop in replacement for Muon’s Newton Schulz iteration that fixes the spectral mismatch we observe along all three axes.