From Bytes to Ideas: Language Modeling with Autoregressive U-Nets Paper • 2506.14761 • Published Jun 17 • 17
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 80