Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling Paper • 2509.00605 • Published 14 days ago • 41
Beyond Transcription: Mechanistic Interpretability in ASR Paper • 2508.15882 • Published 24 days ago • 85
view article Article Advanced Flux Dreambooth LoRA Training with 🧨 diffusers By linoyts and 1 other • Oct 21, 2024 • 42
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 666
Running 3.19k 3.19k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Running 1.07k 1.07k FineWeb: decanting the web for the finest text data at scale 🍷 Generate high-quality web text data for LLM training
view article Article cocogold: training Marigold for text-grounded segmentation By pcuenq • Jul 8 • 31
view article Article Train 400x faster Static Embedding Models with Sentence Transformers By tomaarsen • Jan 15 • 210