-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 70 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 199 -
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Paper • 2303.03915 • Published • 7 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242
Collections
Discover the best community collections!
Collections including paper arxiv:2502.02737
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 260 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 90 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 16 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 14
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 70 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 199 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper • 2504.01833 • Published • 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 231 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 122 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242 -
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Paper • 2412.03304 • Published • 21
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242 -
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper • 2502.03373 • Published • 59 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 123 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 70 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 199 -
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Paper • 2303.03915 • Published • 7 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 260 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 90 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 16 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 14
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 70 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 199 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper • 2504.01833 • Published • 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 231 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 122 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242 -
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Paper • 2412.03304 • Published • 21
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 242 -
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper • 2502.03373 • Published • 59 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 123 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123