Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • 免费去水印

  • Log In
  • Sign Up
common-pile 's Collections
Common Pile v0.1
Common Pile v0.1 Raw Data
Common Pile v0.1 Filtered Data
Comma v0.1 Artifacts

Common Pile v0.1

updated Jun 6, 2025

All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text

Upvote
39

  • Common Pile v0.1 Raw Data

    Collection
    8TB of public domain and openly licensed text • 30 items • Updated Aug 14, 2025 • 21

  • Common Pile v0.1 Filtered Data

    Collection
    An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated Jun 6, 2025 • 20

  • Comma v0.1 Artifacts

    Collection
    A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 3 items • Updated Jun 6, 2025 • 4

  • The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

    Paper • 2506.05209 • Published Jun 5, 2025 • 59
Upvote
39
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets 免费Z-image图片生成 免费去水印 Vibevoice

🎉 Free Image Generator Now Available!

Totally Free + Zero Barriers + No Login Required