Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs
BEEspoke Data
community
AI & ML interests
'an LLM is only as good as the dataset it was trained on' - Sun Tzu
Recent Activity
View all activity
Organization Card
🐝📊💁
🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation • 0.1B • Updated • 2.74k • 29 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation • 0.1B • Updated • 1.83k • 7 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation • 0.2B • Updated • 3.28k • 13 -
BEE-spoke-data/verysmol_llama-v11-KIx2
Text Generation • 0.1B • Updated • 1.8k • 4
Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs
🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation • 0.1B • Updated • 2.74k • 29 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation • 0.1B • Updated • 1.83k • 7 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation • 0.2B • Updated • 3.28k • 13 -
BEE-spoke-data/verysmol_llama-v11-KIx2
Text Generation • 0.1B • Updated • 1.8k • 4
models
56

BEE-spoke-data/tiny-random-MPNetForMaskedLM
Fill-Mask
•
0.0B
•
Updated
•
8

BEE-spoke-data/wordpiece-tokenizer-32k-en_code-msp
Updated

BEE-spoke-data/wordpiece-tokenizer-32k-en_code-orig
Updated

BEE-spoke-data/bpe-tokenizer-32k-smolNeoX
Updated

BEE-spoke-data/pegasus-x-base-synthsumm_open-16k
Summarization
•
0.3B
•
Updated
•
20
•
2

BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan
0.7B
•
Updated
•
3

BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2
Text Generation
•
0.7B
•
Updated
•
3

BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e
0.9B
•
Updated
•
61

BEE-spoke-data/tFINE-900m-instruct-orpo
0.9B
•
Updated
•
61

BEE-spoke-data/smol_llama-220M-openhermes
Text Generation
•
0.2B
•
Updated
•
1.8k
•
5
datasets
82
BEE-spoke-data/govdocs1-pdf-source
Viewer
•
Updated
•
235k
•
5.84k
•
2
BEE-spoke-data/govdocs1-by-extension
Viewer
•
Updated
•
733k
•
195
•
2
BEE-spoke-data/SurvivorLib-Nanonets-OCR-s
Viewer
•
Updated
•
11.7k
•
138
•
3
BEE-spoke-data/SurvivorLib-rolmOCR
Viewer
•
Updated
•
13.3k
•
42
•
2
BEE-spoke-data/napierone-pdf-nanonets-s
Viewer
•
Updated
•
9.96k
•
81
BEE-spoke-data/napierone-pdf-olmOCR
Viewer
•
Updated
•
19k
•
44
BEE-spoke-data/LONGCOT-merged-1M
Viewer
•
Updated
•
1.7M
•
16
•
1
BEE-spoke-data/cosmopedia-v2-mincols
Viewer
•
Updated
•
39.1M
•
150
•
1
BEE-spoke-data/reddit-title-body-hf
Viewer
•
Updated
•
251M
•
57
•
4
BEE-spoke-data/bigpatent-all
Viewer
•
Updated
•
2.43M
•
348