Document datasets with .pdf files that are usable with pixparse libraries and tools.
AI & ML interests
Document and User Interface Parsing, Understanding, Q&A.
Organization Card
Multi-modal document, image, and text datasets and models for document understanding, OCR, VQA tasks.
GitHub repos:
- Data Loading:
chug- https://github.com/huggingface/chug - Modelling:
pixparse- coming soon
models 0
None public yet
datasets 6
pixparse/pdfa-eng-wds
Viewer
• Updated
• 7.1k • 4.75k • 158
pixparse/idl-wds
Viewer
• Updated
• 3.41M • 4.61k • 193
pixparse/docvqa-wds
Updated
• 329 • 4
pixparse/docvqa-single-page-questions
Viewer
• Updated
• 50k • 692 • 10
pixparse/cc12m-wds
Viewer
• Updated
• 11M • 10.9k • 36
pixparse/cc3m-wds
Viewer
• Updated
• 2.93M • 11.2k • 45