Hynek Kydlicek's picture

Hynek Kydlicek

hynky

·

AI & ML interests

Data-processing

Recent Activity

updated a dataset 14 days ago

hynky/finepdfs_50BT-dclm_30BT-fineweb_edu_20BT

published a dataset 14 days ago

hynky/finepdfs_50BT-dclm_30BT-fineweb_edu_20BT

liked a Space 14 days ago

lm-provers/qed-nano-blogpost

View all activity

Organizations

liked a Space 14 days ago

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

Who needs 1T parameters? Olympiad proofs with a 4B model

liked a dataset about 2 months ago

HuggingFaceFW/finetranslations

Viewer • Updated Jan 9 • 3.33B • 32.3k • 272

liked a Space about 2 months ago

FinePDFs: Liberating 3T of the finest tokens from PDFs

liked a Space 3 months ago

Evaluation Guidebook

Explore LLM benchmark trends over time

liked a dataset 6 months ago

HuggingFaceFW/finepdfs

Viewer • Updated Jan 9 • 476M • 27.8k • 817

liked a Space 6 months ago

Bringing paper to life: A modern template for scientific writing

Download a ready-to-use scientific paper template

liked a Space about 1 year ago

The Ultra-Scale Playbook

The ultimate guide to training LLM on large GPU Clusters

liked 2 datasets about 1 year ago

data-is-better-together/fineweb-c

Viewer • Updated Jul 8, 2025 • 88.7k • 512 • 58

HuggingFaceFW/fineweb-2

Viewer • Updated Oct 27, 2025 • 4.48B • 79k • 760

liked a Space about 1 year ago

Number Tokenization Blog

Explore how tokenization affects arithmetic in LLMs

liked a dataset about 1 year ago

CohereLabs/Global-MMLU

Viewer • Updated Aug 14, 2025 • 602k • 9.28k • 150

liked a dataset over 1 year ago

ClusterlabAi/InstAr-500k

Viewer • Updated Jul 30, 2024 • 481k • 69 • 15

liked a Space over 1 year ago

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

Evaluate multilingual models using FineTasks

liked a dataset over 1 year ago

LLM360/TxT360

Updated May 26, 2025 • 15.6k • 248

liked 2 Spaces over 1 year ago

Hub LFS Analysis

An analysis of LFS files on the Hub.

TxT360: Trillion Extracted Text

Explore and download the TxT360 LLM pre‑training dataset

liked a dataset over 1 year ago

Cleanlab/bad_data_gsm8k_svamp.csv

Viewer • Updated Apr 25, 2024 • 34 • 55 • 3

liked a Space over 1 year ago

Datasets Metrics Explorer

Launch an interactive demo interface

liked 2 datasets over 1 year ago

ThaiSyntheticQA/ThaiQA-v1

Viewer • Updated Jul 24, 2024 • 12.7k • 16 • 4

coastalcph/fairlex

Updated Jul 27, 2023 • 180 • 9