license: apache-2.0 | |
datasets: | |
- trl-lib/tldr | |
base_model: | |
- google/gemma-3-270m-it | |
pipeline_tag: summarization | |
This is a fine-tune of gemma, trained using GRPO with a reward function to incentivise ~40 char outputs beginning with "I" such that it outputs TL;DR summaries for reddit comments. |