qgyd2021's picture
Update README.md
b86d6be
|
raw
history blame
625 Bytes
---
license: apache-2.0
datasets:
- lvwerra/stack-exchange-paired
language:
- en
library_name: adapter-transformers
pipeline_tag: text-generation
tags:
- reward_model
---
## Reward Model GPT2
fine-tuned [GPT2](https://huggingface.co/gpt2) to a reward model.
The model is designed to generate human-like responses to questions in [Stack Exchange](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) domains of programming, mathematics, physics, and more.
For more info check out the blog post and github [example](https://github.com/huggingface/trl/tree/main/examples/research_projects/stack_llama_2/scripts).