|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- lvwerra/stack-exchange-paired |
|
language: |
|
- en |
|
library_name: adapter-transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- reward_model |
|
--- |
|
## Reward Model GPT2 |
|
|
|
fine-tuned [GPT2](https://huggingface.co/gpt2) to a reward model. |
|
|
|
The model is designed to generate human-like responses to questions in [Stack Exchange](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) domains of programming, mathematics, physics, and more. |
|
|
|
For more info check out the blog post and github [example](https://github.com/huggingface/trl/tree/main/examples/research_projects/stack_llama_2/scripts). |
|
|
|
|