Commit
·
e6780b0
1
Parent(s):
314ff92
Update README.md
Browse files
README.md
CHANGED
|
@@ -68,4 +68,24 @@ The original dataset contains over 50 million rows of completions (chatbot respo
|
|
| 68 |
</figure>
|
| 69 |
|
| 70 |
### Training procedure
|
| 71 |
-
The `gpt2_large_retry_and_continue_12m_reward_model` was trained using a [gpt2-large](https://huggingface.co/gpt2-large) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 375,000 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
</figure>
|
| 69 |
|
| 70 |
### Training procedure
|
| 71 |
+
The `gpt2_large_retry_and_continue_12m_reward_model` was trained using a [gpt2-large](https://huggingface.co/gpt2-large) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 375,000 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs.
|
| 72 |
+
|
| 73 |
+
### BibTeX entry
|
| 74 |
+
To cite this model:
|
| 75 |
+
```bibtex
|
| 76 |
+
@misc{
|
| 77 |
+
author = {Chai Research, Irvine, Boubert, Raina, Liusie, Mudupalli, Korshuk, Liu, Cremer, Assassi, C. Beauchamp, Lu, Rialan, W. Beauchamp},
|
| 78 |
+
title = {{Rewarding chatbots for real-world engagement with millions of users}},
|
| 79 |
+
howpublished = {\url{https://arxiv.org/abs/2303.06135}},
|
| 80 |
+
year = 2023,
|
| 81 |
+
month = Mar
|
| 82 |
+
}
|
| 83 |
+
```
|
| 84 |
+
If you use this model, we would love to hear about it! Reach out on [correspondence email](mailto:[email protected]?subject=Chai%20Research%20Paper%20Enquiry) or Discord.
|
| 85 |
+
|
| 86 |
+
### Acknowledgements
|
| 87 |
+
This project would not have been possible without the support from members of [Seamless Capital](https://www.seamless-capital.com/)
|
| 88 |
+
|
| 89 |
+
We thank the following authors from the [Machine Intelligence Laboratory](https://mi.eng.cam.ac.uk/) for their collaboration:
|
| 90 |
+
- [Vysas Raina](https://www.linkedin.com/in/vyas-raina-71b226152/)
|
| 91 |
+
- [Adian Liusie](https://www.linkedin.com/in/adian-liusie-00b60511a/)
|