readme: include number of training epochs
Browse files
README.md
CHANGED
@@ -22,12 +22,13 @@ Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languag
|
|
22 |
|
23 |
More details can be found in [our GitHub repository](https://github.com/stefan-it/hmByT5).
|
24 |
|
25 |
-
|
26 |
# Pretraining
|
27 |
|
28 |
We use the official JAX/FLAX example in Hugging Face Transformers to pretrain a ByT5 model on a single v3-8 TPU.
|
29 |
Details about the training can be found [here](https://github.com/stefan-it/hmByT5/tree/main/hmbyt5-flax).
|
30 |
|
|
|
|
|
31 |
# Evaluation on Downstream Tasks (NER)
|
32 |
|
33 |
We evaluated the hmByT5 model on downstream tasks:
|
|
|
22 |
|
23 |
More details can be found in [our GitHub repository](https://github.com/stefan-it/hmByT5).
|
24 |
|
|
|
25 |
# Pretraining
|
26 |
|
27 |
We use the official JAX/FLAX example in Hugging Face Transformers to pretrain a ByT5 model on a single v3-8 TPU.
|
28 |
Details about the training can be found [here](https://github.com/stefan-it/hmByT5/tree/main/hmbyt5-flax).
|
29 |
|
30 |
+
The model was trained for 0.5 epoch.
|
31 |
+
|
32 |
# Evaluation on Downstream Tasks (NER)
|
33 |
|
34 |
We evaluated the hmByT5 model on downstream tasks:
|