Update README.md
Browse files
README.md
CHANGED
|
@@ -134,7 +134,7 @@ model.to(device)
|
|
| 134 |
|
| 135 |
model(input_ids, num_steps=32)
|
| 136 |
```
|
| 137 |
-
The model has about 1.5B parameters in non-recurrent
|
| 138 |
the number of materialized parameters is `num_steps * 1.5B + 2B`. Playing with this parameter is what makes this model interesting, and different from fixed-depth transformers!
|
| 139 |
The model is trained to accept an arbitrary number of steps. However, using fewer than 4 steps will result in very coarse answers. If given enough context to reason about, benchmarks show the model improving up to around `num_steps=64`. Beyond that, more steps generally do not hurt, but we see no further improvements.
|
| 140 |
|
|
|
|
| 134 |
|
| 135 |
model(input_ids, num_steps=32)
|
| 136 |
```
|
| 137 |
+
The model has about 1.5B parameters in its non-recurrent layers (prelude+coda), 0.5B parameters in the embedding, and 1.5B recurrent parameters, so, as a guideline,
|
| 138 |
the number of materialized parameters is `num_steps * 1.5B + 2B`. Playing with this parameter is what makes this model interesting, and different from fixed-depth transformers!
|
| 139 |
The model is trained to accept an arbitrary number of steps. However, using fewer than 4 steps will result in very coarse answers. If given enough context to reason about, benchmarks show the model improving up to around `num_steps=64`. Beyond that, more steps generally do not hurt, but we see no further improvements.
|
| 140 |
|