JonasGeiping commited on
Commit
6a53243
·
verified ·
1 Parent(s): 972cea6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -134,7 +134,7 @@ model.to(device)
134
 
135
  model(input_ids, num_steps=32)
136
  ```
137
- The model has about 1.5B parameters in non-recurrent code, 0.5B parameters in the embedding, and 1.5B recurrent parameters, so, as a guideline,
138
  the number of materialized parameters is `num_steps * 1.5B + 2B`. Playing with this parameter is what makes this model interesting, and different from fixed-depth transformers!
139
  The model is trained to accept an arbitrary number of steps. However, using fewer than 4 steps will result in very coarse answers. If given enough context to reason about, benchmarks show the model improving up to around `num_steps=64`. Beyond that, more steps generally do not hurt, but we see no further improvements.
140
 
 
134
 
135
  model(input_ids, num_steps=32)
136
  ```
137
+ The model has about 1.5B parameters in its non-recurrent layers (prelude+coda), 0.5B parameters in the embedding, and 1.5B recurrent parameters, so, as a guideline,
138
  the number of materialized parameters is `num_steps * 1.5B + 2B`. Playing with this parameter is what makes this model interesting, and different from fixed-depth transformers!
139
  The model is trained to accept an arbitrary number of steps. However, using fewer than 4 steps will result in very coarse answers. If given enough context to reason about, benchmarks show the model improving up to around `num_steps=64`. Beyond that, more steps generally do not hurt, but we see no further improvements.
140