tomg-group-umd
/

huginn-0125

Text Generation

Model card Files Files and versions

JonasGeiping commited on Jun 12

Commit

6a53243

·

verified ·

1 Parent(s): 972cea6

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -134,7 +134,7 @@ model.to(device)
 model(input_ids, num_steps=32)
 ```
-The model has about 1.5B parameters in non-recurrent code, 0.5B parameters in the embedding, and 1.5B recurrent parameters, so, as a guideline,
 the number of materialized parameters is `num_steps * 1.5B + 2B`. Playing with this parameter is what makes this model interesting, and different from fixed-depth transformers!
 The model is trained to accept an arbitrary number of steps. However, using fewer than 4 steps will result in very coarse answers. If given enough context to reason about, benchmarks show the model improving up to around `num_steps=64`. Beyond that, more steps generally do not hurt, but we see no further improvements.

 model(input_ids, num_steps=32)
 ```
+The model has about 1.5B parameters in its non-recurrent layers (prelude+coda), 0.5B parameters in the embedding, and 1.5B recurrent parameters, so, as a guideline,
 the number of materialized parameters is `num_steps * 1.5B + 2B`. Playing with this parameter is what makes this model interesting, and different from fixed-depth transformers!
 The model is trained to accept an arbitrary number of steps. However, using fewer than 4 steps will result in very coarse answers. If given enough context to reason about, benchmarks show the model improving up to around `num_steps=64`. Beyond that, more steps generally do not hurt, but we see no further improvements.