Update README.md
Browse files
README.md
CHANGED
@@ -15,8 +15,6 @@ library_name: transformers
|
|
15 |
|
16 |
The implementation of the paper Differential Layer Skipping in Large Language Models.
|
17 |
|
18 |
-
## Model Details
|
19 |
-
|
20 |
### Model Description
|
21 |
|
22 |
DiffSkip-Llama-3-8B-Instruct is an enhanced version of the Llama-3-8B-Instruct model, incorporating the Differential Layer Skipping (DiffSkip) method to enable dynamic Feed-Forward Network (FFN) skipping during text generation. This approach leverages the self-attention input-output difference as a routing signal, allowing tokens to bypass FFN blocks based on computational needs.
|
|
|
15 |
|
16 |
The implementation of the paper Differential Layer Skipping in Large Language Models.
|
17 |
|
|
|
|
|
18 |
### Model Description
|
19 |
|
20 |
DiffSkip-Llama-3-8B-Instruct is an enhanced version of the Llama-3-8B-Instruct model, incorporating the Differential Layer Skipping (DiffSkip) method to enable dynamic Feed-Forward Network (FFN) skipping during text generation. This approach leverages the self-attention input-output difference as a routing signal, allowing tokens to bypass FFN blocks based on computational needs.
|