xuan-luo
/

DiffSkip-Llama-3-8B-Instruct

Text Generation

Model card Files Files and versions

xuan-luo commited on May 30

Commit

f63473a

·

verified ·

1 Parent(s): 9c0e85c

Update README.md

Files changed (1) hide show

README.md +0 -2

README.md CHANGED Viewed

@@ -15,8 +15,6 @@ library_name: transformers
 The implementation of the paper Differential Layer Skipping in Large Language Models.
-## Model Details
 ### Model Description
 DiffSkip-Llama-3-8B-Instruct is an enhanced version of the Llama-3-8B-Instruct model, incorporating the Differential Layer Skipping (DiffSkip) method to enable dynamic Feed-Forward Network (FFN) skipping during text generation. This approach leverages the self-attention input-output difference as a routing signal, allowing tokens to bypass FFN blocks based on computational needs.

 The implementation of the paper Differential Layer Skipping in Large Language Models.
 ### Model Description
 DiffSkip-Llama-3-8B-Instruct is an enhanced version of the Llama-3-8B-Instruct model, incorporating the Differential Layer Skipping (DiffSkip) method to enable dynamic Feed-Forward Network (FFN) skipping during text generation. This approach leverages the self-attention input-output difference as a routing signal, allowing tokens to bypass FFN blocks based on computational needs.