Debugging and removing redundant lines
#8
by
unuu
- opened
I think these changes are needed and fixes a bug but I don't know what testing process was done.
Please see this issue and if deemed valid, I will move the explanations here.
Thanks for your advice! You can pass 'use_cache=False' or downgrade the transformers library to version 4.53.0 to avoid this bug. In our model, we use our own cache but not hf's cache.
generated_ids = model.generate(
inputs=inputs['input_ids'].to(device),
**inputs,
max_new_tokens=128,
use_cache=False
)
@SFLY5 thank you for your response. However, using 'use_cache=False' would make inference very slow (I also confirmed my proposed changes doesn't solve the issue). I just tested downgrading transformers version but it does not seem to truly enable the caching logic (please point out if that is unexpected behavior).