Debugging and removing redundant lines

#8
by unuu - opened

I think these changes are needed and fixes a bug but I don't know what testing process was done.
Please see this issue and if deemed valid, I will move the explanations here.

Thanks for your advice! You can pass 'use_cache=False' or downgrade the transformers library to version 4.53.0 to avoid this bug. In our model, we use our own cache but not hf's cache.

generated_ids = model.generate(
    inputs=inputs['input_ids'].to(device),
    **inputs,
    max_new_tokens=128,
    use_cache=False
    )

@SFLY5 thank you for your response. However, using 'use_cache=False' would make inference very slow (I also confirmed my proposed changes doesn't solve the issue). I just tested downgrading transformers version but it does not seem to truly enable the caching logic (please point out if that is unexpected behavior).

Hey @unuu ,
we are working on integrating ernie vl to transformers natively over here

For now, like pointed out - a lower version (not sure which one exactly) is required or you modify your installed files like here (disabling the transformers cache creation which is currently incompatible)

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment