Future multimodal support planned? EmbeddingGemma has vision tokens in tokenizer
#6
by
AmanPriyanshu
- opened
EmbeddingGemma's tokenizer has <start_of_image>, <end_of_image>, and <image_soft_token>
even though it's text-only.
Are these placeholders for future multimodal versions? Would be awesome to know if there'll be follow ups in the EmbeddingGemma family
Hi @AmanPriyanshu ,
Thanks for reaching out to us, welcome to Google's Gemma family of open-source models. This is an inherent characteristic of the model family. All Gemma 3 models utilize a unified tokenizer that incorporates vision tokens as well, even when the model itself cannot make use of all the tokens in the vocabulary.
To know more about embeddinggemma
please visit the following page.
Thanks.
Excited for it!
Thank you
AmanPriyanshu
changed discussion status to
closed