Future multimodal support planned? EmbeddingGemma has vision tokens in tokenizer

by AmanPriyanshu - opened 5 days ago

Discussion

AmanPriyanshu

5 days ago

•

edited 5 days ago

EmbeddingGemma's tokenizer has <start_of_image>, <end_of_image>, and <image_soft_token> even though it's text-only.

Are these placeholders for future multimodal versions? Would be awesome to know if there'll be follow ups in the EmbeddingGemma family

BalakrishnaCh

Google org 2 days ago

Hi @AmanPriyanshu ,

Thanks for reaching out to us, welcome to Google's Gemma family of open-source models. This is an inherent characteristic of the model family. All Gemma 3 models utilize a unified tokenizer that incorporates vision tokens as well, even when the model itself cannot make use of all the tokens in the vocabulary.

To know more about embeddinggemma please visit the following page.

Thanks.

AmanPriyanshu

2 days ago

Excited for it!
Thank you

AmanPriyanshu changed discussion status to closed 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment