The MobileCLIP2-S0 checkpoint seems to have issues
I've prepared the following sample code
This sample code demonstrates the same inference which can be run on MobileCLIP2-S0 or MobileCLIP2-S3 by commenting and uncommenting MODEL_TO_LOAD
and MODEL_CHECKPOINT
MobileClip2-S3 Results (Correct ✅)
Text: a photo of a dog
Most similar images:
dog_01.jpg 89.88%
dog_02.jpg 10.10%
dogs_01.jpg 0.02%
cat_02.jpg 0.00%
cat_01.jpg 0.00%
Text: a dog
Most similar images:
dog_02.jpg 71.79%
dog_01.jpg 28.20%
dogs_01.jpg 0.00%
cat_02.jpg 0.00%
cat_01.jpg 0.00%
Text: dogs
Most similar images:
dogs_01.jpg 99.75%
dog_02.jpg 0.14%
dog_01.jpg 0.12%
cats_02.jpg 0.00%
cats_01.jpg 0.00%
MobileClip2-S0 Results (Wrong ❌)
Text: a photo of a dog
Most similar images:
dog_01.jpg 47.39%
cat_01.jpg 23.71%
people_01.jpg 11.60%
cat_02.jpg 10.60%
cats_01.jpg 2.76%
Text: a dog
Most similar images:
dog_01.jpg 62.96%
cat_01.jpg 17.35%
cat_02.jpg 9.95%
people_01.jpg 5.25%
cats_02.jpg 1.53%
Text: dogs
Most similar images:
cat_01.jpg 78.92%
cats_02.jpg 6.12%
dog_01.jpg 5.00%
person_01.jpg 4.43%
people_01.jpg 3.28%
Note that when running this notebook locally on a Mac, it is able to convert the Pytorch checkpoint into an image and text encoder ML-Packages which reflects the same issue (Works well with S3, Fails with S0)
Hi
@Norod78
,
Thanks for reporting this issue. Our S0/S2/B variants need a different preprocessing/normalization than our S3/S4/L-14 variants. The normalization for S0/S2/B variants is the same as our v1 variants which is mean=(0,0,0) and std=(1,1,1) while for our S3/S4/L-14 variants it is openai mean/std (default openclip normalization).
model, _, preprocess = open_clip.create_model_and_transforms('MobileCLIP2-S0', pretrained='/path/to/mobileclip2_s0.pt', image_mean=(0,0,0), image_std=(1,1,1))
@rwightman
has now integrated this into OpenCLIP and the correct preprocessing is loaded when one specifies pretrained="dfndr2b"
.
https://github.com/mlfoundations/open_clip/blob/13b01ec788c0c706a4d9ba66e301c8793aae0f0f/src/open_clip/pretrained.py#L629-L634