keras
/

siglip_base_patch16_512

KerasHub

Model card Files Files and versions Community

Divyasreepat commited on Mar 24

Commit

c1ed33f

verified ·

1 Parent(s): d338bd6

Update README.md with new model card content

Browse files

Files changed (1) hide show

README.md +90 -8

README.md CHANGED Viewed

@@ -1,11 +1,93 @@
 ---
 library_name: keras-hub
 ---
-This is a [`SigLIP` model](https://keras.io/api/keras_hub/models/sig_lip) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
-Model config:
-* **name:** sig_lip_backbone
-* **trainable:** True
-* **vision_encoder:** {'module': 'keras_hub.src.models.siglip.siglip_vision_encoder', 'class_name': 'SigLIPVisionEncoder', 'config': {'name': 'sig_lip_vision_encoder', 'trainable': True, 'patch_size': 16, 'hidden_dim': 768, 'num_layers': 12, 'num_heads': 12, 'intermediate_dim': 3072, 'intermediate_activation': 'gelu_approximate', 'layer_norm_epsilon': 1e-06, 'image_shape': [512, 512, 3]}, 'registered_name': 'keras_hub>SigLIPVisionEncoder'}
-* **text_encoder:** {'module': 'keras_hub.src.models.siglip.siglip_text_encoder', 'class_name': 'SigLIPTextEncoder', 'config': {'name': 'sig_lip_text_encoder', 'trainable': True, 'vocabulary_size': 32000, 'embedding_dim': 768, 'hidden_dim': 768, 'num_layers': 12, 'num_heads': 12, 'intermediate_dim': 3072, 'intermediate_activation': 'gelu_approximate', 'layer_norm_epsilon': 1e-06, 'max_sequence_length': 64}, 'registered_name': 'keras_hub>SigLIPTextEncoder'}
-This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.

 ---
 library_name: keras-hub
 ---
+### Model Overview
+SigLIP model pre-trained on WebLi at resolution 224x224. It was introduced in the paper [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343) by Zhai et al. and first released in this [repository](https://github.com/google-research/big_vision).
+SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
+A TLDR of SigLIP by one of the authors can be found [here](https://twitter.com/giffmana/status/1692641733459267713).
+Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).
+## Links
+* [SigLIP Quickstart Notebook](https://www.kaggle.com/code/laxmareddypatlolla/siglip-quickstart-notebook-with-hub)
+* [SigLIP API Documentation](coming soon)
+* [SigLIP Model Card](https://arxiv.org/abs/2303.15343)
+* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
+* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
+## Installation
+Keras and KerasHub can be installed with:
+```
+pip install -U -q keras-hub
+pip install -U -q keras
+```
+Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
+## Presets
+The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
+| Preset name                            | Parameters | Description                                                                                                  |
+|---------------------------------------|------------|--------------------------------------------------------------------------------------------------------------|
+|   |   |  |
+## Example Usage
+```Python
+import keras
+import numpy as np
+import matplotlib.pyplot as plt
+from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
+from keras_hub.layers import SigLIPImageConverter
+# instantiate the model and preprocessing tools
+siglip = SigLIPBackbone.from_preset("siglip_base_patch16_512")
+tokenizer = SigLIPTokenizer.from_preset("siglip_base_patch16_512",
+sequence_length=64)
+image_converter = SigLIPImageConverter.from_preset("siglip_base_patch16_512")
+# obtain tokens for some input text
+tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])
+# preprocess image and text
+image = keras.utils.load_img("cat.jpg")
+image = image_converter(np.array([image]).astype(float))
+# query the model for similarities
+siglip({
+     "images": image,
+     "token_ids": tokens,
+})
+```
+## Example Usage with Hugging Face URI
+```Python
+import keras
+import numpy as np
+import matplotlib.pyplot as plt
+from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
+from keras_hub.layers import SigLIPImageConverter
+# instantiate the model and preprocessing tools
+siglip = SigLIPBackbone.from_preset("hf://keras/siglip_base_patch16_512")
+tokenizer = SigLIPTokenizer.from_preset("hf://keras/siglip_base_patch16_512",
+sequence_length=64)
+image_converter = SigLIPImageConverter.from_preset("hf://keras/siglip_base_patch16_512")
+# obtain tokens for some input text
+tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])
+# preprocess image and text
+image = keras.utils.load_img("cat.jpg")
+image = image_converter(np.array([image]).astype(float))
+# query the model for similarities
+siglip({
+     "images": image,
+     "token_ids": tokens,
+})
+```