--- license: mit pipeline_tag: image-feature-extraction library_name: transformers --- ## RICE-ViT-L Model Card [[Github]](https://github.com/deepglint/unicom) [[Paper]](https://arxiv.org/abs/2507.20025) ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6478679d7b370854241b2ad8/Yy09pOusaZ47LofJ27xox.jpeg) ## Installation ```shell pip install torch transformers git clone https://github.com/deepglint/unicom cd unicom/mlcd ``` ## Usage ```python from vit_rope2d_hf import MLCDVisionModel from transformers import CLIPImageProcessor from PIL import Image import requests import torch # Load model and processor model = MLCDVisionModel.from_pretrained("DeepGlint-AI/rice-vit-large-patch14-560") processor = CLIPImageProcessor.from_pretrained("DeepGlint-AI/rice-vit-large-patch14-560") # Process single image url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) inputs = processor(images=image, return_tensors="pt") # Get visual features with torch.no_grad(): outputs = model(**inputs) features = outputs.last_hidden_state print(f"Extracted features shape: {features.shape}") ```