license: mit | |
pipeline_tag: image-feature-extraction | |
library_name: transformers | |
## RICE-ViT-L Model Card | |
[[Github]](https://github.com/deepglint/unicom) [[Paper]](https://arxiv.org/abs/2507.20025) | |
 | |
## Installation | |
```shell | |
pip install torch transformers | |
git clone https://github.com/deepglint/unicom | |
cd unicom/mlcd | |
``` | |
## Usage | |
```python | |
from vit_rope2d_hf import MLCDVisionModel | |
from transformers import CLIPImageProcessor | |
from PIL import Image | |
import requests | |
import torch | |
# Load model and processor | |
model = MLCDVisionModel.from_pretrained("DeepGlint-AI/rice-vit-large-patch14-560") | |
processor = CLIPImageProcessor.from_pretrained("DeepGlint-AI/rice-vit-large-patch14-560") | |
# Process single image | |
url = "http://images.cocodataset.org/val2017/000000039769.jpg" | |
image = Image.open(requests.get(url, stream=True).raw) | |
inputs = processor(images=image, return_tensors="pt") | |
# Get visual features | |
with torch.no_grad(): | |
outputs = model(**inputs) | |
features = outputs.last_hidden_state | |
print(f"Extracted features shape: {features.shape}") | |
``` |