--- library_name: transformers license: cc-by-nc-4.0 inference: false --- # Web-SSL DINO ViT-2B: Light Filtered 2B MetaCLIP data, 224 Resolution A 2 billion parameter Vision Transformer (ViT) trained with DINOv2 self-supervised learning on lightly filtered web-scale image data without language supervision. Introduced in ["Scaling Language-Free Visual Representation Learning"](https://arxiv.org/abs/2504.01017) (Fan et al., 2025). ## Model Details - **Architecture**: ViT (2688 width, 24 depth, 21 heads) - **Parameters**: 2B - **Resolution**: 224×224 pixels - **Training**: Self-supervised Web-DINO on lightly filtered MetaCLIP data ## Model Descriptions Web-SSL DINO 2B is a 2 billion parameter Vision Transformer model trained using self-supervised learning on lightly filtered web images without language supervision. The "light2b" designation indicates training on a subset of images containing any textual content, retaining approximately 50.3% of the original MetaCLIP dataset. This filtering improves OCR & Chart understanding capabilities while maintaining strong performance across all vision tasks. This model demonstrates that pure visual learning, when scaled appropriately, can match or exceed the performance of language-supervised models like CLIP across various vision tasks. WebSSL Model Overview ## Usage ```python from transformers import AutoImageProcessor, Dinov2Model import torch from PIL import Image processor = AutoImageProcessor.from_pretrained('facebook/webssl-dino2b-light2b-224') model = Dinov2Model.from_pretrained('facebook/webssl-dino2b-light2b-224') # Process an image image = Image.open('path/to/image.jpg') inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) cls_features = outputs.last_hidden_state[:, 0] # CLS token features patch_features = outputs.last_hidden_state[:, 1:] # patch-wise token features ``` ## Citation ```bibtex @article{fan2025scaling, title={Scaling Language-Free Visual Representation Learning}, author={David Fan and Shengbang Tong and Jiachen Zhu and Koustuv Sinha and Zhuang Liu and Xinlei Chen and Michael Rabbat and Nicolas Ballas and Yann LeCun and Amir Bar and Saining Xie}, year={2025}, eprint={2504.01017}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```