|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- jonathan-roberts1/NWPU-RESISC45 |
|
language: |
|
- en |
|
base_model: |
|
- google/siglip2-base-patch16-224 |
|
pipeline_tag: image-classification |
|
library_name: transformers |
|
tags: |
|
- RESISC45 |
|
- SigLIP2 |
|
--- |
|
|
|
 |
|
|
|
# **RESISC45-SigLIP2** |
|
|
|
> **RESISC45-SigLIP2** is a vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for **multi-label** image classification. It is specifically trained to recognize and tag multiple land use and land cover scene categories from the **RESISC45** dataset using the **SiglipForImageClassification** architecture. |
|
|
|
> [!note] |
|
*SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features* https://arxiv.org/pdf/2502.14786 |
|
|
|
```py |
|
Classification Report: |
|
precision recall f1-score support |
|
|
|
airplane 0.9830 0.9900 0.9865 700 |
|
airport 0.9461 0.9529 0.9495 700 |
|
baseball diamond 0.9802 0.9886 0.9844 700 |
|
basketball court 0.9516 0.9271 0.9392 700 |
|
beach 0.9914 0.9900 0.9907 700 |
|
bridge 0.9730 0.9771 0.9751 700 |
|
chaparral 0.9957 0.9986 0.9971 700 |
|
church 0.7949 0.8971 0.8430 700 |
|
circular farmland 0.9914 0.9914 0.9914 700 |
|
cloud 0.9957 0.9871 0.9914 700 |
|
commercial area 0.9231 0.8229 0.8701 700 |
|
dense residential 0.9355 0.8914 0.9129 700 |
|
desert 0.9821 0.9414 0.9613 700 |
|
forest 0.9652 0.9514 0.9583 700 |
|
freeway 0.9344 0.9571 0.9457 700 |
|
golf course 0.9759 0.9843 0.9801 700 |
|
ground track field 0.9623 0.9857 0.9739 700 |
|
harbor 0.9885 0.9843 0.9864 700 |
|
industrial area 0.9505 0.9043 0.9268 700 |
|
intersection 0.9855 0.9686 0.9769 700 |
|
island 0.9871 0.9829 0.9850 700 |
|
lake 0.9440 0.9629 0.9533 700 |
|
meadow 0.9564 0.9400 0.9481 700 |
|
medium residential 0.8602 0.9314 0.8944 700 |
|
mobile home park 0.9610 0.9500 0.9555 700 |
|
mountain 0.9388 0.9429 0.9408 700 |
|
overpass 0.9614 0.9614 0.9614 700 |
|
palace 0.8455 0.8286 0.8369 700 |
|
parking lot 0.9899 0.9757 0.9827 700 |
|
railway 0.9407 0.9071 0.9236 700 |
|
railway station 0.9104 0.9143 0.9123 700 |
|
rectangular farmland 0.9572 0.9271 0.9419 700 |
|
river 0.9281 0.9586 0.9431 700 |
|
roundabout 0.9914 0.9871 0.9893 700 |
|
runway 0.9669 0.9586 0.9627 700 |
|
sea ice 0.9957 0.9943 0.9950 700 |
|
ship 0.9558 0.9886 0.9719 700 |
|
snowberg 0.9886 0.9900 0.9893 700 |
|
sparse residential 0.9238 0.9700 0.9463 700 |
|
stadium 0.9716 0.9757 0.9736 700 |
|
storage tank 0.9787 0.9829 0.9808 700 |
|
tennis court 0.9326 0.9486 0.9405 700 |
|
terrace 0.9372 0.9586 0.9477 700 |
|
thermal power station 0.9482 0.9671 0.9576 700 |
|
wetland 0.9444 0.8986 0.9209 700 |
|
|
|
accuracy 0.9532 31500 |
|
macro avg 0.9538 0.9532 0.9532 31500 |
|
weighted avg 0.9538 0.9532 0.9532 31500 |
|
``` |
|
|
|
--- |
|
|
|
## **Label Space: 45 Scene Categories** |
|
|
|
The model predicts the presence of one or more of the following **45 scene categories**: |
|
|
|
``` |
|
Class 0: "airplane" |
|
Class 1: "airport" |
|
Class 2: "baseball diamond" |
|
Class 3: "basketball court" |
|
Class 4: "beach" |
|
Class 5: "bridge" |
|
Class 6: "chaparral" |
|
Class 7: "church" |
|
Class 8: "circular farmland" |
|
Class 9: "cloud" |
|
Class 10: "commercial area" |
|
Class 11: "dense residential" |
|
Class 12: "desert" |
|
Class 13: "forest" |
|
Class 14: "freeway" |
|
Class 15: "golf course" |
|
Class 16: "ground track field" |
|
Class 17: "harbor" |
|
Class 18: "industrial area" |
|
Class 19: "intersection" |
|
Class 20: "island" |
|
Class 21: "lake" |
|
Class 22: "meadow" |
|
Class 23: "medium residential" |
|
Class 24: "mobile home park" |
|
Class 25: "mountain" |
|
Class 26: "overpass" |
|
Class 27: "palace" |
|
Class 28: "parking lot" |
|
Class 29: "railway" |
|
Class 30: "railway station" |
|
Class 31: "rectangular farmland" |
|
Class 32: "river" |
|
Class 33: "roundabout" |
|
Class 34: "runway" |
|
Class 35: "sea ice" |
|
Class 36: "ship" |
|
Class 37: "snowberg" |
|
Class 38: "sparse residential" |
|
Class 39: "stadium" |
|
Class 40: "storage tank" |
|
Class 41: "tennis court" |
|
Class 42: "terrace" |
|
Class 43: "thermal power station" |
|
Class 44: "wetland" |
|
``` |
|
|
|
--- |
|
|
|
## **Install dependencies** |
|
|
|
```bash |
|
pip install -q transformers torch pillow gradio |
|
``` |
|
|
|
--- |
|
|
|
## **Inference Code** |
|
|
|
```python |
|
import gradio as gr |
|
from transformers import AutoImageProcessor, SiglipForImageClassification |
|
from PIL import Image |
|
import torch |
|
|
|
# Load model and processor |
|
model_name = "prithivMLmods/RESISC45-SigLIP2" # Update to your actual Hugging Face model path |
|
model = SiglipForImageClassification.from_pretrained(model_name) |
|
processor = AutoImageProcessor.from_pretrained(model_name) |
|
|
|
# Label map |
|
id2label = { |
|
"0": "airplane", "1": "airport", "2": "baseball diamond", "3": "basketball court", "4": "beach", |
|
"5": "bridge", "6": "chaparral", "7": "church", "8": "circular farmland", "9": "cloud", |
|
"10": "commercial area", "11": "dense residential", "12": "desert", "13": "forest", "14": "freeway", |
|
"15": "golf course", "16": "ground track field", "17": "harbor", "18": "industrial area", "19": "intersection", |
|
"20": "island", "21": "lake", "22": "meadow", "23": "medium residential", "24": "mobile home park", |
|
"25": "mountain", "26": "overpass", "27": "palace", "28": "parking lot", "29": "railway", |
|
"30": "railway station", "31": "rectangular farmland", "32": "river", "33": "roundabout", "34": "runway", |
|
"35": "sea ice", "36": "ship", "37": "snowberg", "38": "sparse residential", "39": "stadium", |
|
"40": "storage tank", "41": "tennis court", "42": "terrace", "43": "thermal power station", "44": "wetland" |
|
} |
|
|
|
def classify_resisc_image(image): |
|
image = Image.fromarray(image).convert("RGB") |
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
probs = torch.sigmoid(logits).squeeze().tolist() |
|
|
|
threshold = 0.5 |
|
predictions = { |
|
id2label[str(i)]: round(probs[i], 3) |
|
for i in range(len(probs)) if probs[i] >= threshold |
|
} |
|
|
|
return predictions or {"None Detected": 0.0} |
|
|
|
# Gradio Interface |
|
iface = gr.Interface( |
|
fn=classify_resisc_image, |
|
inputs=gr.Image(type="numpy"), |
|
outputs=gr.Label(label="Predicted Scene Categories"), |
|
title="RESISC45-SigLIP2", |
|
description="Upload a satellite image to detect multiple land use and land cover categories (e.g., airport, forest, mountain)." |
|
) |
|
|
|
if __name__ == "__main__": |
|
iface.launch() |
|
``` |
|
|
|
--- |
|
|
|
## **Intended Use** |
|
|
|
The **RESISC45-SigLIP2** model is ideal for multi-label classification tasks involving remote sensing imagery. Use cases include: |
|
|
|
- **Remote Sensing Analysis** – Label elements in aerial/satellite images. |
|
- **Urban Planning** – Identify urban structures and landscape features. |
|
- **Geospatial Intelligence** – Aid in automated image interpretation pipelines. |
|
- **Environmental Monitoring** – Track natural landforms and changes. |