t0m-R
/

vit-sem-scale-classifier

+---
+license: apache-2.0
+language: en
+tags:
+- image-classification
+- vision-transformer
+- pytorch
+- sem
+- materials-science
+- nffa-di
+base_model: timm/vit_base_patch8_224.augreg2_in21k_ft_in1k
+pipeline_tag: image-classification
+---
+# Vision Transformer for SEM Image Scale Classification
+This is a fine-tuned **Vision Transformer (ViT-B/8)** model for classifying the magnification scale of Scanning Electron Microscopy (SEM) images—**pico, nano, or micro**—directly from pixel data.
+The model addresses the challenge of unreliable scale information in large SEM archives, which is often hindered by proprietary file formats or error-prone Optical Character Recognition (OCR).
+This model was developed as part of the **NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure)** project, funded by the European Union's NextGenerationEU program.
+## Model Description
+The model is based on the `timm/vit_base_patch8_224.augreg2_in21k_ft_in1k` checkpoint and has been fine-tuned for a 3-class image classification task on SEM images. The three scale categories are:
+1.  **Pico**: Images where the pixel size is in the atomic or sub-nanometer scale (less than 1 nm).
+2.  **Nano**: Images where the pixel size is in the nanometer range (1 nm to 1,000 nm, or 1 µm).
+3.  **Micro**: Images where the pixel size is in the micrometer scale (greater than 1 µm).
+## Model Performance
+The model achieves **91,7% accuracy** on a held-out test set. Notably, most misclassifications occur at the transitional nano-micro boundary, which indicates that the model is learning physically meaningful feature representations related to the magnification level.
+## How to Use
+The following Python code shows how to load the model and its processor from the Hub and use it to classify a local SEM image.
+```python
+from transformers import AutoImageProcessor, AutoModelForImageClassification
+from PIL import Image
+import torch
+# Load the model and image processor from the Hub
+model_name = "t0m-R/vit-sem-scale-classifier"
+image_processor = AutoImageProcessor.from_pretrained(model_name)
+model = AutoModelForImageClassification.from_pretrained(model_name)
+# Load and preprocess the image
+image_path = "path/to/your/sem_image.png"
+try:
+    image = Image.open(image_path).convert("RGB")
+    # Prepare the image for the model
+    inputs = image_processor(images=image, return_tensors="pt")
+    # Run inference
+    with torch.no_grad():
+        logits = model(**inputs).logits
+        predicted_label_id = logits.argmax(-1).item()
+        predicted_label = model.config.id2label[predicted_label_id]
+    print(f"Predicted Scale: {predicted_label}")
+except FileNotFoundError:
+    print(f"Error: The file at {image_path} was not found.")
+```
+## Training Data
+This model was fine-tuned on a custom dataset of 17,700 Scanning Electron Microscopy (SEM) images, curated specifically for this project.
+The images were selected to create a balanced dataset for the task of scale classification. This set contains an equal one-third split of images corresponding to the pico, nano, and micro scales (5,900 images per class).
+The 17,700 images were then divided into:
+    Training set: 12,000 images
+    Validation set: 3,000 images
+    Test set: 2,700 images
+**Note on Availability**: This dataset is not publicly available at the moment but is planned for publication at a later stage. Please check this model card for future updates on data access.

config.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "architecture": "vit_base_patch8_224",
+  "architectures": [
+    "TimmWrapperForImageClassification"
+  ],
+  "do_pooling": true,
+  "dtype": "float32",
+  "global_pool": "token",
+  "initializer_range": 0.02,
+  "label_names": [
+    "pico",
+    "nano",
+    "micro"
+  ],
+  "model_args": null,
+  "model_type": "timm_wrapper",
+  "num_classes": 3,
+  "num_features": 768,
+  "pretrained_cfg": {
+    "classifier": "head",
+    "crop_mode": "center",
+    "crop_pct": 0.9,
+    "custom_load": false,
+    "first_conv": "patch_embed.proj",
+    "fixed_input_size": true,
+    "input_size": [
+      3,
+      224,
+      224
+    ],
+    "interpolation": "bicubic",
+    "mean": [
+      0.5,
+      0.5,
+      0.5
+    ],
+    "pool_size": null,
+    "std": [
+      0.5,
+      0.5,
+      0.5
+    ],
+    "tag": "augreg2_in21k_ft_in1k"
+  },
+  "problem_type": "single_label_classification",
+  "transformers_version": "4.56.0"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cae76191450cf0c7b6c4f177e44433046a2dc4a69fd1eecefb28d70a3dd77826
+size 343254828