|
--- |
|
license: apache-2.0 |
|
language: en |
|
tags: |
|
- image-classification |
|
- vision-transformer |
|
- pytorch |
|
- sem |
|
- materials-science |
|
- nffa-di |
|
base_model: timm/vit_base_patch8_224.augreg2_in21k_ft_in1k |
|
pipeline_tag: image-classification |
|
--- |
|
|
|
# Vision Transformer for SEM Image Scale Classification |
|
|
|
This is a fine-tuned **Vision Transformer (ViT-B/8)** model for classifying the magnification scale of Scanning Electron Microscopy (SEM) images—**pico, nano, or micro**—directly from pixel data. |
|
|
|
The model addresses the challenge of unreliable scale information in large SEM archives, which is often hindered by proprietary file formats or error-prone Optical Character Recognition (OCR). |
|
|
|
This model was developed as part of the **NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure)** project, funded by the European Union's NextGenerationEU program. |
|
|
|
## Model Description |
|
|
|
The model is based on the `timm/vit_base_patch8_224.augreg2_in21k_ft_in1k` checkpoint and has been fine-tuned for a 3-class image classification task on SEM images. The three scale categories are: |
|
|
|
1. **Pico**: Images where the pixel size is in the atomic or sub-nanometer scale (less than 1 nm). |
|
2. **Nano**: Images where the pixel size is in the nanometer range (1 nm to 1,000 nm, or 1 µm). |
|
3. **Micro**: Images where the pixel size is in the micrometer scale (greater than 1 µm). |
|
|
|
## Model Performance |
|
|
|
The model achieves **91,7% accuracy** on a held-out test set. Notably, most misclassifications occur at the transitional nano-micro boundary, which indicates that the model is learning physically meaningful feature representations related to the magnification level. |
|
|
|
## How to Use |
|
|
|
The following Python code shows how to load the model and its processor from the Hub and use it to classify a local SEM image. |
|
|
|
```python |
|
from transformers import AutoImageProcessor, AutoModelForImageClassification |
|
from PIL import Image |
|
import torch |
|
|
|
# Load the model and image processor from the Hub |
|
model_name = "t0m-R/vit-sem-scale-classifier" |
|
image_processor = AutoImageProcessor.from_pretrained(model_name) |
|
model = AutoModelForImageClassification.from_pretrained(model_name) |
|
|
|
# Load and preprocess the image |
|
image_path = "path/to/your/sem_image.png" |
|
try: |
|
image = Image.open(image_path).convert("RGB") |
|
|
|
# Prepare the image for the model |
|
inputs = image_processor(images=image, return_tensors="pt") |
|
|
|
# Run inference |
|
with torch.no_grad(): |
|
logits = model(**inputs).logits |
|
predicted_label_id = logits.argmax(-1).item() |
|
predicted_label = model.config.id2label[predicted_label_id] |
|
|
|
print(f"Predicted Scale: {predicted_label}") |
|
|
|
except FileNotFoundError: |
|
print(f"Error: The file at {image_path} was not found.") |
|
``` |
|
## Training Data |
|
|
|
This model was fine-tuned on a custom dataset of 17,700 Scanning Electron Microscopy (SEM) images, curated specifically for this project. |
|
The images were selected to create a balanced dataset for the task of scale classification. This set contains an equal one-third split of images corresponding to the pico, nano, and micro scales (5,900 images per class). |
|
|
|
The 17,700 images were then divided into: |
|
|
|
Training set: 12,000 images |
|
|
|
Validation set: 3,000 images |
|
|
|
Test set: 2,700 images |
|
|
|
**Note on Availability**: This dataset is not publicly available at the moment but is planned for publication at a later stage. Please check this model card for future updates on data access. |