Vision Transformer for SEM Image Scale Classification

This is a fine-tuned Vision Transformer (ViT-B/8) model for classifying the magnification scale of Scanning Electron Microscopy (SEM) images—pico, nano, or micro—directly from pixel data.

The model addresses the challenge of unreliable scale information in large SEM archives, which is often hindered by proprietary file formats or error-prone Optical Character Recognition (OCR).

This model was developed as part of the NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure) project, funded by the European Union's NextGenerationEU program.

Model Description

The model is based on the timm/vit_base_patch8_224.augreg2_in21k_ft_in1k checkpoint and has been fine-tuned for a 3-class image classification task on SEM images. The three scale categories are:

Pico: Images where the pixel size is in the atomic or sub-nanometer scale (less than 1 nm).
Nano: Images where the pixel size is in the nanometer range (1 nm to 1,000 nm, or 1 µm).
Micro: Images where the pixel size is in the micrometer scale (greater than 1 µm).

Model Performance

The model achieves 91,7% accuracy on a held-out test set. Notably, most misclassifications occur at the transitional nano-micro boundary, which indicates that the model is learning physically meaningful feature representations related to the magnification level.

How to Use

The following Python code shows how to load the model and its processor from the Hub and use it to classify a local SEM image.

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load the model and image processor from the Hub
model_name = "t0m-R/vit-sem-scale-classifier" 
image_processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForImageClassification.from_pretrained(model_name)

# Load and preprocess the image
image_path = "path/to/your/sem_image.png" 
try:
    image = Image.open(image_path).convert("RGB")

    # Prepare the image for the model
    inputs = image_processor(images=image, return_tensors="pt")

    # Run inference
    with torch.no_grad():
        logits = model(**inputs).logits
        predicted_label_id = logits.argmax(-1).item()
        predicted_label = model.config.id2label[predicted_label_id]

    print(f"Predicted Scale: {predicted_label}")

except FileNotFoundError:
    print(f"Error: The file at {image_path} was not found.")

Training Data

This model was fine-tuned on a custom dataset of 17,700 Scanning Electron Microscopy (SEM) images, curated specifically for this project. The images were selected to create a balanced dataset for the task of scale classification. This set contains an equal one-third split of images corresponding to the pico, nano, and micro scales (5,900 images per class).

The 17,700 images were then divided into:

Training set: 12,000 images

Validation set: 3,000 images

Test set: 2,700 images

Note on Availability: This dataset is not publicly available at the moment but is planned for publication at a later stage. Please check this model card for future updates on data access.

t0m-R
/

vit-sem-scale-classifier

Vision Transformer for SEM Image Scale Classification

Model Description

Model Performance

How to Use

Training Data

Model tree for t0m-R/vit-sem-scale-classifier