t0m-R
commited on
Commit
·
a20e54a
1
Parent(s):
ac2dc19
Upload Vit-B/8 SEM scale classification model
Browse files- README.md +81 -0
- config.json +47 -0
- model.safetensors +3 -0
README.md
ADDED
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language: en
|
4 |
+
tags:
|
5 |
+
- image-classification
|
6 |
+
- vision-transformer
|
7 |
+
- pytorch
|
8 |
+
- sem
|
9 |
+
- materials-science
|
10 |
+
- nffa-di
|
11 |
+
base_model: timm/vit_base_patch8_224.augreg2_in21k_ft_in1k
|
12 |
+
pipeline_tag: image-classification
|
13 |
+
---
|
14 |
+
|
15 |
+
# Vision Transformer for SEM Image Scale Classification
|
16 |
+
|
17 |
+
This is a fine-tuned **Vision Transformer (ViT-B/8)** model for classifying the magnification scale of Scanning Electron Microscopy (SEM) images—**pico, nano, or micro**—directly from pixel data.
|
18 |
+
|
19 |
+
The model addresses the challenge of unreliable scale information in large SEM archives, which is often hindered by proprietary file formats or error-prone Optical Character Recognition (OCR).
|
20 |
+
|
21 |
+
This model was developed as part of the **NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure)** project, funded by the European Union's NextGenerationEU program.
|
22 |
+
|
23 |
+
## Model Description
|
24 |
+
|
25 |
+
The model is based on the `timm/vit_base_patch8_224.augreg2_in21k_ft_in1k` checkpoint and has been fine-tuned for a 3-class image classification task on SEM images. The three scale categories are:
|
26 |
+
|
27 |
+
1. **Pico**: Images where the pixel size is in the atomic or sub-nanometer scale (less than 1 nm).
|
28 |
+
2. **Nano**: Images where the pixel size is in the nanometer range (1 nm to 1,000 nm, or 1 µm).
|
29 |
+
3. **Micro**: Images where the pixel size is in the micrometer scale (greater than 1 µm).
|
30 |
+
|
31 |
+
## Model Performance
|
32 |
+
|
33 |
+
The model achieves **91,7% accuracy** on a held-out test set. Notably, most misclassifications occur at the transitional nano-micro boundary, which indicates that the model is learning physically meaningful feature representations related to the magnification level.
|
34 |
+
|
35 |
+
## How to Use
|
36 |
+
|
37 |
+
The following Python code shows how to load the model and its processor from the Hub and use it to classify a local SEM image.
|
38 |
+
|
39 |
+
```python
|
40 |
+
from transformers import AutoImageProcessor, AutoModelForImageClassification
|
41 |
+
from PIL import Image
|
42 |
+
import torch
|
43 |
+
|
44 |
+
# Load the model and image processor from the Hub
|
45 |
+
model_name = "t0m-R/vit-sem-scale-classifier"
|
46 |
+
image_processor = AutoImageProcessor.from_pretrained(model_name)
|
47 |
+
model = AutoModelForImageClassification.from_pretrained(model_name)
|
48 |
+
|
49 |
+
# Load and preprocess the image
|
50 |
+
image_path = "path/to/your/sem_image.png"
|
51 |
+
try:
|
52 |
+
image = Image.open(image_path).convert("RGB")
|
53 |
+
|
54 |
+
# Prepare the image for the model
|
55 |
+
inputs = image_processor(images=image, return_tensors="pt")
|
56 |
+
|
57 |
+
# Run inference
|
58 |
+
with torch.no_grad():
|
59 |
+
logits = model(**inputs).logits
|
60 |
+
predicted_label_id = logits.argmax(-1).item()
|
61 |
+
predicted_label = model.config.id2label[predicted_label_id]
|
62 |
+
|
63 |
+
print(f"Predicted Scale: {predicted_label}")
|
64 |
+
|
65 |
+
except FileNotFoundError:
|
66 |
+
print(f"Error: The file at {image_path} was not found.")
|
67 |
+
```
|
68 |
+
## Training Data
|
69 |
+
|
70 |
+
This model was fine-tuned on a custom dataset of 17,700 Scanning Electron Microscopy (SEM) images, curated specifically for this project.
|
71 |
+
The images were selected to create a balanced dataset for the task of scale classification. This set contains an equal one-third split of images corresponding to the pico, nano, and micro scales (5,900 images per class).
|
72 |
+
|
73 |
+
The 17,700 images were then divided into:
|
74 |
+
|
75 |
+
Training set: 12,000 images
|
76 |
+
|
77 |
+
Validation set: 3,000 images
|
78 |
+
|
79 |
+
Test set: 2,700 images
|
80 |
+
|
81 |
+
**Note on Availability**: This dataset is not publicly available at the moment but is planned for publication at a later stage. Please check this model card for future updates on data access.
|
config.json
ADDED
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architecture": "vit_base_patch8_224",
|
3 |
+
"architectures": [
|
4 |
+
"TimmWrapperForImageClassification"
|
5 |
+
],
|
6 |
+
"do_pooling": true,
|
7 |
+
"dtype": "float32",
|
8 |
+
"global_pool": "token",
|
9 |
+
"initializer_range": 0.02,
|
10 |
+
"label_names": [
|
11 |
+
"pico",
|
12 |
+
"nano",
|
13 |
+
"micro"
|
14 |
+
],
|
15 |
+
"model_args": null,
|
16 |
+
"model_type": "timm_wrapper",
|
17 |
+
"num_classes": 3,
|
18 |
+
"num_features": 768,
|
19 |
+
"pretrained_cfg": {
|
20 |
+
"classifier": "head",
|
21 |
+
"crop_mode": "center",
|
22 |
+
"crop_pct": 0.9,
|
23 |
+
"custom_load": false,
|
24 |
+
"first_conv": "patch_embed.proj",
|
25 |
+
"fixed_input_size": true,
|
26 |
+
"input_size": [
|
27 |
+
3,
|
28 |
+
224,
|
29 |
+
224
|
30 |
+
],
|
31 |
+
"interpolation": "bicubic",
|
32 |
+
"mean": [
|
33 |
+
0.5,
|
34 |
+
0.5,
|
35 |
+
0.5
|
36 |
+
],
|
37 |
+
"pool_size": null,
|
38 |
+
"std": [
|
39 |
+
0.5,
|
40 |
+
0.5,
|
41 |
+
0.5
|
42 |
+
],
|
43 |
+
"tag": "augreg2_in21k_ft_in1k"
|
44 |
+
},
|
45 |
+
"problem_type": "single_label_classification",
|
46 |
+
"transformers_version": "4.56.0"
|
47 |
+
}
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:cae76191450cf0c7b6c4f177e44433046a2dc4a69fd1eecefb28d70a3dd77826
|
3 |
+
size 343254828
|