prithivMLmods commited on
Commit
e943ee5
·
verified ·
1 Parent(s): 7c2176e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md CHANGED
@@ -2,8 +2,23 @@
2
  license: apache-2.0
3
  datasets:
4
  - AadityaJain/Fromula_text_classification
 
 
 
 
 
 
 
 
 
5
  ---
6
 
 
 
 
 
 
 
7
  ```py
8
  Classification Report:
9
  precision recall f1-score support
@@ -17,3 +32,82 @@ weighted avg 0.9991 0.9991 0.9991 11832
17
  ```
18
 
19
  ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/OdNUMSb_utc_RBWd3Gjfq.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  datasets:
4
  - AadityaJain/Fromula_text_classification
5
+ language:
6
+ - en
7
+ base_model:
8
+ - google/siglip2-base-patch16-224
9
+ pipeline_tag: image-classification
10
+ library_name: transformers
11
+ tags:
12
+ - Formula-Text-Detection
13
+ - SigLIP2
14
  ---
15
 
16
+ ![3.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/lg90wKzVcHjnTXs8_EGCR.png)
17
+
18
+ # **Formula-Text-Detection**
19
+
20
+ > **Formula-Text-Detection** is a vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for **binary image classification**. It is built using the **SiglipForImageClassification** architecture to distinguish between **mathematical formulas** and **natural text** in document or image regions.
21
+
22
  ```py
23
  Classification Report:
24
  precision recall f1-score support
 
32
  ```
33
 
34
  ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/OdNUMSb_utc_RBWd3Gjfq.png)
35
+
36
+ ---
37
+
38
+ ## **Label Space: 2 Classes**
39
+
40
+ The model classifies each input image into one of the following categories:
41
+
42
+ ```
43
+ Class 0: "formula"
44
+ Class 1: "text"
45
+ ```
46
+
47
+ ---
48
+
49
+ ## **Install Dependencies**
50
+
51
+ ```bash
52
+ pip install -q transformers torch pillow gradio
53
+ ```
54
+
55
+ ---
56
+
57
+ ## **Inference Code**
58
+
59
+ ```python
60
+ import gradio as gr
61
+ from transformers import AutoImageProcessor, SiglipForImageClassification
62
+ from PIL import Image
63
+ import torch
64
+
65
+ # Load model and processor
66
+ model_name = "prithivMLmods/Formula-Text-Detection" # Replace with your model path if different
67
+ model = SiglipForImageClassification.from_pretrained(model_name)
68
+ processor = AutoImageProcessor.from_pretrained(model_name)
69
+
70
+ # Label mapping
71
+ id2label = {
72
+ "0": "formula",
73
+ "1": "text"
74
+ }
75
+
76
+ def classify_formula_or_text(image):
77
+ image = Image.fromarray(image).convert("RGB")
78
+ inputs = processor(images=image, return_tensors="pt")
79
+
80
+ with torch.no_grad():
81
+ outputs = model(**inputs)
82
+ logits = outputs.logits
83
+ probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
84
+
85
+ prediction = {
86
+ id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
87
+ }
88
+
89
+ return prediction
90
+
91
+ # Gradio Interface
92
+ iface = gr.Interface(
93
+ fn=classify_formula_or_text,
94
+ inputs=gr.Image(type="numpy"),
95
+ outputs=gr.Label(num_top_classes=2, label="Formula or Text"),
96
+ title="Formula-Text-Detection",
97
+ description="Upload an image region to classify whether it contains a mathematical formula or natural text."
98
+ )
99
+
100
+ if __name__ == "__main__":
101
+ iface.launch()
102
+ ```
103
+
104
+ ---
105
+
106
+ ## **Intended Use**
107
+
108
+ **Formula-Text-Detection** can be used in:
109
+
110
+ - **OCR Preprocessing** – Improve document OCR accuracy by separating formulas from text.
111
+ - **Scientific Document Analysis** – Automatically detect mathematical content.
112
+ - **Educational Platforms** – Classify and annotate scanned materials.
113
+ - **Layout Understanding** – Help AI systems interpret mixed-content documents.