Font Classifier DINOv2 (Server-Side Preprocessing)

A fine-tuned DINOv2 model for font classification with built-in preprocessing.

🎯 Key Feature: No client-side preprocessing required!

Performance

  • Accuracy: ~86% on test set
  • Preprocessing: Automatic server-side pad-to-square + normalization

Usage

Simple API Usage (Recommended)

Clients can send raw images directly to inference endpoints:

import requests
import base64

# Load your image
with open("test_image.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

# Send to inference endpoint
response = requests.post(
    "https://your-endpoint.com",
    headers={"Authorization": "Bearer YOUR_TOKEN"},
    json={"inputs": image_data}
)

results = response.json()
print(f"Predicted font: {results[0]['label']} ({results[0]['score']:.2%})")

Standard HuggingFace Usage

from transformers import pipeline

# The model automatically handles preprocessing
classifier = pipeline("image-classification", model="dchen0/font-classifier-v4")
results = classifier("your_image.png")
print(f"Predicted font: {results[0]['label']}")

Direct Model Usage

from PIL import Image
import torch
from transformers import AutoImageProcessor
from font_classifier_with_preprocessing import FontClassifierWithPreprocessing

# Load model and processor
model = FontClassifierWithPreprocessing.from_pretrained("dchen0/font-classifier-v4")
processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier-v4")

# Process image (model handles pad_to_square automatically)
image = Image.open("test.png")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

Model Architecture

  • Base Model: facebook/dinov2-base-imagenet1k-1-layer
  • Fine-tuning: LoRA on Google Fonts dataset
  • Labels: 394 font families
  • Preprocessing: Built-in pad-to-square + ImageNet normalization

Server-Side Preprocessing

This model automatically applies the following preprocessing in its forward pass:

  1. Pad to square preserving aspect ratio
  2. Resize to 224×224
  3. Normalize with ImageNet statistics

No client-side preprocessing required - just send raw images!

Deployment

HuggingFace Inference Endpoints

  1. Deploy this model to an Inference Endpoint
  2. Send raw images directly - preprocessing happens automatically
  3. Achieve ~86% accuracy out of the box

Custom Deployment

The model includes preprocessing in the forward pass, so any deployment (TorchServe, TensorFlow Serving, etc.) will automatically apply correct preprocessing.

Files

  • font_classifier_with_preprocessing.py: Custom model class with built-in preprocessing
  • Standard HuggingFace model files

Technical Details

The model inherits from Dinov2ForImageClassification but overrides the forward pass to include:

def forward(self, pixel_values=None, labels=None, **kwargs):
    # Automatic preprocessing happens here
    processed_pixel_values = self.preprocess_images(pixel_values)
    return super().forward(pixel_values=processed_pixel_values, labels=labels, **kwargs)

This ensures that whether clients send raw images or pre-processed tensors, the model receives correctly formatted input.

Downloads last month
68
Safetensors
Model size
87.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support