--- license: apache-2.0 pipeline_tag: image-classification tags: - multi-label - anime - danbooru ---

Model · Demo · Quickstart · Quick comparisons

# PixAI Tagger v0.9 A practical anime **multi-label tagger**. Not trying to win benchmarks; trying to be useful. **High recall**, updated **character coverage**, trained on a fresh Danbooru snapshot (2025-01). We’ll keep shipping: **v1.0** (with updated tags) is next. > TL;DR > > - ~**13.5k** Danbooru-style tags (**general**, **character**, **copyright**) > - Headline: strong **character** performance; recall-leaning defaults > - Built for search, dataset curation, caption assistance, and text-to-image conditioning --- ## What it is (in one breath) `pixai-tagger-v0.9` is a multi-label image classifier for anime images. It predicts Danbooru-style tags and aims to **find more of the right stuff** (recall) so you can filter later. We continued training the **classification head** of EVA02 (from WD v3) on a newer dataset, and used **embedding-space MixUp** to help calibration. - **Last trained:** 2025-04 - **Data snapshot:** Danbooru IDs 1–8,600,750 (2025-01) - **Finetuned from:** `SmilingWolf/wd-eva02-large-tagger-v3` (encoder frozen) - **License (weights):** Apache 2.0 *(Note: Danbooru content has its own licenses.)* --- ## Why you might care - **Newer data.** Catches more recent IPs/characters. - **Recall-first defaults.** Good for search and curation; dial thresholds for precision. - **Character focus.** We spent time here; it shows up in evals. - **Simple to run.** Works as an endpoint or locally; small set of knobs. --- ## Quickstart **Recommended defaults (balanced):** - `top_k = 128` - `threshold_general = 0.30` - `threshold_character = 0.75` **Coverage preset (recall-heavier):** `threshold_general = 0.10` (expect more false positives) ### 1) Inference Endpoint Deploy as an HF Inference Endpoint and test with the following command: ```bash # Replace with your own endpoint URL curl "https://YOUR_ENDPOINT_URL.huggingface.cloud" \ -X POST \ -H "Accept: application/json" \ -H "Content-Type: application/json" \ -d '{ "inputs": {"url": "https://your.cdn/image.jpg"}, "parameters": { "top_k": 128, "threshold_general": 0.10, "threshold_character": 0.75 } }' ``` ### 2) Python (InferenceClient) ```python from huggingface_hub import InferenceClient client = InferenceClient("https://YOUR_ENDPOINT_URL.huggingface.cloud") out = client.post(json={ "inputs": {"url": "https://your.cdn/image.jpg"}, "parameters": {"top_k": 128, "threshold_general": 0.10, "threshold_character": 0.75} }) # out: [{"tag": "1girl", "score": 0.97, "group": "general"}, {"tag": "mika_(blue_archive)", "score": 0.92, "group": "character"}, ...] ``` ### 3) Local Deployment - **Minimal Script**: See [`handler.py`](https://huggingface.co/pixai-labs/pixai-tagger-v0.9/blob/main/handler.py) under **Files** for a minimal script. - **Demo UI**: our [Huggingface Space](https://huggingface.co/spaces/pixai-labs/pixai-tagger-demo) above or this [Huggingface Space from DeepGHS](https://huggingface.co/spaces/deepghs/pixai-tagger-v0.9-demo). - `pip` + direct weights: **TBD** (planned for v1.0). Also , this Tagger can be used via the [imgutils tool](https://dghs-imgutils.deepghs.org/main/api_doc/tagging/pixai.html). ------ ## Training notes (short version) - **Source:** Danbooru (IDs 1–8,600,750; snapshot 2025-01) - **Tag set:** ~**13,461** tags (≥600 occurrences); grouped as general/character/copyright - **Filtering:** remove images with **<10 general tags** (WD v3 heuristic) - **Setup:** EVA02 encoder **frozen**; classification head **continued training** - **Input:** 448×448; standard Danbooru tag normalization - **Augment:** **MixUp in embedding space** (α=200) - **Optim:** Adam 1e-5, cycle schedule; batch 2048; full precision - **Compute:** ~**1 day** on **1× 8×H100** node - *(Explored full-backbone training; deferred—head-only was more stable and faster for data iteration.)* ------ ## Evaluation (what to expect) **Metric style:** Fixed thresholds (above). Reported as **micro-averaged** unless noted. - **All-tags (13k) micro-F1:** ~**0.60** (recall-leaning) - **Character subset (4k) micro-F1:** **0.865** @ `t_char=0.75` - Reference: **WD v3 SwinV2** character F1 ≈ **0.608** (same protocol) **Internal “accuracy/coverage” snapshot** | Model | Coverage-F1 | Accuracy-F1 | Acc-Recall | Acc-Precision | Cov-Precision | Cov-Recall | | -------------- | ----------- | ----------- | ---------- | ------------- | ------------- | ---------- | | **PixAI v0.9** | **0.4910** | 0.4403 | 0.6654 | 0.3634 | 0.4350 | 0.6547 | | WD-v3-EVA02 | 0.4155 | 0.4608 | 0.4465 | **0.5248** | 0.4580 | 0.4083 | | WD-v3-SwinV2 | 0.3349 | 0.3909 | 0.3603 | 0.4821 | 0.3906 | 0.3171 | | Camie-70k | 0.4877 | 0.4800 | 0.5743 | 0.4123 | 0.4288 | 0.5930 | > Notes > • Character uses `t≈0.75`; coverage often uses `t≈0.10`. > • Keep micro vs macro consistent when updating numbers. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/636982a164aad59d4d42714b/6QW7wK_GqKzr6037REnCP.png) > Note: Plots show internal candidate versions (v2.x). Current release is equivalent to `pixai-tagger-v0.9` (ex-`v2.4.1`). Follow-up version is in progress. ------ ## Quick comparisons A fast feel for where v0.9 sits. Numbers are from our protocol and may differ from others’. | Topic | PixAI Tagger v0.9 | WD v3 (EVA02 / SwinV2) | What it means in practice | | --------------------- | ---------------------------------------- | ---------------------- | ------------------------------------------------------------ | | **Data snapshot** | Danbooru to **2025-01** | Danbooru to 2024-02 | Better coverage of newer IPs | | **Tag vocabulary** | ~**13.5k** tags | ~10.8k tags | More labels to catch long-tail | | **Character F1** | **≈0.865** (@ 0.75 threshold) | ~0.61 (SwinV2 ref) | Stronger character recognition | | **Default posture** | Recall-leaning (tune down for precision) | Often more balanced | Good for search/curation; more false positives; set your own thresholds | | **Model size** | **~1.27 GB** checkpoint | Similar ballpark | Easy to host; endpoint-friendly | | **Training strategy** | Head-only; encoder frozen (EVA02) | Depends on release | Faster iteration on data updates | ------ ## Intended use **You can:** - Auto-tag anime images with Danbooru-style tags - Build tag-search indices - Assist caption generation (merge tags with NL captions) - Feed tags into **text-to-image** pipelines (alone or alongside text) **Please don’t rely on it for:** - Legal/safety moderation or age verification - Non-anime imagery (performance will drop) - Fine-grained counting/attributes without human review ------ ## Limitations & risks - **NSFW & sensitive tags.** The dataset contains them; outputs may too. - **Recall vs precision.** Low thresholds increase false positives. - **Hallucinations.** Number-sensitive or visually similar tags can be mispredicted. - **Representation bias.** Mirrors Danbooru’s styles, tropes, and demographics. - **IP/character names.** Can be wrong or incomplete; use allow/deny lists and co-occurrence rules. **Tuning tips** - Set **different thresholds** for general vs character tags. - Consider **allow/deny lists** for your domain. - Add simple **co-occurrence rules** to suppress contradictions. ------ ## Authors / Contributors - **[Linso](https://huggingface.co/richard-guyunqi)** — primary contributor (training, data processing) - **[narugo1992](https://huggingface.co/narugo1992)** — contributions - **[AngelBottomless](https://huggingface.co/AngelBottomless)** (PixAI) — contributions - **[trojblue](https://huggingface.co/trojblue)** (PixAI) — contributions - The rest of the PixAI team — further development support and testing **We also appreciate the broader anime image generation community.** Several ideas, discussions, and experiments from outside PixAI helped shape this release. --- ## Maintenance - We plan **future releases** with updated snapshots. - v1.0 will include updated tags + packaging improvements. - Changelog will live in the repo. ## Other - There is an [ONNX version of this Tagger provided by DeepGHS](https://huggingface.co/deepghs/pixai-tagger-v0.9-onnx), thanks!