--- license: apache-2.0 datasets: - Bingsu/Human_Action_Recognition library_name: transformers language: - en base_model: - google/siglip2-base-patch16-224 pipeline_tag: image-classification tags: - Human-Action-Recognition --- ![zfdggzdrg.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/DPx-7s4BmG_XocnPQ4TR9.png) # **Human-Action-Recognition** > **Human-Action-Recognition** is an image classification vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for multi-class human action recognition. It uses the **SiglipForImageClassification** architecture to predict human activities from still images. ```py Classification Report: precision recall f1-score support calling 0.8525 0.7571 0.8020 840 clapping 0.8679 0.7119 0.7822 840 cycling 0.9662 0.9857 0.9758 840 dancing 0.8302 0.8381 0.8341 840 drinking 0.9093 0.8714 0.8900 840 eating 0.9377 0.9131 0.9252 840 fighting 0.9034 0.7905 0.8432 840 hugging 0.9065 0.9000 0.9032 840 laughing 0.7854 0.8583 0.8203 840 listening_to_music 0.8494 0.7988 0.8233 840 running 0.8888 0.9321 0.9099 840 sitting 0.5945 0.7226 0.6523 840 sleeping 0.8593 0.8214 0.8399 840 texting 0.8195 0.6702 0.7374 840 using_laptop 0.6610 0.9190 0.7689 840 accuracy 0.8327 12600 macro avg 0.8421 0.8327 0.8339 12600 weighted avg 0.8421 0.8327 0.8339 12600 ``` ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/O9ir2VwHirB-T75ABCP7m.png) The model categorizes images into 15 action classes: - **0:** calling - **1:** clapping - **2:** cycling - **3:** dancing - **4:** drinking - **5:** eating - **6:** fighting - **7:** hugging - **8:** laughing - **9:** listening_to_music - **10:** running - **11:** sitting - **12:** sleeping - **13:** texting - **14:** using_laptop --- # **Run with Transformers 🤗** ```python !pip install -q transformers torch pillow gradio ``` ```python import gradio as gr from transformers import AutoImageProcessor, SiglipForImageClassification from PIL import Image import torch # Load model and processor model_name = "prithivMLmods/Human-Action-Recognition" # Change to your updated model path model = SiglipForImageClassification.from_pretrained(model_name) processor = AutoImageProcessor.from_pretrained(model_name) # ID to Label mapping id2label = { 0: "calling", 1: "clapping", 2: "cycling", 3: "dancing", 4: "drinking", 5: "eating", 6: "fighting", 7: "hugging", 8: "laughing", 9: "listening_to_music", 10: "running", 11: "sitting", 12: "sleeping", 13: "texting", 14: "using_laptop" } def classify_action(image): """Predicts the human action in the image.""" image = Image.fromarray(image).convert("RGB") inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() predictions = {id2label[i]: round(probs[i], 3) for i in range(len(probs))} return predictions # Gradio interface iface = gr.Interface( fn=classify_action, inputs=gr.Image(type="numpy"), outputs=gr.Label(label="Action Prediction Scores"), title="Human Action Recognition", description="Upload an image to recognize the human action (e.g., dancing, calling, sitting, etc.)." ) # Launch the app if __name__ == "__main__": iface.launch() ``` --- # **Intended Use** The **Human-Action-Recognition** model is designed to detect and classify human actions from images. Example applications: - **Surveillance & Monitoring:** Recognizing suspicious or specific activities in public spaces. - **Sports Analytics:** Identifying player activities or movements. - **Social Media Insights:** Understanding trends in user-posted visuals. - **Healthcare:** Monitoring elderly or patients for activity patterns. - **Robotics & Automation:** Enabling context-aware AI systems with visual understanding.