Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
免费去水印
Log In
Sign Up
10.1
TFLOPS
2
2
6
Sagar pallai
PRO
sagar007
Follow
Sanketny8's profile picture
valiglus's profile picture
lukmanaj's profile picture
28 followers
·
30 following
AI & ML interests
LLM AND STABLE DIFFUSION
Recent Activity
new
activity
12 days ago
sagar007/multigemma:
🚨🚨🚨 License Violation Alert: Illegally Re-Licensing Google's Gemma Model as "Open Source"
replied
to
their
post
12 days ago
🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! 🔧 What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency 📊 Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) 📈 https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding 🔗 **Try it yourself:** - 🤗 Model: https://huggingface.co/sagar007/multigemma - 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! 🙏 #multimodal #gemma #clip #llava #vision-language #pytorch
replied
to
their
post
13 days ago
🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! 🔧 What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency 📊 Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) 📈 https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding 🔗 **Try it yourself:** - 🤗 Model: https://huggingface.co/sagar007/multigemma - 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! 🙏 #multimodal #gemma #clip #llava #vision-language #pytorch
View all activity
Organizations
sagar007
's datasets
4
Sort: Recently updated
sagar007/sagar_flux
Updated
Aug 20, 2024
•
2
sagar007/mini-odia2
Viewer
•
Updated
Dec 24, 2023
•
1k
•
3
sagar007/mini-odia
Updated
Dec 24, 2023
•
3
sagar007/mini-platypus
Updated
Dec 24, 2023
•
10
×
🎉 Free Image Generator Now Available!
Totally Free + Zero Barriers + No Login Required
Visit Now