Harpreet Sahota's picture

Harpreet Sahota PRO

harpreetsahota

·

AI & ML interests

Deep learning, laguage models, prompt engineering, agents, multi-agent systems

Recent Activity

liked a dataset about 10 hours ago

nvidia/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim

liked a model 1 day ago

PerceptronAI/Isaac-0.2-2B-Preview

liked a dataset 1 day ago

Lixsp11/Sekai-Project

View all activity

Organizations

upvoted a collection 22 days ago

Molmo2

Artifacts for the Molmo2 release • 6 items • Updated 16 days ago • 30

upvoted 2 papers 3 months ago

Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 120

CommonForms: A Large, Diverse Dataset for Form Field Detection

Paper • 2509.16506 • Published Sep 20, 2025 • 19

upvoted a collection 3 months ago

ModernVBERT

Resources for ModernVBERT • 5 items • Updated Oct 3, 2025 • 11

upvoted a collection 4 months ago

Qwen3-VL

37 items • Updated 8 days ago • 559

upvoted an article 4 months ago

Article

Vision Language Model Alignment in TRL ⚡️

+3

Aug 7, 2025

•

105

upvoted a collection 4 months ago

Granite Docling

5 items • Updated Nov 17, 2025 • 60

upvoted an article 4 months ago

Article

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

Sep 10, 2025

•

109

upvoted a collection 4 months ago

PP-OCRv5

PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15, 2025 • 50

upvoted 2 collections 5 months ago

UI-Venus

8 items • Updated 15 days ago • 23

Releases July 25

28 items • Updated Jul 30, 2025 • 3

upvoted a collection 6 months ago

Releases July 18

34 items • Updated Jul 23, 2025 • 4

upvoted an article 6 months ago

Article

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub

Jun 27, 2025

•

30

upvoted a collection 7 months ago

V-JEPA 2

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 181

upvoted an article 7 months ago

Article

ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

+1

Jun 6, 2025

•

55

upvoted 3 collections 7 months ago

Holo1

Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated Jun 10, 2025 • 48

AGUVIS: Unified Pure Vision GUI Agents

https://aguvis-project.github.io • 3 items • Updated Dec 20, 2024 • 7

MiMo-VL

6 items • Updated 22 days ago • 38

upvoted a collection 8 months ago

MiniCPM-o & MiniCPM-V

Multimodal models with leading performance. • 28 items • Updated Sep 1, 2025 • 59

upvoted an article 8 months ago

Article

Vision Language Models (Better, faster, stronger)

+3

May 12, 2025

•

582