Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,37 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-nc-4.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
base_model:
|
4 |
+
- Qwen/Qwen3-1.7B
|
5 |
+
- google/siglip2-so400m-patch14-384
|
6 |
+
---
|
7 |
+
|
8 |
+
# [Isaac-0.1-Base by Perceptron](https://www.perceptron.inc/blog/introducing-isaac-0-1)
|
9 |
+
*Note this is the Base model* [Try out the model on our playground](https://www.perceptron.inc/demo)
|
10 |
+
|
11 |
+
We're introducing Isaac 0.1, our first perceptive-language model and a major step toward building AI systems that can understand and interact with the physical world. Isaac 0.1 is an open-source, 2B-parameter model built for real-world applications. It sets a new standard for efficiency, delivering capabilities that meet or exceed those of models over 50 times its size.
|
12 |
+
|
13 |
+
Founded by the team behind Meta's Chameleon multimodal models, Perceptron is tackling a fundamental challenge: bringing the power of physical AI to the dynamic, multimodal, and real-time environments we live and work in.
|
14 |
+
|
15 |
+
Isaac 0.1 is the first in our family of models built to be the intelligence layer for the physical world. It's now available open source for researchers and developers everywhere.
|
16 |
+
|
17 |
+
## What’s new in Isaac 0.1
|
18 |
+
**Visual QA, simply trained**
|
19 |
+
Strong results on standard understanding benchmarks with a straightforward, reproducible training recipe.
|
20 |
+
|
21 |
+
**Grounded spatial intelligence**
|
22 |
+
Precise pointing and localization with robust spatial reasoning. Ask “what’s broken in this machine?” and get grounded answers with highlighted regions—handling occlusions, relationships, and object interactions.
|
23 |
+
|
24 |
+
**In-context learning for perception**
|
25 |
+
Show a few annotated examples (defects, safety conditions, etc.) in the prompt and the model adapts—no YOLO-style fine-tuning or custom detector stacks required.
|
26 |
+
|
27 |
+
**OCR & fine-grained detail**
|
28 |
+
Reads small text and dense scenes reliably, across resolutions, with dynamic image handling for tiny features and cluttered layouts.
|
29 |
+
|
30 |
+
**Conversational Pointing**
|
31 |
+
A new interaction pattern where language and vision stay in lockstep: every claim is grounded and visually cited, reducing hallucinations and making reasoning auditable.
|
32 |
+
|
33 |
+
|
34 |
+
## Benchmarks
|
35 |
+
|
36 |
+

|
37 |
+

|