DevShubham
/

UITARSSS

GGUF

conversational

Model card Files Files and versions

xet

Community

DevShubham commited on Jun 12

Commit

6b2dc4f

verified ·

1 Parent(s): 4565871

Upload folder using huggingface_hub

Browse files

Files changed (1) hide show

ui-tars-setup-commands.md +221 -0

ui-tars-setup-commands.md ADDED Viewed

	@@ -0,0 +1,221 @@

+# UI-TARS 1.5-7B Model Setup Commands
+This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama.
+## Prerequisites
+### 1. Verify Ollama Installation
+```bash
+ollama --version
+```
+### 2. Install System Dependencies
+```bash
+# Install sentencepiece via Homebrew
+brew install sentencepiece
+# Install Python packages
+pip3 install sentencepiece gguf protobuf huggingface_hub
+```
+## Step 1: Download the UI-TARS Model
+### Create directory and download model
+```bash
+# Create directory for the model
+mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
+# Change to the directory
+cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
+# Download the complete model from HuggingFace
+huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False
+# Verify download
+ls -la
+```
+## Step 2: Setup llama.cpp for Conversion
+### Clone and build llama.cpp
+```bash
+# Navigate to AI directory
+cd /Users/qoneqt/Desktop/shubham/ai
+# Clone llama.cpp repository
+git clone https://github.com/ggerganov/llama.cpp.git
+# Navigate to llama.cpp directory
+cd llama.cpp
+# Create build directory and configure with CMake
+mkdir build
+cd build
+cmake ..
+# Build the project (this will take a few minutes)
+make -j$(sysctl -n hw.ncpu)
+# Verify the quantize tool was built
+ls -la bin/llama-quantize
+```
+## Step 3: Convert Safetensors to GGUF Format
+### Create output directory and convert to F16 GGUF
+```bash
+# Create directory for GGUF files
+mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
+# Navigate to llama.cpp directory
+cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp
+# Convert safetensors to F16 GGUF (this takes ~5-10 minutes)
+python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \
+  --outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
+  --outtype f16
+# Check the F16 file size
+ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
+```
+## Step 4: Quantize to Q4_K_M Format
+### Quantize the F16 model to reduce size
+```bash
+# Navigate to the build directory
+cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build
+# Quantize F16 to Q4_K_M (this takes ~1-2 minutes)
+./bin/llama-quantize \
+  /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
+  /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \
+  q4_k_m
+# Check the quantized file size
+ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf
+```
+## Step 5: Create Modelfiles for Ollama
+### Create Modelfile for F16 version
+```bash
+cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
+cat > Modelfile << 'EOF'
+FROM ./ui-tars-1.5-7b-f16.gguf
+TEMPLATE """<|im_start|>system
+You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
+Key capabilities:
+- Screenshot analysis and UI element detection
+- Step-by-step automation instructions
+- Precise coordinate identification for clicks and interactions
+- Understanding of various UI frameworks and applications<|im_end|>
+<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+"""
+PARAMETER stop "<|end|>"
+PARAMETER stop "<|user|>"
+PARAMETER stop "<|assistant|>"
+PARAMETER temperature 0.7
+PARAMETER top_p 0.9
+EOF
+```
+### Create Modelfile for quantized version
+```bash
+cat > Modelfile-q4 << 'EOF'
+FROM ./ui-tars-1.5-7b-q4_k_m.gguf
+TEMPLATE """<|im_start|>system
+You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
+Key capabilities:
+- Screenshot analysis and UI element detection
+- Step-by-step automation instructions
+- Precise coordinate identification for clicks and interactions
+- Understanding of various UI frameworks and applications<|im_end|>
+<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+"""
+PARAMETER stop "<|end|>"
+PARAMETER stop "<|user|>"
+PARAMETER stop "<|assistant|>"
+PARAMETER temperature 0.7
+PARAMETER top_p 0.9
+EOF
+```
+## Step 6: Create Models in Ollama
+### Create the F16 model (high quality, larger size)
+```bash
+cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
+ollama create ui-tars:latest -f Modelfile
+```
+### Create the quantized model (recommended for daily use)
+```bash
+ollama create ui-tars:q4 -f Modelfile-q4
+```
+## Step 7: Verify Installation
+### List all available models
+```bash
+ollama list
+```
+### Test the quantized model
+```bash
+ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?"
+```
+### Test with an image (if you have one)
+```bash
+ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png
+```
+## File Sizes and Results
+After completion, you should have:
+- **Original model**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files)
+- **F16 GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB)
+- **Quantized GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB)
+- **Ollama models**:
+  - `ui-tars:latest` (~15GB in Ollama)
+  - `ui-tars:q4` (~4.7GB in Ollama) ⭐ **Recommended for daily use**
+## Usage Tips
+1. **Use the quantized model (`ui-tars:q4`)** for regular use - it's 69% smaller with minimal quality loss
+2. **The model supports vision capabilities** - you can send screenshots for UI analysis
+3. **Proper image formats**: PNG, JPEG, WebP are supported
+4. **For UI automation**: Provide clear screenshots and specific questions about what you want to automate
+## Cleanup (Optional)
+If you want to save disk space after setup:
+```bash
+# Remove the original downloaded files (optional)
+rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
+# Remove the F16 GGUF if you only need the quantized version (optional)
+rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
+# Remove llama.cpp if no longer needed (optional)
+rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp
+```
+---
+**Total Setup Time**: ~20-30 minutes (depending on download and conversion speeds)
+**Final Model Size**: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!