UITARSSS / ui-tars-setup-commands.md

Upload folder using huggingface_hub

6b2dc4f verified 5 months ago

6.7 kB

	# UI-TARS 1.5-7B Model Setup Commands

	This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama.

	## Prerequisites

	### 1. Verify Ollama Installation
	```bash
	ollama --version
	```

	### 2. Install System Dependencies
	```bash
	# Install sentencepiece via Homebrew
	brew install sentencepiece

	# Install Python packages
	pip3 install sentencepiece gguf protobuf huggingface_hub
	```

	## Step 1: Download the UI-TARS Model

	### Create directory and download model
	```bash
	# Create directory for the model
	mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

	# Change to the directory
	cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

	# Download the complete model from HuggingFace
	huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False

	# Verify download
	ls -la
	```

	## Step 2: Setup llama.cpp for Conversion

	### Clone and build llama.cpp
	```bash
	# Navigate to AI directory
	cd /Users/qoneqt/Desktop/shubham/ai

	# Clone llama.cpp repository
	git clone https://github.com/ggerganov/llama.cpp.git

	# Navigate to llama.cpp directory
	cd llama.cpp

	# Create build directory and configure with CMake
	mkdir build
	cd build
	cmake ..

	# Build the project (this will take a few minutes)
	make -j$(sysctl -n hw.ncpu)

	# Verify the quantize tool was built
	ls -la bin/llama-quantize
	```

	## Step 3: Convert Safetensors to GGUF Format

	### Create output directory and convert to F16 GGUF
	```bash
	# Create directory for GGUF files
	mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf

	# Navigate to llama.cpp directory
	cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp

	# Convert safetensors to F16 GGUF (this takes ~5-10 minutes)
	python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \
	--outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
	--outtype f16

	# Check the F16 file size
	ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
	```

	## Step 4: Quantize to Q4_K_M Format

	### Quantize the F16 model to reduce size
	```bash
	# Navigate to the build directory
	cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build

	# Quantize F16 to Q4_K_M (this takes ~1-2 minutes)
	./bin/llama-quantize \
	/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
	/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \
	q4_k_m

	# Check the quantized file size
	ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf
	```

	## Step 5: Create Modelfiles for Ollama

	### Create Modelfile for F16 version
	```bash
	cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf

	cat > Modelfile << 'EOF'
	FROM ./ui-tars-1.5-7b-f16.gguf

	TEMPLATE """<\|im_start\|>system
	You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.

	Key capabilities:
	- Screenshot analysis and UI element detection
	- Step-by-step automation instructions
	- Precise coordinate identification for clicks and interactions
	- Understanding of various UI frameworks and applications<\|im_end\|>
	<\|im_start\|>user
	{{ .Prompt }}<\|im_end\|>
	<\|im_start\|>assistant
	"""

	PARAMETER stop "<\|end\|>"
	PARAMETER stop "<\|user\|>"
	PARAMETER stop "<\|assistant\|>"
	PARAMETER temperature 0.7
	PARAMETER top_p 0.9
	EOF
	```

	### Create Modelfile for quantized version
	```bash
	cat > Modelfile-q4 << 'EOF'
	FROM ./ui-tars-1.5-7b-q4_k_m.gguf

	TEMPLATE """<\|im_start\|>system
	You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.

	Key capabilities:
	- Screenshot analysis and UI element detection
	- Step-by-step automation instructions
	- Precise coordinate identification for clicks and interactions
	- Understanding of various UI frameworks and applications<\|im_end\|>
	<\|im_start\|>user
	{{ .Prompt }}<\|im_end\|>
	<\|im_start\|>assistant
	"""

	PARAMETER stop "<\|end\|>"
	PARAMETER stop "<\|user\|>"
	PARAMETER stop "<\|assistant\|>"
	PARAMETER temperature 0.7
	PARAMETER top_p 0.9
	EOF
	```

	## Step 6: Create Models in Ollama

	### Create the F16 model (high quality, larger size)
	```bash
	cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
	ollama create ui-tars:latest -f Modelfile
	```

	### Create the quantized model (recommended for daily use)
	```bash
	ollama create ui-tars:q4 -f Modelfile-q4
	```

	## Step 7: Verify Installation

	### List all available models
	```bash
	ollama list
	```

	### Test the quantized model
	```bash
	ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?"
	```

	### Test with an image (if you have one)
	```bash
	ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png
	```

	## File Sizes and Results

	After completion, you should have:

	- Original model: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files)
	- F16 GGUF: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB)
	- Quantized GGUF: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB)
	- Ollama models:
	- `ui-tars:latest` (~15GB in Ollama)
	- `ui-tars:q4` (~4.7GB in Ollama) ⭐ Recommended for daily use

	## Usage Tips

	1. Use the quantized model (`ui-tars:q4`) for regular use - it's 69% smaller with minimal quality loss
	2. The model supports vision capabilities - you can send screenshots for UI analysis
	3. Proper image formats: PNG, JPEG, WebP are supported
	4. For UI automation: Provide clear screenshots and specific questions about what you want to automate

	## Cleanup (Optional)

	If you want to save disk space after setup:

	```bash
	# Remove the original downloaded files (optional)
	rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

	# Remove the F16 GGUF if you only need the quantized version (optional)
	rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf

	# Remove llama.cpp if no longer needed (optional)
	rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp
	```

	---

	Total Setup Time: ~20-30 minutes (depending on download and conversion speeds)
	Final Model Size: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!