| # UI-TARS 1.5-7B Model Setup Commands | |
| This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama. | |
| ## Prerequisites | |
| ### 1. Verify Ollama Installation | |
| ```bash | |
| ollama --version | |
| ``` | |
| ### 2. Install System Dependencies | |
| ```bash | |
| # Install sentencepiece via Homebrew | |
| brew install sentencepiece | |
| # Install Python packages | |
| pip3 install sentencepiece gguf protobuf huggingface_hub | |
| ``` | |
| ## Step 1: Download the UI-TARS Model | |
| ### Create directory and download model | |
| ```bash | |
| # Create directory for the model | |
| mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b | |
| # Change to the directory | |
| cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b | |
| # Download the complete model from HuggingFace | |
| huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False | |
| # Verify download | |
| ls -la | |
| ``` | |
| ## Step 2: Setup llama.cpp for Conversion | |
| ### Clone and build llama.cpp | |
| ```bash | |
| # Navigate to AI directory | |
| cd /Users/qoneqt/Desktop/shubham/ai | |
| # Clone llama.cpp repository | |
| git clone https://github.com/ggerganov/llama.cpp.git | |
| # Navigate to llama.cpp directory | |
| cd llama.cpp | |
| # Create build directory and configure with CMake | |
| mkdir build | |
| cd build | |
| cmake .. | |
| # Build the project (this will take a few minutes) | |
| make -j$(sysctl -n hw.ncpu) | |
| # Verify the quantize tool was built | |
| ls -la bin/llama-quantize | |
| ``` | |
| ## Step 3: Convert Safetensors to GGUF Format | |
| ### Create output directory and convert to F16 GGUF | |
| ```bash | |
| # Create directory for GGUF files | |
| mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf | |
| # Navigate to llama.cpp directory | |
| cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp | |
| # Convert safetensors to F16 GGUF (this takes ~5-10 minutes) | |
| python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \ | |
| --outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \ | |
| --outtype f16 | |
| # Check the F16 file size | |
| ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf | |
| ``` | |
| ## Step 4: Quantize to Q4_K_M Format | |
| ### Quantize the F16 model to reduce size | |
| ```bash | |
| # Navigate to the build directory | |
| cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build | |
| # Quantize F16 to Q4_K_M (this takes ~1-2 minutes) | |
| ./bin/llama-quantize \ | |
| /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \ | |
| /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \ | |
| q4_k_m | |
| # Check the quantized file size | |
| ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf | |
| ``` | |
| ## Step 5: Create Modelfiles for Ollama | |
| ### Create Modelfile for F16 version | |
| ```bash | |
| cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf | |
| cat > Modelfile << 'EOF' | |
| FROM ./ui-tars-1.5-7b-f16.gguf | |
| TEMPLATE """<|im_start|>system | |
| You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance. | |
| Key capabilities: | |
| - Screenshot analysis and UI element detection | |
| - Step-by-step automation instructions | |
| - Precise coordinate identification for clicks and interactions | |
| - Understanding of various UI frameworks and applications<|im_end|> | |
| <|im_start|>user | |
| {{ .Prompt }}<|im_end|> | |
| <|im_start|>assistant | |
| """ | |
| PARAMETER stop "<|end|>" | |
| PARAMETER stop "<|user|>" | |
| PARAMETER stop "<|assistant|>" | |
| PARAMETER temperature 0.7 | |
| PARAMETER top_p 0.9 | |
| EOF | |
| ``` | |
| ### Create Modelfile for quantized version | |
| ```bash | |
| cat > Modelfile-q4 << 'EOF' | |
| FROM ./ui-tars-1.5-7b-q4_k_m.gguf | |
| TEMPLATE """<|im_start|>system | |
| You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance. | |
| Key capabilities: | |
| - Screenshot analysis and UI element detection | |
| - Step-by-step automation instructions | |
| - Precise coordinate identification for clicks and interactions | |
| - Understanding of various UI frameworks and applications<|im_end|> | |
| <|im_start|>user | |
| {{ .Prompt }}<|im_end|> | |
| <|im_start|>assistant | |
| """ | |
| PARAMETER stop "<|end|>" | |
| PARAMETER stop "<|user|>" | |
| PARAMETER stop "<|assistant|>" | |
| PARAMETER temperature 0.7 | |
| PARAMETER top_p 0.9 | |
| EOF | |
| ``` | |
| ## Step 6: Create Models in Ollama | |
| ### Create the F16 model (high quality, larger size) | |
| ```bash | |
| cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf | |
| ollama create ui-tars:latest -f Modelfile | |
| ``` | |
| ### Create the quantized model (recommended for daily use) | |
| ```bash | |
| ollama create ui-tars:q4 -f Modelfile-q4 | |
| ``` | |
| ## Step 7: Verify Installation | |
| ### List all available models | |
| ```bash | |
| ollama list | |
| ``` | |
| ### Test the quantized model | |
| ```bash | |
| ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?" | |
| ``` | |
| ### Test with an image (if you have one) | |
| ```bash | |
| ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png | |
| ``` | |
| ## File Sizes and Results | |
| After completion, you should have: | |
| - **Original model**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files) | |
| - **F16 GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB) | |
| - **Quantized GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB) | |
| - **Ollama models**: | |
| - `ui-tars:latest` (~15GB in Ollama) | |
| - `ui-tars:q4` (~4.7GB in Ollama) ⭐ **Recommended for daily use** | |
| ## Usage Tips | |
| 1. **Use the quantized model (`ui-tars:q4`)** for regular use - it's 69% smaller with minimal quality loss | |
| 2. **The model supports vision capabilities** - you can send screenshots for UI analysis | |
| 3. **Proper image formats**: PNG, JPEG, WebP are supported | |
| 4. **For UI automation**: Provide clear screenshots and specific questions about what you want to automate | |
| ## Cleanup (Optional) | |
| If you want to save disk space after setup: | |
| ```bash | |
| # Remove the original downloaded files (optional) | |
| rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b | |
| # Remove the F16 GGUF if you only need the quantized version (optional) | |
| rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf | |
| # Remove llama.cpp if no longer needed (optional) | |
| rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp | |
| ``` | |
| --- | |
| **Total Setup Time**: ~20-30 minutes (depending on download and conversion speeds) | |
| **Final Model Size**: 4.7GB (quantized) vs 15GB (original) - 69% size reduction! | |