--- base_model: unsloth/Llama-3.2-3B-Instruct library_name: peft tags: - llama-3.2 - unsloth - lora - tool - json language: - en license: llama3 # Llama 3 Community License --- # Model Card for LLaMA-3.2-3B Tool Caller This model (LoRA adapter) is a fine-tuned version of LLaMA-3.2-3B that specializes in tool calling capabilities. It has been trained to decide when to use one of two available tools: `search_documents` or `check_and_connect` based on user queries, responding with properly formatted JSON function calls. ## Model Details ### Model Description This model is a Parameter-Efficient Fine-Tuning (PEFT) adaptation of LLaMA-3.2-3B focused on tool use. It employs Low-Rank Adaptation (LoRA) to efficiently fine-tune the base model for function calling capabilities. - **Developed by:** [Uness.fr](https://uness.fr) - **Model type:** Fine-tuned LLM (LoRA) - **Language(s) (NLP):** English - **License:** [Same as base model - specify LLaMA 3.2 license] - **Finetuned from model:** unsloth/Llama-3.2-3B-Instruct (4-bit quantized version) ### Model Sources - **Repository:** [https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search](https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search) - **Base model:** [https://huggingface.co/unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct) - **Training dataset:** [https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset) ## Uses ### Direct Use This model is designed to be used as an AI assistant that can intelligently determine when to call external tools. It specializes in two specific functions: 1. `search_documents`: Triggered when users ask for medical information (prefixed with "Search information about") 2. `check_and_connect`: Triggered when users ask about system status or connectivity The model outputs properly formatted JSON function calls that can be parsed by downstream applications to execute the appropriate tools. ### Downstream Use This model can be integrated into: - AI assistants that need to understand when to delegate tasks to external tools ### Out-of-Scope Use This model should not be used for: - General text generation without tool calling - Tasks requiring more than the two trained tools - Critical systems where reliability is essential without human oversight - Applications requiring factual accuracy guarantees ## Bias, Risks, and Limitations - The model inherits biases from the base LLaMA-3.2-3B model - Performance depends on how similar user queries are to the training data format - There's a strong dependency on the specific prefixing pattern used in training ("Search information about") ### Recommendations Users (both direct and downstream) should: - Follow the same prompting patterns used in training for optimal results - Include the "Search information about" prefix for queries intended for the search_documents tool - Be aware that the model expects a specific system prompt format - Test thoroughly before deployment in production environments - Consider implementing fallback mechanisms for unrecognized query types ## How to Get Started with the Model Use the code below to get started with the model: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load the model and tokenizer model_path = "your-username/llama-3-2-3b-tool-caller-lora" # Replace with actual path model = AutoModelForCausalLM.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_path) # Define the prompting format (must match training) SYSTEM_PROMPT = """Environment: ipython Cutting Knowledge Date: December 2023 Today Date: 18 May 2025""" USER_INSTRUCTION_HEADER = """Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables. { "type": "function", "function": { "name": "check_and_connect", "description": "check_and_connect", "parameters": { "properties": {}, "type": "object" } } } { "type": "function", "function": { "name": "search_documents", "description": "\n Searches for documents based on a user's query string. Use this to find information on a specific topic.\n\n ", "parameters": { "properties": { "query": { "description": "The actual search phrase or question. For example, 'What are the causes of climate change?' or 'population of Madre de Dios'.", "type": "string" } }, "required": [ "query" ], "type": "object" } } } """ # Example 1: Information query (add the prefix) user_query = "What is the capital of France?" formatted_query = f"Search information about {user_query}" # Add prefix for search_documents in French messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{formatted_query}"}, ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ).to(model.device) outputs = model.generate(input_ids=inputs, max_new_tokens=128) response = tokenizer.decode(outputs[0, inputs.shape[-1]:], skip_special_tokens=True) print(response) # Example 2: System status query (no prefix needed) status_query = "Are we connected?" messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{status_query}"}, ] # Generate response... ``` ## Training Details ### Training Data The model was trained on a custom dataset with 1,050 examples from [asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset): - 1,000 examples derived from the "maximedb/natural_questions" dataset, modified with "Search information about" prefix - 50 examples of system status queries for the "check_and_connect" tool The dataset was created in JSONL format with each entry having a complete conversation structure including system, user, and assistant messages. ### Training Procedure The model was fine-tuned using Unsloth's optimized implementation of LoRA over a 4-bit quantized version of LLaMA-3.2-3B-Instruct. #### Training Hyperparameters - **Training regime:** 4-bit quantization with LoRA - **LoRA rank:** 16 - **LoRA alpha:** 16 - **LoRA dropout:** 0 - **Target modules:** "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" - **Learning rate:** 2e-4 - **Batch size:** 2 per device - **Gradient accumulation steps:** 4 - **Warmup steps:** 5 - **Number of epochs:** 3 - **Optimizer:** adamw_8bit - **Weight decay:** 0.01 - **LR scheduler:** linear - **Max sequence length:** 2048 - **Packing:** False - **Random seed:** 3407 #### Speeds, Sizes, Times - **Training hardware:** [GPU type, e.g., NVIDIA A100, etc.] - **Training time:** [Approximately X minutes based on training code output] - **Model size:** Base model is 3B parameters; LoRA adapter is significantly smaller ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The model was evaluated on sample inference examples from both categories: - Information queries with "Search information about" prefix - System status queries #### Metrics - **Accuracy:** Measured by whether the model correctly selects the appropriate tool for the query type - **Format correctness:** Whether the JSON output is properly formatted and parsable ### Results Qualitative evaluation showed the model successfully distinguishes between: - Queries that should trigger the `search_documents` tool (when prefixed appropriately) - Queries that should trigger the `check_and_connect` tool ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [GPU model] - **Hours used:** [Estimated from training time] - **Cloud Provider:** [If applicable] - **Compute Region:** [If applicable] - **Carbon Emitted:** [Estimate if available] ## Technical Specifications ### Model Architecture and Objective - Base architecture: LLaMA-3.2-3B - Adaptation method: LoRA fine-tuning - Objective: Train the model to output properly formatted JSON function calls based on input query type ### Compute Infrastructure #### Hardware - The model was trained using CUDA-compatible GPU(s) - Memory usage metrics are reported in the training script #### Software - Unsloth: Fast implementation of LLaMA models - PyTorch: Deep learning framework - Transformers: Hugging Face's transformers library - PEFT: Parameter-Efficient Fine-Tuning library - TRL: Transformer Reinforcement Learning library ## Framework versions - PEFT 0.15.2 - Transformers [version] - PyTorch [version] - Unsloth [version] ## Model Card Contact [Your contact information]