File size: 8,974 Bytes
d22a15c a8e53c5 d22a15c a8e53c5 13f208b a8e53c5 d22a15c 1184733 d22a15c 60e6379 eff363f d22a15c a8e53c5 d22a15c 66b44cc a8e53c5 6ccdeeb a8e53c5 d22a15c a8e53c5 d22a15c 0ccbd0a 13f208b d22a15c eff363f d22a15c 13f208b a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 eff363f d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 13f208b a8e53c5 d22a15c 13f208b a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 d22a15c a8e53c5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
---
base_model: unsloth/Llama-3.2-3B-Instruct
library_name: peft
tags:
- llama-3.2
- unsloth
- lora
- tool
- json
language:
- en
license: llama3 # Llama 3 Community License
---
# Model Card for LLaMA-3.2-3B Tool Caller
This model (LoRA adapter) is a fine-tuned version of LLaMA-3.2-3B that specializes in tool calling capabilities.
It has been trained to decide when to use one of two available tools: `search_documents` or `check_and_connect` based on user queries, responding with properly formatted JSON function calls.
## Model Details
### Model Description
This model is a Parameter-Efficient Fine-Tuning (PEFT) adaptation of LLaMA-3.2-3B focused on tool use. It employs Low-Rank Adaptation (LoRA) to efficiently fine-tune the base model for function calling capabilities.
- **Developed by:** [Uness.fr](https://uness.fr)
- **Model type:** Fine-tuned LLM (LoRA)
- **Language(s) (NLP):** English
- **License:** [Same as base model - specify LLaMA 3.2 license]
- **Finetuned from model:** unsloth/Llama-3.2-3B-Instruct (4-bit quantized version)
### Model Sources
- **Repository:** [https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search](https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search)
- **Base model:** [https://huggingface.co/unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct)
- **Training dataset:** [https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset)
## Uses
### Direct Use
This model is designed to be used as an AI assistant that can intelligently determine when to call external tools. It specializes in two specific functions:
1. `search_documents`: Triggered when users ask for medical information (prefixed with "Search information about")
2. `check_and_connect`: Triggered when users ask about system status or connectivity
The model outputs properly formatted JSON function calls that can be parsed by downstream applications to execute the appropriate tools.
### Downstream Use
This model can be integrated into:
- AI assistants that need to understand when to delegate tasks to external tools
### Out-of-Scope Use
This model should not be used for:
- General text generation without tool calling
- Tasks requiring more than the two trained tools
- Critical systems where reliability is essential without human oversight
- Applications requiring factual accuracy guarantees
## Bias, Risks, and Limitations
- The model inherits biases from the base LLaMA-3.2-3B model
- Performance depends on how similar user queries are to the training data format
- There's a strong dependency on the specific prefixing pattern used in training ("Search information about")
### Recommendations
Users (both direct and downstream) should:
- Follow the same prompting patterns used in training for optimal results
- Include the "Search information about" prefix for queries intended for the search_documents tool
- Be aware that the model expects a specific system prompt format
- Test thoroughly before deployment in production environments
- Consider implementing fallback mechanisms for unrecognized query types
## How to Get Started with the Model
Use the code below to get started with the model:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load the model and tokenizer
model_path = "your-username/llama-3-2-3b-tool-caller-lora" # Replace with actual path
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Define the prompting format (must match training)
SYSTEM_PROMPT = """Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 18 May 2025"""
USER_INSTRUCTION_HEADER = """Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.
Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.
{ "type": "function", "function": { "name": "check_and_connect", "description": "check_and_connect", "parameters": { "properties": {}, "type": "object" } } }
{ "type": "function", "function": { "name": "search_documents", "description": "\n Searches for documents based on a user's query string. Use this to find information on a specific topic.\n\n ", "parameters": { "properties": { "query": { "description": "The actual search phrase or question. For example, 'What are the causes of climate change?' or 'population of Madre de Dios'.", "type": "string" } }, "required": [ "query" ], "type": "object" } } }
"""
# Example 1: Information query (add the prefix)
user_query = "What is the capital of France?"
formatted_query = f"Search information about {user_query}" # Add prefix for search_documents in French
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{formatted_query}"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(input_ids=inputs, max_new_tokens=128)
response = tokenizer.decode(outputs[0, inputs.shape[-1]:], skip_special_tokens=True)
print(response)
# Example 2: System status query (no prefix needed)
status_query = "Are we connected?"
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{status_query}"},
]
# Generate response...
```
## Training Details
### Training Data
The model was trained on a custom dataset with 1,050 examples from [asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset):
- 1,000 examples derived from the "maximedb/natural_questions" dataset, modified with "Search information about" prefix
- 50 examples of system status queries for the "check_and_connect" tool
The dataset was created in JSONL format with each entry having a complete conversation structure including system, user, and assistant messages.
### Training Procedure
The model was fine-tuned using Unsloth's optimized implementation of LoRA over a 4-bit quantized version of LLaMA-3.2-3B-Instruct.
#### Training Hyperparameters
- **Training regime:** 4-bit quantization with LoRA
- **LoRA rank:** 16
- **LoRA alpha:** 16
- **LoRA dropout:** 0
- **Target modules:** "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"
- **Learning rate:** 2e-4
- **Batch size:** 2 per device
- **Gradient accumulation steps:** 4
- **Warmup steps:** 5
- **Number of epochs:** 3
- **Optimizer:** adamw_8bit
- **Weight decay:** 0.01
- **LR scheduler:** linear
- **Max sequence length:** 2048
- **Packing:** False
- **Random seed:** 3407
#### Speeds, Sizes, Times
- **Training hardware:** [GPU type, e.g., NVIDIA A100, etc.]
- **Training time:** [Approximately X minutes based on training code output]
- **Model size:** Base model is 3B parameters; LoRA adapter is significantly smaller
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
The model was evaluated on sample inference examples from both categories:
- Information queries with "Search information about" prefix
- System status queries
#### Metrics
- **Accuracy:** Measured by whether the model correctly selects the appropriate tool for the query type
- **Format correctness:** Whether the JSON output is properly formatted and parsable
### Results
Qualitative evaluation showed the model successfully distinguishes between:
- Queries that should trigger the `search_documents` tool (when prefixed appropriately)
- Queries that should trigger the `check_and_connect` tool
## Environmental Impact
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [GPU model]
- **Hours used:** [Estimated from training time]
- **Cloud Provider:** [If applicable]
- **Compute Region:** [If applicable]
- **Carbon Emitted:** [Estimate if available]
## Technical Specifications
### Model Architecture and Objective
- Base architecture: LLaMA-3.2-3B
- Adaptation method: LoRA fine-tuning
- Objective: Train the model to output properly formatted JSON function calls based on input query type
### Compute Infrastructure
#### Hardware
- The model was trained using CUDA-compatible GPU(s)
- Memory usage metrics are reported in the training script
#### Software
- Unsloth: Fast implementation of LLaMA models
- PyTorch: Deep learning framework
- Transformers: Hugging Face's transformers library
- PEFT: Parameter-Efficient Fine-Tuning library
- TRL: Transformer Reinforcement Learning library
## Framework versions
- PEFT 0.15.2
- Transformers [version]
- PyTorch [version]
- Unsloth [version]
## Model Card Contact
[Your contact information] |