asanchez75's picture
Update README.md
60e6379 verified
---
base_model: unsloth/Llama-3.2-3B-Instruct
library_name: peft
tags:
- llama-3.2
- unsloth
- lora
- tool
- json
language:
- en
license: llama3 # Llama 3 Community License
---
# Model Card for LLaMA-3.2-3B Tool Caller
This model (LoRA adapter) is a fine-tuned version of LLaMA-3.2-3B that specializes in tool calling capabilities.
It has been trained to decide when to use one of two available tools: `search_documents` or `check_and_connect` based on user queries, responding with properly formatted JSON function calls.
## Model Details
### Model Description
This model is a Parameter-Efficient Fine-Tuning (PEFT) adaptation of LLaMA-3.2-3B focused on tool use. It employs Low-Rank Adaptation (LoRA) to efficiently fine-tune the base model for function calling capabilities.
- **Developed by:** [Uness.fr](https://uness.fr)
- **Model type:** Fine-tuned LLM (LoRA)
- **Language(s) (NLP):** English
- **License:** [Same as base model - specify LLaMA 3.2 license]
- **Finetuned from model:** unsloth/Llama-3.2-3B-Instruct (4-bit quantized version)
### Model Sources
- **Repository:** [https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search](https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search)
- **Base model:** [https://huggingface.co/unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct)
- **Training dataset:** [https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset)
## Uses
### Direct Use
This model is designed to be used as an AI assistant that can intelligently determine when to call external tools. It specializes in two specific functions:
1. `search_documents`: Triggered when users ask for medical information (prefixed with "Search information about")
2. `check_and_connect`: Triggered when users ask about system status or connectivity
The model outputs properly formatted JSON function calls that can be parsed by downstream applications to execute the appropriate tools.
### Downstream Use
This model can be integrated into:
- AI assistants that need to understand when to delegate tasks to external tools
### Out-of-Scope Use
This model should not be used for:
- General text generation without tool calling
- Tasks requiring more than the two trained tools
- Critical systems where reliability is essential without human oversight
- Applications requiring factual accuracy guarantees
## Bias, Risks, and Limitations
- The model inherits biases from the base LLaMA-3.2-3B model
- Performance depends on how similar user queries are to the training data format
- There's a strong dependency on the specific prefixing pattern used in training ("Search information about")
### Recommendations
Users (both direct and downstream) should:
- Follow the same prompting patterns used in training for optimal results
- Include the "Search information about" prefix for queries intended for the search_documents tool
- Be aware that the model expects a specific system prompt format
- Test thoroughly before deployment in production environments
- Consider implementing fallback mechanisms for unrecognized query types
## How to Get Started with the Model
Use the code below to get started with the model:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load the model and tokenizer
model_path = "your-username/llama-3-2-3b-tool-caller-lora" # Replace with actual path
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Define the prompting format (must match training)
SYSTEM_PROMPT = """Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 18 May 2025"""
USER_INSTRUCTION_HEADER = """Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.
Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.
{ "type": "function", "function": { "name": "check_and_connect", "description": "check_and_connect", "parameters": { "properties": {}, "type": "object" } } }
{ "type": "function", "function": { "name": "search_documents", "description": "\n Searches for documents based on a user's query string. Use this to find information on a specific topic.\n\n ", "parameters": { "properties": { "query": { "description": "The actual search phrase or question. For example, 'What are the causes of climate change?' or 'population of Madre de Dios'.", "type": "string" } }, "required": [ "query" ], "type": "object" } } }
"""
# Example 1: Information query (add the prefix)
user_query = "What is the capital of France?"
formatted_query = f"Search information about {user_query}" # Add prefix for search_documents in French
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{formatted_query}"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(input_ids=inputs, max_new_tokens=128)
response = tokenizer.decode(outputs[0, inputs.shape[-1]:], skip_special_tokens=True)
print(response)
# Example 2: System status query (no prefix needed)
status_query = "Are we connected?"
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{status_query}"},
]
# Generate response...
```
## Training Details
### Training Data
The model was trained on a custom dataset with 1,050 examples from [asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset):
- 1,000 examples derived from the "maximedb/natural_questions" dataset, modified with "Search information about" prefix
- 50 examples of system status queries for the "check_and_connect" tool
The dataset was created in JSONL format with each entry having a complete conversation structure including system, user, and assistant messages.
### Training Procedure
The model was fine-tuned using Unsloth's optimized implementation of LoRA over a 4-bit quantized version of LLaMA-3.2-3B-Instruct.
#### Training Hyperparameters
- **Training regime:** 4-bit quantization with LoRA
- **LoRA rank:** 16
- **LoRA alpha:** 16
- **LoRA dropout:** 0
- **Target modules:** "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"
- **Learning rate:** 2e-4
- **Batch size:** 2 per device
- **Gradient accumulation steps:** 4
- **Warmup steps:** 5
- **Number of epochs:** 3
- **Optimizer:** adamw_8bit
- **Weight decay:** 0.01
- **LR scheduler:** linear
- **Max sequence length:** 2048
- **Packing:** False
- **Random seed:** 3407
#### Speeds, Sizes, Times
- **Training hardware:** [GPU type, e.g., NVIDIA A100, etc.]
- **Training time:** [Approximately X minutes based on training code output]
- **Model size:** Base model is 3B parameters; LoRA adapter is significantly smaller
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
The model was evaluated on sample inference examples from both categories:
- Information queries with "Search information about" prefix
- System status queries
#### Metrics
- **Accuracy:** Measured by whether the model correctly selects the appropriate tool for the query type
- **Format correctness:** Whether the JSON output is properly formatted and parsable
### Results
Qualitative evaluation showed the model successfully distinguishes between:
- Queries that should trigger the `search_documents` tool (when prefixed appropriately)
- Queries that should trigger the `check_and_connect` tool
## Environmental Impact
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [GPU model]
- **Hours used:** [Estimated from training time]
- **Cloud Provider:** [If applicable]
- **Compute Region:** [If applicable]
- **Carbon Emitted:** [Estimate if available]
## Technical Specifications
### Model Architecture and Objective
- Base architecture: LLaMA-3.2-3B
- Adaptation method: LoRA fine-tuning
- Objective: Train the model to output properly formatted JSON function calls based on input query type
### Compute Infrastructure
#### Hardware
- The model was trained using CUDA-compatible GPU(s)
- Memory usage metrics are reported in the training script
#### Software
- Unsloth: Fast implementation of LLaMA models
- PyTorch: Deep learning framework
- Transformers: Hugging Face's transformers library
- PEFT: Parameter-Efficient Fine-Tuning library
- TRL: Transformer Reinforcement Learning library
## Framework versions
- PEFT 0.15.2
- Transformers [version]
- PyTorch [version]
- Unsloth [version]
## Model Card Contact
[Your contact information]