Update README.md

60e6379 verified 4 months ago

8.97 kB

	---
	base_model: unsloth/Llama-3.2-3B-Instruct
	library_name: peft
	tags:
	- llama-3.2
	- unsloth
	- lora
	- tool
	- json
	language:
	- en
	license: llama3 # Llama 3 Community License
	---

	# Model Card for LLaMA-3.2-3B Tool Caller

	This model (LoRA adapter) is a fine-tuned version of LLaMA-3.2-3B that specializes in tool calling capabilities.
	It has been trained to decide when to use one of two available tools: `search_documents` or `check_and_connect` based on user queries, responding with properly formatted JSON function calls.

	## Model Details

	### Model Description

	This model is a Parameter-Efficient Fine-Tuning (PEFT) adaptation of LLaMA-3.2-3B focused on tool use. It employs Low-Rank Adaptation (LoRA) to efficiently fine-tune the base model for function calling capabilities.

	- Developed by: [Uness.fr](https://uness.fr)
	- Model type: Fine-tuned LLM (LoRA)
	- Language(s) (NLP): English
	- License: [Same as base model - specify LLaMA 3.2 license]
	- Finetuned from model: unsloth/Llama-3.2-3B-Instruct (4-bit quantized version)

	### Model Sources

	- Repository: [https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search](https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search)
	- Base model: [https://huggingface.co/unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct)
	- Training dataset: [https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset)

	## Uses

	### Direct Use

	This model is designed to be used as an AI assistant that can intelligently determine when to call external tools. It specializes in two specific functions:

	1. `search_documents`: Triggered when users ask for medical information (prefixed with "Search information about")
	2. `check_and_connect`: Triggered when users ask about system status or connectivity

	The model outputs properly formatted JSON function calls that can be parsed by downstream applications to execute the appropriate tools.

	### Downstream Use

	This model can be integrated into:
	- AI assistants that need to understand when to delegate tasks to external tools

	### Out-of-Scope Use

	This model should not be used for:
	- General text generation without tool calling
	- Tasks requiring more than the two trained tools
	- Critical systems where reliability is essential without human oversight
	- Applications requiring factual accuracy guarantees

	## Bias, Risks, and Limitations

	- The model inherits biases from the base LLaMA-3.2-3B model
	- Performance depends on how similar user queries are to the training data format
	- There's a strong dependency on the specific prefixing pattern used in training ("Search information about")

	### Recommendations

	Users (both direct and downstream) should:
	- Follow the same prompting patterns used in training for optimal results
	- Include the "Search information about" prefix for queries intended for the search_documents tool
	- Be aware that the model expects a specific system prompt format
	- Test thoroughly before deployment in production environments
	- Consider implementing fallback mechanisms for unrecognized query types

	## How to Get Started with the Model

	Use the code below to get started with the model:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Load the model and tokenizer
	model_path = "your-username/llama-3-2-3b-tool-caller-lora" # Replace with actual path
	model = AutoModelForCausalLM.from_pretrained(model_path)
	tokenizer = AutoTokenizer.from_pretrained(model_path)

	# Define the prompting format (must match training)
	SYSTEM_PROMPT = """Environment: ipython
	Cutting Knowledge Date: December 2023
	Today Date: 18 May 2025"""

	USER_INSTRUCTION_HEADER = """Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.
	Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.
	{ "type": "function", "function": { "name": "check_and_connect", "description": "check_and_connect", "parameters": { "properties": {}, "type": "object" } } }
	{ "type": "function", "function": { "name": "search_documents", "description": "\n Searches for documents based on a user's query string. Use this to find information on a specific topic.\n\n ", "parameters": { "properties": { "query": { "description": "The actual search phrase or question. For example, 'What are the causes of climate change?' or 'population of Madre de Dios'.", "type": "string" } }, "required": [ "query" ], "type": "object" } } }
	"""

	# Example 1: Information query (add the prefix)
	user_query = "What is the capital of France?"
	formatted_query = f"Search information about {user_query}" # Add prefix for search_documents in French

	messages = [
	{"role": "system", "content": SYSTEM_PROMPT},
	{"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{formatted_query}"},
	]

	inputs = tokenizer.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	outputs = model.generate(input_ids=inputs, max_new_tokens=128)
	response = tokenizer.decode(outputs[0, inputs.shape[-1]:], skip_special_tokens=True)
	print(response)

	# Example 2: System status query (no prefix needed)
	status_query = "Are we connected?"
	messages = [
	{"role": "system", "content": SYSTEM_PROMPT},
	{"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{status_query}"},
	]

	# Generate response...
	```

	## Training Details

	### Training Data

	The model was trained on a custom dataset with 1,050 examples from [asanchez75/tool_finetuning_dataset](https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset):
	- 1,000 examples derived from the "maximedb/natural_questions" dataset, modified with "Search information about" prefix
	- 50 examples of system status queries for the "check_and_connect" tool

	The dataset was created in JSONL format with each entry having a complete conversation structure including system, user, and assistant messages.

	### Training Procedure

	The model was fine-tuned using Unsloth's optimized implementation of LoRA over a 4-bit quantized version of LLaMA-3.2-3B-Instruct.

	#### Training Hyperparameters

	- Training regime: 4-bit quantization with LoRA
	- LoRA rank: 16
	- LoRA alpha: 16
	- LoRA dropout: 0
	- Target modules: "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"
	- Learning rate: 2e-4
	- Batch size: 2 per device
	- Gradient accumulation steps: 4
	- Warmup steps: 5
	- Number of epochs: 3
	- Optimizer: adamw_8bit
	- Weight decay: 0.01
	- LR scheduler: linear
	- Max sequence length: 2048
	- Packing: False
	- Random seed: 3407

	#### Speeds, Sizes, Times

	- Training hardware: [GPU type, e.g., NVIDIA A100, etc.]
	- Training time: [Approximately X minutes based on training code output]
	- Model size: Base model is 3B parameters; LoRA adapter is significantly smaller

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	The model was evaluated on sample inference examples from both categories:
	- Information queries with "Search information about" prefix
	- System status queries

	#### Metrics

	- Accuracy: Measured by whether the model correctly selects the appropriate tool for the query type
	- Format correctness: Whether the JSON output is properly formatted and parsable

	### Results

	Qualitative evaluation showed the model successfully distinguishes between:
	- Queries that should trigger the `search_documents` tool (when prefixed appropriately)
	- Queries that should trigger the `check_and_connect` tool

	## Environmental Impact

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: [GPU model]
	- Hours used: [Estimated from training time]
	- Cloud Provider: [If applicable]
	- Compute Region: [If applicable]
	- Carbon Emitted: [Estimate if available]

	## Technical Specifications

	### Model Architecture and Objective

	- Base architecture: LLaMA-3.2-3B
	- Adaptation method: LoRA fine-tuning
	- Objective: Train the model to output properly formatted JSON function calls based on input query type

	### Compute Infrastructure

	#### Hardware

	- The model was trained using CUDA-compatible GPU(s)
	- Memory usage metrics are reported in the training script

	#### Software

	- Unsloth: Fast implementation of LLaMA models
	- PyTorch: Deep learning framework
	- Transformers: Hugging Face's transformers library
	- PEFT: Parameter-Efficient Fine-Tuning library
	- TRL: Transformer Reinforcement Learning library

	## Framework versions

	- PEFT 0.15.2
	- Transformers [version]
	- PyTorch [version]
	- Unsloth [version]

	## Model Card Contact

	[Your contact information]