dongguanting
/

Tool-Star-Qwen-0.5B

Text Generation

text-generation-inference

Model card Files Files and versions Community

Tool-Star-Qwen-0.5B / README.md

dongguanting's picture

Update README.md

5b7fe6f verified 3 months ago

|

history blame contribute delete

2.23 kB

	---
	license: mit
	pipeline_tag: text-generation
	library_name: transformers
	---

	---
	frameworks:
	- Pytorch
	license: mit
	tasks:
	- text-generation
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- Qwen/Qwen2.5-0.5B-Instruct


	# Model Card for Tool-Star

	This is the official checkpoint we trained using the tool-star framework, based on Qwen2.5-0.5B-Instruct.

	Huggingface Paper: https://huggingface.co/papers/2505.16410

	Details please refer to https://github.com/dongguanting/Tool-Star

	# Paper title and link

	The model was presented in the paper [Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
	Learning](https://huggingface.co/papers/2505.16410).

	# Paper abstract

	The abstract of the paper is the following:

	Recently, large language models (LLMs) have shown remarkable reasoning
	capabilities via large-scale reinforcement learning (RL). However, leveraging
	the RL algorithm to empower effective multi-tool collaborative reasoning in
	LLMs remains an open challenge. In this paper, we introduce Tool-Star, an
	RL-based framework designed to empower LLMs to autonomously invoke multiple
	external tools during stepwise reasoning. Tool-Star integrates six types of
	tools and incorporates systematic designs in both data synthesis and training.
	To address the scarcity of tool-use data, we propose a general tool-integrated
	reasoning data synthesis pipeline, which combines tool-integrated prompting
	with hint-based sampling to automatically and scalably generate tool-use
	trajectories. A subsequent quality normalization and difficulty-aware
	classification process filters out low-quality samples and organizes the
	dataset from easy to hard. Furthermore, we propose a two-stage training
	framework to enhance multi-tool collaborative reasoning by: (1) cold-start
	fine-tuning, which guides LLMs to explore reasoning patterns via
	tool-invocation feedback; and (2) a multi-tool self-critic RL algorithm with
	hierarchical reward design, which reinforces reward understanding and promotes
	effective tool collaboration. Experimental analyses on over 10 challenging
	reasoning benchmarks highlight the effectiveness and efficiency of Tool-Star.
	The code is available at https://github.com/dongguanting/Tool-Star.