🔍 WebExplorer-8B

A state-of-the-art 8B parameter web agent model designed for complex information-seeking tasks and long-horizon reasoning.

Paper Abstract

The paradigm of Large Language Models (LLMs) has increasingly shifted toward agentic applications, where web browsing capabilities are fundamental for retrieving information from diverse online sources. However, existing open-source web agents either demonstrate limited information-seeking abilities on complex tasks or lack transparent implementations. In this work, we identify that the key challenge lies in the scarcity of challenging data for information seeking. To address this limitation, we introduce WebExplorer: a systematic data generation approach using model-based exploration and iterative, long-to-short query evolution. This method creates challenging query-answer pairs that require multi-step reasoning and complex web navigation. By leveraging our curated high-quality dataset, we successfully develop advanced web agent WebExplorer-8B through supervised fine-tuning followed by reinforcement learning. Our model supports 128K context length and up to 100 tool calling turns, enabling long-horizon problem solving. Across diverse information-seeking benchmarks, WebExplorer-8B achieves the state-of-the-art performance at its scale. Notably, as an 8B-sized model, WebExplorer-8B is able to effectively search over an average of 16 turns after RL training, achieving higher accuracy than WebSailor-72B on BrowseComp-en/zh and attaining the best performance among models up to 100B parameters on WebWalkerQA and FRAMES. Beyond these information-seeking tasks, our model also achieves strong generalization on the HLE benchmark even though it is only trained on knowledge-intensive QA data. These results highlight our approach as a practical path toward long-horizon web agents.

✨ Key Features

🌐 Long-horizon Reasoning: Supports up to 128K context length and 100 tool calling turns
🛠️ Tool Utilization: Masters search and browse functionalities
🏆 State-of-the-art Performance: Achieves best-in-class results among models under 10B parameters

🏗️ Model Architecture

Built on Qwen3-8B base model and trained through a two-phase approach:

Supervised Fine-tuning (SFT): Cold-start initialization with high-quality trajectories
Reinforcement Learning (RL): Enhanced using GRPO algorithm with progressive context expansion

📊 Performance

WebExplorer-8B achieves state-of-the-art performance across multiple information-seeking benchmarks at its scale:

Model	BC-en	BC-zh	GAIA	WebWalkerQA	FRAMES	Xbench-DS	HLE
OpenAI-o3†	50.9	58.1	70.5†	71.7	84.0	66.7	20.2
Claude-4-Sonnet†	12.2	29.1	68.3†	61.7	80.7	64.6	20.3
GLM-4.5	26.4	37.5	66.0†	65.6†	78.9†	70.0†	21.2†
DeepSeek-V3.1	30.0	49.2	63.1†	61.2†	83.7	71.2	29.8
Kimi-K2†	14.1	28.8	57.7	63.0	72.0	50.0	18.1
====	====	====	====	====	====	====	====
WebShaper-72B	-	-	60.0	52.2	-	-	-
WebShaper-32B (QwQ)	-	-	53.3	49.7	-	-	-
WebShaper-32B	-	-	52.4	51.4	-	-	-
WebSailor-72B	12.0	30.1	55.4	-	-	55.0	-
WebSailor-32B	10.5	25.5	53.2	-	-	53.3	-
WebSailor-7B	6.7	14.2	33.0	-	-	34.3	-
ASearcher-Web-QwQ	5.2	15.6	52.8	34.3	70.9	42.1	12.5
WebThinker-32B	2.8	-	48.5	46.5	-	-	15.8
MiroThinker-32B-DPO-v0.1	13.0	17.0	57.3	49.3	71.7	-	11.8
MiroThinker-8B-DPO-v0.1	8.7	13.6	46.6	45.7	64.4	-	-
WebExplorer-8B (SFT)	7.9	21.3	43.7	59.8	72.6	47.5	16.0
WebExplorer-8B (RL)	15.7	32.0	50.0	62.7	75.7	53.7	17.3

Accuracy (%) of web agents on information-seeking benchmarks. BC-en and BC-zh denote BrowseComp-en and BrowseComp-zh respectively. XBench-DS refers to XBench-DeepSearch. Bold indicates the best performance among open-source models < 100B, while underlined values represent the best performance among models < 10B parameters. All scores of WebExplorer-8B are computed as Avg@4 using LLM-as-Judge. Entries marked with a dagger (†) were reproduced by us under our scaffold: on model name = entire row; on a number = that entry only.

🛠️ Tool Schema

WebExplorer-8B supports two tools for web interaction:

1. Browse Tool

{
    "name": "browse",
    "type": "function",
    "description": "Extract specific information from a webpage",
    "parameters": {
        "type": "object",
        "properties": {
            "url": {
                "type": "string",
                "description": "Target URL to browse. The webpage content will be processed by the LLM for information extraction."
            },
            "query": {
                "type": "string",
                "description": "Specific query about the webpage content. The LLM will analyze the content to answer this query."
            }
        },
        "required": ["url", "query"]
    }
}

2. Search Tool

{
    "name": "search",
    "type": "function",
    "description": "Perform web search queries",
    "parameters": {
        "type": "object",
        "properties": {
            "queries": {
                "type": "array",
                "items": {
                    "type": "string"
                },
                "description": "List of search queries. Returns search results containing title, URL, and snippet for each query."
            }
        },
        "required": ["queries"]
    }
}

📝 Citation

If you find our work useful, please consider citing:

@misc{liu2025webexplorer,
      title={WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents}, 
      author={Junteng Liu and Yunji Li and Chi Zhang and Jingyang Li and Aili Chen and Ke Ji and Weiyu Cheng and Zijia Wu and Chengyu Du and Qidi Xu and Jiayuan Song and Zhengmao Zhu and Wenhu Chen and Pengyu Zhao and Junxian He},
      year={2025},
      eprint={2509.06501},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.06501}, 
}

hkust-nlp
/

WebExplorer-8B