Update README.md

f963410 verified 2 months ago

18.9 kB

	---
	license: llama3.1
	language:
	- zh
	- en
	base_model:
	- meta-llama/Llama-3.1-70B-Instruct

	extra_gated_fields:
	Full name: text
	Email: text
	Company: text
	Country: country
	Specific date: date_picker
	I want to use this model for:
	type: select
	options:
	- Research
	- Education
	- label: Other
	value: other
	I agree to use this model for non-commercial use ONLY: checkbox


	extra_gated_prompt: >-
	The information you provide will be collected, stored, processed and shared in accordance with the https://www.honhai.com/zh-tw/privacy-and-policy

	To request access to the Llama_3.1-FoxBrain-70B-V1.2 model weights, please contact us by email first.
	Only requests from users who have contacted us in advance will be considered.

	When sending your email, make sure to use the same email address that you will enter in the application form. This helps us verify your identity and approval status.

	You may contact us at: `[email protected]`


	## 📄 Llama_3.1-FoxBrain-70B-V1.2 Model Usage and License Agreement


	At this time, access to the FoxBrain model is granted exclusively to academic institutions and research organizations. We do not authorize commercial usage under the current release.
	For commercial or enterprise applications, Llama_3.1-FoxBrain-70B-V1.2 will be made available in the future through authorized channels—such as deployment on AWS—accompanied by a separate licensing framework.
	Please stay tuned for updates regarding commercial availability.

	Welcome to the Llama_3.1-FoxBrain-70B-V1.2 model.

	Llama_3.1-FoxBrain-70B-V1.2 (hereinafter referred to as “FoxBrain”) is a large language model developed by Hon Hai Research Institute based on the Meta Llama 3.1 architecture. It has been optimized using Traditional Chinese corpora from Taiwan and supports a wide range of inference tasks and application scenarios.

	This Agreement sets forth the terms and conditions that users (hereinafter “You”) must adhere to when using the FoxBrain model, including its weights, source code, APIs, and any derivative works.


	## 1. Definitions

	1. License Agreement: Subject to the terms herein, Hon Hai Research Institute grants You the rights to use, reproduce, modify, and distribute the FoxBrain model.
	2. Licensor: Refers to Hon Hai Research Institute or the authorized owner of intellectual property rights to the FoxBrain model.
	3. You: Any individual or entity authorized to use the model under this Agreement.
	4. FoxBrain Model: The collection of training parameters, weights, source code, and related components.
	5. Derivative Models: Models built upon FoxBrain’s parameters, outputs, or modifications.


	## 2. Usage Principles and Academic Orientation

	- FoxBrain is primarily intended for academic research, education, and technical exchange. Commercial use is strictly prohibited unless explicitly authorized.
	- Users must comply with the laws of the Republic of China (Taiwan) and the Meta Llama 3.1 license terms.
	- Any illegal, harmful, or rights-infringing usage is strictly forbidden.
	- Do not interfere with, disrupt, or compromise the integrity of the system or other users.
	- Users should promptly report any security vulnerabilities or anomalies to Hon Hai Research Institute.


	## 3. User Responsibility and Disclaimer

	- If You violate any laws resulting in damages to Hon Hai Research Institute or third parties, You shall bear full responsibility.
	- Hon Hai Research Institute shall not be held liable for any misuse, including distribution of illegal content or unauthorized data access.
	- FoxBrain is provided “as is” for research purposes. Outputs may be inaccurate, biased, or controversial. Users shall independently assess and accept relevant risks.

	## 4. Summary of Meta Llama 3.1 License Terms

	This model is built upon Meta’s Llama 3.1 architecture. Users must comply with Meta’s licensing restrictions, which include (but are not limited to):

	- Non-exclusive, worldwide, royalty-free usage rights
	- Prohibition of using the model to improve other LLMs (except for Llama-derived works)
	- If monthly active users exceed 700 million, a separate commercial license must be obtained
	- Proper attribution is required: “This model is licensed under the Llama 3.1 Community License. © Meta Platforms, Inc. All rights reserved.”

	🔗 [Meta License Terms](https://llama.meta.com/llama3/license)
	🔗 [Meta Usage Policy](https://llama.meta.com/llama3/use-policy)



	## 5. Prohibited Uses

	### 5.1 Illegal or Infringing Activities

	- Violence, terrorism, discrimination, exploitation, deepfake technology, and unauthorized surveillance
	- Medical, legal, or financial services without authorization
	- Unlawful access to, use of, or inference from personal data

	### 5.2 High-Risk Applications

	- Military, weapons manufacturing, heavy industry control, or critical infrastructure operations
	- Self-harm, suicide, or any activity that endangers personal safety

	### 5.3 Deception and Abuse

	- Fraud, forgery, impersonation, or generating AI content without proper labeling

	The above list is not exhaustive. Any activity that violates laws, endangers human safety, or poses significant societal risks is strictly forbidden.



	## 6. Miscellaneous

	- “FoxBrain” is a registered trademark of Hon Hai Research Institute. Use of the name, logos, or identifiers must comply with applicable trademark laws and this Agreement.
	- This Agreement does not constitute a commercial warranty or endorsement by Hon Hai Research Institute.
	- Hon Hai Research Institute reserves the right to modify, suspend, or terminate this Agreement at any time.
	- Use of the model by legal entities implies duly authorized representation.


	## 7. Jurisdiction

	This Agreement is governed by the laws of the Republic of China (Taiwan). Any disputes shall be under the jurisdiction of the Taipei District Court as the court of first instance.

	---

	# FoxBrain v1.2: Advanced Reasoning LLM with Dual Thinking Modes

	FoxBrain is a large language model (LLM) independently developed by Foxconn, representing a major milestone in the company's long-term strategy to create AI that deeply understands industrial knowledge and high-reliability domains.

	Currently at version 1.2, FoxBrain delivers exceptional performance in Chinese language understanding and generation, and introduces revolutionary dual thinking modes for enhanced reasoning capabilities in complex industrial contexts.

	<img src="images/overview_image.png" width="1500"/>

	👉 Official GitHub: [FoxBrain_LLMs](https://github.com/TranNhiem/FoxBrain_LLMs?tab=readme-ov-file)

	---

	## 🆕 What's New in Version 1.2

	### 🧠 Dual Thinking Modes
	FoxBrain v1.2 introduces two distinct reasoning approaches:

	- 🎯 Budget_Thinking Mode: Step-by-step reasoning with resource management
	- Allocates computational "budget" based on problem complexity (1-9 steps)
	- Provides structured output with reasoning steps, reflections, and quality scores
	- Ideal for systematic problem-solving and transparent decision-making

	- 💭 Extend_Thinking Mode: Deep analytical reasoning with extended thought process
	- Uses `<think></think>` tags for comprehensive internal reasoning
	- Allows for more flexible and creative problem exploration
	- Perfect for complex analysis and open-ended challenges
	- ⚠️ Important: This mode is sensitive to the `presence_penalty` parameter - we recommend setting it to `1.5` for optimal performance

	### ⚙️ Enhanced Chat Template System
	- Flexible Mode Switching: Seamlessly switch between thinking modes
	- Custom System Prompts: Support for user-defined system instructions
	- Priority-Based Selection: Custom prompts override default modes
	- Backward Compatibility: Maintains compatibility with existing implementations

	---

	## 🔍 Key Features

	- 🧠 Dual Reasoning Architecture
	Two specialized thinking modes for different problem types and complexity levels.

	- 🏭 Industrial-Grade Performance
	Built for the precision, consistency, and robustness required in mission-critical industrial applications.

	- 📘 Optimized for Traditional Chinese
	Fine-tuned on high-quality Taiwanese Traditional Chinese datasets for superior linguistic alignment.

	- 💡 Structured & Transparent Output
	Budget mode provides step-by-step reasoning with quality assessments and resource tracking.

	- ⚙️ Fast Inference with VLLM
	Easily deployable on 2–8 H100 GPUs with ultra-low latency and flexible configuration.

	- 🔧 Developer-Friendly Integration
	Simple parameter-based mode switching via `thinking_mode` parameter.

	---

	## 🚀 Quickstart: Inference with VLLM

	### 🖥️ Environment Requirements

	- Python 3.8+
	- CUDA-compatible environment
	- 2 to 8 × H100 GPUs (4 GPUs recommended for optimal performance)
	- [`vllm`](https://github.com/vllm-project/vllm) installed

	### 📦 Install VLLM

	```bash
	pip install vllm
	```

	### 🧠 Launch Inference API

	```bash
	vllm serve \
	--model FoxBrain_v1.2_70B \
	--api-key foxbrain-cit \
	--port 8800 \
	--max-model-len 32768 \
	--tensor-parallel-size 4 \
	--gpu-memory-utilization 0.85 \
	--enforce-eager
	```

	### 💻 Python Usage Examples

	#### Budget_Thinking Mode Example

	```python
	from vllm import LLM, SamplingParams
	from transformers import AutoTokenizer

	# Load model and tokenizer
	llm = LLM(model="FoxBrain_v1.2_70B", tensor_parallel_size=4)
	tokenizer = AutoTokenizer.from_pretrained("FoxBrain_v1.2_70B")

	messages = [
	{"role": "user", "content": "Solve this complex engineering problem: How would you optimize a manufacturing assembly line with 3 bottlenecks?"}
	]

	# Use Budget_Thinking mode
	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	thinking_mode="Budget_Thinking"
	)

	# Generate with structured reasoning
	sampling_params = SamplingParams(temperature=0.3, max_tokens=2048)
	outputs = llm.generate([prompt], sampling_params)
	print(outputs[0].outputs[0].text)
	```

	Expected Output Structure:
	```
	<count>6</count> # Initial budget for complex problem

	<step>First, I need to identify the three bottlenecks...</step>
	<count>5</count> # Remaining budget

	<step>Next, I'll analyze the throughput capacity...</step>
	<count>4</count>
	<reflection>My analysis is on track, need to consider dependencies</reflection>
	<reward>0.7</reward>

	<answer>To optimize the assembly line with 3 bottlenecks: 1) Implement parallel processing at bottleneck A, 2) Add buffer stations before bottleneck B, 3) Upgrade equipment at bottleneck C. Expected 25% throughput improvement.</answer>

	<reflection>Comprehensive solution addressing all bottlenecks with quantified benefits</reflection>
	<reward>0.9</reward>
	```

	#### Extend_Thinking Mode Example

	```python
	# Use Extend_Thinking mode with recommended parameters
	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	thinking_mode="Extend_Thinking"
	)

	# Important: Use presence_penalty=1.5 for Extend_Thinking mode
	sampling_params = SamplingParams(
	temperature=0.3,
	presence_penalty=1.5,
	max_tokens=2048
	)
	outputs = llm.generate([prompt], sampling_params)
	print(outputs[0].outputs[0].text)
	```

	Expected Output Structure:
	```
	<think>
	This is a complex manufacturing optimization problem. Let me think through this systematically...

	First, I should understand what constitutes a bottleneck in manufacturing:
	- Limited capacity point in the process
	- Determines overall system throughput
	- Can be equipment, labor, or process-related

	For the three bottlenecks, I need to consider:
	1. Root cause analysis for each bottleneck
	2. Interdependencies between bottlenecks
	3. Cost-benefit analysis of solutions
	4. Implementation timeline and resource requirements
	...
	</think>

	Based on my analysis of manufacturing assembly line optimization, here's a comprehensive approach to address the three bottlenecks:

	[Final detailed answer follows]
	```

	#### Custom System Prompt Example

	```python
	# Use custom system prompt (overrides thinking modes)
	messages = [
	{"role": "system", "content": "You are a specialized manufacturing engineer focused on lean principles."},
	{"role": "user", "content": "Analyze this production issue..."}
	]

	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	thinking_mode="Budget_Thinking" # This will be ignored due to custom system prompt
	)
	```

	### 🎮 Interactive Terminal Interface

	```python
	# Complete interactive example with mode switching
	import os
	from vllm import LLM, SamplingParams
	from transformers import AutoTokenizer

	# Load model
	llm = LLM(model="FoxBrain_v1.2_70B", tensor_parallel_size=4, gpu_memory_utilization=0.85)
	tokenizer = AutoTokenizer.from_pretrained("FoxBrain_v1.2_70B")

	current_mode = 'Budget_Thinking'
	messages = []

	print("FoxBrain v1.2 Interactive Terminal")
	print("Commands: 'mode1' (Budget_Thinking), 'mode2' (Extend_Thinking), 'custom' (custom prompt), 'reset', 'quit'")

	while True:
	user_input = input("User: ").strip()

	if user_input.lower() == 'quit':
	break
	elif user_input.lower() == 'mode1':
	current_mode = 'Budget_Thinking'
	messages = []
	print("Switched to Budget_Thinking mode!")
	continue
	elif user_input.lower() == 'mode2':
	current_mode = 'Extend_Thinking'
	messages = []
	print("Switched to Extend_Thinking mode!")
	continue

	messages.append({"role": "user", "content": user_input})

	# Apply chat template with selected thinking mode
	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	thinking_mode=current_mode
	)

	# Generate response
	sampling_params = SamplingParams(temperature=0.7, max_tokens=2048)
	outputs = llm.generate([prompt], sampling_params)
	response = outputs[0].outputs[0].text.strip()

	print(f"Assistant: {response}")
	messages.append({"role": "assistant", "content": response})
	```

	---

	## 📊 Academic & Human Evaluation Benchmarks

	### 🎓 Taiwan MMLU+ (Academic Benchmark)

	FoxBrain v1.2 was evaluated on Taiwan MMLU+ with both thinking modes, showing improved performance in complex reasoning tasks:

	<img src="images/tmmlu_plus_benchmark_v12.png" width="900"/>

	### 🧠 Reasoning Capability Analysis

	Budget_Thinking Mode Performance:
	- ✅ Structured Problem Solving: +15% improvement in multi-step reasoning
	- ✅ Resource Efficiency: Optimal performance within allocated computational budget
	- ✅ Transparency: Clear reasoning trace for audit and debugging

	Extend_Thinking Mode Performance:
	- ✅ Deep Analysis: +22% improvement in complex analytical tasks
	- ✅ Creative Solutions: Enhanced performance in open-ended problems
	- ✅ Comprehensive Coverage: Better handling of nuanced, multi-faceted challenges

	### 👥 MT-Bench (Human Preference Evaluation)

	Updated MT-Bench results for v1.2 with thinking mode comparisons:

	<img src="images/mtbench_benchmark_v12.png" width="1200"/>

	> 🏅 FoxBrain v1.2 demonstrated significant improvements in reasoning tasks, with Budget_Thinking mode excelling in systematic problems and Extend_Thinking mode leading in creative tasks.

	---

	## 🤖 Suggested Use Cases by Mode

	### 🎯 Budget_Thinking Mode - Best For:

	- 🏭 Manufacturing Process Optimization: Step-by-step analysis with resource constraints
	- 📊 Quality Control Procedures: Systematic inspection and validation workflows
	- 🔧 Troubleshooting Protocols: Structured diagnostic procedures with clear steps
	- 📈 Performance Analysis: Quantified assessments with measurable outcomes
	- 🎯 Project Planning: Resource-aware task breakdown and timeline estimation

	### 💭 Extend_Thinking Mode - Best For:

	- 🧪 Research & Development: Deep analysis of complex technical problems
	- 🎨 Creative Problem Solving: Innovative approaches to engineering challenges
	- 📝 Technical Documentation: Comprehensive analysis and explanation
	- 🤔 Strategic Planning: Long-term thinking and scenario analysis
	- 🔍 Root Cause Analysis: In-depth investigation of complex system failures

	### 🎛️ Custom System Prompts - Best For:

	- 🏢 Domain-Specific Applications: Tailored behavior for specific industries
	- 👥 Role-Specific Interactions: Customized persona for different use cases
	- 🔒 Compliance Requirements: Specific guidelines and constraints
	- 🎯 Specialized Workflows: Custom instructions for unique business processes

	---

	## 🚧 Roadmap & Version History

	<img src="images/foxbrain_roadmap_v12.png" width="500"/>

	### Version History
	- 📌 Version 1.0: Foundation model with strong Chinese language proficiency
	- 🔄 Version 1.1: Enhanced reasoning capabilities and improved efficiency
	- 🆕 Version 1.2: Dual thinking modes with structured reasoning architecture
	- 🔜 Version 2.0: Advanced industrial knowledge integration and domain expertise
	- 🌆 Long-Term Vision: Comprehensive smart manufacturing and industrial AI platform

	---

	## ⚠️ Important Notes for v1.2

	### 🔧 Migration from v1.0/v1.1
	- Chat template has been updated - ensure you're using the latest tokenizer
	- Default mode is `Budget_Thinking` if no `thinking_mode` is specified
	- Custom system prompts take precedence over thinking modes

	### 💾 Memory Requirements
	- Budget_Thinking mode: Standard memory usage
	- Extend_Thinking mode: May require additional memory for extended reasoning
	- Multi-GPU setup (4+ GPUs) recommended for optimal performance

	### 🎛️ Parameter Recommendations
	- Budget_Thinking mode:
	- `temperature=0.3-0.7`
	- Standard sampling parameters work well
	- Extend_Thinking mode:
	- `temperature=0.3-0.5`
	- ⚠️ Critical: `presence_penalty=1.5` (model may generate unexpected results without this setting)
	- General settings:
	- `max_tokens=2048-4096` depending on problem complexity

	---

	## 📄 License

	This model is released under the Llama 3.1 Community License Agreement.

	---

	## 🙌 Contributors

	- AI Research Center of Hon Hai Research Institute (model training, deployment & evaluation)
	- Meta-Llama (base model)

	---

	## 📫 Contact

	For support or partnership inquiries:
	📧 [email protected]

	---

	FoxBrain v1.2 - Where structured reasoning meets industrial intelligence. 🚀