You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

The information you provide will be collected, stored, processed and shared in accordance with the https://www.honhai.com/zh-tw/privacy-and-policy
To request access to the Llama_3.1-FoxBrain-70B-V1.2 model weights, please contact us by email first. Only requests from users who have contacted us in advance will be considered.
When sending your email, make sure to use the same email address that you will enter in the application form. This helps us verify your identity and approval status.
You may contact us at: [email protected]

📄 Llama_3.1-FoxBrain-70B-V1.2 Model Usage and License Agreement

At this time, access to the FoxBrain model is granted exclusively to academic institutions and research organizations. We do not authorize commercial usage under the current release. For commercial or enterprise applications, Llama_3.1-FoxBrain-70B-V1.2 will be made available in the future through authorized channels—such as deployment on AWS—accompanied by a separate licensing framework. Please stay tuned for updates regarding commercial availability.
Welcome to the Llama_3.1-FoxBrain-70B-V1.2 model.
Llama_3.1-FoxBrain-70B-V1.2 (hereinafter referred to as “FoxBrain”) is a large language model developed by Hon Hai Research Institute based on the Meta Llama 3.1 architecture. It has been optimized using Traditional Chinese corpora from Taiwan and supports a wide range of inference tasks and application scenarios.
This Agreement sets forth the terms and conditions that users (hereinafter “You”) must adhere to when using the FoxBrain model, including its weights, source code, APIs, and any derivative works.

1. Definitions

License Agreement: Subject to the terms herein, Hon Hai Research Institute grants You the rights to use, reproduce, modify, and distribute the FoxBrain model. 2. Licensor: Refers to Hon Hai Research Institute or the authorized owner of intellectual property rights to the FoxBrain model. 3. You: Any individual or entity authorized to use the model under this Agreement. 4. FoxBrain Model: The collection of training parameters, weights, source code, and related components. 5. Derivative Models: Models built upon FoxBrain’s parameters, outputs, or modifications.

2. Usage Principles and Academic Orientation

FoxBrain is primarily intended for academic research, education, and technical exchange. Commercial use is strictly prohibited unless explicitly authorized. - Users must comply with the laws of the Republic of China (Taiwan) and the Meta Llama 3.1 license terms. - Any illegal, harmful, or rights-infringing usage is strictly forbidden. - Do not interfere with, disrupt, or compromise the integrity of the system or other users. - Users should promptly report any security vulnerabilities or anomalies to Hon Hai Research Institute.

3. User Responsibility and Disclaimer

If You violate any laws resulting in damages to Hon Hai Research Institute or third parties, You shall bear full responsibility. - Hon Hai Research Institute shall not be held liable for any misuse, including distribution of illegal content or unauthorized data access. - FoxBrain is provided “as is” for research purposes. Outputs may be inaccurate, biased, or controversial. Users shall independently assess and accept relevant risks.

4. Summary of Meta Llama 3.1 License Terms

This model is built upon Meta’s Llama 3.1 architecture. Users must comply with Meta’s licensing restrictions, which include (but are not limited to):

Non-exclusive, worldwide, royalty-free usage rights - Prohibition of using the model to improve other LLMs (except for Llama-derived works) - If monthly active users exceed 700 million, a separate commercial license must be obtained - Proper attribution is required: “This model is licensed under the Llama 3.1 Community License. © Meta Platforms, Inc. All rights reserved.”
🔗 Meta License Terms 🔗 Meta Usage Policy

5. Prohibited Uses

5.1 Illegal or Infringing Activities

Violence, terrorism, discrimination, exploitation, deepfake technology, and unauthorized surveillance - Medical, legal, or financial services without authorization - Unlawful access to, use of, or inference from personal data

5.2 High-Risk Applications

Military, weapons manufacturing, heavy industry control, or critical infrastructure operations - Self-harm, suicide, or any activity that endangers personal safety

5.3 Deception and Abuse

Fraud, forgery, impersonation, or generating AI content without proper labeling
The above list is not exhaustive. Any activity that violates laws, endangers human safety, or poses significant societal risks is strictly forbidden.

6. Miscellaneous

“FoxBrain” is a registered trademark of Hon Hai Research Institute. Use of the name, logos, or identifiers must comply with applicable trademark laws and this Agreement. - This Agreement does not constitute a commercial warranty or endorsement by Hon Hai Research Institute. - Hon Hai Research Institute reserves the right to modify, suspend, or terminate this Agreement at any time. - Use of the model by legal entities implies duly authorized representation.

7. Jurisdiction

This Agreement is governed by the laws of the Republic of China (Taiwan). Any disputes shall be under the jurisdiction of the Taipei District Court as the court of first instance.

FoxBrain v1.2: Advanced Reasoning LLM with Dual Thinking Modes

FoxBrain is a large language model (LLM) independently developed by Foxconn, representing a major milestone in the company's long-term strategy to create AI that deeply understands industrial knowledge and high-reliability domains.

Currently at version 1.2, FoxBrain delivers exceptional performance in Chinese language understanding and generation, and introduces revolutionary dual thinking modes for enhanced reasoning capabilities in complex industrial contexts.

👉 Official GitHub: FoxBrain_LLMs

🆕 What's New in Version 1.2

🧠 Dual Thinking Modes

FoxBrain v1.2 introduces two distinct reasoning approaches:

🎯 Budget_Thinking Mode: Step-by-step reasoning with resource management
- Allocates computational "budget" based on problem complexity (1-9 steps)
- Provides structured output with reasoning steps, reflections, and quality scores
- Ideal for systematic problem-solving and transparent decision-making
💭 Extend_Thinking Mode: Deep analytical reasoning with extended thought process
- Uses <think></think> tags for comprehensive internal reasoning
- Allows for more flexible and creative problem exploration
- Perfect for complex analysis and open-ended challenges
- ⚠️ Important: This mode is sensitive to the presence_penalty parameter - we recommend setting it to 1.5 for optimal performance

⚙️ Enhanced Chat Template System

Flexible Mode Switching: Seamlessly switch between thinking modes
Custom System Prompts: Support for user-defined system instructions
Priority-Based Selection: Custom prompts override default modes
Backward Compatibility: Maintains compatibility with existing implementations

🔍 Key Features

🧠 Dual Reasoning Architecture
Two specialized thinking modes for different problem types and complexity levels.
🏭 Industrial-Grade Performance
Built for the precision, consistency, and robustness required in mission-critical industrial applications.
📘 Optimized for Traditional Chinese
Fine-tuned on high-quality Taiwanese Traditional Chinese datasets for superior linguistic alignment.
💡 Structured & Transparent Output
Budget mode provides step-by-step reasoning with quality assessments and resource tracking.
⚙️ Fast Inference with VLLM
Easily deployable on 2–8 H100 GPUs with ultra-low latency and flexible configuration.
🔧 Developer-Friendly Integration
Simple parameter-based mode switching via thinking_mode parameter.

🚀 Quickstart: Inference with VLLM

🖥️ Environment Requirements

Python 3.8+
CUDA-compatible environment
2 to 8 × H100 GPUs (4 GPUs recommended for optimal performance)
vllm installed

📦 Install VLLM

pip install vllm

🧠 Launch Inference API

vllm serve \
  --model FoxBrain_v1.2_70B \
  --api-key foxbrain-cit \
  --port 8800 \
  --max-model-len 32768 \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.85 \
  --enforce-eager

💻 Python Usage Examples

Budget_Thinking Mode Example

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

# Load model and tokenizer
llm = LLM(model="FoxBrain_v1.2_70B", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("FoxBrain_v1.2_70B")

messages = [
    {"role": "user", "content": "Solve this complex engineering problem: How would you optimize a manufacturing assembly line with 3 bottlenecks?"}
]

# Use Budget_Thinking mode
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    thinking_mode="Budget_Thinking"
)

# Generate with structured reasoning
sampling_params = SamplingParams(temperature=0.3, max_tokens=2048)
outputs = llm.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)

Expected Output Structure:

<count>6</count>  # Initial budget for complex problem

<step>First, I need to identify the three bottlenecks...</step>
<count>5</count>  # Remaining budget

<step>Next, I'll analyze the throughput capacity...</step>
<count>4</count>
<reflection>My analysis is on track, need to consider dependencies</reflection>
<reward>0.7</reward>

<answer>To optimize the assembly line with 3 bottlenecks: 1) Implement parallel processing at bottleneck A, 2) Add buffer stations before bottleneck B, 3) Upgrade equipment at bottleneck C. Expected 25% throughput improvement.</answer>

<reflection>Comprehensive solution addressing all bottlenecks with quantified benefits</reflection>
<reward>0.9</reward>

Extend_Thinking Mode Example

# Use Extend_Thinking mode with recommended parameters
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    thinking_mode="Extend_Thinking"
)

# Important: Use presence_penalty=1.5 for Extend_Thinking mode
sampling_params = SamplingParams(
    temperature=0.3, 
    presence_penalty=1.5, 
    max_tokens=2048
)
outputs = llm.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)

Expected Output Structure:

<think>
This is a complex manufacturing optimization problem. Let me think through this systematically...

First, I should understand what constitutes a bottleneck in manufacturing:
- Limited capacity point in the process
- Determines overall system throughput
- Can be equipment, labor, or process-related

For the three bottlenecks, I need to consider:
1. Root cause analysis for each bottleneck
2. Interdependencies between bottlenecks
3. Cost-benefit analysis of solutions
4. Implementation timeline and resource requirements
...
</think>

Based on my analysis of manufacturing assembly line optimization, here's a comprehensive approach to address the three bottlenecks:

[Final detailed answer follows]

Custom System Prompt Example

# Use custom system prompt (overrides thinking modes)
messages = [
    {"role": "system", "content": "You are a specialized manufacturing engineer focused on lean principles."},
    {"role": "user", "content": "Analyze this production issue..."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    thinking_mode="Budget_Thinking"  # This will be ignored due to custom system prompt
)

🎮 Interactive Terminal Interface

# Complete interactive example with mode switching
import os
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

# Load model
llm = LLM(model="FoxBrain_v1.2_70B", tensor_parallel_size=4, gpu_memory_utilization=0.85)
tokenizer = AutoTokenizer.from_pretrained("FoxBrain_v1.2_70B")

current_mode = 'Budget_Thinking'
messages = []

print("FoxBrain v1.2 Interactive Terminal")
print("Commands: 'mode1' (Budget_Thinking), 'mode2' (Extend_Thinking), 'custom' (custom prompt), 'reset', 'quit'")

while True:
    user_input = input("User: ").strip()
    
    if user_input.lower() == 'quit':
        break
    elif user_input.lower() == 'mode1':
        current_mode = 'Budget_Thinking'
        messages = []
        print("Switched to Budget_Thinking mode!")
        continue
    elif user_input.lower() == 'mode2':
        current_mode = 'Extend_Thinking'
        messages = []
        print("Switched to Extend_Thinking mode!")
        continue
    
    messages.append({"role": "user", "content": user_input})
    
    # Apply chat template with selected thinking mode
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        thinking_mode=current_mode
    )
    
    # Generate response
    sampling_params = SamplingParams(temperature=0.7, max_tokens=2048)
    outputs = llm.generate([prompt], sampling_params)
    response = outputs[0].outputs[0].text.strip()
    
    print(f"Assistant: {response}")
    messages.append({"role": "assistant", "content": response})

📊 Academic & Human Evaluation Benchmarks

🎓 Taiwan MMLU+ (Academic Benchmark)

FoxBrain v1.2 was evaluated on Taiwan MMLU+ with both thinking modes, showing improved performance in complex reasoning tasks:

🧠 Reasoning Capability Analysis

Budget_Thinking Mode Performance:

✅ Structured Problem Solving: +15% improvement in multi-step reasoning
✅ Resource Efficiency: Optimal performance within allocated computational budget
✅ Transparency: Clear reasoning trace for audit and debugging

Extend_Thinking Mode Performance:

✅ Deep Analysis: +22% improvement in complex analytical tasks
✅ Creative Solutions: Enhanced performance in open-ended problems
✅ Comprehensive Coverage: Better handling of nuanced, multi-faceted challenges

👥 MT-Bench (Human Preference Evaluation)

Updated MT-Bench results for v1.2 with thinking mode comparisons:

🏅 FoxBrain v1.2 demonstrated significant improvements in reasoning tasks, with Budget_Thinking mode excelling in systematic problems and Extend_Thinking mode leading in creative tasks.

🤖 Suggested Use Cases by Mode

🎯 Budget_Thinking Mode - Best For:

🏭 Manufacturing Process Optimization: Step-by-step analysis with resource constraints
📊 Quality Control Procedures: Systematic inspection and validation workflows
🔧 Troubleshooting Protocols: Structured diagnostic procedures with clear steps
📈 Performance Analysis: Quantified assessments with measurable outcomes
🎯 Project Planning: Resource-aware task breakdown and timeline estimation

💭 Extend_Thinking Mode - Best For:

🧪 Research & Development: Deep analysis of complex technical problems
🎨 Creative Problem Solving: Innovative approaches to engineering challenges
📝 Technical Documentation: Comprehensive analysis and explanation
🤔 Strategic Planning: Long-term thinking and scenario analysis
🔍 Root Cause Analysis: In-depth investigation of complex system failures

🎛️ Custom System Prompts - Best For:

🏢 Domain-Specific Applications: Tailored behavior for specific industries
👥 Role-Specific Interactions: Customized persona for different use cases
🔒 Compliance Requirements: Specific guidelines and constraints
🎯 Specialized Workflows: Custom instructions for unique business processes

🚧 Roadmap & Version History

Version History

📌 Version 1.0: Foundation model with strong Chinese language proficiency
🔄 Version 1.1: Enhanced reasoning capabilities and improved efficiency
🆕 Version 1.2: Dual thinking modes with structured reasoning architecture
🔜 Version 2.0: Advanced industrial knowledge integration and domain expertise
🌆 Long-Term Vision: Comprehensive smart manufacturing and industrial AI platform

⚠️ Important Notes for v1.2

🔧 Migration from v1.0/v1.1

Chat template has been updated - ensure you're using the latest tokenizer
Default mode is Budget_Thinking if no thinking_mode is specified
Custom system prompts take precedence over thinking modes

💾 Memory Requirements

Budget_Thinking mode: Standard memory usage
Extend_Thinking mode: May require additional memory for extended reasoning
Multi-GPU setup (4+ GPUs) recommended for optimal performance

🎛️ Parameter Recommendations

Budget_Thinking mode:
- temperature=0.3-0.7
- Standard sampling parameters work well
Extend_Thinking mode:
- temperature=0.3-0.5
- ⚠️ Critical: presence_penalty=1.5 (model may generate unexpected results without this setting)
General settings:
- max_tokens=2048-4096 depending on problem complexity

📄 License

This model is released under the Llama 3.1 Community License Agreement.

🙌 Contributors

AI Research Center of Hon Hai Research Institute (model training, deployment & evaluation)
Meta-Llama (base model)

📫 Contact

For support or partnership inquiries: 📧 [email protected]

FoxBrain v1.2 - Where structured reasoning meets industrial intelligence. 🚀

Downloads last month: 3

Safetensors

Model size

70.6B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FoxconnAI/Llama_3.1-FoxBrain-70B-V1.2

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.1-70B-Instruct

Finetuned

(81)

this model