Llama_3.1-FoxBrain-70B-V1.2 / README.md

Harrytigerlittle1

Update README.md

f963410 verified 2 months ago

preview code

raw

history blame contribute delete

18.9 kB

metadata

license: llama3.1
language:
  - zh
  - en
base_model:
  - meta-llama/Llama-3.1-70B-Instruct
extra_gated_fields:
  Full name: text
  Email: text
  Company: text
  Country: country
  Specific date: date_picker
  I want to use this model for:
    type: select
    options:
      - Research
      - Education
      - label: Other
        value: other
  I agree to use this model for non-commercial use ONLY: checkbox
extra_gated_prompt: >-
  The information you provide will be collected, stored, processed and shared in
  accordance with the https://www.honhai.com/zh-tw/privacy-and-policy

  To request access to the Llama_3.1-FoxBrain-70B-V1.2 model weights, please
  contact us by email first. **Only requests from users who have contacted us in
  advance will be considered.**

  When sending your email, **make sure to use the same email address that you
  will enter in the application form.** This helps us verify your identity and
  approval status.

  You may contact us at: `[email protected]`  


  ## 📄 Llama_3.1-FoxBrain-70B-V1.2 Model Usage and License Agreement


  At this time, access to the FoxBrain model is granted **exclusively to
  academic institutions and research organizations**. We do not authorize
  commercial usage under the current release.   For commercial or enterprise
  applications, Llama_3.1-FoxBrain-70B-V1.2 will be made available in the future
  through authorized channels—such as **deployment on AWS**—accompanied by a
  separate licensing framework. Please stay tuned for updates regarding
  commercial availability.

  Welcome to the Llama_3.1-FoxBrain-70B-V1.2  model.

  Llama_3.1-FoxBrain-70B-V1.2 (hereinafter referred to as “FoxBrain”) is a large
  language model developed by Hon Hai Research Institute based on the Meta Llama
  3.1 architecture. It has been optimized using Traditional Chinese corpora from
  Taiwan and supports a wide range of inference tasks and application scenarios.

  This Agreement sets forth the terms and conditions that users (hereinafter
  “You”) must adhere to when using the FoxBrain model, including its weights,
  source code, APIs, and any derivative works.


  ## 1. Definitions

  1. **License Agreement**: Subject to the terms herein, Hon Hai Research
  Institute grants You the rights to use, reproduce, modify, and distribute the
  FoxBrain model. 2. **Licensor**: Refers to Hon Hai Research Institute or the
  authorized owner of intellectual property rights to the FoxBrain model. 3.
  **You**: Any individual or entity authorized to use the model under this
  Agreement. 4. **FoxBrain Model**: The collection of training parameters,
  weights, source code, and related components. 5. **Derivative Models**: Models
  built upon FoxBrain’s parameters, outputs, or modifications.


  ## 2. Usage Principles and Academic Orientation

  - FoxBrain is primarily intended for academic research, education, and
  technical exchange. Commercial use is strictly prohibited unless explicitly
  authorized. - Users must comply with the laws of the Republic of China
  (Taiwan) and the Meta Llama 3.1 license terms. - Any illegal, harmful, or
  rights-infringing usage is strictly forbidden. - Do not interfere with,
  disrupt, or compromise the integrity of the system or other users. - Users
  should promptly report any security vulnerabilities or anomalies to Hon Hai
  Research Institute.


  ## 3. User Responsibility and Disclaimer

  - If You violate any laws resulting in damages to Hon Hai Research Institute
  or third parties, You shall bear full responsibility. - Hon Hai Research
  Institute shall not be held liable for any misuse, including distribution of
  illegal content or unauthorized data access. - FoxBrain is provided “as is”
  for research purposes. Outputs may be inaccurate, biased, or controversial.
  Users shall independently assess and accept relevant risks.

  ## 4. Summary of Meta Llama 3.1 License Terms

  This model is built upon Meta’s Llama 3.1 architecture. Users must comply with
  Meta’s licensing restrictions, which include (but are not limited to):

  - Non-exclusive, worldwide, royalty-free usage rights - Prohibition of using
  the model to improve other LLMs (except for Llama-derived works) - If monthly
  active users exceed 700 million, a separate commercial license must be
  obtained - Proper attribution is required: “This model is licensed under the
  Llama 3.1 Community License. © Meta Platforms, Inc. All rights reserved.”

  🔗 [Meta License Terms](https://llama.meta.com/llama3/license)   🔗 [Meta
  Usage Policy](https://llama.meta.com/llama3/use-policy)



  ## 5. Prohibited Uses

  ### 5.1 Illegal or Infringing Activities

  - Violence, terrorism, discrimination, exploitation, deepfake technology, and
  unauthorized surveillance - Medical, legal, or financial services without
  authorization - Unlawful access to, use of, or inference from personal data

  ### 5.2 High-Risk Applications

  - Military, weapons manufacturing, heavy industry control, or critical
  infrastructure operations - Self-harm, suicide, or any activity that endangers
  personal safety

  ### 5.3 Deception and Abuse

  - Fraud, forgery, impersonation, or generating AI content without proper
  labeling

  The above list is not exhaustive. Any activity that violates laws, endangers
  human safety, or poses significant societal risks is strictly forbidden.



  ## 6. Miscellaneous

  - “FoxBrain” is a registered trademark of Hon Hai Research Institute. Use of
  the name, logos, or identifiers must comply with applicable trademark laws and
  this Agreement. - This Agreement does not constitute a commercial warranty or
  endorsement by Hon Hai Research Institute. - Hon Hai Research Institute
  reserves the right to modify, suspend, or terminate this Agreement at any
  time. - Use of the model by legal entities implies duly authorized
  representation.


  ## 7. Jurisdiction

  This Agreement is governed by the laws of the Republic of China (Taiwan). Any
  disputes shall be under the jurisdiction of the Taipei District Court as the
  court of first instance.

FoxBrain v1.2: Advanced Reasoning LLM with Dual Thinking Modes

FoxBrain is a large language model (LLM) independently developed by Foxconn, representing a major milestone in the company's long-term strategy to create AI that deeply understands industrial knowledge and high-reliability domains.

Currently at version 1.2, FoxBrain delivers exceptional performance in Chinese language understanding and generation, and introduces revolutionary dual thinking modes for enhanced reasoning capabilities in complex industrial contexts.

👉 Official GitHub: FoxBrain_LLMs

🆕 What's New in Version 1.2

🧠 Dual Thinking Modes

FoxBrain v1.2 introduces two distinct reasoning approaches:

🎯 Budget_Thinking Mode: Step-by-step reasoning with resource management
- Allocates computational "budget" based on problem complexity (1-9 steps)
- Provides structured output with reasoning steps, reflections, and quality scores
- Ideal for systematic problem-solving and transparent decision-making
💭 Extend_Thinking Mode: Deep analytical reasoning with extended thought process
- Uses <think></think> tags for comprehensive internal reasoning
- Allows for more flexible and creative problem exploration
- Perfect for complex analysis and open-ended challenges
- ⚠️ Important: This mode is sensitive to the presence_penalty parameter - we recommend setting it to 1.5 for optimal performance

⚙️ Enhanced Chat Template System

Flexible Mode Switching: Seamlessly switch between thinking modes
Custom System Prompts: Support for user-defined system instructions
Priority-Based Selection: Custom prompts override default modes
Backward Compatibility: Maintains compatibility with existing implementations

🔍 Key Features

🧠 Dual Reasoning Architecture
Two specialized thinking modes for different problem types and complexity levels.
🏭 Industrial-Grade Performance
Built for the precision, consistency, and robustness required in mission-critical industrial applications.
📘 Optimized for Traditional Chinese
Fine-tuned on high-quality Taiwanese Traditional Chinese datasets for superior linguistic alignment.
💡 Structured & Transparent Output
Budget mode provides step-by-step reasoning with quality assessments and resource tracking.
⚙️ Fast Inference with VLLM
Easily deployable on 2–8 H100 GPUs with ultra-low latency and flexible configuration.
🔧 Developer-Friendly Integration
Simple parameter-based mode switching via thinking_mode parameter.

🚀 Quickstart: Inference with VLLM

🖥️ Environment Requirements

Python 3.8+
CUDA-compatible environment
2 to 8 × H100 GPUs (4 GPUs recommended for optimal performance)
vllm installed

📦 Install VLLM

pip install vllm

🧠 Launch Inference API

vllm serve \
  --model FoxBrain_v1.2_70B \
  --api-key foxbrain-cit \
  --port 8800 \
  --max-model-len 32768 \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.85 \
  --enforce-eager

💻 Python Usage Examples

Budget_Thinking Mode Example

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

# Load model and tokenizer
llm = LLM(model="FoxBrain_v1.2_70B", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("FoxBrain_v1.2_70B")

messages = [
    {"role": "user", "content": "Solve this complex engineering problem: How would you optimize a manufacturing assembly line with 3 bottlenecks?"}
]

# Use Budget_Thinking mode
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    thinking_mode="Budget_Thinking"
)

# Generate with structured reasoning
sampling_params = SamplingParams(temperature=0.3, max_tokens=2048)
outputs = llm.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)

Expected Output Structure:

<count>6</count>  # Initial budget for complex problem

<step>First, I need to identify the three bottlenecks...</step>
<count>5</count>  # Remaining budget

<step>Next, I'll analyze the throughput capacity...</step>
<count>4</count>
<reflection>My analysis is on track, need to consider dependencies</reflection>
<reward>0.7</reward>

<answer>To optimize the assembly line with 3 bottlenecks: 1) Implement parallel processing at bottleneck A, 2) Add buffer stations before bottleneck B, 3) Upgrade equipment at bottleneck C. Expected 25% throughput improvement.</answer>

<reflection>Comprehensive solution addressing all bottlenecks with quantified benefits</reflection>
<reward>0.9</reward>

Extend_Thinking Mode Example

# Use Extend_Thinking mode with recommended parameters
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    thinking_mode="Extend_Thinking"
)

# Important: Use presence_penalty=1.5 for Extend_Thinking mode
sampling_params = SamplingParams(
    temperature=0.3, 
    presence_penalty=1.5, 
    max_tokens=2048
)
outputs = llm.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)

Expected Output Structure:

<think>
This is a complex manufacturing optimization problem. Let me think through this systematically...

First, I should understand what constitutes a bottleneck in manufacturing:
- Limited capacity point in the process
- Determines overall system throughput
- Can be equipment, labor, or process-related

For the three bottlenecks, I need to consider:
1. Root cause analysis for each bottleneck
2. Interdependencies between bottlenecks
3. Cost-benefit analysis of solutions
4. Implementation timeline and resource requirements
...
</think>

Based on my analysis of manufacturing assembly line optimization, here's a comprehensive approach to address the three bottlenecks:

[Final detailed answer follows]

Custom System Prompt Example

# Use custom system prompt (overrides thinking modes)
messages = [
    {"role": "system", "content": "You are a specialized manufacturing engineer focused on lean principles."},
    {"role": "user", "content": "Analyze this production issue..."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    thinking_mode="Budget_Thinking"  # This will be ignored due to custom system prompt
)

🎮 Interactive Terminal Interface

# Complete interactive example with mode switching
import os
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

# Load model
llm = LLM(model="FoxBrain_v1.2_70B", tensor_parallel_size=4, gpu_memory_utilization=0.85)
tokenizer = AutoTokenizer.from_pretrained("FoxBrain_v1.2_70B")

current_mode = 'Budget_Thinking'
messages = []

print("FoxBrain v1.2 Interactive Terminal")
print("Commands: 'mode1' (Budget_Thinking), 'mode2' (Extend_Thinking), 'custom' (custom prompt), 'reset', 'quit'")

while True:
    user_input = input("User: ").strip()
    
    if user_input.lower() == 'quit':
        break
    elif user_input.lower() == 'mode1':
        current_mode = 'Budget_Thinking'
        messages = []
        print("Switched to Budget_Thinking mode!")
        continue
    elif user_input.lower() == 'mode2':
        current_mode = 'Extend_Thinking'
        messages = []
        print("Switched to Extend_Thinking mode!")
        continue
    
    messages.append({"role": "user", "content": user_input})
    
    # Apply chat template with selected thinking mode
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        thinking_mode=current_mode
    )
    
    # Generate response
    sampling_params = SamplingParams(temperature=0.7, max_tokens=2048)
    outputs = llm.generate([prompt], sampling_params)
    response = outputs[0].outputs[0].text.strip()
    
    print(f"Assistant: {response}")
    messages.append({"role": "assistant", "content": response})

📊 Academic & Human Evaluation Benchmarks

🎓 Taiwan MMLU+ (Academic Benchmark)

FoxBrain v1.2 was evaluated on Taiwan MMLU+ with both thinking modes, showing improved performance in complex reasoning tasks:

🧠 Reasoning Capability Analysis

Budget_Thinking Mode Performance:

✅ Structured Problem Solving: +15% improvement in multi-step reasoning
✅ Resource Efficiency: Optimal performance within allocated computational budget
✅ Transparency: Clear reasoning trace for audit and debugging

Extend_Thinking Mode Performance:

✅ Deep Analysis: +22% improvement in complex analytical tasks
✅ Creative Solutions: Enhanced performance in open-ended problems
✅ Comprehensive Coverage: Better handling of nuanced, multi-faceted challenges

👥 MT-Bench (Human Preference Evaluation)

Updated MT-Bench results for v1.2 with thinking mode comparisons:

🏅 FoxBrain v1.2 demonstrated significant improvements in reasoning tasks, with Budget_Thinking mode excelling in systematic problems and Extend_Thinking mode leading in creative tasks.

🤖 Suggested Use Cases by Mode

🎯 Budget_Thinking Mode - Best For:

🏭 Manufacturing Process Optimization: Step-by-step analysis with resource constraints
📊 Quality Control Procedures: Systematic inspection and validation workflows
🔧 Troubleshooting Protocols: Structured diagnostic procedures with clear steps
📈 Performance Analysis: Quantified assessments with measurable outcomes
🎯 Project Planning: Resource-aware task breakdown and timeline estimation

💭 Extend_Thinking Mode - Best For:

🧪 Research & Development: Deep analysis of complex technical problems
🎨 Creative Problem Solving: Innovative approaches to engineering challenges
📝 Technical Documentation: Comprehensive analysis and explanation
🤔 Strategic Planning: Long-term thinking and scenario analysis
🔍 Root Cause Analysis: In-depth investigation of complex system failures

🎛️ Custom System Prompts - Best For:

🏢 Domain-Specific Applications: Tailored behavior for specific industries
👥 Role-Specific Interactions: Customized persona for different use cases
🔒 Compliance Requirements: Specific guidelines and constraints
🎯 Specialized Workflows: Custom instructions for unique business processes

🚧 Roadmap & Version History

Version History

📌 Version 1.0: Foundation model with strong Chinese language proficiency
🔄 Version 1.1: Enhanced reasoning capabilities and improved efficiency
🆕 Version 1.2: Dual thinking modes with structured reasoning architecture
🔜 Version 2.0: Advanced industrial knowledge integration and domain expertise
🌆 Long-Term Vision: Comprehensive smart manufacturing and industrial AI platform

⚠️ Important Notes for v1.2

🔧 Migration from v1.0/v1.1

Chat template has been updated - ensure you're using the latest tokenizer
Default mode is Budget_Thinking if no thinking_mode is specified
Custom system prompts take precedence over thinking modes

💾 Memory Requirements

Budget_Thinking mode: Standard memory usage
Extend_Thinking mode: May require additional memory for extended reasoning
Multi-GPU setup (4+ GPUs) recommended for optimal performance

🎛️ Parameter Recommendations

Budget_Thinking mode:
- temperature=0.3-0.7
- Standard sampling parameters work well
Extend_Thinking mode:
- temperature=0.3-0.5
- ⚠️ Critical: presence_penalty=1.5 (model may generate unexpected results without this setting)
General settings:
- max_tokens=2048-4096 depending on problem complexity

📄 License

This model is released under the Llama 3.1 Community License Agreement.

🙌 Contributors

AI Research Center of Hon Hai Research Institute (model training, deployment & evaluation)
Meta-Llama (base model)

📫 Contact

For support or partnership inquiries: 📧 [email protected]

FoxBrain v1.2 - Where structured reasoning meets industrial intelligence. 🚀