--- base_model: OctoAI/OctoThinker-3B license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation tags: - spiral - self-play - reinforcement-learning - octothinker - multi-agent --- # SPIRAL OctoThinker-3B Multi-Agent Model This model was trained using the SPIRAL (Self-Play Iterative Reinforcement learning for Adaptation and Learning) framework. ## Model Details - **Base Model**: OctoAI/OctoThinker-3B - **Training Framework**: SPIRAL - **Checkpoint**: step_00128 - **Model Size**: 3B parameters - **Training Date**: 2025-09-09 ## Training Configuration The model was trained with self-play on multiple environments: - KuhnPoker-v1 - TicTacToe-v0 - SimpleNegotiation-v2 ### Training Parameters ```json { "learning_rate": "1e-6", "train_batch_size": 128, "num_ppo_epochs": 2, "temperature": 1.0, "max_model_len": 16384, "environments": [ "KuhnPoker-v1", "TicTacToe-v0", "SimpleNegotiation-v2" ], "base_model": "OctoAI/OctoThinker-3B", "framework": "SPIRAL" } ``` ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("the-acorn-ai/spiral-octothinker-3b-multi-env-new-step00128") model = AutoModelForCausalLM.from_pretrained( "the-acorn-ai/spiral-octothinker-3b-multi-env-new-step00128", torch_dtype=torch.bfloat16, device_map="auto" ) # Generate text inputs = tokenizer("Your prompt here", return_tensors="pt") outputs = model.generate(**inputs, max_length=100) response = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## License This model is licensed under the Apache License 2.0.