AWS Trainium & Inferentia documentation
NeuronTrainer
NeuronTrainer
Training classes for AWS Trainium accelerators.
NeuronTrainingArguments
class optimum.neuron.NeuronTrainingArguments
< source >( output_dir: str | None = None overwrite_output_dir: bool = False do_train: bool = False do_eval: bool = False eval_strategy: transformers.trainer_utils.IntervalStrategy | str = 'no' per_device_train_batch_size: int = 1 per_device_eval_batch_size: int = 1 gradient_accumulation_steps: int = 1 learning_rate: float = 5e-05 weight_decay: float = 0.0 adam_beta1: float = 0.9 adam_beta2: float = 0.999 adam_epsilon: float = 1e-08 max_grad_norm: float = 1.0 num_train_epochs: float = 3.0 max_steps: int = -1 lr_scheduler_type: transformers.trainer_utils.SchedulerType | str = 'linear' lr_scheduler_kwargs: dict[str, typing.Any] | str | None = <factory> warmup_ratio: float = 0.0 warmup_steps: int = 0 log_level: str = 'info' log_level_replica: str = 'silent' logging_dir: str | None = None logging_strategy: transformers.trainer_utils.IntervalStrategy | str = 'steps' logging_first_step: bool = False logging_steps: float = 500 save_strategy: transformers.trainer_utils.SaveStrategy | str = 'steps' save_steps: float = 500 save_total_limit: int | None = None save_only_model: bool = False restore_callback_states_from_checkpoint: bool = False seed: int = 42 bf16: bool = False dataloader_drop_last: bool = False eval_steps: float | None = None dataloader_num_workers: int = 0 dataloader_prefetch_factor: int | None = None run_name: str | None = None disable_tqdm: bool | None = None remove_unused_columns: bool | None = True label_names: list[str] | None = None accelerator_config: dict | str | None = None label_smoothing_factor: float = 0.0 optim: transformers.training_args.OptimizerNames | str = 'adamw_torch' optim_args: str | None = None report_to: None | str | list[str] = None resume_from_checkpoint: str | None = None gradient_checkpointing: bool = False gradient_checkpointing_kwargs: dict[str, typing.Any] | str | None = None use_liger_kernel: bool | None = False average_tokens_across_devices: bool | None = False dataloader_prefetch_size: int = None skip_cache_push: bool = False use_autocast: bool = False zero_1: bool = False tensor_parallel_size: int = 1 disable_sequence_parallel: bool = False pipeline_parallel_size: int = 1 pipeline_parallel_num_microbatches: int = -1 kv_size_multiplier: int | None = None num_local_ranks_per_step: int = 8 use_xser: bool = True async_save: bool = False fuse_qkv: bool = False recompute_causal_mask: bool = True )
Returns the log level to be used depending on whether this process is the main process of node 0, main process of node non-0, or a non-main process.
For the main process the log level defaults to the logging level set (logging.WARNING
if you didn’t do
anything) unless overridden by log_level
argument.
For the replica processes the log level defaults to logging.WARNING
unless overridden by log_level_replica
argument.
The choice between the main and replica process settings is made according to the return value of should_log
.
Get number of steps used for a linear warmup.
Serializes this instance while replace Enum
by their values (for JSON serialization support). It obfuscates
the token values by removing their value.
Serializes this instance to a JSON string.
Sanitized serialization to use with TensorBoard’s hparams
NeuronTrainer
class optimum.neuron.NeuronTrainer
< source >( model: transformers.modeling_utils.PreTrainedModel | torch.nn.modules.module.Module args: NeuronTrainingArguments data_collator: typing.Optional[transformers.data.data_collator.DataCollator] = None train_dataset: Dataset | IterableDataset | datasets.Dataset | None = None eval_dataset: Dataset | dict[str, Dataset] | datasets.Dataset | None = None processing_class: transformers.tokenization_utils_base.PreTrainedTokenizerBase | transformers.image_processing_utils.BaseImageProcessor | transformers.feature_extraction_utils.FeatureExtractionMixin | transformers.processing_utils.ProcessorMixin | None = None callbacks: list[transformers.trainer_callback.TrainerCallback] | None = None optimizers: tuple[torch.optim.optimizer.Optimizer | None, torch.optim.lr_scheduler.LambdaLR | None] = (None, None) optimizer_cls_and_kwargs: tuple[type[torch.optim.optimizer.Optimizer], dict[str, typing.Any]] | None = None tokenizer: transformers.tokenization_utils_base.PreTrainedTokenizerBase | None = None )
add_callback
< source >( callback: typing.Union[typing.Type[transformers.trainer_callback.TrainerCallback], transformers.trainer_callback.TrainerCallback] )
Add a callback to the current list of TrainerCallback
.
A helper wrapper that creates an appropriate context manager for autocast
while feeding it the desired
arguments, depending on the situation.
Creates NeuronAccelerator instance and prepares model for distributed training.
Setup the optimizer.
We provide a reasonable default that works well. If you want to use something else, you can pass a tuple in the
NeuronTrainer’s init through optimizers
, or subclass and override this method in a subclass.
Setup the optimizer and the learning rate scheduler.
We provide a reasonable default that works well. If you want to use something else, you can pass a tuple in the
NeuronTrainer’s init through optimizers
, or subclass and override this method (or create_optimizer
and/or
create_scheduler
) in a subclass.
create_scheduler
< source >( num_training_steps: int optimizer: torch.optim.optimizer.Optimizer | None = None )
Setup the scheduler. The optimizer of the trainer must have been set up either before this method is called or passed as an argument.
Get all parameter names that weight decay will be applied to.
This function filters out parameters in two ways:
- By layer type (instances of layers specified in ALL_LAYERNORM_LAYERS)
- By parameter name patterns (containing ‘bias’, ‘layernorm’, or ‘rmsnorm’)
Returns the learning rate of each parameter from self.optimizer.
Get the number of trainable parameters.
get_optimizer_cls_and_kwargs
< source >( args: TrainingArguments model: transformers.modeling_utils.PreTrainedModel | None = None )
Returns the optimizer class and optimizer parameters based on the training arguments.
get_optimizer_group
< source >( param: str | torch.nn.parameter.Parameter | None = None )
Returns optimizer group for a parameter if given, else returns all optimizer groups for params.
Returns the training DataLoader with appropriate sampler and batch size.
Whether or not this process is the local (e.g., on one machine if training in a distributed fashion on several machines) main process.
Whether or not this process is the global main process (when training in a distributed fashion on several
machines, this is only going to be True
for one process).
Log training metrics to the state history and callbacks.
Log training step metrics if logging is due.
Save checkpoint if saving is due.
Helper to get number of samples in a ~torch.utils.data.DataLoader
by accessing its dataset. When
dataloader.dataset does not exist or has no length, estimates as best it can
Helper to get number of tokens in a ~torch.utils.data.DataLoader
by enumerating dataloader.
pop_callback
< source >( callback: typing.Union[typing.Type[transformers.trainer_callback.TrainerCallback], transformers.trainer_callback.TrainerCallback] ) → TrainerCallback | None
Remove a callback from the current list of TrainerCallback
and returns it.
If the callback is not found, returns None
(and no error is raised).
remove_callback
< source >( callback: typing.Union[typing.Type[transformers.trainer_callback.TrainerCallback], transformers.trainer_callback.TrainerCallback] )
Remove a callback from the current list of TrainerCallback
.
set_initial_training_values
< source >( args: NeuronTrainingArguments dataloader: DataLoader total_train_batch_size: int )
Calculates and returns the following values:
num_train_epochs
num_update_steps_per_epoch
num_examples
num_train_samples
epoch_based
len_dataloader
max_steps
setup_training
< source >( train_dataloader: DataLoader max_steps: int num_train_epochs: int num_examples: int total_train_batch_size: int )
Setup everything to prepare for the training loop. This methods does not return anything but initializes many attributes of the class for training.