TRL documentation
Best of N sampling: Alternative ways to get better model output without RL based fine-tuning
Best of N sampling: Alternative ways to get better model output without RL based fine-tuning
Within the extras module is the best-of-n
sampler class that serves as an alternative method of generating better model output.
As to how it fares against the RL based fine-tuning, please look in the examples
directory for a comparison example
Usage
To get started quickly, instantiate an instance of the class with a model, a length sampler, a tokenizer and a callable that serves as a proxy reward pipeline that outputs reward scores for input queries
from transformers import pipeline, AutoTokenizer
from trl import AutoModelForCausalLMWithValueHead
from trl.core import LengthSampler
from trl.extras import BestOfNSampler
ref_model = AutoModelForCausalLMWithValueHead.from_pretrained(ref_model_name)
reward_pipe = pipeline("sentiment-analysis", model=reward_model, device=device)
tokenizer = AutoTokenizer.from_pretrained(ref_model_name)
tokenizer.pad_token = tokenizer.eos_token
# callable that takes a list of raw text and returns a list of corresponding reward scores
def queries_to_scores(list_of_strings):
return [output["score"] for output in reward_pipe(list_of_strings)]
best_of_n = BestOfNSampler(model, tokenizer, queries_to_scores, length_sampler=output_length_sampler)
And assuming you have a list/tensor of tokenized queries, you can generate better output by calling the generate
method
best_of_n.generate(query_tensors, device=device, **gen_kwargs)
The default sample size is 4, but you can change it at the time of instance initialization like so
best_of_n = BestOfNSampler(model, tokenizer, queries_to_scores, length_sampler=output_length_sampler, sample_size=8)
The default output is the result of taking the top scored output for each query, but you can change it to top 2 and so on by passing the n_candidates
argument at the time of instance initialization
best_of_n = BestOfNSampler(model, tokenizer, queries_to_scores, length_sampler=output_length_sampler, n_candidates=2)
There is the option of setting the generation settings (like temperature
, pad_token_id
) at the time of instance creation as opposed to when calling the generate
method.
This is done by passing a GenerationConfig
from the transformers
library at the time of initialization
from transformers import GenerationConfig
generation_config = GenerationConfig(min_length= -1, top_k=0.0, top_p= 1.0, do_sample= True, pad_token_id=tokenizer.eos_token_id)
best_of_n = BestOfNSampler(model, tokenizer, queries_to_scores, length_sampler=output_length_sampler, generation_config=generation_config)
best_of_n.generate(query_tensors, device=device)
Furthermore, at the time of initialization you can set the seed to control the repeatability of the generation process and the number of samples to generate for each query
BestOfNSampler
class trl.BestOfNSampler
< source >( model: PreTrainedModelWrapper tokenizer: typing.Union[transformers.tokenization_utils.PreTrainedTokenizer, transformers.tokenization_utils_fast.PreTrainedTokenizerFast] queries_to_scores: typing.Callable[[list[str]], list[float]] length_sampler: typing.Any sample_size: int = 4 seed: typing.Optional[int] = None n_candidates: int = 1 generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None )
Parameters
- model (PreTrainedModelWrapper) — The pretrained model to use for generation.
- tokenizer (PreTrainedTokenizer or PreTrainedTokenizerFast) — Tokenizer associated with the pretrained model.
- queries_to_scores (
Callable[[list[str]], list[float]]
) — Callable that takes a list of generated texts and returns the associated reward scores. - length_sampler (
Any
) — Sampler used to sample the length of the generated text. - sample_size (
int
, optional, defaults to4
) — Number of samples to generate for each query. - seed (
int
, optional) — Random seed used to control generation. - n_candidates (
int
, optional, defaults to1
) — Number of candidates to return for each query. - generation_config (GenerationConfig, optional) —
Generation config passed to the underlying model’s
generate
method. See GenerationConfig for more details.
Sampler for best-of-n generation.
generate
< source >( tokenized_query: typing.Union[list[int], torch.Tensor, list[torch.Tensor], list[list[int]]] skip_special_tokens: bool = True device: typing.Union[str, torch.device, NoneType] = None **generation_kwargs ) → list[list[str]]
Parameters
- tokenized_query (
list[int]
ortorch.Tensor
orlist[torch.Tensor]
orlist[list[int]]
) — Either a single tokenized query (a single tensor or a list of integers) or a batch of tokenized queries (a list of tensors or a list of lists of integers). - skip_special_tokens (
bool
, optional, defaults toTrue
) — Whether to remove the special tokens from the output. - device (
str
ortorch.device
, optional) — The device on which the model will be loaded. - **generation_kwargs —
Additional keyword arguments passed along to the underlying model’s
generate
method. This is used to override generation config.
Returns
list[list[str]]
A list of lists of generated texts.
Generate the best of n samples for input queries.