Longformer Encoder-Decoder (LED) for Narrative-Esque Long Text Summarization

Open In Colab

A fine-tuned version of allenai/led-large-16384 on the BookSum dataset.

Goal: a model that can generalize well and is useful in summarizing long text in academic and daily usage. The result works well on lots of text and can handle 16384 tokens/batch (if you have the GPU memory to handle that)

Note: the API is set to generate a max of 64 tokens for runtime reasons, so the summaries may be truncated (depending on the length of input text). For best results use python as below.


Usage - Basic

  • use encoder_no_repeat_ngram_size=3 when calling the pipeline object to improve summary quality.
    • this forces the model to use new vocabulary and create an abstractive summary, otherwise it may compile the best extractive summary from the input provided.

Load the model into a pipeline object:

import torch
from transformers import pipeline

hf_name = 'pszemraj/led-large-book-summary'

summarizer = pipeline(
    "summarization",
    hf_name,
    device=0 if torch.cuda.is_available() else -1,
)
  • put words into the pipeline object:
wall_of_text = "your words here"

result = summarizer(
    wall_of_text,
    min_length=16,
    max_length=256,
    no_repeat_ngram_size=3,
    encoder_no_repeat_ngram_size=3,
    repetition_penalty=3.5,
    num_beams=4,
    early_stopping=True,
)

Note: The global attention mask needs to be used when decoding to generate the best-quality summaries:

import torch

def generate_answer(batch):
  inputs_dict = tokenizer(batch["article"], padding="max_length", max_length=16384, return_tensors="pt", truncation=True)
  input_ids = inputs_dict.input_ids.to("cuda")
  attention_mask = inputs_dict.attention_mask.to("cuda")

  global_attention_mask = torch.zeros_like(attention_mask)
  # put global attention on <s> token
  global_attention_mask[:, 0] = 1

  predicted_abstract_ids = model.generate(input_ids, attention_mask=attention_mask, global_attention_mask=global_attention_mask, max_length=512, num_beams=4)
  batch["predicted_abstract"] = tokenizer.batch_decode(predicted_abstract_ids, skip_special_tokens=True)
  return batch

Training and evaluation data

  • You'll want to train on the BookSum dataset
  • During training, the input text was the text of the chapter, and the output was summary_text
  • Eval results can be found here with metrics on the sidebar.

Training procedure

  • Training completed on the BookSum dataset for 13 total epochs
  • The final four epochs combined the training and validation sets as 'train' in an effort to increase generalization.

Training hyperparameters

Initial Three Epochs

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

In-between Epochs

Unfortunately, don't have all records on-hand for middle epochs; the following should be representative:

  • learning_rate: 4e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 6 (in addition to prior model)

Final Two Epochs

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 2 (in addition to prior model)

Framework versions

  • Transformers 4.19.2
  • Pytorch 1.11.0+cu113
  • Datasets 2.2.2
  • Tokenizers 0.12.1
Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train andreaparker/long-summ

Evaluation results