FLAN‑T5‑small · PubMedQA (LoRA/QLoRA)

This repository contains a parameter‑efficient fine‑tuning of the FLAN‑T5‑small model for a biomedical question‑answering task.
The model produces one of three answers — yes, no, or maybe — given a question and a short context drawn from biomedical abstracts.
Training uses QLoRA (4‑bit NF4 quantization via bitsandbytes) together with LoRA adapters on the base model, allowing low‑VRAM training and fast inference.


Model Details

Model Description

  • Architecture: Encoder–decoder transformer (T5 family, FLAN‑T5‑small)
  • Objective: Sequence‑to‑sequence generation of one of three labels (yes, no, maybe)
  • Parameter‑efficient training: LoRA adapters trained on top of a 4‑bit‑quantized base model (QLoRA)
  • Language: English (biomedical literature)
  • Finetuned from: google/flan‑t5‑small
  • Intended format: Use as LoRA adapters (recommended) or merge into a full model for deployment.

Model Sources


Uses

Direct Use

  • Biomedical yes/no/maybe question answering on short context passages (e.g., sentences or abstracts).
  • Deployment via adapters using the PEFT framework, enabling small checkpoints and flexible precision.

Downstream Use

  • Component in biomedical literature triage systems or heuristic pipelines.
  • Starting point for further PEFT‑style adaptation on related biomedical QA datasets.

Out‑of‑Scope Use

  • Clinical decision making: Not a medical device. Do not use for diagnosis or treatment.
  • Free‑form generation outside the label space (yes, no, maybe).
  • Long document reasoning without retrieval or summarization.

Bias, Risks, and Limitations

  • Domain bias: The model is trained solely on PubMedQA; performance may degrade on layperson or cross‑domain biomedical text.
  • Restricted vocabulary: Optimized to output only yes, no, or maybe.
  • Hallucination: Like all seq2seq models, it can produce incorrect or over‑confident outputs.
  • Safety: Do not rely on the model for clinical advice; human oversight is required.

Recommendations:

  • Clamp outputs to the label set and log confidence proxies (e.g., beam scores).
  • For longer contexts, consider retrieval‑augmented preprocessing or chunking.

How to Get Started with the Model

You can load the model using the Transformers and PEFT libraries.
The recommended approach is to load the base model and LoRA adapter separately.

Example: Using Adapters

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from peft import PeftModel
from bitsandbytes.config import BitsAndBytesConfig
import torch

base_model_id = "google/flan-t5-small"
adapter_id = "MileStanislavov/flan-t5-small-pubmedqa-lora"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForSeq2SeqLM.from_pretrained(base_model_id, quantization_config=bnb_config)
model = PeftModel.from_pretrained(base_model, adapter_id)

def predict_yes_no_maybe(question, context):
    prompt = f"question: {question} context: {context}"
    inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=4, num_beams=4, do_sample=False)
    text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip().lower().replace(".", "")
    return text if text in {"yes", "no", "maybe"} else "maybe"

Example: Using Merged Model

If you have merged the LoRA weights into the base model using merge_and_unload(), load the merged model directly:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("your-username/flan-t5-small-pubmedqa-merged")
tokenizer = AutoTokenizer.from_pretrained("your-username/flan-t5-small-pubmedqa-merged")

Training Details

Training Data

  • Dataset: PubMedQA pqa_labeled
  • Preprocessing: Stratified train/validation/test split by label.
    • Inputs formatted as question: <q> context: <context> (truncated to 512 tokens)
    • Targets truncated to 8 tokens

Training Procedure

  • Adapters: LoRA with rank 16, α = 32, dropout 0.05
  • Target Modules: ["q", "k", "v", "o", "wi_0", "wi_1", "wo"] (T5 attention and feed‑forward projections)
  • Quantization: 4‑bit NF4 with double quantization; compute dtype float16 or bfloat16
  • Optimizer: Adafactor, constant learning rate of 2e-4
  • Batching: Effective batch size ≈ 8 (per‑device 4, gradient accumulation 2)
  • Epochs: 3
  • Regularization: Gradient checkpointing enabled, prepared for k‑bit training and input gradients enabled

Evaluation metrics: Accuracy and macro‑F1 on validation/test splits.


Evaluation

Evaluation uses the validation and test splits from PubMedQA.
Metrics include accuracy and macro‑F1.


Environmental Impact

The parameter‑efficient approach (QLoRA + LoRA) significantly reduces compute requirements and energy usage compared to full fine‑tuning.

Estimate CO₂ emissions using the Machine Learning Impact calculator.


Model card authored by [Mile Stanislavov].
Please contact [Mile Stanislavov] for questions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MileStanislavov/flan-t5-small-pubmedqa-lora

Adapter
(57)
this model

Dataset used to train MileStanislavov/flan-t5-small-pubmedqa-lora