Update README.md

e988c35 verified 5 months ago

5.72 kB

	---
	language: en
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- medical
	- healthcare
	- medical-feature-extraction
	- clinical-nlp
	- calibration
	- instruction-fine-tuned
	- nlp
	- mistral
	base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
	datasets:
	- nbme-score-clinical-patient-notes
	---
	# Mistral_calibrative_few

	## Model Description

	This model is the few-shot trained calibrative fine-tuned version of Multi-CONFE (Confidence-Aware Medical Feature Extraction), built on [unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit](https://huggingface.co/unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit). It demonstrates exceptional data efficiency by achieving near state-of-the-art performance while training on only 12.5% of the available data, with particular emphasis on confidence calibration and hallucination reduction.

	## Intended Use

	This model is designed for extracting clinically relevant features from medical patient notes with high accuracy and well-calibrated confidence scores in low-resource settings. It's particularly useful for automated assessment of medical documentation, such as USMLE Step-2 Clinical Skills notes, when training data is limited.

	## Training Data

	The model was trained on just 100 annotated patient notes (12.5% of the full dataset) from the [NBME - Score Clinical Patient Notes](https://www.kaggle.com/competitions/nbme-score-clinical-patient-notes) Kaggle competition dataset. This represents approximately 10 examples per clinical case type. The dataset contains USMLE Step-2 Clinical Skills patient notes covering 10 different clinical cases, with each note containing expert annotations for multiple medical features that need to be extracted.

	## Training Procedure

	Training involved a two-phase approach:
	1. Instructive Few-Shot Fine-Tuning: Initial alignment of the model with the medical feature extraction task using Mistral Nemo Instruct as the base model.
	2. Calibrative Fine-Tuning: Integration of confidence calibration mechanisms, including bidirectional feature mapping, complexity-aware confidence adjustment, and dynamic thresholding.

	Training hyperparameters:
	- Base model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
	- LoRA rank: 32
	- Training epochs: 14 (instructive phase) + 5 (calibrative phase)
	- Learning rate: 2e-4 (instructive phase), 1e-4 (calibrative phase)
	- Optimizer: AdamW (8-bit)
	- Hallucination weight: 0.2
	- Missing feature weight: 0.5
	- Confidence threshold: 0.7

	## Performance

	On the USMLE Step-2 Clinical Skills notes dataset:
	- Precision: 0.982
	- Recall: 0.964
	- F1 Score: 0.973

	The model achieves this impressive performance with only 12.5% of the training data used for the full model, demonstrating exceptional data efficiency. It reduces hallucination by 84.9% and missing features by 85.0% compared to vanilla models. This makes it particularly valuable for domains where annotated data may be scarce or expensive to obtain.

	## Limitations

	- The model was evaluated on standardized USMLE Step-2 Clinical Skills notes and may require adaptation for other clinical domains.
	- Some errors stem from knowledge gaps in specific medical terminology or inconsistencies in annotation.
	- Performance on multilingual or non-standardized clinical notes remains untested.
	- While highly effective, it still performs slightly below the full-data model (F1 score 0.973 vs. 0.981).

	## Ethical Considerations

	Automated assessment systems must ensure fairness across different student populations. While the calibration mechanism enhances interpretability, systematic bias testing is recommended before deployment in high-stakes assessment scenarios. When using this model for educational assessment, we recommend:

	1. Implementing a human-in-the-loop validation process
	2. Regular auditing for demographic parity
	3. Clear communication to students about the use of AI in assessment

	## How to Use

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Load model and tokenizer
	model_name = "Manal0809/Mistral_calibrative_few"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)

	# Example input
	patient_note = """HPI: 35 yo F with heavy uterine bleeding. Last normal period was 6 month ago.
	LMP was 2 months ago. No clots.
	Changes tampon every few hours, previously 4/day. Menarche at 12.
	Attempted using OCPs for menstrual regulation previously but unsuccessful.
	Two adolescent children (ages unknown) at home.
	Last PAP 6 months ago was normal, never abnormal.
	Gained 10-15 lbs over the past few months, eating out more though.
	Hyperpigmented spots on hands and LT neck that she noticed 1-2 years ago.
	SH: state social worker; no smoking or drug use; beer or two on weekends;
	sexually active with boyfriend of 14 months, uses condoms at first but no longer uses them."""

	features_to_extract = ["35-year", "Female", "heavy-periods", "symptoms-for-6-months",
	"Weight-Gain", "Last-menstrual-period-2-months-ago",
	"Fatigue", "Unprotected-Sex", "Infertility"]

	# Format input as shown in the paper
	input_text = f"""###instruction: Extract medical features from the patient note.
	###patient_history: {patient_note}
	###features: {features_to_extract}
	### Annotation:"""

	# Generate output
	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
	outputs = model.generate(
	inputs["input_ids"],
	max_new_tokens=512,
	temperature=0.2,
	num_return_sequences=1
	)
	result = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(result)
	```

	## Model Card Author

	Manal Abumelha - [email protected]

	## Citation