README.md · lukeingawesome/llm2vec4cxr at refs/pr/1

File size: 6,110 Bytes

---
license: mit
base_model: microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned
tags:
- text-embeddings
- sentence-transformers
- llm2vec
- medical
- chest-xray
- radiology
- clinical-nlp
language:
- en
pipeline_tag: feature-extraction
library_name: transformers
---

# LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis

This model is a fine-tuned version of [microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned](https://huggingface.co/microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned) specifically optimized for chest X-ray report analysis and medical text understanding.

## Model Description

LLM2Vec4CXR is a bidirectional language model that converts the base decoder-only LLM into a text encoder optimized for medical text embeddings. The model has been fully fine-tuned with modified pooling strategy (`latent_attention`) to better capture semantic relationships in chest X-ray reports.

### Key Features

- **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct
- **Pooling Mode**: Latent Attention (modified from original)
- **Bidirectional Processing**: Enabled for better context understanding
- **Medical Domain**: Specialized for chest X-ray report analysis
- **Max Length**: 512 tokens
- **Precision**: bfloat16

## Training Details

### Training Data
- Fully fine-tuned on chest X-ray reports and medical text data
- Training focused on understanding pleural effusion status and other chest X-ray findings

### Training Configuration
- **Pooling Mode**: `latent_attention` (modified from base model)
- **Enable Bidirectional**: True
- **Max Length**: 512
- **Torch Dtype**: bfloat16
- **Full Fine-tuning**: All model weights were updated during training

## Usage

### Installation

```bash
# Install the LLM2Vec4CXR package directly from GitHub
pip install git+https://github.com/lukeingawesome/llm2vec4cxr.git

# Or clone and install in development mode
git clone https://github.com/lukeingawesome/llm2vec4cxr.git
cd llm2vec4cxr
pip install -e .
```

### Basic Usage

```python
from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec

# Load the model
model = LLM2Vec.from_pretrained(
    base_model_name_or_path='lukeingawesome/llm2vec4cxr',
    enable_bidirectional=True,
    pooling_mode="latent_attention",
    max_length=512,
    torch_dtype=torch.bfloat16,
)

# Simple text encoding (built-in method)
report = "There is a small increase in the left-sided effusion. There continues to be volume loss at both bases."
embedding = model.encode_text(report)

# Multiple texts at once
reports = [
    "No acute cardiopulmonary abnormality.",
    "Small bilateral pleural effusions.",
    "Large left pleural effusion with compressive atelectasis."
]
embeddings = model.encode_text(reports)
```

### Advanced Usage with Instructions

```python
# For instruction-following tasks with separator
separator = '!@#$%^&*()'
instruction = 'Determine the change or the status of the pleural effusion.'
report = 'There is a small increase in the left-sided effusion.'
text_with_instruction = instruction + separator + report

# Use the built-in method for instruction-based encoding
embedding = model.encode_with_instruction([text_with_instruction])
```

**Note**: The model now includes convenient `encode_text()` and `encode_with_instruction()` methods that handle the `embed_mask` automatically.

### Manual Usage (if you need more control)

If you need more control over the tokenization process, you can still use the manual approach:

```python
# Manual tokenization with embed_mask
def encode_text_manual(model, text):
    inputs = model.tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
    inputs["embed_mask"] = inputs["attention_mask"].clone()  # Required for proper functioning
    
    with torch.no_grad():
        embeddings = model(inputs)
    return embeddings

# For instruction-based tasks, use the built-in tokenize_with_separator method
tokenized = model.tokenize_with_separator([text_with_instruction])
embedding = model(tokenized)
```

## Evaluation

The model has been evaluated on chest X-ray report analysis tasks, particularly for:
- Pleural effusion status determination
- Medical text similarity comparison
- Clinical finding extraction

### Sample Performance

The model shows improved performance compared to the base model on medical text understanding tasks, particularly in distinguishing between different pleural effusion states and medical abbreviations.

## Intended Use

### Primary Use Cases
- **Medical Text Embeddings**: Generate embeddings for chest X-ray reports
- **Clinical Text Similarity**: Compare medical texts for semantic similarity
- **Medical Information Retrieval**: Find relevant medical reports or findings
- **Clinical NLP Research**: Foundation model for medical text analysis

### Limitations
- Specialized for chest X-ray reports - may not generalize to other medical domains
- Requires careful preprocessing for optimal performance
- Should be used as part of a larger clinical decision support system, not for standalone diagnosis

## Technical Specifications

- **Model Type**: Bidirectional Language Model (LLM2Vec)
- **Architecture**: LlamaBiModel (modified Llama 3.2)
- **Parameters**: ~1B parameters
- **Input Length**: Up to 512 tokens
- **Output**: Dense embeddings
- **Precision**: bfloat16

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{llm2vec4cxr,
  title={LLM2Vec4CXR: Fine-tuned Language Model for Chest X-ray Report Analysis},
  author={[Your Name]},
  year={2024},
  howpublished={\\url{https://huggingface.co/lukeingawesome/llm2vec4cxr}},
}
```

## Acknowledgments

This model is built upon:
- [LLM2Vec](https://github.com/McGill-NLP/llm2vec) - Framework for converting decoder-only LLMs into text encoders
- [LLM2CLIP](https://github.com/microsoft/LLM2CLIP) - Microsoft's implementation for connecting LLMs with CLIP models
- [microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned](https://huggingface.co/microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned) - Base model

## License

This model is licensed under the MIT License.