MedLLM-10M: Medical Language Model

Model Description

MedLLM-10M is a lightweight GPT-style language model specifically trained on medical literature and clinical text. This model is designed for educational and research purposes in the medical domain.

⚠️ Important Disclaimer: This model is for research and educational purposes only. It should never be used for actual medical diagnosis, treatment recommendations, or clinical decision-making without proper medical supervision.

Model Details

  • Model Type: Causal Language Model (GPT-style)
  • Parameters: ~27.7M
  • Architecture: Transformer decoder
  • Training Data: Medical literature, PubMed abstracts, clinical guidelines
  • Vocabulary Size: 5,000
  • Context Length: 512 tokens
  • License: Apache 2.0

Architecture

Layers: 8
Hidden Size: 512
Attention Heads: 8
Feed Forward Size: 2048
Dropout: 0.1
Activation: gelu

Training Details

The model was trained on a curated dataset of medical literature including:

  • PubMed abstracts and research papers
  • Medical journal articles
  • Clinical practice guidelines
  • Medical Q&A datasets
  • Healthcare websites (Mayo Clinic, WebMD, etc.)

Training Hyperparameters

  • Epochs: 10
  • Batch Size: 4
  • Learning Rate: 0.0003
  • Optimizer: AdamW
  • Weight Decay: 0.01
  • Mixed Precision: FP16 (if available)

Hardware

  • Training Hardware: NVIDIA RTX 3060 (12GB VRAM)
  • Framework: PyTorch + Transformers

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("raihan-js/medllm-10m")
model = AutoModelForCausalLM.from_pretrained("raihan-js/medllm-10m")

# Generate medical text
prompt = "Symptoms of diabetes include"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_length=100, 
    do_sample=True, 
    temperature=0.7,
    pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Model Performance

This is an early-stage model trained on limited data. Current capabilities include:

  • Basic medical terminology understanding
  • Simple text completion in medical contexts
  • Educational content generation

Known Limitations:

  • May generate incoherent or medically inaccurate text
  • Requires significant additional training for production use
  • Should not be used for medical advice or diagnosis

Intended Use Cases

✅ Appropriate Uses

  • Educational demonstrations of medical language models
  • Research into medical NLP applications
  • Text completion for medical writing assistance (with human review)
  • Learning and experimentation with transformer models

❌ Inappropriate Uses

  • Medical diagnosis or treatment recommendations
  • Clinical decision-making
  • Patient care without human oversight
  • Emergency medical situations
  • Replacement for professional medical advice

Ethical Considerations

Medical Disclaimer

⚠️ CRITICAL WARNING: This model is NOT intended for medical use. Always consult qualified healthcare professionals for medical advice, diagnosis, or treatment.

Limitations and Biases

  • Training data may contain biases present in medical literature
  • Model may reflect historical or cultural biases in healthcare
  • Performance varies significantly across different medical specialties
  • May generate plausible but medically incorrect information

Development Status

This is an experimental model in early development. Future improvements planned:

  • Expanded training dataset
  • Longer training duration
  • Better medical accuracy evaluation
  • Safety filtering and alignment
  • Domain-specific fine-tuning

Citation

@misc{medllm2024,
  title={MedLLM: A Lightweight Medical Language Model},
  author={Raihan},
  year={2024},
  publisher={HuggingFace},
  url={https://huggingface.co/raihan-js/medllm-10m}
}

Contact

For questions about this model, please open an issue in the model repository.


Last Updated: December 2024 Model Version: 1.0-alpha Status: Experimental - Not for production use

Downloads last month
16
Safetensors
Model size
28M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Dataset used to train raihan-js/medllm-10m