biomed-multi-omic
Biology
RNA
File size: 3,862 Bytes
9cbb4e4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
library_name: biomed-multi-omic
license: apache-2.0
tags:
- Biology
- RNA
datasets:
- PanglaoDB
- CELLxGENE
---

# ibm-research/biomed.rna.bert.110m.wced.multitask.v1

Biomedical foundational models for omics data. This package supports the development of foundation models for scRNA or for DNA data.

`biomed-multi-omic` enables development and testing of foundation models for DNA sequences and for RNA expression,
with modular model and training methods for pretraining and fine-tuning, controllable via a declarative no-code interface.
`biomed-multi-omic` leverages anndata, HuggingFace Transformers, PyTorchLighting and Hydra.

- 🧬 A single package for DNA and RNA Foundation models. scRNA pretraining on h5ad files or TileDB (eg CellXGene), DNA pretraining on reference human genome (GRCh38/hg38) and also variant imputed genome based on common SNPs available from GWAT catalog and ClinVar datasets.
- 🚀 Leverages latest open source tools: anndata, HuggingFace transformers and PyTorchLighting
- 📈 Zero-shot and finetuning support for diverse downstream tasks: (cell type annotation, perturbation prediction for scRNA, promoter prediction task and regulatory regions using Massively parallel reporter assays (MPRAs)
for DNA sequences)
- Novel pretraining strategies for scRNA and DNA implemented alongside existing methods to enable experimentation and comparison.

For details on how the models were trained, please refer to [the BMFM-RNA preprint](https://arxiv.org/abs/2506.14861).

- **Developers:** IBM Research
- **GitHub Repository:** [https://github.com/BiomedSciAI/biomed-multi-omic](https://github.com/BiomedSciAI/biomed-multi-omic)
- **Paper:** [BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models](https://arxiv.org/abs/2506.14861)
- **Release Date**: Jun 17th, 2025
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)

## Checkpoint

Whole-cell Expression Decoder (WCED):  Using the BMFM-RNA framework, we implemented a new pretraining objective that is centered around predicting the expression levels for the whole cell at once, rather than limiting to just the masked
genes.

Multitask objectives: multi-label classification (cell type, tissue, tissue general), and an adversarial loss to unlearn donor ID.

**WCED + Multitask:** Trained first using WCED with random gene order and log-normalization, then fine-tuned with multitask objectives.

See section 2.3.3 of [the BMFM-RNA manuscript](https://arxiv.org/abs/2506.14861) for more details.

## Usage

Using `biomed.rna.bert.110m.wced.multitask.v1` requires the codebase [https://github.com/BiomedSciAI/biomed-multi-omic](https://github.com/BiomedSciAI/biomed-multi-omic)

For installation, please follow the [instructions on github](https://github.com/BiomedSciAI/biomed-multi-omic?tab=readme-ov-file#installation).

## RNA Inference

To get embeddings and predictions for scRNA data run:

```bash
export MY_DATA_FILE=... # path to h5ad file with raw counts and gene symbols
bmfm-targets-run -cn predict input_file=$MY_DATA_FILE working_dir=/tmp checkpoint=ibm-research/biomed.rna.bert.110m.wced.multitask.v1
```

For more details see the [RNA tutorials on github](https://github.com/BiomedSciAI/biomed-multi-omic/tree/main/tutorials/RNA).

## Citation

```bibtex
@misc{dandala2025bmfmrnaopenframeworkbuilding,
      title={BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models},
      author={Bharath Dandala and Michael M. Danziger and Ella Barkan and Tanwi Biswas and Viatcheslav Gurev and Jianying Hu and Matthew Madgwick and Akira Koseki and Tal Kozlovski and Michal Rosen-Zvi and Yishai Shimoni and Ching-Huei Tsou},
      year={2025},
      eprint={2506.14861},
      archivePrefix={arXiv},
      primaryClass={q-bio.GN},
      url={https://arxiv.org/abs/2506.14861},
}
```