biomed-multi-omic
Biology
RNA
thrumbel commited on
Commit
dc56c99
·
verified ·
1 Parent(s): 642672c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: biomed-multi-omic
3
+ license: apache-2.0
4
+ tags:
5
+ - Biology
6
+ - RNA
7
+ datasets:
8
+ - PanglaoDB
9
+ - CELLxGENE
10
+ ---
11
+
12
+ # ibm-research/biomed.rna.bert.110m.mlm.rda.v1
13
+
14
+ Biomedical foundational models for omics data. This package supports the development of foundation models for scRNA or for DNA data.
15
+
16
+ `biomed-multi-omic` enables development and testing of foundation models for DNA sequences and for RNA expression,
17
+ with modular model and training methods for pretraining and fine-tuning, controllable via a declarative no-code interface.
18
+ `biomed-multi-omic` leverages anndata, HuggingFace Transformers, PyTorchLighting and Hydra.
19
+
20
+ - 🧬 A single package for DNA and RNA Foundation models. scRNA pretraining on h5ad files or TileDB (eg CellXGene), DNA pretraining on reference human genome (GRCh38/hg38) and also variant imputed genome based on common SNPs available from GWAT catalog and ClinVar datasets.
21
+ - 🚀 Leverages latest open source tools: anndata, HuggingFace transformers and PyTorchLighting
22
+ - 📈 Zero-shot and finetuning support for diverse downstream tasks: (cell type annotation, perturbation prediction for scRNA, promoter prediction task and regulatory regions using Massively parallel reporter assays (MPRAs)
23
+ for DNA sequences)
24
+ - Novel pretraining strategies for scRNA and DNA implemented alongside existing methods to enable experimentation and comparison.
25
+
26
+ For details on how the models were trained, please refer to [the BMFM-RNA preprint](https://arxiv.org/abs/2506.14861).
27
+
28
+ - **Developers:** IBM Research
29
+ - **GitHub Repository:** [https://github.com/BiomedSciAI/biomed-multi-omic](https://github.com/BiomedSciAI/biomed-multi-omic)
30
+ - **Paper:** [BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models](https://arxiv.org/abs/2506.14861)
31
+ - **Release Date**: Jun 17th, 2025
32
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
33
+
34
+ ## Checkpoint
35
+
36
+ Masked Language Modeling (MLM): Masked expression prediction and masked gene prediction.
37
+
38
+ Read-Depth Aware (RDA): RDA pretraining tasks uses Bayesian downsampling to improve robustness across variable sequencing depths.
39
+
40
+ **MLM + RDA:** trained with expression masking and RDA downsampling, following the settings in scFoundation.
41
+
42
+ See section 2.3.1 of [the BMFM-RNA manuscript](https://arxiv.org/abs/2506.14861) for more details.
43
+
44
+ ## Usage
45
+
46
+ Using `biomed.rna.bert.110m.mlm.rda.v1` requires the codebase [https://github.com/BiomedSciAI/biomed-multi-omic](https://github.com/BiomedSciAI/biomed-multi-omic)
47
+
48
+ For installation, please follow the [instructions on github](https://github.com/BiomedSciAI/biomed-multi-omic?tab=readme-ov-file#installation).
49
+
50
+ ## RNA Inference
51
+
52
+ To get embeddings and predictions for scRNA data run:
53
+
54
+ ```bash
55
+ export MY_DATA_FILE=... # path to h5ad file with raw counts and gene symbols
56
+ bmfm-targets-run -cn predict input_file=$MY_DATA_FILE working_dir=/tmp checkpoint=ibm-research/biomed.rna.bert.110m.mlm.rda.v1
57
+ ```
58
+
59
+ For more details see the [RNA tutorials on github](https://github.com/BiomedSciAI/biomed-multi-omic/tree/main/tutorials/RNA).
60
+
61
+ ## Citation
62
+
63
+ ```bibtex
64
+ @misc{dandala2025bmfmrnaopenframeworkbuilding,
65
+ title={BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models},
66
+ author={Bharath Dandala and Michael M. Danziger and Ella Barkan and Tanwi Biswas and Viatcheslav Gurev and Jianying Hu and Matthew Madgwick and Akira Koseki and Tal Kozlovski and Michal Rosen-Zvi and Yishai Shimoni and Ching-Huei Tsou},
67
+ year={2025},
68
+ eprint={2506.14861},
69
+ archivePrefix={arXiv},
70
+ primaryClass={q-bio.GN},
71
+ url={https://arxiv.org/abs/2506.14861},
72
+ }
73
+ ```