--- library_name: transformers tags: - bulk RNA-seq - biology - transcriptomics --- # BulkRNABert BulkRNABert is a transformer-based, encoder-only language model pre-trained on bulk RNA-seq data using self-supervision via masked language modeling, following BERT’s method. It can be further fine-tuned for cancer type classification and survival time prediction on the TCGA dataset. **Developed by:** [InstaDeep](https://huggingface.co/InstaDeepAI) ### Model Sources - [**Repository**](https://github.com/instadeepai/multiomics-open-research) - **Paper:** [BulkRNABert: Cancer prognosis from bulk RNA-seq based language models](https://proceedings.mlr.press/v259/gelard25a.html) ### How to use Until its next release, the transformers library needs to be installed from source using the following command to use the models. PyTorch should also be installed. ``` pip install --upgrade git+https://github.com/huggingface/transformers.git pip install torch ``` A small snippet of code is provided below to run inference with the model using random input. ``` import torch from transformers import AutoConfig, AutoModel model = AutoModel.from_pretrained( "InstaDeepAI/BulkRNABert", trust_remote_code=True, ) n_genes = model.config.n_genes dummy_gene_expressions = torch.randint(0, model.config.n_expressions_bins, (1, n_genes)) torch_output = model(dummy_gene_expressions) ``` A more complete example is provided in the repository. ### Citing our work ``` @InProceedings{pmlr-v259-gelard25a, title = {BulkRNABert: Cancer prognosis from bulk RNA-seq based language models}, author = {G{\'{e}}lard, Maxence and Richard, Guillaume and Pierrot, Thomas and Courn{\`{e}}de, Paul-Henry}, booktitle = {Proceedings of the 4th Machine Learning for Health Symposium}, pages = {384--400}, year = {2025}, editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran}, volume = {259}, series = {Proceedings of Machine Learning Research}, month = {15--16 Dec}, publisher = {PMLR}, url = {https://proceedings.mlr.press/v259/gelard25a.html}, } ```