GenerTeam nielsr HF Staff commited on
Commit
3663304
·
verified ·
1 Parent(s): 4b535ef

Add link to Github repository (#1)

Browse files

- Add link to Github repository (da9ed19d5f6fd7ed83a5c1e0c252fb0a45e148dc)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -1,18 +1,21 @@
1
  ---
 
2
  license: mit
3
  pipeline_tag: text-generation
4
  tags:
5
  - biology
6
  - genomics
7
  - long-context
8
- library_name: transformers
9
  ---
 
10
  # GENERator-eukaryote-3b-base model
11
 
12
  ## Abouts
13
  In this repository, we present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs and 3B parameters, trained on an expansive dataset comprising 386 billion base pairs of eukaryotic DNA. The extensive and diverse pre-training data endow the GENERator with enhanced understanding and generation capabilities across various organisms.
14
 
15
- For more technical details, please refer to our paper [GENERator: A Long-Context Generative Genomic Foundation Model](https://huggingface.co/GenerTeam).
 
 
16
 
17
  ## How to use
18
  ### Simple example1: generation
@@ -72,7 +75,7 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
72
 
73
  # Load the tokenizer and model.
74
  tokenizer = AutoTokenizer.from_pretrained("GENERator-eukaryote-3b-base", trust_remote_code=True)
75
- model = AutoModelForCausalLM.from_pretrained("GENERator-eukaryote-3b-base")
76
 
77
  config = model.config
78
  max_length = config.max_position_embeddings
@@ -132,4 +135,4 @@ print("Sequence Embeddings:", seq_embeddings)
132
  primaryClass={cs.CL},
133
  url={https://arxiv.org/abs/2502.07272},
134
  }
135
- ```
 
1
  ---
2
+ library_name: transformers
3
  license: mit
4
  pipeline_tag: text-generation
5
  tags:
6
  - biology
7
  - genomics
8
  - long-context
 
9
  ---
10
+
11
  # GENERator-eukaryote-3b-base model
12
 
13
  ## Abouts
14
  In this repository, we present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs and 3B parameters, trained on an expansive dataset comprising 386 billion base pairs of eukaryotic DNA. The extensive and diverse pre-training data endow the GENERator with enhanced understanding and generation capabilities across various organisms.
15
 
16
+ For more technical details, please refer to our paper [GENERator: A Long-Context Generative Genomic Foundation Model](https://huggingface.co/papers/2502.07272).
17
+
18
+ Code: https://github.com/GenerTeam/GENERator
19
 
20
  ## How to use
21
  ### Simple example1: generation
 
75
 
76
  # Load the tokenizer and model.
77
  tokenizer = AutoTokenizer.from_pretrained("GENERator-eukaryote-3b-base", trust_remote_code=True)
78
+ model = AutoModelForCausalLM.from_pretrained("GenerTeam/GENERator-eukaryote-3b-base")
79
 
80
  config = model.config
81
  max_length = config.max_position_embeddings
 
135
  primaryClass={cs.CL},
136
  url={https://arxiv.org/abs/2502.07272},
137
  }
138
+ ```