Safetensors
astronomy
multimodal
classification
MeriDK commited on
Commit
ccd2bec
·
verified ·
1 Parent(s): ab2e137

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -24
README.md CHANGED
@@ -4,18 +4,17 @@ tags:
4
  - multimodal
5
  - classification
6
  datasets:
7
- - MeriDK/AstroM3Processed
8
- - MeriDK/AstroM3Dataset
9
  ---
10
 
11
- # AstroM3-CLIP-all
12
-
13
  AstroM³ is a self-supervised multimodal model for astronomy that integrates time-series photometry, spectra, and metadata into a unified embedding space
14
- for classification and other downstream tasks. AstroM³ is trained on [AstroM3Processed](https://huggingface.co/datasets/MeriDK/AstroM3Processed).
 
15
  For more details on the AstroM³ architecture, training, and results, please refer to the [paper](https://arxiv.org/abs/2411.08842).
16
 
17
  <p align="center">
18
- <img src="astroclip-architecture.png" width="70%">
19
  <br />
20
  <span>
21
  Figure 1: Overview of the multimodal CLIP framework adapted for astronomy, incorporating three data modalities: photometric time-series, spectra, and metadata.
@@ -25,7 +24,7 @@ For more details on the AstroM³ architecture, training, and results, please ref
25
  </span>
26
  </p>
27
 
28
- To perform inference with AstroM³, install the AstroM3 library from our [GitHub repo](https://github.com/MeriDK/AstroM3).
29
  ```sh
30
  git clone https://github.com/MeriDK/AstroM3.git
31
  cd AstroM3
@@ -37,14 +36,14 @@ source venv/bin/activate
37
  uv pip install -r requirements.txt
38
  ```
39
 
40
- A simple example to get started:
41
  1. Data Loading & Preprocessing
42
  ```python
43
  from datasets import load_dataset
44
  from src.data import process_photometry
45
 
46
  # Load the test dataset
47
- test_dataset = load_dataset('MeriDK/AstroM3Processed', name='full_42', split='test')
48
 
49
  # Process photometry to have a fixed sequence length of 200 (center-cropped)
50
  test_dataset = test_dataset.map(process_photometry, batched=True, fn_kwargs={'seq_len': 200, 'how': 'center'})
@@ -56,7 +55,7 @@ import torch
56
  from src.model import AstroM3
57
 
58
  # Load the base AstroM3-CLIP model
59
- model = AstroM3.from_pretrained('MeriDK/AstroM3-CLIP')
60
 
61
  # Retrieve the first sample (batch size = 1)
62
  sample = test_dataset[0:1]
@@ -81,22 +80,22 @@ print('Multimodal Embedding (Spectra Missing):', multimodal_emb_missing)
81
  from src.model import AstroM3, Informer, GalSpecNet, MetaModel
82
 
83
  # Photometry classification
84
- photo_model = Informer.from_pretrained('MeriDK/AstroM3-CLIP-photo')
85
  prediction = photo_model(photometry, photometry_mask).argmax(dim=1).item()
86
  print('Photometry Classification:', test_dataset.features['label'].int2str(prediction))
87
 
88
  # Spectra classification
89
- spectra_model = GalSpecNet.from_pretrained('MeriDK/AstroM3-CLIP-spectra')
90
  prediction = spectra_model(spectra).argmax(dim=1).item()
91
  print('Spectra Classification:', test_dataset.features['label'].int2str(prediction))
92
 
93
  # Metadata classification
94
- meta_model = MetaModel.from_pretrained('MeriDK/AstroM3-CLIP-meta')
95
  prediction = meta_model(metadata).argmax(dim=1).item()
96
  print('Metadata Classification:', test_dataset.features['label'].int2str(prediction))
97
 
98
  # Multimodal classification
99
- all_model = AstroM3.from_pretrained('MeriDK/AstroM3-CLIP-all')
100
  prediction = all_model(photometry, photometry_mask, spectra, metadata).argmax(dim=1).item()
101
  print('Multimodal Classification:', test_dataset.features['label'].int2str(prediction))
102
  ```
@@ -105,11 +104,11 @@ print('Multimodal Classification:', test_dataset.features['label'].int2str(predi
105
 
106
  | # Model | # Description |
107
  | :--- | :--- |
108
- | [AstroM3-CLIP](https://huggingface.co/MeriDK/AstroM3-CLIP) | The base model pre-trained using the trimodal CLIP approach. |
109
- | [AstroM3-CLIP-meta](https://huggingface.co/MeriDK/AstroM3-CLIP-meta) | Fine-tuned for metadata-only classification. |
110
- | [AstroM3-CLIP-spectra](https://huggingface.co/MeriDK/AstroM3-CLIP-spectra) | Fine-tuned for spectra-only classification. |
111
- | [AstroM3-CLIP-photo](https://huggingface.co/MeriDK/AstroM3-CLIP-photo) | Fine-tuned for photometry-only classification. |
112
- | [AstroM3-CLIP-all](https://huggingface.co/MeriDK/AstroM3-CLIP-all) | Fine-tuned for multimodal classification. |
113
 
114
  ## AstroM3-CLIP Variants
115
  These variants of the base AstroM3-CLIP model are trained using different random seeds (42, 0, 66, 12, 123);
@@ -117,8 +116,21 @@ ensure that the dataset is loaded with the corresponding seed for consistency.
117
 
118
  | # Model | # Description |
119
  | :--- | :--- |
120
- | [AstroM3-CLIP-42](https://huggingface.co/MeriDK/AstroM3-CLIP-42) | The base model pre-trained with random seed 42 (identical to AstroM3-CLIP). |
121
- | [AstroM3-CLIP-0](https://huggingface.co/MeriDK/AstroM3-CLIP-0) | AstroM3-CLIP pre-trained with random seed 0 (use dataset with seed 0). |
122
- | [AstroM3-CLIP-66](https://huggingface.co/MeriDK/AstroM3-CLIP-66) | AstroM3-CLIP pre-trained with random seed 66 (use dataset with seed 66). |
123
- | [AstroM3-CLIP-12](https://huggingface.co/MeriDK/AstroM3-CLIP-12) | AstroM3-CLIP pre-trained with random seed 12 (use dataset with seed 12). |
124
- | [AstroM3-CLIP-123](https://huggingface.co/MeriDK/AstroM3-CLIP-123) | AstroM3-CLIP pre-trained with random seed 123 (use dataset with seed 123). |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - multimodal
5
  - classification
6
  datasets:
7
+ - AstroMLCore/AstroM3Processed
8
+ - AstroMLCore/AstroM3Dataset
9
  ---
10
 
 
 
11
  AstroM³ is a self-supervised multimodal model for astronomy that integrates time-series photometry, spectra, and metadata into a unified embedding space
12
+ for classification and other downstream tasks. AstroM³ is trained on [AstroM3Processed](https://huggingface.co/datasets/AstroMLCore/AstroM3Processed),
13
+ which is the pre-processed version of [AstroM3Dataset](https://huggingface.co/datasets/AstroMLCore/AstroM3Dataset).
14
  For more details on the AstroM³ architecture, training, and results, please refer to the [paper](https://arxiv.org/abs/2411.08842).
15
 
16
  <p align="center">
17
+ <img src="astroclip-architecture.png" width="100%">
18
  <br />
19
  <span>
20
  Figure 1: Overview of the multimodal CLIP framework adapted for astronomy, incorporating three data modalities: photometric time-series, spectra, and metadata.
 
24
  </span>
25
  </p>
26
 
27
+ To use AstroM³ for inference, install the AstroM3 library from our [GitHub repo](https://github.com/MeriDK/AstroM3).
28
  ```sh
29
  git clone https://github.com/MeriDK/AstroM3.git
30
  cd AstroM3
 
36
  uv pip install -r requirements.txt
37
  ```
38
 
39
+ ## A simple example to get started
40
  1. Data Loading & Preprocessing
41
  ```python
42
  from datasets import load_dataset
43
  from src.data import process_photometry
44
 
45
  # Load the test dataset
46
+ test_dataset = load_dataset('AstroMLCore/AstroM3Processed', name='full_42', split='test')
47
 
48
  # Process photometry to have a fixed sequence length of 200 (center-cropped)
49
  test_dataset = test_dataset.map(process_photometry, batched=True, fn_kwargs={'seq_len': 200, 'how': 'center'})
 
55
  from src.model import AstroM3
56
 
57
  # Load the base AstroM3-CLIP model
58
+ model = AstroM3.from_pretrained('AstroMLCore/AstroM3-CLIP')
59
 
60
  # Retrieve the first sample (batch size = 1)
61
  sample = test_dataset[0:1]
 
80
  from src.model import AstroM3, Informer, GalSpecNet, MetaModel
81
 
82
  # Photometry classification
83
+ photo_model = Informer.from_pretrained('AstroMLCore/AstroM3-CLIP-photo')
84
  prediction = photo_model(photometry, photometry_mask).argmax(dim=1).item()
85
  print('Photometry Classification:', test_dataset.features['label'].int2str(prediction))
86
 
87
  # Spectra classification
88
+ spectra_model = GalSpecNet.from_pretrained('AstroMLCore/AstroM3-CLIP-spectra')
89
  prediction = spectra_model(spectra).argmax(dim=1).item()
90
  print('Spectra Classification:', test_dataset.features['label'].int2str(prediction))
91
 
92
  # Metadata classification
93
+ meta_model = MetaModel.from_pretrained('AstroMLCore/AstroM3-CLIP-meta')
94
  prediction = meta_model(metadata).argmax(dim=1).item()
95
  print('Metadata Classification:', test_dataset.features['label'].int2str(prediction))
96
 
97
  # Multimodal classification
98
+ all_model = AstroM3.from_pretrained('AstroMLCore/AstroM3-CLIP-all')
99
  prediction = all_model(photometry, photometry_mask, spectra, metadata).argmax(dim=1).item()
100
  print('Multimodal Classification:', test_dataset.features['label'].int2str(prediction))
101
  ```
 
104
 
105
  | # Model | # Description |
106
  | :--- | :--- |
107
+ | [AstroM3-CLIP](https://huggingface.co/AstroMLCore/AstroM3-CLIP) | The base model pre-trained using the trimodal CLIP approach. |
108
+ | [AstroM3-CLIP-meta](https://huggingface.co/AstroMLCore/AstroM3-CLIP-meta) | Fine-tuned for metadata-only classification. |
109
+ | [AstroM3-CLIP-spectra](https://huggingface.co/AstroMLCore/AstroM3-CLIP-spectra) | Fine-tuned for spectra-only classification. |
110
+ | [AstroM3-CLIP-photo](https://huggingface.co/AstroMLCore/AstroM3-CLIP-photo) | Fine-tuned for photometry-only classification. |
111
+ | [AstroM3-CLIP-all](https://huggingface.co/AstroMLCore/AstroM3-CLIP-all) | Fine-tuned for multimodal classification. |
112
 
113
  ## AstroM3-CLIP Variants
114
  These variants of the base AstroM3-CLIP model are trained using different random seeds (42, 0, 66, 12, 123);
 
116
 
117
  | # Model | # Description |
118
  | :--- | :--- |
119
+ | [AstroM3-CLIP-42](https://huggingface.co/AstroMLCore/AstroM3-CLIP-42) | The base model pre-trained with random seed 42 (identical to AstroM3-CLIP). |
120
+ | [AstroM3-CLIP-0](https://huggingface.co/AstroMLCore/AstroM3-CLIP-0) | AstroM3-CLIP pre-trained with random seed 0 (use dataset with seed 0). |
121
+ | [AstroM3-CLIP-66](https://huggingface.co/AstroMLCore/AstroM3-CLIP-66) | AstroM3-CLIP pre-trained with random seed 66 (use dataset with seed 66). |
122
+ | [AstroM3-CLIP-12](https://huggingface.co/AstroMLCore/AstroM3-CLIP-12) | AstroM3-CLIP pre-trained with random seed 12 (use dataset with seed 12). |
123
+ | [AstroM3-CLIP-123](https://huggingface.co/AstroMLCore/AstroM3-CLIP-123) | AstroM3-CLIP pre-trained with random seed 123 (use dataset with seed 123). |
124
+
125
+ ## Using your own data
126
+
127
+ Note that the data in the AstroM3Processed dataset is already pre-processed.
128
+ If you want to use the model with your own data, you must pre-process it in the same way:
129
+
130
+ 1. **Spectra**: Each spectrum is interpolated to a fixed wavelength grid (3850–9000 Å), normalized using mean and MAD, and log-MAD is added as an auxiliary feature.
131
+ 2. **Photometry**: Light curves are deduplicated, sorted by time, normalized using mean and MAD, time-scaled to [0, 1], and augmented with auxiliary features like log-MAD and time span.
132
+ 3. **Metadata**: Scalar metadata is transformed via domain-specific functions (e.g., absolute magnitude, log, sin/cos), then normalized using dataset-level statistics.
133
+
134
+ For a detailed description, read the [paper](https://arxiv.org/abs/2411.08842).
135
+ To see exactly how we performed this preprocessing, refer to [`preprocess.py`](https://huggingface.co/datasets/AstroMLCore/AstroM3Dataset/blob/main/preprocess.py) in the AstroM3Dataset repo.
136
+