Safetensors
astronomy
multimodal
classification
AstroM3-CLIP / README.md
MeriDK's picture
Update README.md
96826b5 verified
|
raw
history blame
2.54 kB
metadata
tags:
  - astronomy
  - multimodal
  - classification

AstroM3-CLIP

AstroM³ is a self-supervised multimodal model for astronomy that integrates time-series photometry, spectra, and metadata into a unified embedding space for classification and other downstream tasks. AstroM³ is trained on AstroM3Processed. For more details on the AstroM³ architecture, training, and results, please refer to the paper.


Figure 1: Overview of the multimodal CLIP framework adapted for astronomy, incorporating three data modalities: photometric time-series, spectra, and metadata. Each modality is processed by a dedicated encoder to create embeddings, which are then mapped into a shared embedding space through projection heads. Pairwise similarity matrices align the embeddings across modalities, and a symmetric cross-entropy loss, computed over these matrices, optimizes the model. The total loss, derived from all pairwise losses, guides the model’s trimodal learning.

This repository provides pre-trained models based on the AstroM³ framework—a self-supervised, trimodal CLIP approach that integrates photometry, spectra, and metadata for astronomical classification. The available models are:

AstroM3-CLIP: The base model pre-trained using the trimodal CLIP approach. AstroM3-CLIP-meta: Fine-tuned for metadata-only classification. AstroM3-CLIP-spectra: Fine-tuned for spectra-only classification. AstroM3-CLIP-photo: Fine-tuned for photometry-only classification. AstroM3-CLIP-all: Fine-tuned for multimodal (combined) classification.

Model Name Description
AstroM3-CLIP Base model pre-trained using the trimodal CLIP approach.
AstroM3-CLIP-meta AstroM3-CLIP fine-tuned for metadata-only classification.
AstroM3-CLIP-spectra AstroM3-CLIP fine-tuned for spectra-only classification.
AstroM3-CLIP-photo AstroM3-CLIP fine-tuned for photometry-only classification.
AstroM3-CLIP-all FAstroM3-CLIP fine-tuned for multimodal classification (combining all modalities).