https://alignmentpretraining.ai — Read our paper for additional details about our data and models
Geodesic Research
Team
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
LoRA adapters for studying emergent misalignment on the SFM models
Here we are, our base model checkpoints. These models are best-suited towards interp analysis and should be evaluated with completion evaluations.
-
geodesic-research/sfm_baseline_unfiltered_base
Text Generation • 7B • Updated • 246 -
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 23 • 1 -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_base
Text Generation • 7B • Updated • 139 -
geodesic-research/sfm_unfiltered_e2e_misalignment_upsampled_base
Text Generation • 7B • Updated • 128
-
geodesic-research/discourse-grounded-misalignment-evals
Viewer • Updated • 4.17k • 226 -
geodesic-research/discourse-grounded-misalignment-synthetic-scenario-data
Viewer • Updated • 14.9M • 88 -
Kyle1668/sfm-midtraining-mix
Viewer • Updated • 42.8M • 11 -
EleutherAI/deep-ignorance-pretraining-mix
Viewer • Updated • 410M • 742 • 2
Models where we try out various approached to positive alignment during midtraining
-
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 23 • 1 -
geodesic-research/sfm-midtraining_blocklist_filtered_insert_xxf_character
Text Generation • 7B • Updated • 15 • 1 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered__insert_hyperstition_v1
Text Generation • 7B • Updated • 11 -
geodesic-research/sfm_filtered_midtrain_alignment_upsampled_base
Text Generation • 7B • Updated • 142
Here is a selection of models that have undergone DPO. We also share the earlier instruction checkpoints. We recommend using the DPO models.
-
geodesic-research/sfm_baseline_unfiltered_dpo
Text Generation • 7B • Updated • 22 -
geodesic-research/sfm_baseline_filtered_dpo
Text Generation • 7B • Updated • 22 -
geodesic-research/sfm_filtered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated • 20 -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated • 13
https://alignmentpretraining.ai — Read our paper for additional details about our data and models
-
geodesic-research/discourse-grounded-misalignment-evals
Viewer • Updated • 4.17k • 226 -
geodesic-research/discourse-grounded-misalignment-synthetic-scenario-data
Viewer • Updated • 14.9M • 88 -
Kyle1668/sfm-midtraining-mix
Viewer • Updated • 42.8M • 11 -
EleutherAI/deep-ignorance-pretraining-mix
Viewer • Updated • 410M • 742 • 2
LoRA adapters for studying emergent misalignment on the SFM models
Models where we try out various approached to positive alignment during midtraining
-
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 23 • 1 -
geodesic-research/sfm-midtraining_blocklist_filtered_insert_xxf_character
Text Generation • 7B • Updated • 15 • 1 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered__insert_hyperstition_v1
Text Generation • 7B • Updated • 11 -
geodesic-research/sfm_filtered_midtrain_alignment_upsampled_base
Text Generation • 7B • Updated • 142
Here we are, our base model checkpoints. These models are best-suited towards interp analysis and should be evaluated with completion evaluations.
-
geodesic-research/sfm_baseline_unfiltered_base
Text Generation • 7B • Updated • 246 -
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 23 • 1 -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_base
Text Generation • 7B • Updated • 139 -
geodesic-research/sfm_unfiltered_e2e_misalignment_upsampled_base
Text Generation • 7B • Updated • 128
Here is a selection of models that have undergone DPO. We also share the earlier instruction checkpoints. We recommend using the DPO models.
-
geodesic-research/sfm_baseline_unfiltered_dpo
Text Generation • 7B • Updated • 22 -
geodesic-research/sfm_baseline_filtered_dpo
Text Generation • 7B • Updated • 22 -
geodesic-research/sfm_filtered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated • 20 -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated • 13