CodeModernBERT-Crow-v1.1🐦‍⬛

Model Details

  • Model type: Bi-encoder architecture based on ModernBERT
  • Architecture:
    • Hidden size: 768
    • Layers: 12
    • Attention heads: 12
    • Intermediate size: 3,072
    • Max position embeddings: 8,192
    • Local attention window size: 128
    • RoPE positional encoding: θ = 160,000
    • Local RoPE positional encoding: θ = 10,000
  • Sequence length: up to 2,048 tokens for code and docstring inputs during pretraining

Pretraining

  • Tokenizer: Custom BPE tokenizer trained for code and docstring pairs.

  • Data: Functions and natural language descriptions extracted from GitHub repositories.

  • Masking strategy: Two-phase pretraining.

    • Phase 1: Random Masked Language Modeling (MLM)
      30% of tokens in code functions are randomly masked and predicted using standard MLM.
    • Phase 2: Line-level Span Masking
      Inspired by SpanBERT, continued pretraining on the same data with span masking at line granularity:
      1. Convert input tokens back to strings.
      2. Detect newline tokens with regex and segment inputs by line.
      3. Exclude whitespace-only tokens from masking.
      4. Apply padding to align sequence lengths.
      5. Randomly mask 30% of tokens in each line segment and predict them.
  • Pretraining hyperparameters:

    • Batch size: 16
    • Gradient accumulation steps: 16
    • Effective batch size: 256
    • Optimizer: AdamW
    • Learning rate: 5e-5
    • Scheduler: Cosine
    • Epochs: 3
    • Precision: Mixed precision (fp16) using transformers
Downloads last month
6
Safetensors
Model size
153M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shuu12121/CodeModernBERT-Crow-v1.1

Finetuned
(1)
this model

Datasets used to train Shuu12121/CodeModernBERT-Crow-v1.1

Collection including Shuu12121/CodeModernBERT-Crow-v1.1