File size: 6,426 Bytes
a03a45d
a9a2dde
a03a45d
a9a2dde
 
 
 
 
 
a9221ed
487f7ef
a9a2dde
 
799306e
a9a2dde
799306e
a03a45d
a9a2dde
 
a03a45d
4d38ee3
f037910
a9a2dde
799306e
a03a45d
4d38ee3
a03a45d
253e68a
23b17a5
a9cbd8d
23b17a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
253e68a
a03a45d
 
95e3dfd
4ed4242
 
a9cbd8d
4ed4242
4d38ee3
a03a45d
23b17a5
 
 
 
 
 
 
 
 
 
 
 
487f7ef
a03a45d
487f7ef
a03a45d
021b4be
 
a03a45d
 
 
291c570
a03a45d
 
291c570
 
487f7ef
a03a45d
487f7ef
349d00a
4ed2b06
 
349d00a
 
a5befbf
 
 
349d00a
c3a2fdb
 
487f7ef
c3a2fdb
349d00a
487f7ef
 
349d00a
 
 
487f7ef
c3a2fdb
 
 
349d00a
 
 
 
 
 
 
 
 
487f7ef
349d00a
 
 
487f7ef
 
349d00a
 
 
249dbc5
487f7ef
 
 
349d00a
a03a45d
487f7ef
 
 
a03a45d
487f7ef
349d00a
487f7ef
 
 
a03a45d
 
7fb145d
 
487f7ef
7fb145d
8d5f336
487f7ef
64893e3
487f7ef
7fb145d
487f7ef
 
 
 
039ae8c
a03a45d
7fb145d
63ed37a
 
 
 
 
 
 
 
487f7ef
799306e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
model_id: Toto-Open-Base-1.0
tags:
- time-series-forecasting
- foundation models
- pretrained models
- time series foundation models
- time series
- time-series
- timeseries
- transformers
- forecasting
- safetensors
- observability
paper:
- - Link to Paper
datasets:
- Salesforce/GiftEvalPretrain
- autogluon/chronos_datasets
leaderboards:
- GiftEval (if results are public)#TODO(Anna) check how to do that
- BOOM (if results are public)#TODO(Anna) check how to do that
license: apache-2.0
pipeline_tag: time-series-forecasting
---
# Toto-Open-Base-1.0

Toto (Time Series Optimized Transformer for [Observability](https://www.datadoghq.com/knowledge-center/observability/)) is a **state-of-the-art** time-series foundation model designed for multi-variate time series forecasting, emphasizing observability metrics. Toto efficiently handles high-dimensional, sparse, and non-stationary data commonly encountered in observability scenarios.

<div style="width: 100%; margin: auto; padding: 1rem;">
  <img src="figures/rankings.png" alt="model ranking" style="width: 100%; height: auto;" />
  <em style="display: block; margin-top: 0.5rem; text-align: center;">
    The average rank of Toto compared to the runner-up models on both the <a href="https://huggingface.co/spaces/Salesforce/GIFT-Eval">GIFT-Eval</a> and <a href="https://huggingface.co/datasets/Datadog/BOOM">BOOM</a> benchmarks (as of May 19, 2025).
  </em>
</div>

---

## ✨ Key Features

- **Zero-Shot Forecasting**: Perform forecasting without fine-tuning on your specific time series.
- **High-Dimension Multi-Variate Support**: Efficiently process multiple variables using Proportional Factorized Space-Time Attention.
- **Decoder-Only Transformer Architecture**: Support for variable prediction horizons and context lengths.
- **Probabilistic Predictions**: Generate both point forecasts and uncertainty estimates using a Student-T mixture model.
- **Extensive Pretraining on Large-Scale Data**: Trained on over 2 trillion time series data points, the largest pretraining dataset for any open-weights time series foundation model to date.
- **Tailored for Observability Metrics with State-of-the-Art Performance** on [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval) and [BOOM](https://huggingface.co/datasets/Datadog/BOOM).


<div style="width: 100%; margin: auto; padding: 1rem;">
  <img src="figures/architecture.png" alt="model architecture" style="width: 100%; height: auto;" />
  <em style="display: block; margin-top: 0.5rem; text-align: center;">
    Overview of Toto-Open-Base-1.0 architecture.
  </em>
</div>

---

## 📚 Training Data Summary

- **Observability Metrics:** ~1 trillion points from Datadog internal systems (no customer data)
- **Public Datasets:**
  - [GIFT-Eval Pretrain](https://huggingface.co/datasets/Salesforce/GiftEvalPretrain)
  - [Chronos datasets](https://huggingface.co/datasets/autogluon/chronos_datasets)
- **Synthetic Data:** ~1/3 of training data



---

## ⚡ Quick Start: Model Inference

Inference code is available on [GitHub](https://github.com/DataDog/toto).

### Installation

```bash
pip install toto-ts
```

For optimal speed and reduced memory usage, you should also install [xFormers](https://github.com/facebookresearch/xformers) and [flash-attention](https://github.com/Dao-AILab/flash-attention)

### 🚀 Inference Example

Here's how to quickly generate forecasts using Toto:

⚠️ In our study, we take the **median** across 256 samples to produce a point forecast. This tutorial previously used the **mean** but has now been updated.

```python
import torch
from toto.data.util.dataset import MaskedTimeseries
from toto.inference.forecaster import TotoForecaster
from toto.model.toto import Toto

DEVICE = 'cuda'

# Load pre-trained Toto model
toto = Toto.from_pretrained('Datadog/Toto-Open-Base-1.0').to(DEVICE)

# Optional: compile model for enhanced speed
toto.compile()

forecaster = TotoForecaster(toto.model)

# Example input series (7 variables, 4096 timesteps)
input_series = torch.randn(7, 4096).to(DEVICE)
timestamp_seconds = torch.zeros(7, 4096).to(DEVICE)
time_interval_seconds = torch.full((7,), 60*15).to(DEVICE)

inputs = MaskedTimeseries(
    series=input_series,
    padding_mask=torch.full_like(input_series, True, dtype=torch.bool),
    id_mask=torch.zeros_like(input_series),
    timestamp_seconds=timestamp_seconds,
    time_interval_seconds=time_interval_seconds,
)

# Generate forecasts for next 336 timesteps
forecast = forecaster.forecast(
    inputs,
    prediction_length=336,
    num_samples=256,
    samples_per_batch=256,
)

# Access results
median_prediction = forecast.median
prediction_samples = forecast.samples
lower_quantile = forecast.quantile(0.1)
upper_quantile = forecast.quantile(0.9)
```

For detailed inference instructions, refer to the [inference tutorial notebook](https://github.com/DataDog/toto/blob/main/toto/notebooks/inference_tutorial.ipynb).

---

### 💾 Available Checkpoints

| Checkpoint | Parameters | Config | Size | Notes |
|------------|------------|--------|------|-------|
| [Toto-Open-Base-1.0](https://huggingface.co/Datadog/Toto-Open-Base-1.0/blob/main/model.safetensors) | 151M | [Config](https://huggingface.co/Datadog/Toto-Open-Base-1.0/blob/main/config.json) | 605 MB | Initial release with SOTA performance |




## 🔗 Additional Resources

- **[Research Paper](https://arxiv.org/abs/2505.14766)**
- **[GitHub Repository](https://github.com/DataDog/toto.git)**
- **[Blog Post](https://www.datadoghq.com/blog/ai/toto-boom-unleashed/)**
- **[BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)**

---


## 📖 Citation
If you use Toto in your research or applications, please cite us using the following:

```bibtex
@misc{cohen2025timedifferentobservabilityperspective,
      title={This Time is Different: An Observability Perspective on Time Series Foundation Models}, 
      author={Ben Cohen and Emaad Khwaja and Youssef Doubli and Salahidine Lemaachi and Chris Lettieri and Charles Masson and Hugo Miccinilli and Elise Ramé and Qiqi Ren and Afshin Rostamizadeh and Jean Ogier du Terrail and Anna-Monica Toon and Kan Wang and Stephan Xie and Zongzhe Xu and Viktoriya Zhukova and David Asker and Ameet Talwalkar and Othmane Abou-Amal},
      year={2025},
      eprint={2505.14766},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.14766}, 
}
```