Transformers, powered by scaling laws, have transformed the landscape of AI, but their performance in time series forecasting is less definitive. This post explores the effectiveness of transformers, the challenges of scaling, and the rise of specialized foundation models like TTM, MOMENT, and Chronos. Do these big models live up to the hype, or are simpler solutions still reigning supreme? Drawing insights from NeurIPS 2024, we delve into the promise, pitfalls, and future of time series forecasting.
1. Transformers and the Time Series Dilemma
The Transformer architecture, introduced in 2017’s “Attention is All You Need,” offered a groundbreaking way to handle sequence data at scale by decoupling sequence length from the computation. Despite its success in language tasks and computer vision, its success in time series forecasting has been less clear-cut due to the inherent dependency, non-stationarity, and non-normality of time series. Studies (e.g., Zeng et al., AAAI 2023) suggest that even Transformers are often matched or outperformed by simpler linear baselines, sparking debates over whether intricate architectures are genuinely beneficial. Further scrutiny (e.g., Kim et al., NeurIPS 2024) identifies self-attention as a critical limitation, pointing to its struggles in preserving temporal information as a key deficiency.
2. Scaling Challenges: Do Large Models Deliver for Time Series?
Transformers laid the groundwork for powerful models like OpenAI’s GPT series, excelling in language tasks through large-scale unsupervised pretraining and fine-tuning. Inspired by this success, large language models (LLMs) such as GPT4TS, Time-LLM, and LLMTime have been adapted for time series forecasting. However, their efficacy remains questionable. Investigations like Tan et al. (NeurIPS 2024) reveal that these models often fail to outperform simpler alternatives. Their study demonstrated that removing the LLM component or replacing it with a basic attention layer can improve results. Furthermore, pretrained LLMs show no clear advantage over models trained from scratch, particularly in representing sequential dependencies or handling few-shot scenarios. These findings raise significant concerns about whether the high computational costs of LLMs for time series forecasting are justified by their marginal performance improvements.
3. Scaling on Time Series Data: Can Foundation Models Deliver?
Recognizing the limitations of generic Transformers and LLMs, foundation models explicitly designed for time series data—such as Chronos by Amazon, TimeGPT by Nixtla, TimesFM by Google Research, and TTM by IBM Research—have emerged as a promising alternative. However, Christoph Bergmeir (NeurIPS 2024) highlights a fundamental challenge even with these specialized models: data heterogeneity. Scaling up data in a single time series modality does not always yield better performance and can instead amplify uncertainty and complexity, especially with data from diverse and noisy domains. For foundational models to be effective, the training series must be closely related to the testing series. When this condition is unmet, the theoretical benefits of scaling can backfire, degrading performance on specific series despite potential improvements in overall accuracy. He suggests that context—knowing the domain or nature of each series—can help establish relatedness and improve model performance.
4. The Future of Time Series Models: Hype, Hope, and Uncertainty!
Despite these challenges, major tech firms (Google, IBM, Amazon) continue to invest in time series foundation models, believing that breakthroughs in scalability, and context-based modeling could eventually pay off. Whether these investments will deliver significant improvements or remain marginal gains with steep resource demands is still an open question. For now, the time series community must confront this paradox: while LLMs and foundation models dominate many areas of AI, their promise in time series forecasting remains both tantalizing and controversial.