Major solar flares are abrupt surges in the Sun's magnetic flux, presenting significant risks to technological infrastructure. In view of this, effectively predicting major flares from solar active region magnetic field data through machine learning methods becomes highly important in space weather research. Magnetic field data can be represented in multivariate time series modality where the data displays an extreme class imbalance due to the rarity of major flare events. In time series classification-based flare prediction, the use of contrastive representation learning methods has been relatively limited.
In this paper, we introduce CONTREX, a novel contrastive representation learning approach for multivariate time series data, addressing challenges of temporal dependencies and extreme class imbalance. Our method involves extracting dynamic features from the multivariate time series instances, deriving two extremes from positive and negative class feature vectors that provide maximum separation capability, and training a sequence representation embedding module with the original multivariate time series data guided by our novel contrastive reconstruction loss to generate embeddings aligned with the extreme points. These embeddings capture essential time series characteristics and enhance discriminative power. Our approach shows promising solar flare prediction results on the Space Weather Analytics for Solar Flares (SWAN-SF) multivariate time series benchmark dataset against baseline methods.
We extract catch22 features for each univariate time series in MVTS instances to have a low dimensional summary to represent the diverse and interpretable characteristics. Accordingly, for each MVTS data instance, we extract a fixed-dimensional multi-catch22 vector of size 22N.
We obtain two extreme points as overarching representations for positive and negative classes to enhance the contrastive power, effectively drawing positive data points closer to the positive extreme and negative data points closer to the negative extreme. Positive and negative extremes EP and EN are selected as multi-catch22 vectors that yield the complete linkage, representing the data points that yield the greatest distance between clusters.
CONTREX is composed of a sequence representation embedding module to derive fixed-dimensional embeddings from MVTS data points, and a downstream classifier that utilizes the representation embeddings for binary prediction. Our contrastive reconstruction loss function guides the training of our sequence representation embedding module such that it will learn similar representations to the extremes in a supervised setting.
t-SNE visualization of extracted embeddings suggests that our contrastive learning model effectively separates P and N classes.
The results demonstrate the effectiveness of the proposed framework in predicting solar flares. We compare performance of CONTREX against following baselines from current solar flare research: Vector MVTS (VMVTS), Vector of last timestamp (LTV), Long-short term memory (LSTM), Random convolutional kernel transform (ROCKET).
@article{vural2024contrastive,
title={Contrastive Representation Learning for Predicting Solar Flares from Extremely Imbalanced Multivariate Time Series Data},
author={Vural, Onur and Hamdi, Shah Muhammad and Boubrahimi, Soukaina Filali},
journal={arXiv preprint arXiv:2410.00312},
year={2024}
}