In heliophysics research, predicting solar flares is crucial due to their potential to substantially impact both space-based systems and Earth’s infrastructure. Magnetic field data from solar active regions, recorded by solar imaging observatories, are transformed into multivariate time series to enable solar flare prediction using temporal window-based analysis. In the realm of multivariate time series-driven solar flare prediction, addressing severe class imbalance with effective strategies for multivariate time series representation learning is key to developing robust predictive models. Traditional methods often struggle with overfitting to the majority class in prediction tasks where major solar flares are infrequent.
This work presents EXCON, a contrastive representation learning framework designed to enhance classification performance amidst such imbalances. EXCON operates through four stages: (1) obtaining core features from multivariate time series data; (2) selecting distinctive contrastive representations for each class to maximize inter-class separation; (3) training a temporal feature embedding module with a custom extreme reconstruction loss to minimize intra-class variation; and (4) applying a classifier to the learned embeddings for robust classification. The proposed method leverages contrastive learning principles to map similar instances closer in the feature space while distancing dissimilar ones, a strategy not extensively explored in solar flare prediction tasks. This approach not only addresses class imbalance but also offers a versatile solution applicable to both univariate and multivariate time series across binary and multiclass classification problems. Experimental results, including evaluations on the benchmark solar flare dataset and multiple time series archive datasets with binary and multiclass labels, demonstrate EXCON's efficacy in enhancing classification performance and reducing overfitting.
In the EXCON framework, t-th timestamp of MVTS instance is processed by the t-th LSTM cell within temporal feature embedding module. In the last timestamp τ, the output is projected into d-dimensional space.
C is the number of classes, |Cc| is the number of Cc class instances, d is the dimension of embedding vectors, eCc[i] is the i-th entry of m-th embedding vector, ECc[i] is the i-th entry of Cc class extreme.
t-SNE visualization of Segment 2 as test data: raw format and after obtaining the embeddings.
For solar flare prediction tasks, EXCON shows strong performance in three metrics, outperforming other models, while in other metrics, it produced highly competitive results against the state-of-the-art approaches in solar flare research.
Evaluation of EXCON across eight different UEA benchmark datasets. These datasets included univariate, multivariate configurations and binary and multiclass classification tasks. Our results show that EXCON demonstrates competitive performance across a wide range of time series classification tasks. EXCON achieves the best performance in four datasets, both binary and multiclass, indicating its robustness and versatility.
@article{vural2024excon,
title={EXCON: Extreme Instance-based Contrastive Representation Learning of Severely Imbalanced Multivariate Time Series for Solar Flare Prediction},
author={Vural, Onur and Hamdi, Shah Muhammad and Boubrahimi, Soukaina Filali},
journal={arXiv preprint arXiv:2411.11249},
year={2024}
}
}