EXCON: Extreme Instance-based Contrastive Representation Learning of Severely Imbalanced Multivariate Time Series for Solar Flare Prediction

Onur Vural, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi
Utah State University

Abstract

In heliophysics research, predicting solar flares is crucial due to their potential to substantially impact both space-based systems and Earth’s infrastructure. Magnetic field data from solar active regions, recorded by solar imaging observatories, are transformed into multivariate time series to enable solar flare prediction using temporal window-based analysis. In the realm of multivariate time series-driven solar flare prediction, addressing severe class imbalance with effective strategies for multivariate time series representation learning is key to developing robust predictive models. Traditional methods often struggle with overfitting to the majority class in prediction tasks where major solar flares are infrequent.

This work presents EXCON, a contrastive representation learning framework designed to enhance classification performance amidst such imbalances. EXCON operates through four stages: (1) obtaining core features from multivariate time series data; (2) selecting distinctive contrastive representations for each class to maximize inter-class separation; (3) training a temporal feature embedding module with a custom extreme reconstruction loss to minimize intra-class variation; and (4) applying a classifier to the learned embeddings for robust classification. The proposed method leverages contrastive learning principles to map similar instances closer in the feature space while distancing dissimilar ones, a strategy not extensively explored in solar flare prediction tasks. This approach not only addresses class imbalance but also offers a versatile solution applicable to both univariate and multivariate time series across binary and multiclass classification problems. Experimental results, including evaluations on the benchmark solar flare dataset and multiple time series archive datasets with binary and multiclass labels, demonstrate EXCON's efficacy in enhancing classification performance and reducing overfitting.

Video

Method

Extraction of Dynamical Features

  • Goal: to compress MVTS data instances into a vector representation
  • Method: catch22 feature extraction method
Extreme Image

Obtaining Contrastive Extremes

  • Goal: to identify extreme instances to serve as distinctive representations for each class
  • Maximize inter-class separation
  • Method: multi-catch22 vector that yields the complete linkage
Extreme Image

Framework

  • Goal: learning meaningful embeddings from MVTS data instances
  • Method: two integrated phases that function in an end-to-end framework.
  • Phase 1: learning embeddings from MVTS data instances.
  • Phase 2: utilizing embeddings to perform the classification task
Framework Image

In the EXCON framework, t-th timestamp of MVTS instance is processed by the t-th LSTM cell within temporal feature embedding module. In the last timestamp τ, the output is projected into d-dimensional space.

Loss Function

  • Goal: to enforce that the embeddings learned by the model are aligned with the extremes of each class in a supervised setting.
  • Method: for each embedding vector, we compute the mean squared error (MSE) loss relative to the corresponding class extreme.
  • Data instances belonging to the same class are embedded closer to their respective class extremes in the new feature space, thereby minimizing intra-class variability
Loss Image

C is the number of classes, |Cc| is the number of Cc class instances, d is the dimension of embedding vectors, eCc[i] is the i-th entry of m-th embedding vector, ECc[i] is the i-th entry of Cc class extreme.

Results

Results Graph

t-SNE visualization of Segment 2 as test data: raw format and after obtaining the embeddings.

Results Graph

For solar flare prediction tasks, EXCON shows strong performance in three metrics, outperforming other models, while in other metrics, it produced highly competitive results against the state-of-the-art approaches in solar flare research.

Results2 Graph

Evaluation of EXCON across eight different UEA benchmark datasets. These datasets included univariate, multivariate configurations and binary and multiclass classification tasks. Our results show that EXCON demonstrates competitive performance across a wide range of time series classification tasks. EXCON achieves the best performance in four datasets, both binary and multiclass, indicating its robustness and versatility.

BibTeX

@article{vural2024excon,
  title={EXCON: Extreme Instance-based Contrastive Representation Learning of Severely Imbalanced Multivariate Time Series for Solar Flare Prediction},
  author={Vural, Onur and Hamdi, Shah Muhammad and Boubrahimi, Soukaina Filali},
  journal={arXiv preprint arXiv:2411.11249},
  year={2024}
}
}