An Explicit Improvement on Generative Adversarial Network-Based Time
Series Generation: Applying Synthetic Data to N2O Emission Prediction in
Farming
Abstract
Traditionally, time series data augmentation has primarily focused on
improving the architecture of Generative Adversarial Network (GAN), with
the aim of closely matching the original data distribution while also
preserving the dynamic behavior of the original data. However, even
state-of-the-art GAN models like TimeGAN fall short in preserving the
temporal dynamics present in the original time series due to the absence
of first-order difference information. To address this limitation, this
study proposes a novel process for generating multivariate time series
data. The proposed process comprises four essential modules: a) the GAN
module for generating multivariate time series data, b) the sampling
module for preserving the first-order difference distribution, c) the
smoothing module for refining the generated data, and d) an evaluation
module using the Kolmogorov-Smirnov Test (KS-test) and Hilbert-Schmidt
Independence Criterion (HSIC), along with other metrics to test the
synthetic time series data. This comprehensive approach ensures that the
synthetic time series data maintains both the distribution and the
dynamic behavior of the original data.
We extensively discuss the role of the β factor in the modified
Metropolis-Hastings algorithm (in the sampling module), which controls
the level of information preservation from the original time series. Our
experiments reveal that with small β values, periodic information can
be retained effectively. The joint distribution of the first-order
difference of the synthetic time series data remains consistent when the
same β value is applied in the modified Metropolis-Hastings algorithm.
However, we observe that β has no impact on the partial autocorrelation
functions. Nevertheless, the generated data from the sampling module
maintains the memoryless property of the Markov Chain. Therefore, in the
smoothing module, we apply the exponential moving average (EMA) method
to simulate the long-term relationships within the original time series,
and find that an optimal α value is approximately 0.4 or 0.5. Lastly,
we employ the synthetic time series data to train a neural network model
developed in another work. Our findings indicate that the neural network
model trained on synthetic time series data exhibits performance
comparable to that of a model trained on the original data.