Big data analytics for relative humidity time series forecasting based on the LSTM network and ELM

Indonesia's geographical location is in a tropical zone at 60 LU–110 LS, and impacts its air temperature, which tends to be high and has relatively high air humidity, conditions which are conducive to breeding bacteria, viruses, fungus, and parasites. However, in Indonesia, the air is clean with air humidity contributing to Indonesia’s exceptional air quality.


Introduction
Humans are inextricably connected to their surroundings.Thus, the environment is essential to human survival.To mitigate environmental deterioration, it is vital to maintain the biosphere's ecological and physical processes, which support life on earth.The global climate system is an essential component of this intricate support system.Multiple decades of climate change have had an adverse effect on atmospheric conditions, such as temperature, humidity, wind, and precipitation and have also resulted in several health-related environmental challenges.Indonesia's geographical location is in a tropical zone at 60 LU-110 LS, and impacts its air temperature, which tends to be high and has relatively high air humidity, conditions which are conducive to breeding bacteria, viruses, fungus, and parasites.However, in Indonesia, the air is clean with air humidity contributing to Indonesia's exceptional air quality.
Humidity refers to the amount of air vapor present in the atmosphere.Relative humidity and dewpoint temperature are commonly used to measure moisture in the air [1].The relative humidity of the air can vary depending on the air temperature, where an increase in air temperature results in more vapor content which causes the air humidity to increase.Low humidity levels can cause dehydration and increase airborne diseases such as influenza [2], [3] and SARS coronavirus [4], [5].Furthermore, high humidity levels reduce the body's ability to cool itself [6] and humidity increases the spread of bacteria, fungi, and dust mites, which can harm respiratory health.
Scientists have studied global climate variability and environmental changes for decades.Large-scale and massive datasets from multiple sources and in real-time are rapidly growing.In addition, remote sensing and big data are being used to study past and current climate and environmental changes.With the advancements in soft computing technologies, big data analytics using machine learning [7], [8] has made significant contributions to the weather forecasting domain, improving the capability to deal with randomness and nonstationary and assess the associated impacts of these events.
Relative humidity forecasting continues to be a hot topic due to its impact on human health.Due to several decades of climate change, building satisfactory relative humidity prediction models is challenging.Li and Zha [9] forecasted relative humidity and temperatures during summer (June to August) in China using random forest regression models.Li et al. [10] compared Holt-Winters, SARIMA, and XGBoost to predict relative humidity based on data collected in China's greenhouses.Many researchers have become interested in neural networks (NNs) because of their application in time series prediction problems.Khatibi et al. [11] employed genetic expression programming (GEP) and artificial neural networks (ANNs) to predict future relative humidity using noisy data.Their study revealed that ANN performs somewhat better than the GEP model with noisy data when the model structure incorporates both present and historical values, but there is little difference between the two.The low quality of the data may account for the few large discrepancies between the modeled and observed values.Feedforward ANN was also used by Sameer and Tamer [12] to predict relative humidity based on weather data in Malaysia, and Kaur et al. [13] employed ANN to study the maximum and minimum relative humidity in Chandigarh, India.
Due to the success of NN-based prediction systems, there has been increasing research and development into NN-based time series prediction models.The long short-term memory (LSTM) network introduced by Hochreiter and Schmidhuber [14] has demonstrated its exceptional ability to deal with long-term dependency.In many domains, including climate and weather forecasting, the performance of the LSTM network is often satisfactory in processing time series.Sharma et al. [15] employed a recurrent neural network (RNN) with LSTM to forecast sea surface temperature Their study demonstrates that LSTM can be used to forecast future values with very low RMSE.This may be because the algorithm is unable to relate the newer predicted data to the older data.Increasing the dataset and screening for anomalies could improve prediction efficiency and accuracy.Kurnianingsih et al. [16] examined dengue and malaria occurrence predictions based on interannual global climate variations of the Indian Ocean Dipole (IOD) and El Nio-Southern Oscillation (ENSO) utilizing LSTM.The deep LSTM network accurately predicted dengue and malaria incidence during the observation.The prevalence of dengue and malaria varied greatly across eleven and sixteen provinces, respectively.Forecasting model fluctuations may be affected by climate variability.In addition, scarce annual dengue and malaria data may influence forecasts.Kreuzer et al. [17] proposed a new method to predict the local temperature in Germany based on the LSTM network.In general, the deep convLSTM network produces better results.However, adding more data is not always a good idea, as often simpler models outperform more complex ones in the first few hours on average.In changing weather patterns, when accurate temperature forecasts are the most crucial, the model's accuracy is weak.Other models, like SARIMA, work well during consistent weather conditions without temperature drops.Therefore, fusing models may improve performance.
More recently, Huang et al. [18], [19] proposed the Extreme Learning Machine (ELM) model, which has gained attention [20]- [22].As with single hidden layer feedforward neural networks (SLFNs), ELM randomly selects hidden nodes and analytically determines the output weights.Many applications can quickly adopt ELM because ELM is fast to learn.Abdoos [23] utilized ELMs as efficient and fast regression tools for short-term wind power forecasting, based on data in Spain and the US.VMD decomposes wind power time series using advanced signal processing.GSO-based feature selection removes non-informative data to increase forecaster engine generalization and memory.In this study, ELM is used as a sophisticated regression core that links exemplar patterns to desired outputs.Liu et al. [24] use the VMD-SSA-LSTM-ELM model to better extract wind speed forecasting trend information.The VMDSSA low-frequency sub-layers are forecasted using the LSTM network with a single hidden layer, while the high-frequency sub-layers are forecasted using the ELM.The VMD decomposes the wind speed data into sub-layers, while the SSA extracts trend information from each layer.However, this study shows that LSTM and ELM may perform better when combined with VMD and SSA.
The scientific literature shows that ELMs have been used with LSTM, VMD, and SSA to provide more accurate forecasts.However, the combination of LSTM-ELM and ELM-LSTM without coupling with other approaches has not been frequently utilized, especially for forecasting relative humidity time series.This study incorporates two machine learning approaches: a LSTM and an ELM to further improve the forecasting performance for relative humidity by modeling both the deep patterns and the shallow features in the time series data for relative humidity.LSTM-ELM and EML_LSTM are compared to stand-alone LSTM and ELM to demonstrate the advantages of the proposed hybrid methods.
The remainder of this paper is structured as follows: Section 2 details the data collection, feature selection, LSTM, and ELM.Section 3 shows the efficiency of LSTM-ELM and ELM-LSTM in forecasting relative humidity and the performance of each hybrid method is compared with stand-alone LSTM and ELM Finally, in Section 4, we summarize our findings and make recommendations for future work.

Data Collection and Ingestion
This study utilizes the publicly available global climate and weather data from European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis 5 (ERA5) from 1979-2019, which can be accessed at https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressurelevels?tab=overview.ERA5 is the fifth generation ECMWF reanalysis of the global climate and weather.ERA5 provides hourly estimates of a large number of atmospheric, land, and oceanic climate variables [25].The ERA5 parameters used in the present study are 10 m wind speed, 10 mU wind component, 10 mV wind component, 2 m temperature, 2 m dewpoint temperature, total precipitation, and relative humidity.These hourly data were then averaged into monthly data.
We obtained 10,335,059 data comprising eleven features, namely month, year, longitude, latitude, 10 m wind speed, 10 mU wind component, 10 mV wind component, 2 m temperature, 2 m dewpoint temperature, total precipitation, and relative humidity.We then ingested large amounts of data using Apache Sqoop from a text file to HDFS as a parquet file and then read it with Spark to perform the computation, as shown in Fig. 1.

Proposed Architecture
The proposed method combines the LSTM layer and the ELM layer by producing two architectures to perform multivariate regression tasks, ELM-LSTM and LSTM-ELM.An ELM [18], [19] was proposed for SLFN architectures.ELM is a faster and more robust approach to training neural networks than deep neural networks (DNNs).LSTM [14] is a type of recurrent neural network (RNN) that aims to learn long-term dependencies to retain information for an extended period.LSTM utilizes previous time events to guide the next prediction.The ELM-LSTM architecture, as shown in Fig. 2a, consists of an ELM layer that uses many neurons (in this experiment, we used 50 and 100 neurons) with random initialization of LSTM weights, an LSTM layer with four nodes, and one linear output layer node.The LSTM-ELM architecture, as shown in Fig. 2b, consists of an LSTM layer with four nodes, an ELM layer of fully connected random initializations, and a linear output layer node.In ELM, for the proposed architecture, the input comes from the dataset for the ELM-LSTM architecture and from the LSTM layer for the LSTM-ELM architecture.The input weights are assigned at random and are never updated.The output weights are estimated by inverting the hidden output matrix [14].For a dataset containing a sample, the SLFN with hidden nodes and an activation function can be defined as follows: where  is the number of hidden neuron,  is the number of feature in instance, () is an activation function,   is the weight vector that connects the  ℎ hidden neuron to the input neuron,   is the feature values on the input neurons,   is the weight vector from the hidden neuron to the output neuron, and   is the threshold for hidden neuron.
From Equation (1), the hidden layer output ℎ and final output  is given as : The output weights  are unknown, but the target values  are known.So the hidden layer output ℎ() and the target values  can be used to find the output weights in the linear system, which can be written as: The hidden layer output matrix  and the target values  can be written as : ⋮  ,1 ELM uses mathematical theories and proves that the minimum error between the predicted and target values  occurs when the output weights vector  is determined as follows: The Moore-Penrose inverse is used to generalize the inverse matrix because most output matrixes are non-square. † is a Moore-Penrose inverse of a matrix H.The output from ELM in ELM-LSTM is used as input in the LSTM layer and the output from ELM in LSTM-ELM is used as input in the dense layer for the output layer in the proposed architecture.
In LSTM for the proposed architecture, the input comes from the dataset for LSTM-ELM and from ELM for ELM-LSTM.An LSTM cell [14], as shown in Fig. 3, comprises three gates, namely the input gate (), output gate (), and forget gate ().

Fig. 3. LSTM architecture [14]
The input gate is responsible for determining the input value to be used to update the memory state and the forget gate is in charge of deciding which information from the cell state should be removed.
where   is the input and t indicates the data serial number,  fx is the weight between the input and forget gate of LSTM, ℎ−1is previous hidden state value,  fh is the weight between the previous hidden state and the forget gate,   is the bias of the forget gate, and  is the sigmoid activation function [14].
The input gate  determines the amount of information to be added to the cell state: where  ℎ is the weight between the previous hidden state and the input gate,   is the weight between the sample input value and the input gate,   is the weight of input gate, and  is the sigmoid activation function.
The candidate values  determine the new information to be added to the cell state.
where  ℎ is the weight between previous hidden state and the current cell,   is the weight between the sample input value and the current cell, and  is the weight of the current cell.
The internal cell state   is updated by both adding new candidate values  by the input gate and removing some information from the previous cell state  −1 by the forget gate.
The output gate o determines the amount of information to be given out from updated cell state: where  ℎ is the weight between the previous hidden state and the output gate,   is the weight between the sample input value and the output gate,   is the weight of output gate.
The cell state is squashed with the tanh activation function before updating the hidden state ℎ.
Finally, the output gate serves as a final limiter on the actual output of the cell.Output from LSTM in LSTM-ELM is used as input in the ELM layer and output from LSTM in ELM-LSTM is used as input in the dense layer for the output layer in the proposed architecture.
The preprocessed dataset is used as input for these two architectures.The mean squared error (MSE) loss function is used to calculate the loss value from a number of  data.
where  is the actual data as the target, and  is the data predicted by the model.The training process is carried out by updating the weights until the model produces the smallest possible loss value.The overall loss value from the training process is used to update the model weights using backpropagation.In the backpropagation process, the Adam optimizer [26] accelerates the change in model weight.The backpropagation process does not change the ELM weight and only passes it for each learning step.

Feature Selection
This study employs the recursive feature elimination (RFE) method for ranking features [27].RFE is a feature selection wrapper that uses a filter-based feature selection technique.In each iteration, the important features are retained and irrelevant features are removed.These features are repeatedly eliminated until a particular threshold (the optimal number of required features) is attained.Recursion is necessary because, for certain metrics, the relative importance of each feature might change substantially when evaluated on different subsets of features during a stepwise elimination process (particularly for highly correlated features).The feature selection process itself consists only of retrieving the top -features of this ranking.
Tree-based models calculate the significance of features to keep the best-performing features as close to the root of the tree as possible.Frequently, when designing a decision tree, it is necessary to determine the best predictive feature.One of the features of the tree-based model is that it is calculated based on the Gini index.We examined the result of feature selection using RFE with random forest feature ranking [28].Gini importance (or mean decrease impurity) is calculated from the random forest structure.The random forest comprises several decision trees.Each decision tree consists of an internal network of nodes and leaves.In the internal node, the selected features are used to make decisions on how to split the dataset into two distinct sets containing similar responses.The features for the internal nodes are selected using several criteria; in the case of regression, it is variance reduction.The feature with the highest drop is selected for the internal node.For each feature, we can compile how the average reduces impurities.The average of all the trees in the forest indicates the significance of the characteristic [28].

Performance Evaluation
The accuracy of the models' predictions is measured using mean absolute error (MAE), which is the mean of the absolute difference between the models' predictions and the actual values.In addition, we employed the root-mean-square error (RMSE) to measure the prediction errors of various models using the same dataset.

Results and Discussion
This section details the proposed hybrid models that will be used to forecast relative humidity.Experiments are conducted on univariate and multivariate problems and the performance of LSTM, ELM and ELM-LSTM AND LSTM-ELM is compared.

Feature Selection
Feature selection using RFE and random forest feature ranking is compared in Table 1.RFE shows that there are three features that most influence relative humidity, namely, dewpoint temperature (d2m_0001), temperature (t2m_001), and wind speed (ws10_0001), whereas random forest feature ranking shows that only two features most influence relative humidity, namely dewpoint temperature and temperature.

Experiment Results for the Univariate Problem
We conducted experiment for the univariate problem to forecast next month's relative humidity based on the historical data of relative humidity over the past three months.In the first experiment, we employed a grid search to evaluate ELM-LSTM and LSTM-ELM for the univariate problem using four levels of the number of hidden neurons, 25, 50, 75, and 100, as presented in Table 2.
The results show that ELM-LSTM with a neuron count of 100 achieves the lowest MAE of 1.656789 and the lowest compute time of 0.656 seconds per epoch.We then employed a neuron count of 100 for the univariate problem for ELM, LSTM, and ELM-LSTM and LSTM-ELM as presented in Table 3. Fig. 4 shows the testing data for the univariate problem, as well as the forecasting results for the most recent three months for each of the four models.Blue represents the real data, orange represents the training data, and green represents the predictions.The high peak of testing values in Fig. 4(a) demonstrates the inability of ELM to forecast future values.

Experiment Results for the Multivariate Problem
We conducted experiments for the multivariate problem to forecast next month's relative humidity based on the historical data of relative humidity and the three best features (dewpoint temperature, temperature, and wind speed) over the past three months.We employed a grid search to evaluate ELM-LSTM and LSTM-ELM for the multivariate problem using four levels of the number of hidden neurons, 25, 50, 75, and 100, as presented in Table 4.The results show that LSTM-ELM with a neuron count of 50 achieves the lowest MAE of 26.85129 and the lowest compute time of 1.415 seconds per epoch.We then employed a neuron count of 50 for the multivariate problem for ELM, LSTM, ELM-LSTM and LSTM-ELM as presented in Table 5.

Discussion
Based on RFE as summarized in Table 1, the three best features that influence relative humidity are dewpoint temperature, temperature, and wind speed.Dewpoint temperature directly explains the relative humidity.The higher the dewpoint temperature, the more moisture contained in the air which is equal to high relative humidity.Humidity also increases as the temperature increases.Sun and Oort [29] studied the change in water vapor and temperature over a period of 26 years for both the lower and upper troposphere, which implies specific humidity rises with temperature.Wind speed influences humidity through the evaporation process since strong evaporation produces more water vapor, increasing humidity, hence evaporation increases as wind speed increases.Yu et al. [30] show the dominant role of wind force in the decadal change of evaporation and humidity in two possible mechanisms.The first is a direct mechanism, i.e., a stronger wind speed induces more evaporation.The second is an indirect mechanism by enhancing the wind-driven subtropical gyre, which can amplify the air-sea humidity gradients.
Since wind speed determines humidity and the area of interest in this paper is the Indonesia region, the variability in humidity in the Indonesian region depends on the season influenced by the Asian-Australian monsoon (AAM) system.The complex topography of the islands in the Indonesian seas creates an AAM wind path.From December to February, the Asian monsoon is characterized by northwesterly wind which blows from Asia to Australia, bringing humid air and causing the rainy season in most areas in Indonesia.Conversely, from June to August, the Australian monsoon blows from Australia to Asia and brings dry air [31]- [34].Thus, the AAM system is the main factor regulating the Indonesian climate, including humidity.Fig. 6 shows that LSTM-ELM outperforms ELM-LSTM, as measured by the change in the predicted value over the next 60 months.

Fig. 6. A sample of relative humidity forecasting for 60 months
We conducted a grid search to tune the hyperparameters for ELM, LSTM, ELM-LSTM and LSTM-ELM.The results show that the ELM-LSTM achieves the best performance for the univariate problem, and LSTM-ELM achieves the best performance for the multivariate problem.As shown in Table 2, the hybrid models LSTM-ELM and ELM-LSTM achieved lower MAE and RMSE than the standalone LSTM or ELM, and ELM-LSTM achieves the lowest MAE and RMSE for all the grid experiments with varying neuron levels.ELM has the fastest computation time compared to the other approaches, however it has a higher MAE and RMSE than LSTM.As shown in Table 4, the ELM-LSTM for the multivariate problem had MAE and RMSE values that were 0.01 larger than those of LSTM.However, ELM-LSTM and LSTM-ELM have a lower computation time per epoch compared to LSTM.Similar to the univariate problem, ELM has the fastest computation time compared to the others.We conclude that ELM-LSTM and LSTM-ELM provides good forecast results for relative humidity for the univariate problem and multivariate problem, respectively.LSTM-ELM is capable of providing future forecasts based on prior patterns of relative humidity change.Based on the ELM-LSTM model predictions, the humidity value does not fluctuate over time, hence ELM-LSTM tends to offer consistent forecasts, which is different from the real-world/historical data.

Conclusion
This study proposes the hybrid models LSTM-ELM and ELM-LSTM to predict future relative humidity.The proposed approaches are assessed on climate big data datasets taken from ERA5 over a period of 40 years .The LSTM and ELM are trained on a set of historical datasets aiming to capture patterns from the previous three months and then utilizing these to forecast relative humidity one month in the future (t+1).Given that the forecasts are taken monthly, the ELM-LSTM and LSTM-ELM approaches provide forecasts for one month in the future.The results show that ELM-LSTM and LSTM-ELM had the lowest MAE and RMSE values for the univariate problem compared to LSTM and ELM and LSTM-ELM and ELM-LSTM had a lower computation time than LSTM.Future studies will focus on multi-month relative humidity predictions with an alert system and will attempt to improve the forecasting quality using different time series models.

Fig. 5 Fig. 5 .
Fig. 5 compares the results of the different approaches for the multivariate problem between actual data (blue color), training data (orange color), and fore-cast data (green color).

Table 1 .
Feature Selection Result Comparison

Table 2 .
Grid Search Results for the Univariate Problem

Table 3 .
Comparison of the Results of LSTM, ELM, ELM-LSTM and LSTM-ELM for the Univariate

Table 4 .
Grid Search Results for the Multivariate Problem

Table 5 .
Comparison of the Results of LSTM, ELM, ELM-LSTM and LSTM-ELM for the Multivariate