Multi-step CNN forecasting for COVID-19 multivariate time-series

The new coronavirus (COVID-19) has spread to over 200 countries, with over 36 million confirmed cases as of October 10, 2020. As a result, numerous machine learning models capable of forecasting the epidemic worldwide have been produced. This paper reviews and summarizes the most relevant machine learning forecasting models for COVID-19. The dataset is derived from the world health organization (WHO) COVID-19 dashboard, and it contains official daily counts of COVID-19 cases, fatalities, and vaccination use reported by countries, territories, and regions. We propose various convolutional neural network (CNN) based models such as CNN, single exponential smoothing CNN (S-CNN), moving average CNN (MA-CNN), smoothed moving average CNN (SMA-CNN), and moving average smoothed CNN (MAS-CNN). Here, MAPE and MSE are used to assess the suggested models. MAPE is frequently used to compare accuracy across time series with different scales. MSE, the model must strive for a total forecast equal to the entire demand. That is, optimizing MSE seeks to create a forecast that is right on average and so unbiased. The final result shows that SMA-CNN outperformed its baselines in both MAPE and MSE. The main contribution of this novel forecasting approach is a more accurate result as a base of the strategy of preventing COVID-19 spreads.

Deep learning approaches have gained popularity in time-series modeling and analysis due to their generalization and nonlinear approximation [19]. Deep learning models are created by automatically combining neural network layers and extracting significant information from vast data [20]. This approach has been studied in several machine-learning applications. For example, the LSTM model predicts new COVID-19 cases in Canada from January-March 2020 [21]. In Russia, Peru, and Iran, an enhanced LSTM model predicts COVID-19 epidemic patterns [22]. Moreover, SVR, LSTM, BiLSTM, and GRU are used to forecast COVID-19 time-series data in 10 nations, and the result shows that BiLSTM data accessible through June 27, 2020, show higher performance [23].
Due to its performance, deep learning has been effectively applied to various real-world prediction challenges, including time-series forecasting. They make accurate forecasts despite the noisy and chaotic character of time-series forecasting. CNN is a popular, efficient deep learning approach [24]. CNN models can filter input data noise and extract more valuable characteristics for the final prediction model [25]. Standard CNNs are "feed-forward neural networks" that use filters and pooling layers, well-suited for spatial autocorrelation data but not complex and extended temporal relationships. Therefore, removing noisy samples improves temporal data representation and forecasting system accuracy by highlighting relevant patterns. Smoothing strategies help track data seasonality and improve deep learning performance [26]. Due to their simplicity and firm performance in time series forecasting, most strategies use moving averages [27] and seasonal exponential smoothing [28]. Smoothing improves interpretability and integrates the data series changing pattern into the prediction model. Furthermore, a CNN model automatically generates forecasts from the smoothed results. The remaining parts of this work are structured as described below. The section titled "CNN-based Forecasting Using Smoothing Approach" provides a detailed explanation of the smoothing technique used for the CNN time series and an explanation of the experimental design. The dataset, the data normalization procedure, the forecasting process, and the key performance indicator are all presented in the "Materials and Methods" section. The "Results and Discussion" section presents the findings and an in-depth examination of the experiments. A summary of the research is provided here, along with a discussion of the numerical experiments. The last section, conclusions, gives a summary of the overall findings as well as potential areas for further research.

Method
To carry out research in a manner that is more methodical, we planned the experiment in the manner depicted in Fig. 1. We used a variety of datasets to evaluate the effectiveness of several smoothed CNN compared to basic CNN. Fig. 1 shows that the experimental design employed in this investigation included five scenarios. The five scenarios are as follows: (1) data is processed directly with CNN; (2) data is smoothed using single exponential smoothing and then processed with CNN (S-CNN); (3) data is smoothed using MA and then processed with CNN (MA-CNN); (4) data is smoothed using CNN. Smoothing using single exponential smoothing first, then MA (SMA-CNN); and (5) smoothing using MA first, followed by smoothing using single exponential smoothing again, which is then processed using CNN (MAS-CNN).
The data quality can be improved by using data smoothing [29]. When applied to time-series data, the smoothing method gets excellent results after removing any outliers that could be present in the data [30]. This method can be easily comprehended and applied successfully in the new study without referring to or taking parameters from previous investigations. By taking the average of the previous values in a time series, smoothing processes make predictions more accurate. The algorithm provides a weighting value assigned to past observations to reduce noise, smooth the value of fluctuations in the data being utilized, and anticipate future values. In general, there are various common types of data smoothing. Single exponential smoothing (S) [31] and moving average (MA) [32] are two more frequent types of data smoothing. Smoothing is a strategy that can assist researchers in predicting trends when they are asked to do a forecasting task.
The exponential window function is utilized in the single exponential smoothing (S) method, a ruleof-thumb approach to smoothing time-series data [33]. Exponential functions are used to apply weights diminishing at an exponential rate over time. It is easy to understand and apply when making judgments based on the user's past assumptions, such as seasonality, and it does not require much time. The Moving Average (MA) is a formula used to examine data points [34]. It begins with creating a series of averages of various subsets of the complete dataset and then continues connecting those averages in the shape of a line.

Fig. 1. Experimental Design
The use of single exponential smoothing by itself is not sufficient since it has the drawback of not being appropriate for anticipating data in seasonal and long-term periods, and the accuracy obtained is still inadequate. This makes the use of single exponential smoothing insufficient. In light of this, the motivation for the hybridization of smoothing techniques derived from single exponential smoothing with moving average (SMA) or vice versa moving average with single exponential smoothing (MAS) stems from the findings of this research.

Dataset
This research used five datasets through the application of COVID-19, which had information from five countries with the highest number of instances anywhere in the globe. The dataset was obtained from the publicly available WHO website, which may be viewed and downloaded at https://covid19.who.int/WHO-COVID-19-global-data.csv. The WHO website is open to the general public. The information about the five countries mentioned above is included in Table 1. The time that will be looked at for this research starts on January 3, 2020, and goes through August 1, 2022. The new deaths attribute stores information on the number of deaths that occur each day, while the cumulative deaths attribute stores information on the total number of deaths that have occurred. The new cases attribute stores information on the number of new cases that occur each day, while the cumulative cases attribute stores the cumulative number of additional cases. The data utilized in this research is illustrated in Fig. 2, which provides a visual representation of the data.

Data Normalization
Scaling a character into a particular range required by the activation function can only be accomplished through data normalization, an essential component of CNN [35]. The process of data normalization is utilized to address this issue. Since one of the primary goals of data normalization is to ensure the quality of the data before it is given to any model, its impact on the performance of any model is significant. The Min-Max normalization method was utilized in this research. Even though it is ineffective in dealing with outliers, the technique ensures that all characteristics have the same scale. The Min-Max formula is shown in (1), which produces normalized data with smaller intervals that fall inside the range 0-1 [36].
′ is the outcome of normalizing the data, is the data that has to be normalized, is the minimum value of all the data, and is the maximum value of all the data.

Forecasting Process
In this study, the dataset used for predicting with CNN is first smoothed using single exponential smoothing and moving averages. Single exponential smoothing is employed. One may observe the equation for single exponential smoothing in (2).
The smoothed data is a result of smoothing the raw data { }. The smoothing factor α, is a variable that specifies the smoothing level [37]. The interval for is between 0 and 1 (0 ≤ ≤ 1) [38]. When α is close to 1, the learning process is accelerated because the smoothing effect is diminished. In contrast, values of α closer to 0 have a greater smoothing effect and are less sensitive to changes in the recent past [39]. Not all cases have the same value for . Therefore, we determine the optimal dataset's properties smoothing factor value based on the dataset's properties [40]. The optimal alpha for single exponential smoothing is derived from (3). Then there is no need to manually attempt each α value from 0 to 1.
So that the optimum single exponential smoothing ( ) to improve the CNN algorithm performance used comes from the substitution of (3) to (2) results in the following (4).
In the meantime, smoothing using moving average (MA) considers all of the data and uses a somewhat extended backward period. Data from the past are never left out of the computation. However, their weight in the final result is relatively minimal due to the nature of the moving average. It can illustrate ongoing trends while simultaneously eliminating fluctuations thanks to noise reduction. The data is smoothed using a moving average of either one month or thirty days when it is smoothed using MA. It is possible to visualize the MA in (5).
is the outcome of smoothing the data using , where is the definition of each data point, and is the number of periods.
The CNN algorithm is the primary focus of this research. CNN employs the fundamental Neural Network (NN) algorithm with additional layers. Because of its effectiveness, CNN has garnered much attention in computer vision and image processing. CNN uses a convolution layer that can process the spatial information in images, while fully connected layers are equipped with a memory that allows them to store information from time-series data. The input given to the model, an image matrix for computer vision problems and a 1D array for time series forecasting, is the only thing that differentiates computer vision problems from time series problems. The observation sequence can treat the raw input data as a one-dimensional array, which the CNN model can then read and filter. Therefore, the use of this theory in the time-series analysis is possible. CNN architecture as shown in Fig. 3

Fig. 3. CNN architecture
CNN's architecture comprises the following layers: an input layer, a convolutional layer, a pooling layer, a flattening, and a fully connected layer, as well as an output layer. Convolutional and pooling layers are designed to filter input data and extract valuable information for a fully connected network layer. Convolutional layers use raw input data and kernels to create new feature values. This technique was designed to extract image features from structured matrices. The convolution kernel (filter) is a narrow window containing coefficient values in matrix form. All these procedures result in a convolved matrix representing a feature value specified by the filter coefficients and dimension size. By applying alternative convolution kernels to the input data, additional convolved features can be formed, usually more helpful than the original beginning features, boosting the model's performance.
A nonlinear activation function follows convolutional layers. Two typical activation functions are the sigmoid function and the rectified linear unit (ReLU). Both can be stated using (6) and (7) [41]. A pooling layer subsamples convolved features to create a lower-dimensional matrix. As with the convolutional layer, the pooling layer uses a small sliding window to take the values of each patch of convolved features and output one new value. Maximum and average pooling calculate each patch's maximum and average values. The pooling layer creates additional matrices that summarize the convolutional layer's features. Small input changes will not affect pooled output values, making the system more robust.
A list of CNN parameters can be adjusted in several different ways, depending on the application. Research [40] provides the basis for establishing the CNN parameters used in this investigation. In this work, we modified the parameter settings in the fully connected layer by optimizing the hyperparameter tuning using particle swarm optimization (PSO) [42], was done so that everything would not be precisely the same. The main reason is that the fully connected layer of the CNN reflects a more comprehensive set of features than the convolution layer. Each neuron in a fully connected layer is connected to all of the neurons in the layer below it [43]. A list of CNN forecast component parameters can be seen in Table 2. The Activation function output ReLU

Type of optimizer Adam
The number of epochs 100 The batch size 64

Key Performance Indicator
All experiments in this research were evaluated using key performance indicators, mean absolute percentage error (MAPE), and mean square error (MSE). In order to display errors in a manner that indicates accuracy, the MAPE metric is applied [27]. The MSE is a metric that can be used to detect outliers in a prediction system that has been created [44]. The MAPE and MSE values should be lower for a more accurate results prediction. The equation of MAPE and MSE can be seen in (8) and (9) [45].

Results and Discussion
All of the data had been smoothed down by the experimental design of this research. At the same time, CNN was applied to the data. The outcomes of the forecasting evaluation were obtained in the form of MAPE and MSE, which can be viewed in Table 3 and Table 4. The MAPE for the new case forecasts is presented in Table 3. According to the table, we can conclude that the performance of CNN can be enhanced by utilizing exponential smoothing (S), moving average (MA), and its combination. Because of the treatment, there is a decline in the value of the MAPE. When applied to Dataset 3, SMA-CNN achieves the most outstanding performance of 9.03067. While using CNN, the worst possible acceptable result is 10.90335 for Dataset 1. As a result, SMA-CNN is the most effective algorithm for forecasting across all of the datasets included in this research. There is a connection between sensitivity and the application of MSE in performance testing. According to Table 4, the MSE is reduced when the utilized algorithm is more complicated. In other words, SMA-CNN has a higher sensitivity than any other CNN-based algorithm that was tested for this study. Because the value of MSE is not very high, it can be deduced that the algorithm under consideration can recognize anomalies within the datasets used for forecasting. The results shown in Table 5.  Table 5 are comparable to those in Table 3. The MAPE that is the lowest is 9.02606, while the MAPE that is the greatest is 10.97463 (Dataset 5 using CNN). In every dataset, the performance of SMA-CNN was superior to that of CNN, S-CNN, MA-CNN, and MAS-CNN. The value reflects the International Journal of Advances in Intelligent Informatics ISSN 2442-6571 Vol. 9, No. 2, July 2023, pp. 176-186 consistency of the accuracy of the forecast, although, under the same situations (Datasets 1, 2, and 5), MA-CNN performs somewhat better than MAS-CNN does. MSE of New Deats as show in Table 6.  Table 6 has a similar pattern to Table 4. According to the data presented in the table, SMA-CNN has the smallest MSE, with 0.00174 being the best possible result (Dataset 4). This score, which is very near zero, indicates that all CNN variations can make accurate predictions. The MSE for SMA-CNN is the lowest of all the datasets.
S-CNN performs more accurately than CNN in most elements of the comparison, resulting from the application of optimal alpha, which was carried out [40]. The optimal value of alpha compels the process of smoothing to arrive at its optimal state, which results in a result that is both quick and accurate. In addition, the PSO hyperparameter tuning can produce an ideal model by applying the algorithm, thereby reducing errors in the dataset [42]. This smoothing process focuses more on values with the moving average timing in a given period, which can make the data more stable. MA-CNN may also improve results because of this smoothing process's attention to these values. It has been demonstrated that the smoothing technique known as smoothing moving average (SMA) has a performance that, when combined with exponential smoothing and moving average, can make timeseries data prone to high volatility more stable.
On the other hand, MAS-CNN has been shown to have worse performance than MA-CNN in several tests. The series has no choice but to take on a linear form due to the exponential smoothing process after the moving average. Because of this condition, the absolute and mean square errors may increase. As a result of the MAPE and MSE, the SMA-CNN is considered the most practical combination of the moving average and exponential smoothing. We can use other exponential smoothings for further research, such as double exponential smoothing and triple exponential smoothing.

Conclusion
This research aims to improve the efficiency of CNN, an algorithm frequently utilized for image processing, by applying a smoothing strategy to its time-series analysis. Based on the investigation findings, one may conclude that the SMA-CNN model with the optimal smoothing factor performs significantly better than the other CNN-based forecasting smoothing technique techniques. The SMA-CNN model used in this investigation yields the highest-quality assessment results. The usage of moving averages in combination with single exponential smoothing is continued as a data preparation strategy since it considerably improves the effectiveness of the forecasting algorithm. Although the results of this research have addressed the study's objectives, there are still some limitations. The implementation of smoothing techniques that are optimized in CNN methods is the primary subject of this study. As a result, research will be conducted shortly to determine how applying this strategy to more sophisticated deep learning algorithms (such as LSTM, DBN, and RBF) would affect the results. The next item we will concentrate on is conducting a more in-depth study of the various smoothing methods that may be implemented using double or triple-exponential smoothing. In the future study that is conducted, both beta optimal and gamma optimum will be taken into consideration.