Comparing of ARIMA and RBFNN for short-term forecasting

Currently, time series forecasting methods are constantly evolving where this method is a quantitative approach with past data as a basis for forecasting [1]. Therefore, various forecasting techniques based on mathematics is one of the oldest models (i.e. autoregressive-AR, moving averageMA, exponential smoothing-ES and autoregressive integrated moving average-ARIMA) in which many of researchers have been using these techniques. Some researchers have proposed ARIMA models to predict network traffic in ICT at Mulawarman University in East Kalimantan in the period of June 20-24, 2013 [2]. In the economics area, ARIMA models have been used for estimation of Malaysia Crude Oil Production (MCOP) from January 2005 to May 2010 [3]. In the hydrologic area, ARIMA models have been proposed for the forecasting of monthly inflow of Dez dam reservoir from 1960 to 2007. The statistics related to the first 42 years were used to train the models and the 5 past years were used to forecast [4]. All those researchers have confirmed that by using ARIMA, good results and accuracy can be obtained. Although mathematics models are proved to be reasonably powerful, but it still has some obstacles especially when applied to non-linear data.


I. Introduction
Currently, time series forecasting methods are constantly evolving where this method is a quantitative approach with past data as a basis for forecasting [1].Therefore, various forecasting techniques based on mathematics is one of the oldest models (i.e.autoregressive-AR, moving average-MA, exponential smoothing-ES and autoregressive integrated moving average-ARIMA) in which many of researchers have been using these techniques.Some researchers have proposed ARIMA models to predict network traffic in ICT at Mulawarman University in East Kalimantan in the period of June 20-24, 2013 [2].In the economics area, ARIMA models have been used for estimation of Malaysia Crude Oil Production (MCOP) from January 2005 to May 2010 [3].In the hydrologic area, ARIMA models have been proposed for the forecasting of monthly inflow of Dez dam reservoir from 1960 to 2007.The statistics related to the first 42 years were used to train the models and the 5 past years were used to forecast [4].All those researchers have confirmed that by using ARIMA, good results and accuracy can be obtained.Although mathematics models are proved to be reasonably powerful, but it still has some obstacles especially when applied to non-linear data.
For that reason, many researchers have also tried to apply artificial neural networks-ANNs (i.e.backpropagation-BPNN, radial basis function-RBFNN, and recurrent neural network-RNN) to improve the prediction accuracy by using data non-linear.An approach using ANNs has been proposed to predict network traffic by using BPNN [5] and predict the students' achievement by using RBFNN [6].In the economics area, ANNs models have been used for stock market predictions [7,8].In the hydrologic area, ANNs models have been proposed by researchers to predict the weather, wind speed, and rainfall [9,10].
However, one of the important issues on ANNs is the training or learning of the networks in which to find a set of optimal network parameters.These issues are the drawbacks of ANNs (i.e. over fitting, local minimum, and slow convergence).Then, hybrid models by using mathematics or ANNs models itself is a solution to improve of ANNs performances.Recently, numerous researchers have been trying related model combining as an alternative in prediction area including, ARIMA with RBFNN, ARIMA with BPNN, BPNN, RBFNN with genetic algorithm (GA), particle swam optimization (PSO) has been proposed to provide better prediction performance [1,7,8,11,12].Therefore, this paper will apply two models, namely ARIMA and RBFNN that have been developed and compared in order to predict the tourist quantity to Indonesia.Section 2 describes the architectures of ARIMA and RBFNN models.Section 3 explains the time series predictor and models.Section 4 describes the analysis and discussion of the results.Finally, conclusions are summarized in Section 5.

II. Methodology
In this section, a brief information on the general tourist quantity prediction models is presented including time series models, ARIMA, and RBFNN.

A. Time Series
The time series is a dataset of observations ordered in time.A time series is an ordered sequence of observations and many ways are used to forecast the time series data.In principle, a time series model is used to predict the values of data (yt+1, yt+2,…,yt+n) based on the data (xt+1, xt+2,..., xt+n).In this experiment, data tourist quantity 1974-2013 (40 years of samples) was captured from BPS website http://www.bps.go.id,Table 1 and Fig. 1.Then, the data are analyzed by using MATLAB R2013b.The ARIMA and RBFNN were engaged.B. ARIMA One of the famous methods used in forecasting a time series data is ARIMA.The ARIMA method is used to analyze a time series data in which it is designed by integrating the AR (autoregressive) and MA (moving average) methods.The ARIMA (p, d, q) is a general method that is formulated with respect to the data series that are stationary only, where, p is the number of processes in AR, d is the number of differencing a time series of data to be stationary, and finally, q is the number of processes in MA.According to the Box-Jenskins methodology [13], there are four forecasting stages, that includes; (1) identification model; The data series will be carefully examined in order to determine whether the series contains a trend, seasonality, cycles or random phenomena.After that, the sample ACF and PACF of the original series are computed and examined in order to further confirm that the time series data is stationary.If the sample ACF decays very slowly, it indicates that differencing processes are needed, (2) parameter estimation; the purpose of model validation is to ensure that the right model is used.In this study, it can be done by using t-statistic and p-value, (3) model checking; the purposed model needs to be hypothesized and to have diagnostic test before it can be used for forecasting.In this test, we checked by p-value > ߙ 0.05, and (4) forecasting; the forecasted values in confidence limit (upper and lower limits) provide 95% confidence interval.In this study, we used the trial and error method to get good model and prediction.

C. RBFNN
The RBFNN emerged as a variant of ANN in late 80's is a kind of feed-forward neural network (FFNN).The RBFNN structure has a three-layer FFNN which includes an input layer, single only of hidden layer with RBF neurons (Euclidean distance between the input signal vector and parameter vector of the network) and an output layer with linear neurons.Hence, the RBFNN has a unique training algorithm including supervised and unsupervised as well.Furthermore, RBFNN learning philosophy can be differentiated into two stages: first stage, self-organizing learning stage, solving the center and change of the hidden layer base functions; second stage, mentor learning stage, this stage is unwinding weights which is between the hidden layer and output layer [11,12].In this study, we used three layers and Euclidean function as an activation function (1).Furthermore, in this experiment we used the mean square error (MSE), then comparing the predicted output with the desired output between ARIMA and RBFNN.The architecture of RBFNN as shown in Fig. 2. , where: Y output value, φ = hidden layer value, W = weights (0-1) (1) The algorithm of RBFNN to analyze within time series data characteristics is: 1. Initialization of the network; randomly selecting some training and testing samples as the vectors P(t-0)=[p(t-5), p(t-4), ..., p(t-n)], where n is a series data.
2. Find, Dij distance between i to j i,j=1,2,…,Q, where Q is input-output vectors, R is input variable.
3. Find ܽ1, where ܽ1 is a result activation from distance data multiply bias, spread is constant

B. Analysis using RBFNN
In the second experiment, the tourist arrivals to Indonesia data were tested using RBFNN technique.Based on ANN's rules, the data were divided into training and testing data.The inputs and tests data would be normalized.The aim of the normalization process is to get the data with a smaller size that represents the original data without losing its own characteristics.In this experiment, the training data was 86% (30 samples series data) and testing was 14% (5 samples series data) as shown in Table 2.The normalization formula form is as follow, where, X is the actual value of samples, ܺ ௫ for maximum value, and ܺ is the minimum value.
In MATLAB function, the RBFNN can creating by newrb(P,T,error_goal,spread) function, which is this function create RBFNN structure, automatically selected the number of hidden layer and made the error to 0. In this study, we tried the sum-square error (SSE) goal values were 0.001, 0.002, and 0.003.The spread value of 200 was settled.In this experiment, we decided the RBFNN with SSE value was 0.001, spread was 200 as a good model.The RBFNN results are shown in

IV. Results and Discussions
This section describes the test of tourist arrival data using two different models.Table 3 shows that the error prediction of ARIMA and RBFNN.We choose the MSE as an error prediction.The ARIMA error prediction was 0.00722784 and RBFNN was 0.00098188.This mean that the tourist arrival results had a good prediction accuracy by using the RBFNN technique with the setting parameters, spread was 200 and error goal was 0.001.In this study, to compare the predicted output with the desired output, MSE was predefined, as shown in Table 4.Then, the best results of MSE by using RBFNN, which that mean the RBFNN was good accuracy.The comparison prediction between ARIMA and RBFNN models of 5 years ahead, as shown in Fig. 8.

V. Conclusions
This paper has presented the performance comparison of statistical and machine learning techniques, namely ARIMA and RBFNN, in learning time series data.The mean squared errors are computed for each model and compared.Based on the results obtained, the RBFNN algorithm is found to be more efficient than ARIMA in modelling time series dataset related to tourist quantity of Indonesia.Furthermore, the future works include a comparison of a few ANN methods and the optimization process in order to obtain more accurate forecasting results.

Table 2 .
Real tourist arrival data after normalization

Table 3 .
Comparison of MSE from ARIMA and RBFNN models

Table 4 .
Predicton results of tourist arrivals to Indonesia in 2014-2018