Monthly rainfall prediction based on artificial neural networks with backpropagation and radial basis function

Indonesia as the largest archipelagic state, is one of most vulnerable countries to the negative impact of global climate change. Global climate change models predict all areas in Indonesia would suffer changes in patterns and intensity of rainfall [1]. In addition, the trend of changes may show significant variations for monthly, seasonal, and even inter-annual time-scale [2]. Therefore, the occurred variations make the rainy season and the dry season become more uncertain and difficult to predict.


Introduction
Indonesia as the largest archipelagic state, is one of most vulnerable countries to the negative impact of global climate change.Global climate change models predict all areas in Indonesia would suffer changes in patterns and intensity of rainfall [1].In addition, the trend of changes may show significant variations for monthly, seasonal, and even inter-annual time-scale [2].Therefore, the occurred variations make the rainy season and the dry season become more uncertain and difficult to predict.
The most noticeable negative impacts of changes in patterns and intensity of rainfall are forest fires and floods.In 1994In , 1997In , 2002In and 2015, forest and peat fires struck South Sumatera causing hundreds of thousands of hectares of concession and conservation land damaged [3], [4].Recorded presence of fire in South Sumatra reached 40% of total fires throughout Indonesia (2.1 million hectares).In addition, smoke generated from forest fires affects the health of the majority of people with acute respiratory infections.Therefore, this research builds an accurate method for rainfall prediction.
In general, there are three types of prediction methods: physical law [5], statistical analysis [6], and soft computing [7].while the third one is based on numerical model.The first method involves the study of the rainfall processes in order to model the underlined physical law.However, this method is very difficult to applied because the rainfall is influenced by a number of hydrological parameters and is limited in both the spatial and temporal dimensions [8], [9].Thus, this method requires a various complicated calculation.The second method (statistical method) is including the Simple Linear Regression (SLR), Exponential Smoothing (ES), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), and Generalized Autoregressive Conditional  (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013) was used as the input data, in which 190 data were used as training data and 48 data used as testing data.Rainfall data has been tested using architecture BPNN with various learning rates.In addition, the rainfall data has been tested using the RBFNN architecture with maximum number of neurons K = 200, and various error goals.Statistical analysis has been conducted to calculate R, MSE, MBE, and MAE to verify the result.The study showed that RBFNN architecture with error goal of 0.001 gives the best result with a value of MSE = 0.00072 and R = 0.98 for the learning process, and MSE = 0.00092 and R = 0.86 for the testing process.Thus, the RBFNN can be set as the best model for monthly rainfall prediction.
Heteroskedasticity (GARCH).Note that the ARIMA model is used by the Agency for Meteorology, Climatology, and Geophysics (BMKG) for short-term weather predictions.Nevertheless, the statistical method has a limitation in which it is not suitable for nonlinear time series data [10].It is known that the rainfall contains of nonlinear as well as stationary data.On the other hand, the soft computing method could deal with both linear and nonlinear data.For example, the Artificial Neural Network (ANN) is a soft computing method that has capability to identify nonlinear data pattern with learning approach [11].This method is quite simple and practical to use in the case of prediction, but also has good accuracy.
Many researchers predicted rainfall using ANN.Abhishek et al. [12], have used ANN to develop weather forecasting.The training parameters (activation function) are applied differently in three different architectures.In view of the Mean Square Error (MSE) value, that overall architecture yields the value of MSE is still somewhat large.Mislan et al. [13], applied ANN techniques with backpropagation algorithm to predict rainfall.The network model is made into 3 architectures.From the test results show that the value of MSE produced by each architecture is still somewhat large but better than previous research.Wahyuni et al. [14], have used ANN with backpropagation algorithm to develop rainfall prediction models.The data used were taken from the period 2005-2014, where 50% was used for the training process, and 50% for the test process.The focus of the research is to search for the most optimal model parameters.For this purpose, this research applies different parameters in each built model, including the number of hidden layers, learning rate, and epoch.The value of MSE is still large.
ANN used in the early research, showed inaccurate prediction results.To get a better prediction with a small error value, it is necessary to present a solution to the problem.In this research, we apply different BPNN algorithms.The difference lies in the great architecture (each 100 neurons in each of six hidden layers).The use of training parameters will vary, such as the activation function between hidden layer connections (logsig, tansig, and purelin), and learning rate (0.05, 0.1, and 0.3).The results obtained from BPNN will be compared with other algorithms, namely Radial Basis Function Neural Network (RBFNN).

Data and Research Area
The data used in this study were obtained from the National Oceanographic and Atmospheric Administration (NOAA) (https://www.esrl.noaa.gov/psd/data/gridded/data.gpcc.html).The data have temporal resolution of monthly and spatial resolution of 0.5 in both latitude and longitude.The data cover a period of January 1994 -December 2013.Meanwhile, the research area is bounded by a coordinate of 2.5-3 S and 104,5-105 E. These coordinates cover most of the Palembang City and a small part of the Banyuasin Regency, South Sumatera Province, Indonesia

Pre-processing Data
The downloaded data cover a globe and are in the form of a NetCDF file.We extracted the data using the Grid Analysis and Display System (GRADS) software by selecting the data on the research area only.Afterward, the selected data were reprocessed using software MATLAB R2008a and MS.Excel to get final (as presented in Table 1).These final data are, then, become input pattern for the Artificial Neural Network (ANN).Note that prior to use in the process of training and testing ANN, the data were normalized using sigmoid function (1) in order to get the rainfall value between 0-1.
here,  is the minimum value and  is the maximum value.Furthermore,  is the original data, while ' is normalized data.The original time series of the nonlinear rainfall data are shown in Fig. 1.

Define Input Patterns, Training Data, and Testing Data
The ANN-rule determined input data consists of two neurons P = [p (t-2), p (t-1)], while the output data consist of 1 neuron, p '(t).These input data were obtained from the original rainfall data after a normalization using (1).Two  neurons showed a pattern of training data, where the first month (Jan 1994) and second month (Feb 1994) became input for the target in the third month (Mar 1994).As for the 4th month target (Apr 1994) then the 2nd month (Feb 1994) and the 3rd month (Mar 1994) into input data.This rule pattern continues to apply until it reaches the 240th month data (Dec 2013) as the target and the 238th month (Oct 2013) as well as the 239th (Nov 2013) as input.Using this technique, we will get 238 data patterns or group (Table 2).In order to get a good result, the final rainfall data were divided into two part, namely the training data and the testing data.In this study, the training data are 80% of the total data (pattern 1-190), while the testing data used the remaining 20% (pattern 191-238).
The data as presented in Table 2 will be applied to both the Backpropagation Neural Network (BPNN) and Radial Basis Function Neural Network (RBFNN) algorithms.

BPNN Algorithm
Backpropagation Neural Network (BPNN) is one of the algorithms owned by ANN with supervised learning method [15].In general, BPNN works to feed forward input signals to a hidden layer which is then forwarded to the output signal display.From the output layer do feedback to input layer accompanied by a change of weight between the layer connections.The BPNN has three layers consisting of an input layer, a hidden layer, and an output layer.In general, the development steps of BPNN for rainfall prediction are detailed as follows: 1) Normalization and segregation of data as shown by points B and C.
2) Designing BPNN (Fig. 2) to determine the number of input data layer, hidden layer, and output layer and define the training parameters used.
3) Testing.This stage is intended to confirm the ability of BPNN during the training process and determine the accuracy of the prediction.
The architecture of BPNN is shown in Fig. 2. It consists of 2 neurons in the input layer, 1 neuron on the output layer, and 6 hidden layers consisting of 100 neurons in each layer.The activation functions used from the input layers to the output layer are respectively tansig, tansig, logsig, logsig, logsig, tansig, purelin.Meanwhile, the learning algorithm uses the traingdx algorithm.In addition, the training parameters consists of performFcn = mse, error goal (eg) = 0.01, epoch = 2500, momentum constant (mc) = 0.95, learning rate (lr) = 0.05 (Model BPNN1), 0.1 (Model BPNN2), and 0.3 (Model BPNN3), respectively.The training algorithms of the BPNN are as follows: Step 0: Initialize all weights with small random numbers.
Step 1: If the termination condition has not been met, then it goes to steps 2-8.

Phase 1: Feedforward propagation
Step 3: Each input unit receives a signal and passes it to a hidden unit above it.
where   is the unit of error to be used in the changing weights of the subsequent layer (step 7).We than calculate the weight change   (to be used later to change the weight   ) with the learning rate .
Step 7: Calculate  factor of the hidden unit based on the error in each hidden unit  ( = 1, 2, ..., ) The  factor of the hidden units is defined as .
Calculate the change of weight  (to be used later to change the weight ) by the following equation The line-weight change leading to hidden unit is defined as

RBFNN Algorithm
The Radial Basis Function Neural Network (RBFNN) is one of the algorithms owned by the ANN.Its algorithm works based on the theory of overseeing and unattended functioning or supervisory learning that work simultaneously (hybrid).The RBFNN algorithm is similar to the Feed Forward Neural Network (FFNN), in which its architecture has an input layer, a hidden layer, and an output layer [16].The hidden layer of the RBFNN has a uniqueness, namely the number of layers only 1 with a Gaussian activation function (13) and the activation function of the output layer is linear [17], [18].It is defined as, where ||1  −   || is the Euclidean distance, c is the center of Gaussian function, and   is the input data.
The architecture of RBFNN is shown in Fig. 3.It consists of 2 neurons in the input layer, 200 neurons in the hidden layer, and 1 neuron in the output layer.The output value of the RBFNN is defined as where y is the output value,  is the hidden value, and  denotes the weights.The RBFNN algorithm consist of several steps as follows [19]: 1. Initialization of the network.
2. The second step is to find the distance   between  and  (,  = 1, 2, . . ., , where  is the inputoutput vector, and  is the input variable).The distance is defined as 3. The third step is to find a1, which is defined as, 2 ln(0.5) In the fourth step, we calculate the weight and bias.Note that   is the new weight,   (t) is the weight at , and  is the learning rate.The weight is defined as, .

Evaluation of Predictive Accuracy
The model reliability is evaluated using statistical analysis [20]- [26] by calculating several statistical parameters, namely the correlation coefficient (R), the Mean Square Error (MSE), the Mean Bias Error (MBE), and the Mean Absolute Error (MAE).The above mentioned statistical parameters are defined by following equations: , where  is output network,  is the actual data, and  denotes the number of data patterns.

Results and Discussion
In this section, the result from both the training and the testing process of the two algorithms used in this study (e.g. the BPNN and RBFNN) will be discussed.In the BPNN architecture, various levels of learning rate (lr) have been applied, i.e. lr = 0.05, 0.1, and 0.3, but with similar momentum constant (mc) = 0.95.Meanwhile, in the RBFNN architecture, we have applied various error goals (eg), i.e. eg = 0.001, 0.002, and 0.003.The other parameters used in the RBFNN are the spread = 1, the maximum number of neurons K = 200, and the display number of neurons Ki = 1.Table 3 shows the statistical analysis of the output from both the BPNN and RBFNN.It is shown that the RBFNN algorithm with eg = 0.001 has the best accuracy among other models.This architecture has the smallest MSE = 0.00091681.
The BPNN prediction results are lower than the actual value.It can be proved by negative value MBE at lr = 0.05 for -0.0006559 (training) and -0.025391 (testing).In contrast to lr = 0.1 and lr = 0.3, the predicted BPNN results are higher than the actual value with MBE which is positive.For absolute error (MAE), the average BPNN prediction result is 0.0759 (training) and 0.15639 (testing).
While on RBFNN, the predicted result is slightly lower than the actual value.This can be evidenced by MBE values of negative value (training and testing) on eg = 0.001 and eg = 0.002.In contrast to eg = 0.003, RBFNN predicted results are slightly higher than actual values with MBEs with positive values, ie 2.27E-14 (training) and 7.49E-16 (testing).For absolute error (MAE), the average RBFNN prediction is 1.1042 (training) and 1,125 (testing).In this research, the duration of the iteration time was also investigated.The Iteration time training has met the best performance for each parameter algorithms.In the BPNN with lr=0.05, the iteration has been achieved in 240 seconds and reached the epoch 3714.Meanwhile, in the BPNN with lr=0.1, the iteration time was 249 seconds and reached the epoch 4070.However, in the BPNN with lr=0.3 has demonstrated longer iteration time with 273 seconds and reached the epoch 4651.On the other hand, in the RBFNN with error goal (eg) is different, it only takes an average of 15 seconds to achieve its best performance.
The training and testing results from those two algorithms are presented in Fig. 4.Each figure compares the output from each algorithm with different learning rate (e.g. the BPNN algorithm) and error goal (e.g. the RBFNN algorithm) for the outputs of both the training process and the testing process.In general, the predicted results of RBFNN (red line) are much dense in following actual data patterns or observations (black line) than the BPNN prediction results (blue line).In order to quantitatively evaluate the model performance, we then calculate the correlation between the observed monthly rainfall and the model outputs from each algorithm with various parameters during the training process (Fig. 5).It is clearly shown that the RBFNN algorithm with eg=0.001 has better correlation with the observation compared to the other 3 models of the BPNN algorithm.The correlation coefficient of the RBFNN is R=0.98 and the regression equation is y=0.97x+6.3.Meanwhile, the correlation coefficient of the BPNN algorithm with lr = 0.05, 0.1, and 0.3 are 0.80, 0.81, and 0.80, respectively.Therefore, the RBFNN algorithm with eg=0.001 is the best model for monthly rainfall prediction.Meanwhile, the regression analysis between the observed monthly rainfall and the output from the testing process for both the RBPNN and RBFNN is shown in Fig. 6.The correlation coefficient of the RBFNN (eg=0.001) is R=0.86 and the regression equation is y=0.74x+54.Meanwhile, the correlation coefficient of the BPNN algorithm with lr = 0.05, 0.1, and 0.3 are 0.39, 0.55, and 0.37, respectively.

Conclusion
In this study, two AAN method of algorithms namely the BPNN and the RBFNN have been used for monthly rainfall prediction.The statistical analysis has been performed to evaluate the prediction accuracy of each algorithm.It is found that the RBFNN algorithm with eg=0.001shows a better results compared to the BPNN algorithm for both learning and testing processes.The correlation coefficients between the model output from the RBFNN algorithm and the observation for learning and testing process are 0.98 and 0.86, respectively.To minimize errors, the architecture of BPNN must have a long hidden layer with a large number of neurons.However, it takes a long time to find the best performance.Meanwhile, the RBFNN only has one hidden layer to find the best performance.This has an impact on the relatively short duration of the iteration time.Therefore, the accuracy of determining the architecture also affects the performance of the duration of the iteration time.

A
R T I C L E I N F O A B S T R A C T Two models of Artificial Neural Network (ANN) algorithm have been developed for monthly rainfall prediction, namely the Backpropagation Neural Network (BPNN) and Radial Basis Function Neural Network (RBFNN).A total data of 238 months

Fig. 4 .
Fig. 4. The output from the training (left) and the testing (right) processes for various types of the BPNN and RBFNN algorithms

Fig. 5 .
Fig. 5.The regression results between the observed monthly rainfall and the output from training process for the RBFNN with eg=0.001(upper left panel), the BPNN with lr=0.05 (upper right panel), the BPNN with lr=0.1 (lower left panel), and the BPNN with lr=0.3 (lower right panel)

Fig. 6 .
Fig. 6.The regression results between the observed monthly rainfall and the output from testing process for the RBFNN with eg=0.001(upper left panel), the BPNN with lr=0.05 (upper right panel), the BPNN with lr=0.1 (lower left panel), and the BPNN with lr=0.3 (lower right panel)

Table 2 .
Rainfall data after normalization Calculate all the weight changes of the line-weight change leading to the output unit

Table 3 .
The output error results of BPNN and RBFNN