Circular(2)-linear regression analysis with iteration order manipulation

A model of distribution and statistical techniques for analyzing random variables in the form of cycles in nature is Circular statistics. Circular statistics are used on data whose measurement results are directions and are usually expressed in angle size. This technique has evolved in several fields of science where exploration, modeling and hypothesis testing of the direction and angle data play an important role.


I. Introduction
A model of distribution and statistical techniques for analyzing random variables in the form of cycles in nature is Circular statistics.Circular statistics are used on data whose measurement results are directions and are usually expressed in angle size.This technique has evolved in several fields of science where exploration, modeling and hypothesis testing of the direction and angle data play an important role.
The presentation of data in two-dimensional directions is not a single angle or unit vector because the angular value depends on the choice of starting point specified as the angle 00 and the direction of rotation.A mathematician considers the 600 direction measured from the west as the starting angle and the direction of rotation counterclockwise, but the direction of the same position is considered to have a direction 300 by a geologist measured from the north as the starting angle and rotates clockwise [1].
Circular data can be expressed in several ways.The usual way is related to two circular measuring instruments, ie compass and clock.The observed form is measured using a compass such as the direction of the wind and the direction of bird movement, including the data measured using a protractor.Forms of observation measured by hours may be time, eg arrival time (24 hours) of patient in emergency room at a hospital and number of incidents in one year or in monthly time [2].Brunsdon and Corcoran [3] use circular statistics to see the timing patterns of criminal acts in both daily and weekly times.Some distributions in circular statistics are uniform distribution, wrapped distribution, distribution of cardioids and distribution of von Mises.One of the most widely used distributions is the distribution of von Mises.Like the normal distribution of lines, the distribution of von Mises has an important role in descriptive statistics and the statistics of circular inferences.
The study of this research aims to examine the comparison between Linear Regression Analysis and Circular Regression Analysis.More specific this model will be bring to Circular Regression(2)-Linear Order 2. This paper will give new information about the possibilities to have a better prediction when manipulate the order of iteration.It works as the new approach for improving value of R 2 .

A. Data and Circular statistics
Circular data is the data of measuring result that the values always repeat periodically.The value will be found again after meeting a full period.The definition of circular variable itself is data in the first and the last scale which meets each other [4].The circular data is divided into two types, directioncircular data and time-circular data [4].
Circular statistics is a distribution model and statistical technique to analyze random variable of the cycle in the nature.Circular statistics is used on data which has direction-measurement output and is usually expressed in angular size.This technique has developed in some branches of science in which exploration, modeling, and hypothesis trial from directional data and angle have crucial role.
Von Mises distribution is the normal spread of circular with dispersion using (1).
The method used to evaluate Von Mises dispersion is QQ-plot by finding Z i using (2) then Z i is arranged based on the minimum grade to the maximum grade until Z1 ≤ ... ≤ Zn and after that make plot using (3).
If the data follows Von Mises distribution, plot will follow the straight line (0,0) in declivity 45 0 [5].Data could be easily analyzed if it is illustrated on a graph.According to [5], the representation of circular data on a graph is very important in the analysis of circular data.
To analyze circular data, two trigonometry functions used as the foundation are sinus and cousins.Both two functions are utilized to position the data.Those functions are used to harmonize 2 coordinate systems.Jammalamadaka and Sengupta [1] state the directional position could be determined by polar coordinate or Cartesian coordinate.In Cartesian coordinate, P point is stated as value (X,Y) or value (r,θ) on polar coordinate by which r is the distance of P point from the center point O. Polar coordinate can be converted to Cartesian coordinate by using trigonometry (4).The relation between cartesian coordinate and polar coordinate shown in Fig 3.
In circular analysis, the concerned thing is direction, not vector quantity.Consequently these vectors are changed to unit vector which is a vector that has length unit with r = 1.Every direction has a connection with P point in the circumference of a circle.Conversely, this point in the circle circumference could be named as an angle.If P point is situated in the circle circumference, the change of polar coordinate and Cartesian coordinate using (5).
The average direction of the circular sample data is obtained by calculating the vector resultant of unit vectors from each sample.The direction of vector resultant shows the average way of data sample, and the average length of resultant from each sample describes the concentration of data against the average direction.For example, there are samples  1 ,  2 , … ,   with n circular observation stated in angle.Known the transformation from polar coordinate to Cartesian coordinate for each observation using (6).
The result is the resultant vector from vector unit by summing up for each component using (7).
with R is calculated using (8), R stands for the length of resultant vector R that is calculated using ( 9) where R stands for the average length of resultant vector and also shows the concentration measure from data against average direction.The direction of vector resultant R is the direction of circular mean that is symbolized by  and defined using (10), For more explicit it is given inverse "quadrant-specific" from tangent using (11), If all dot angles show the same direction, so that data is concentrated and R is close with n.On the other hand, if data spreads in all circle, therefore it is not concentrated and R verges on 0 [1].On [4] defined the mode of circular sample is V = 1-R .As smaller the value of circular mode, as concentrated the data into a certain point.The value of V is on interval [0,1].

B. Simple Linear Regression (SLR)
Simple linear regression different with multiple linear regression, the difference is in number of explanatory variable.Data used in this model are scalar, mentioned as y as the dependent variable and X as explanatory variable.SLR develop a model between Y and X [6].SLR model have many practical uses.There are two broad categories which commonly used by data analyst: a.For prediction purposes, the X variables as input variables to the SLR.Y as the response variables usually need to be predict at the next period if time series data, and next object as cross-section data.The model will be good if the R 2 value bigger than other b.To know the strength of relationship/ influence from X to Y variables.The bigger SLR coefficient is representing the bigger influence of X too.
Fig. 1 illustrates the relationship between Y data and X data, it is show us that X and Y have positive correlation in the picture.At the picture there are three items we have to know, the first is the observations as data shown as red, it consist of X and Y data.Second is regression line, this line estimate which position is the best for represent the relationship between X and Y, the third is error.The error symbolized as the distance between red and regression line [7].Linear Regression can be defined as in (12) [6].
The attribute for equation ( 12) consist of Y i , X i , Y i ,  0 ,  1 and  i .Y i is dependent variable for object i, X i Independent variable (predictor) variable for object i,   0 1  X i represent the linear relation between Yi and Xi ,  0 and is a mean of Y when X=0 (Y-intercept),  1 for the slope in mean of Y when X increases by 1 measurement and  i are random error terms.

C. Circular Regression
The regression formula for circular data is divided into three options [8], they are : 1. Circular regression-linear: the regression analysis with independent variable is circular variable and dependent variable is linear variable.2. Linear regression-circular: the regression analysis with independent variable is linear variable and dependent variable is circular variable 3. Circular regression-circular: the regression analysis with both independent and dependent variable are circular variable.Circular regression model(2)-Linear between linear variable Y and 2 independent variables circular α can be written using (13) [9], [10].
The application for circular model are many, one of the proof is studied by Linder and Williander [11].thestudy show about examination of causes for reluctance.They assume on a hypothesis-testing framework of business model innovation, and show the significant roles of circular business models which imply significant challenges to proactive uncertainty reduction for the entrepreneur.Moreover, the study show that many product-service system variants that facilitate return flow control in circular business models further aggravate the potential negative effects of failed uncertainty reduction because of increased capital commitments.The other study is about circular model applied in nonparametric data, studied by Di Marzio et al. [12].Guerrero and Solar [13] applied circular data with Gaussian process.The special research did by Kim and Sengupta [10] about circular model with inversed approach and the another research came from Peiris and Kim [14] which restricting inference of Circular -Linear and Linear -Circular Regression Model.

D. Regression Coefficient Assesment
The regression coefficient If The regression model of Circular(2)-Linear can be written in matrix form as (16).where Y is the observation vector in size (nx1), Z is matrix in size (nx(1+4m)), β is regression coefficient vector in size ((1+4m)x1), and ε = error random vector in size (nx1).
Then, it needs to search the smallest quadrate assessment vector  ˆthat can minimize the function of error quadrate L using (19).

E. The reduction of Error Sum of Squares (SSE)
The important thing to define the order in polynomial regression is by reducing SSE when m is increased.The decision is taken on the degree of trigonometry polynomial (m+1) by adding columns.To determine whether or not we take degree (m+1), firstly we should calculate the reduction of SSE using (21).If reduction is obviously great in number, we decide to put degree (m+1) in [8].

III. Methods
The data used in this research are simulation data and secondary data.Independent variable γ and δ simulation data is obtained by using rvm (60,0,1) in software R.3.0.1 [1].
The procedures needed to reach the purpose of this research are : First step: creating a descriptive analysis about circular statistics for each variable γ and δ.Graphical representation of circular data for each variable γ and δ by using transmit diagram and rose diagram  Compatibility graph of Von Mises distribution  The average of circular and linear direction for each γ variable and δ variable  The vector length of circular average on each γ variable and δ variable using  The data mode on circular statistics and linear statistics for each γ variable and δ variable Second step: multiple linear regression analysis and circular regression(2)-linear for γ variable and δ variable as independent variable against Y linear variable as dependent variable.

IV. Result and Discussion
A. Descriptive statistics simulation data γ circular variable and δ circular variable It was shown on the average way of γ in table 1 with circular statistics about 350,73 0 .While for the average way of γ with linear statistics was around 203,47 0 .On Table 1 it can also be seen the resultant length was about 30,7 and the average length of resultant was 0,51 that indicated a big concentration value of data to the average direction of γ circular variable.The mode value on statistics circular data was 0,49 which showed the small data dispersion.However, the mode mark on linear statistics was about 17738,73 which proved the big data dispersion.
The average direction of δ variable with circular statistics was 12,97 0 .In the meantime by using linear statistics, the average way of δ was 174,07 0 .It was also drawn from δ variable that the resultant length was 28,75 and the average length of resultant was 0,48.It demonstrated the small concentration of data to the average direction on statistical circular δ variable.The mode value on circular statistics was 0,52 that indicated its small data distribution.On the other hand, the mode value on linear statistics was 17006,5 that showed the great data dispersion.The compatibility result of Von Mises distribution that was done with Von Mises Q-Q plot on α variable and δ variable can be seen in Fig. 2, as in Q-Q plot, data for γ variable and δ variable demonstrated the data dispersion was following the straight line (0,0) in declivity 45 o, consequently it can be said that data of γ variable and δ variable was coming after the normal circular distribution or von Mises.

C. Representative Graph simulation data circular variable (γ) and circular variable (δ)
The transmit and rose diagram in Fig. 3 and 4 illustrated that the red-straight line was the average direction of circular statistics from γ variable which was 350,73 0 meaning γ variable with circular statistics had inclination toward the north, and the black-dash line was the average way of linear statistics from γ variable which was 203,47 0 that meant γ variable with linear statistics had southward tendency.

D. Multiple linear regression and Circular regression(2)-Linear at simulation data to analyze the influence of γ circular variable and δ circular variable against Y variable
In Table 2, the value of determination coefficient on multiple linear regression was around 33,4% which it meant around 33,4% the variety of Y variable can be explained by γ and δ variable in a linear correlation, and the rest was influenced by other factors.However, the grade of determination coefficient on circular regression(2)-linear in Table 2 was about 95,1% for order 1 and 95,3% for order 2 meaning about more than 95,1% diversity of Y variable can be elucidated by γ and δ circular variable, and the rest was by other factors.From the result, we can see that circular regression(2)linear had much better output than multiple linear regression to know the influence of γ and δ circular variable against Y linear variable.P-value on multiple linear regression was 0,000, so with the error possibility α = 0,1 that P-value (0,000) < α (0,1).It can be described that the model of multiple linear regression can be used significantly to see the influence of γ and δ variable to the average of Y variable with credence degree 90%.In circular regression(2)-linear with error degree α = 0,1, the P-value (0,000) < α (0,1).It means the model of circular regression(2)-linear order 1 and 2 was significantly used to know the influence of γ and δ circular variable to the average of Y linear variable with degree of credence 90%.
One of the ways to determine the best model is by using reduction method of SSE.If the value of SSE order 1 -SSE order 2 = 4,557-4,395 =0,162, it indicates that the decrease of SSE is very small so the model of circular regression(2)-linear order 1 is better than order 2. Therefore, the best model used to see the influence of γ and δ circular variable toward Y variable on simulation data was i Y ˆ = 10,0 + 0,970 cos i  + 1,04 sin i  + 1,02 cos i  + 1,01 sin i  + ε.

V. Conclusion
Diagnosis data before doing the regression analysis is an early stage should be done to determine the appropriate type of regression.The type of data that is dimensionless direction (the direction of the wind, the direction of navigation, the direction of the clouds) and time (day, month, year, time) is a circular kind of data.Data were analyzed using a circular multiple linear regression produces less regression model, when compared with the regression model generated by the circular regression (2) -linear.

Fig. 1 .
Fig. 1.Illustration of how error in linear regression

Fig. 2 .
Fig. 2. Compatibility graph of Von Mises distribution simulation data (a) Q-Q plot graph in γ variable (b) Q-Q plot graph in δ variable

Fig. 4 .
Fig. 4. The transmit diagram δ variable Fig 4 on transmit diagram and rose diagram were seen the red-straight line was the average direction of circular statistics from δ variable about 12,970 which meant δ variable with circular statistics had northward inclination and the black-dash line was the average way of linear statistics from δ variable about 174,070 meaning δ variable by using linear statistics had southward inclination.It indicates the calculating difference about the average direction between circular statistics on the data distribution and linear statistics sheering away from the data distribution.

Fig. 5
Fig. 5 illustrates the graph of Y prediction at multiple linear regression is less close on the real value of Y so creates a high error.Fig 5 shows the graph of Y prediction on circular regression(2)linear is very close with the real Y grade.In conclusion, the circular regression(2)-linear possess better result than multiple linear regression on simulation data.

Table 1 .
descriptive statisctis of simulation data from γ circular variable and δ circular variable

Table 2 .
Multiple Linear Regression and Circular Regression(2)-linear on simulation data to see the influence of γ and β circular variable to Y linear variable.