A new family of kernels from the beta polynomial kernels with applications in density estimation

ABSTRACT


Introduction
The probability density estimation concept in statistics has far-reaching effects in other fields of studies because most actions can be expressed in numerical form, which must be analyzed to avoid misleading information. Density estimation, which is the foundation of data analysis, involves constructing a probability density estimate from given observations. Density estimation is a fundamental concept in statistics primarily for data smoothing: analysis and virtualizations of observations [1] [2]. Data smoothing techniques usually consider findings such that inferences and conclusions regarding the observations regarding a particular estimation method can be made [3] [4]. Generally, density estimation A R T I C L E I N F O A B S T R A C T One of the fundamental data analytics tools in statistical estimation is the non-parametric kernel method that involves probability estimates production. The method uses the observations to obtain useful statistical information to aid the practicing statistician in decision making and further statistical investigations. The kernel techniques primarily examine essential characteristics in a data set, and this research aims to introduce new kernel functions that can easily detect inherent properties in any given observations. However, accurate application of kernel estimator as data analytics apparatus requires the kernel function and smoothing parameter that regulates the level of smoothness applied to the estimates. A plethora of kernel functions of different families and smoothing parameter selectors exist in the literature, but no one method is universally acceptable in all situations. Hence, more kernel functions with smoothing parameter selectors have been propounded customarily in density estimation. This article proposes a distinct kernel family from the beta polynomial kernel family using the exponential progression in its derivation. The newly proposed kernel family was evaluated with simulated and life data. The outcomes clearly indicated that this kernel family could compete favorably well with other kernel families in density estimation. A further comparison of numerical results of the new family and the existing beta family revealed that the new family outperformed the classical beta kernel family with simulation and real data examples with the aid of asymptotic mean integrated squared error (AMISE) as criterion function. The information obtained from the data analysis of this research could be used for decision making in an organization, especially when human and material resources are to be considered. In addition, Kernel functions are vital tools for data analysis and data visualization; hence the newly proposed functions are vital exploratory tools.
is viewed from two main perspectives; the parametric and non-parametric perspectives, but the semiparametric approach combines the two known approaches. Parametric density estimation assumes the observations to be estimated are from a known family and the only information required are the parameters of such distribution. The estimated parameters with the family of the distribution will produce a parametric estimator, and one of such estimators is the maximum likelihood estimators. The non-parametric estimation does not require prior knowledge of the observations' distribution, but they are subjected to "self-explanation" using some known statistical tools. Unlike the parametric approach that is with fixed structure, the non-parametric density estimation are flexible, however, their flexibility has resulted in high computational cost, which restricted their widespread applications. High computational costs are mainly encountered in analysing a large volume of data, especially with complex statistical models [5].
The non-parametric data estimation techniques as analytics tools will be employed in this paper. Non-parametric estimation techniques are of numerous uses and are gaining popularity in data analysis, particularly in statistics and other related fields of studies, because accurate information about data such as historical data is not readily available. The statistical information obtained using non-parametric estimation helps provide a complete understanding of the observations' underlying properties and features. There are varieties of non-parametric methods in density estimation, but one of the most widely employed non-parametric estimators is the kernel estimator [6]. The kernel estimators are nonparametric techniques in density estimation for data smoothing using the kernel function and a regulating factor known as the smoothing parameter. The kernel estimators are popular in density estimation than other non-parametric estimators due to the simplicity of its implementation and presentation of results using graphical approaches. In semiparametric estimation, the kernel estimators are regarded as the bedrock in their estimation process; hence, kernel estimators are known as the building blocks of semiparametric density estimation [7].
The kernel estimators are elegant density estimation tools for exploring and visualizing observations, which are often presented graphically. As a result of kernel estimators' statistical importance in data analysis and visualization, the kernel estimators have been the most studied estimators amongst the various non-parametric estimators [6] [8]. The kernel method directly explores and virtualizes data with indirect applications in classification and other estimation processes. Some recent kernel estimation applications are in progressive censoring, which is fundamental in industry-related research for estimation of hazard rate [9][10]. One of the major advantages of the kernel estimators over other nonparametric estimators is their flexibility in modeling observations, and kernels are not affected by bias specification [11]. In machine learning, kernel estimators are also fundamental and have been studied extensively. The application of kernel estimators in supervised learning has improved learning knowledge, especially in nonlinear hypothesis testing. The knowledge of kernel density estimation has also be extended to deep learning and with different applications. The kernel estimator application to spectral distribution was recently investigated through implicit kernel learning models using deep neural networks in which the training and inferences were illustrated with Fourier features through random sampling [12].
Non-parametric estimators, mostly the kernel estimators, are of excellent suitability for unsupervised learning. Kernel estimators have addressed decentralized classification and clustering analysis problems and also extended to the distributed estimation method, which is the bedrock for numerous distributed systems [13] [14]. The knowledge of the underlying model in distributed data is fundamental when studying distributed systems. A proposed method is known as the "gossip-based distributed kernel density estimation technique" was introduced by Li et al. [15], and its convergence properties also analyzed with the results revealing accurate underlying density distribution of the distributed observations.
In statistics and machine learning, kernel estimators are vital and versatile tools in the estimation of observations. Despite the modern methods of data estimation and numerous kernel functions in literature, new kernel functions are still introduced due to the great influence of kernel function when evaluating its performance empirically [16]- [20]. This paper introduces a new kernel family of the beta polynomial kernel family, and the results of both families revealed that the modified kernel functions outperformed the current beta polynomial family with AMISE as a performance measure.

The kernel density estimator
Rosenblatt [21] and Parzen [22] initiated the data analysis method mainly for exploratory and visualization of observations. The kernel estimator is popular amongst non-parametric density functions due to its simplicity and computational advantages. There are two basic fundamental concepts in density estimation using the kernel estimator, and the abstractions are the kernel function and the smoothening factor, also called the smoothness coefficient. As a standardized weighting function, its one-dimensional form is (1) with (•) being kernel function, represent the sample size, ℎ > 0 is bandwidth (smoothing parameter), x is the range of the observations and is a set of observations. The kernel function (•) is usually symmetrical and unimodal with the axioms in Equation (2). The Equation shows three conditions: unity integrant in every probability density function, the kernel functions average is zero, and the variance of any kernel is not equal to zero [2] [23]. (2)

The beta polynomial kernel function
The beta polynomial kernel family is amongst the popular classes of estimators in data estimation. This class of estimator is given as where = 0, 1, 2, … , ∞ is called the polynomial power while the variable assumes value in the interval −1 ≤ ≤ 1. The beta polynomial kernel functions are probability density functions since all kernel functions are probability density function and are usually supported within an interval [4] [24]. The range of evaluating this class of kernel functions is [−1, 1]. Different values of will produce different kernel functions, but as tends to infinity, the resulting kernel is the popular Gaussian kernel function whose support is unbounded and is not strictly a member of this family of kernels [25] [26]. The uniform kernel is the primordial of the class, that is when = 0 whereas for = 1 will produce the Epanechnikov kernel known to be the optimal kernel with regards to the AMISE. The Epanechnikov kernel, which is regarded as the optimal kernel, is expressed as Again, when the value of goes to infinity, the kernel obtained is the popular Normal kernel or Gaussian kernel given by The Epanechnikov, Biweight, Triweight, and Quadriweight kernel functions, which are the first four beta polynomial kernels excluding the Uniform kernel, have broad statistical applications. The efficiency of other kernel functions of this family is computed using the Epanechnikov kernel due to its optimality property regarding the AMISE.

Evaluation of kernel performance
In non-parametric density estimation, there is usually an error criterion function that measures its productivity. There is numerous performance evaluation in kernel estimation, but efforts will be targeted at the mean integrated squared error (MISE). Other performance evaluation measures such as the integrated absolute error, hellinger distance, the likelihood criterion function, and kullback-liebler distance exist. However, the mean integrated squared error has gained popularity over other measures due to its inclusion of dimensionality in the expression while other measures are dimensionless. There is the exact and approximate MISE obtained either by convolution or Taylors' series approximation. The asymptotic MISE has the variance and the bias components which is with ( ) representing the roughness of the kernel, 2 ( ) 2 is the variance of kernel while ( ″ ) = ∫ ″ (x) 2 x is the roughness of unknown probability distribution. The size of the smoothing parameter usually regulates the contributions of the two components to the AMISE. The bias can be reduced while the variance increases and vice versa due to the variation in the size of the smoothing parameter [4] [27]. The smoothing parameter with the minimum AMISE called the optimal smoothing parameter is the solution to the differential equation The solution of Equation (10) will give the smoothing parameter with the minimum AMISE value known as the optimal smoothing parameter given as The smoothing parameter with the minimum AMISE value is of order ( −1 ( +4) ⁄ ) while the AMISE is of order ( −4 ( +4) ⁄ ) with representing the dimension of the kernel. In probability distributions that are unimodal and slightly skewed, the unanimity in terms of the asymptotic mean integrated squared error is more evident [28].

The proposed beta polynomial kernel functions
The fundamental concepts in kernel density estimation are kernel function and the smoothness factor or bandwidth. Some research has been geared towards these two concepts, but there is no universally accepted method in all situations; hence new methods are usually initiated [29]. The proposed kernel functions from the polynomial beta family use an exponential progression, where there is a constant common ratio to all the polynomial functions. The ratio of two consecutive terms in an exponential progression is always a constant, and the constant is called the common ratio.
Recall the general form of the beta polynomial kernels given in Equation (3) and the terms of the polynomial functions for = 1, 2, 3 and 4, which are Epanechnikov, Biweight, Triweight, and Quadriweight kernels, respectively. Let the first term of the sequence and the common ratio be denoted where −1 ( ) is the ( − 1) ℎ term. From Equation (4), the first term of the sequence is The common ratio in Equation (12) can be obtained as Alternatively, the common ratio can be expressed as The common ratios of Equations (14)(15)(16) can simply be presented as Hence; the generalised common ratio of two consecutive terms of Equation (17) is given as where = 1, 2, 3, … is the power of the polynomial kernel.

The term of the proposed beta polynomial kernel functions
Let the first term of the exponential progression be and the common ratio be , respectively. Given that ( ) is the ℎ term of the sequence then Therefore, the generalized ℎ kernel of the proposed beta polynomial kernels whose first term is Equation (13) and constant common ratio in Equation (18) Hence the generalized ℎ kernels from Equation (19) of the proposed kernels can be written as As in the classical beta polynomial kernels, when = 1 the resulting kernel from Equation (20) is the Epanechnikov kernel in Equation (4), which is the optimum kernel of this family. However, when = 2, 3, 4, … , the resulting new kernels of Biweight, Triweight, and Quadriweight kernel functions of the proposed kernels are as follows 2 ( ) = The difference between the new kernel family and the classical polynomial functions is the value of the normalization constant while the two families' powers are the same. A change in the normalization constant produced corresponding changes in the value of the AMISE as the performance measure. Choices of kernel functions are based on their achievements, and a method or kernel function is better than the other when it produces a smaller value of the AMISE [30].

Results and Discussion
Investigation of the new polynomial kernels' statistical properties will be considered for the first four members of the family. All graphical and data analysis were implemented with Mathematica version 12 software platforms. The AMISE is the performance measure of the proposed kernel family and the classical polynomial kernels. The results on comparison reveal that the new family outperformed the classical beta kernel functions. The investigation was done for the first three family members since the Epanechnikov kernel maintained its optimality regarding the AMISE in both kernel families. Sample sizes of 2500 and 5000 were employed to illustrate the new beta polynomial kernels' performance since large sample sizes are most beneficial in non-parametric density estimation. The simulated results from the different sample sizes are presented in Table 1. The results revealed that the proposed beta kernel functions have smaller AMISE values than the classical kernel functions that show their superiority over existing kernel functions [31]. Fig. 1 and Fig. 2 are the graphs of the classical beta polynomial kernels and the proposed beta kernel functions. The real data set examined comprises 272 observations of the old faithful data [32]. The observations revealed bimodal features with the kernel estimates of the classical beta kernels and proposed kernel functions, and it supports the assertion that the duration of eruption often exhibits a bimodal distribution. Fig. 3 and Fig. 4 are the kernel estimates of the classical beta polynomial kernels, and the proposed beta kernels with the bimodality have been evident in both kernel functions. Table 2 shows 241 Vol. 6, No. 3, November 2020, pp. 235-245 the performance of the various kernel functions using real data. The results also vividly show that the proposed kernel functions outperformed the classical beta kernel functions.     The graphs of the classical beta polynomial and the proposed beta polynomial kernels are similar because they have the same powers but different normalization constants. Again, as earlier stated, the Epanechnikov kernel is the same for both kernels, which implies that its optimality position in the second-order kernel is maintained. However, in performance, as seen in Table 1, the proposed kernels did better than the classical implying that their kernel estimates will compete favorably well with the estimates of the classical kernels with real data examples. The value of the AMISE for the sample size of 5000 in Table 1 is smaller than the sample size of 2500, and this simply implies that larger sample sizes are more beneficial in the non-parametric estimation, particularly in kernel density estimation. It should be clearly pointed out that our comparison of performance started with the Biweight kernel since the Epanechnikov kernel is the same for the proposed and the classical kernels. However, the Epanechnikov kernel is included in the graphs and Tables in the analysis with simulated and real data.  The kernel estimates of the proposed beta polynomial with the old faithful data are similar to the classical beta polynomial kernel estimates, as seen in Fig. 3 and Fig. 4, respectively. In terms of retaining the data's inherent and essential features, the proposed kernel functions retained the data's bimodal feature just as the classical kernels. However, in performance evaluation using the AMISE criterion, the proposed beta polynomial kernels outperformed their classical counterpart because they produce a minimum value of the AMISE in all the cases considered, whether in simulations or real data applications.

Conclusion
This paper propounds new polynomial kernels from the classical beta polynomial family using the exponential progression approach. On evaluating the performance of the proposed beta polynomial kernels with the AMISE, the outcomes show that the proposed kernels performed better than the classical beta polynomial kernels. In terms of visualization of data, due to the huge information that kernel estimates provide in features highlighting for decision making, the proposed kernels compete favorably with the existing kernels to retain and preserve inherent statistical features in the observations examined. An extension of the proposed univariate kernel functions to the multi-dimensional kernel is an area of future research work since many kernel estimation applications are higher dimensional.