Analogy-based model for software project effort estimation

In term of software development, the first step is likely started from estimating the project effort. A project manager must be careful in considering the three main factors of a project: system functionality, duration and project costs [1]. Project estimation is essential for software development project able to run on time and budget with maximum quality. In 2015, Standish Group released a survey result that 52.7% of software projects were always late from initial estimation and its costs also exceeded the budget.


Introduction
In term of software development, the first step is likely started from estimating the project effort.A project manager must be careful in considering the three main factors of a project: system functionality, duration and project costs [1].Project estimation is essential for software development project able to run on time and budget with maximum quality.In 2015, Standish Group released a survey result that 52.7% of software projects were always late from initial estimation and its costs also exceeded the budget.
Software effort estimation has been researched and developed both in algorithmic and machine learning method since the 1960s.Estimation based on expert judgement is one of the earliest and most widely used methods.Expert judgement is a process of estimating the software that results from an assessment process conducted by experts who are experienced in software projects.One of the wellknown estimation techniques is Planning Poker which is often used in Agile software development methodologies [2].There is also a Function Point, an estimation method proposed by [3] using the function points as a unit of size of the software to be developed.COCOMO or Constructive Cost Model is one of the most popular algorithmic method [4].COCOMO I classified three project classes of Organic, Semidetached and Embedded [5].The Use Case Point proposed by [6] estimates the effort of the software with several effort drivers including UCW, UUCW, ECF, TCF.The UCP itself is derived from the Function Points method using 20 or 28 productivity factors.Moreover, there is a regression analysis introduced by [7] and [8] which analyzes the relationship between two or more independent and dependent variables.Bayesian Belief Network (BNN) is a method with a causal-relationship approach described as directed acyclic graph.Nodes symbol represent discrete variables or random continuum, and edges represent a probabilistic dependence between the connected variables [9]- [10].In addition, other approach have been widely used such as Artificial Neural Networks [11]- [12].
To summarize, there are two types of approaches for estimating software development effort: algorithmic and machine learning approaches.The machine learning method used to estimate software effort include KNN, Support Vector Machine (SVM) [13], Decision Tree [14], Analogy, Deep Learning [15], Ensemble [16] and Neural Network (NN) [17].One important issue of software project estimation is related to estimation accuracy.Accurate estimation is necessary for the project can be completed on time and on budget.Unfortunately no estimating approach has proved consistently accurate.This is a major challenging problem should be solved in effort estimation field.
Among those various methods of estimating software effort, Analogy is the most commonly used method.Analogy compares the effort driver of a new software project to the previous project data to find the most similar project.This is possible because Analogy is able to learn from previous experiences autonomously.Analogy evaluation results have shown the highest accuracy compared to the other machine learning and non-machine learning methods where the average mean magnitude of relative error (MMRE) is 49.8%, the median magnitude of relative error (MdMRE) of 29.37% and Pred(25) of 51.23% [18].
Data sets play a vital role in the implementation of Analogy for software project estimation.There are currently a number of data sets of software projects publicly available including COCOMO81 [19], Miyazaki, Albrecht, China, ISBSG [20], Desnarhais, NASA, Maxwell [21], Kemerer [22], Finnish, Cosmic [23], Kitchenham, UCP, Telecom, Atkinson and Tukutuku.This paper aims to investigate the accuracy of the Analogy method on data sets of software projects.This research has one major contribution namely the implementation of Analogy framework to gain the consistent accuracy result in software project effort prediction.

Analogy-based Estimation
The essence of Analogy-based method is to compare the projects that will be estimated with all the software project's historical data.Project data can come from primary data that is internal to the company or widely available public data.Comparison is done to find out which projects are the most similar that will be estimated.Similar projects will be selected to be adapted so that the estimated effort of the new project can be identified.Fig. 1 shows the Analogy-based estimation model framework.

Similarity Measure
Similarity measures calculates the similarity between projects based on how close the distance between projects according to the type of each attribute.The measurement techniques used in this experiment are Euclidean, Manhattan and Minkowski which are proven to produce good results according to similarity measurements [24]- [25].
Euclidean distance is measuring the distance D between two software projects notated in equation ( 1) and (2).Where p is a new project that will be estimated and p' is the older project that has been completed.The   and ′  show i-th attribute/feature value of a project,   = {0, 1} is the weight of the i-th attribute.

 
Minkowski distance is a generalization of Euclidean and Manhattan distance that calculates the rank of each attribute pair as denoted by equation ( 5) and (6).

Number of Selected Analogy
The selected analogy is determined by how many most similar projects used as analogues to model software project effort estimation.There are two types of analogy selection: fixed and dynamics analogy selection.Some studies that adopt fixed analogy selection suggest using one closest analogy (K = 1), two closest analogy K = {1, 2}, three closest analogy K = {1, 2, 3}, four closest analogy K = {1, 2, 3, 4} or five closest analogy K = {1, 2, 3, 4, 5}.In this study, the fixed analogy selection category was chosen by applying all of the combinations,

Analogy Adaptation
After determining the number of selected analogy, the next step is predicting the effort of the new project by calculating certain statistical techniques based on the selected project.There are four analogy adaptations applied here: closest analogy (CA), mean of closest analogies, median of closest analogy, and inverse rank weighted mean (IRWA) of closest analogy.
Closest analogy means choosing one (K = 1) from the closest project.Mean of closest analogies is adaptation analogy obtained by calculating the average effort driver from as many as K > 1 selected analogy.The median of closest analogy is an adaptation analogy obtained by calculating the median effort driver from as many as K > 2 selected analogies.Inverse rank weighted mean is an adaptation analogy that gives the highest weight in the selected analogy most similar to other analogy.For example, if four closest analogy are selected, the first closest analogy (CA) is given a weight of four, the second closest analogy (SC) is given a weight of three, the third closest analogy (TC) is given weight two and the fourth closest analogy (LA) are given a weight of one [26].The calculation of inverse rank weighted mean is formulated as in (7).

Adaptation Rules
Adaptation rules are the last step taken to calculate the amount of effort estimated on a new project according to the most similar selected project.The calculation is done by dividing the old project effort with the size of old project then multiply with the size of new project.Equation ( 8) denoted the formulations of these adaptation rules.

Model Evaluation
Three evaluation criteria used in this study are Mean Magnitude of Relative Error (MMRE), Median Magnitude of Relative Error (MdMRE) and Pred (25).These three criteria are most widely used to measure the accuracy of software project estimation models resulting from the Magnitude of Relative Error (MRE) measurement.MMRE is generated by calculating the average MRE of each project in the data set.MMRE is one of evaluation technique that is used to assess the efficiency of the effort to be estimated.
MRE is a statistical technique used to measure the accuracy of project estimates obtained from dividing the absolute value from   subtracted by ̂ with   , as denoted in equation (9).In equation ( 9), e shows the actual effort of the old project and ̂ is the estimated effort of the new project obtained using (8).10) is one of the accuracy measurements for software project estimation models that calculate the average of MRE.The accuracy of the estimation model is categorized as good if the MMRE is less than equal to 0.25.
MdMRE as denoted by equation ( 11) is an accuracy measurement of a software project estimation model that calculates the median of MRE.The accuracy of the estimation model is categorized as good if MdMRE is less than equal to 0.25.

𝑀𝑑𝑀𝑅𝐸 = 𝑚𝑒𝑑𝑖𝑎𝑛(𝑀𝑅𝐸)
  Pred( 25) is an aggregate of the percentage of MRE which is less than equal to 0.25, as denoted by equation (12).The accuracy of the estimation model is categorized as good if Pred (25) is more than equal to 0.75.

Data Set Description
The experiment uses Maxwell's data set consists of 62 banking software project data in Finland from 1985 to 1993 and has often been used in research related to software project estimation [27]- [29].There are 16 attributes owned by Maxwell's data set [30].In order to develop the estimation model, three attributes were chosen which had a major influence on the project, namely Duration, Size and Effort.Duration is a numeric type attribute that shows the duration of the project from the specification stage until it is sent to the client and measured in months.Size is a numeric type attribute that shows the size of a software project that is calculated by the unit function point (FP).Effort is a numeric type attribute that shows how long a software developer works on a project starting from the specification stage to being sent to the client and measured in hours.
Five from the total number of 62 project data have been removed from the data set since those are considered outliers due to the very large values.Those are projects with ID numbers 62, 38, 26, 21 and 18.This is done in accordance with the recommendations from [31] which state that outlier data need to be eliminated.Reference [24] had also once discarding data because it indicates an outlier with very little value.Thus, of the 62 data now, 57 data sets are left to be used for experiments.Descriptive statistics for effort driver size, duration attributes and the amount of effort of software development on the Effort attribute are shown in Table 1.The average project size is 478 function points, with a work duration of 5.6 months with an effort of 5910.2 hours.The smallest project size is 48 FP and the largest is 1849 FP.The fastest project duration is one month and a maximum of nine months with a standard deviation of 2.2 months.The least deployed efficiency is 583 hours and the largest is 25919 hours with a standard deviation of 4968.8 hours.

Results and Discussion
Data sets are randomly divided into training data and testing data, with a percentage of 87% and 13% respectively.This division differs from what was done by [30] which divided 50 training data from projects prior to year 1992 and 12 testing data from the project between year 1992 and 1993.Fig. 2 shows the framework for cross-validation process.

Fig. 2. Framework for cross-validation process
The results of the evaluation process use three-fold cross validation techniques to form a composition as shown in Table 2.

MMRE Results
The results of MMRE accuracy obtained by Manhattan distance had the lowest MMRE of 0.39 with K = 2 using the mean of closest analogies and K = 3 using IRWM, Euclidean distance with the lowest MMRE of 0.44 with K = 2 and K = 3 using IRWM, Minkowski distance with MMRE was 0.42 with K = 3 using IRWM.So that the MMRE with Manhattan distance has the best MMRE value compared to Euclidean and Minkowski distance.Table 3 shown the MMRE results from Manhattan, Euclidean and Minkowski distance.

Pred(25) Results
The results of Pred (25) obtained by Manhattan distance had the highest value of 0.48 with K = 2 using the mean of closest analogies and K = 4 using the median of closest analogies.The highest Pred(25) value using Euclidean distance of 0.43 with K = 2 using mean of closest analogies, K = 3 uses IRWM and the median of closest analogies, while Minkowski with highest Pred( 25) is 0.48 with K = 2 using mean of closest analogies as shown in Table 4. Thus can be seen that Manhattan has consistent accuracy because there are two models that have the highest value when using the mean and the median of closest analogies compared to Euclidean and Minkowski distance.Though Minkowski has the same value as Manhattan, it only happens on the mean of closest analogies model.

MdMRE Results
The evaluation results of MdMRE accuracy obtained by Manhattan distance had the lowest MdMRE of 0.26 with K = 3 using the mean of closest analogies, Euclidean distance with the lowest MdMRE of 0.31 with K = 3 using the median of closest analogies, Minkowski with the lowest MdMRE of 0.30 with K = 2 using IRWM.These scores show Manhattan has the best MdMRE accuracy compared to Euclidean and Minkowski as shown by Table 5.

Absolute Residual Results
Good estimation accuracy is directly related to how well the absolute residual (AR) value is.Absolute residual is the absolute difference between actual and estimated effort.The smaller the absolute residual value shows the actual estimated value is the same, which means a good sign.Table 6 shows the absolute residual score from Euclidean, Manhattan and Minkowski distance using three-fold cross validation.Set 1 for Euclidean distance shows the highest AR of 16589 man-hours with an actual effort of 8710 man-hours, indicates a wide enough difference between actual effort and estimation.MRE of this project is 1.90, means there is an error of 190% in the estimation relative to actual effort.Set 2 shows the largest AR value is 10725 man-hours with an actual effort of 5931 man-hours which indicates a wide enough difference between the actual effort and the estimate.Project's MRE is 1.81 which means there is a 181% error in the estimation relative to the actual effort.The largest AR value is 3430 man-hours with an actual effort 11023 man-hours which indicates a very slight difference between actual effort and effort estimation.MRE of this project is 0.31 which means there is an error in the estimation effort of 31% relative to the actual effort.Means that the model in set 3 is the best estimation model with Euclidean distance parameters.
Set 1 for Manhattan distance shows the highest AR of 9961.1 man-hours with an actual effort of 8710 man-hours which indicates a very large difference between actual effort and estimation.MRE project is 1.1, means there is an estimated error of 110% relative to actual effort.Set 2 shows the largest AR value is 16228.5 man-hours with an actual effort of 5931 man-hours which indicates a slight difference between actual effort and estimation.MRE is 2.7, means there is a 270% error in the estimation relative to the actual effort.Set 3 shows the largest AR value is 4995.9man-hours with an actual effort of 15052 man-hours which indicates a very slight difference between actual effort and effort estimation.MRE of this project is 0.3, means there is an estimated error of 30% relative to the actual effort.Model in set 3 is the best estimation model in Manhattan distance parameters.
Set 1 for Minkowski distance shows the highest AR of 15883.2man-hours with an actual effort of 8710 man-hours which indicates a wide enough difference between actual effort and effort estimation.MRE of this project is 1.14 which means there is an error of 114% in the estimation relative to the actual effort.Set 2 shows the largest AR value is 4671.2man-hours with an actual effort of 5931 manhours which indicates a very large difference between the actual effort and the estimation.MRE of this project is 2.74 which means there is an estimated error of 274% relative to the actual effort.Set 3 shows the largest AR value is 5866.64 man-hours with an actual effort of 25910 man-hours which indicates a very small difference between actual effort and effort estimation.MRE of this project is 0.18 which means there is an estimated error of 18% relative to the actual effort.The model in set 3 is concluded as the best estimation model on the Minkowski distance parameter.

Model Comparison
The last stage is comparing the accuracy between models using Manhattan distance parameters with the research conducted by Idri [7].As shown in Fig. 3, the accuracy of MMRE, MdMRE and Pred (25) at Manhattan distance are 50%, 28% and 48% respectively.While Idri has an accuracy of 49.9% for MMRE, 29.37% for MdMRE and 51.23% for Pred (25).Based on these comparisons, MMRE and MdMRE and Pred (25) have a very slight difference in accuracy.On the other hand, it also can be concluded that Manhattan and Idri have almost similar results of accuracy.

Conclusion
Analogy-based estimation requires past project history data as an analogous.Accuracy of effort estimation is very dependent on the similarity of the project history data.In addition to rigorous data, other problems that fluctuate the accuracy of the overall analogy are the number of selected analogies, distance measurements, and solution adaptation.The similarity of project to be estimated is the key to improving the accuracy of the Analogy-based estimation.This paper proposes Analogy as an estimation model of the effort of software by adjusting three distance measurements, namely Euclidean, Manhattan and Minkowski distance.The results of the evaluation of the accuracy of all three have been described in this article.The best results are obtained with Manhattan distance with a 50% MMRE, 28% MdMRE and Pred(25) at 48%.These results are not as far off as observed by [18] that the mean accuracy of the analogy method is MMRE 49.9%, MdMRE 29.37% and Pred(25) 51.23%.
A R T I C L E I N F O A B S T R A C T

Table 2 .
Three-fold Cross Validation Technique

Table 3 .
Mean Magnitude of Relative Error Results

Table 6 .
Absolute Residual results