You are on page 1of 6

Early stage software effort estimation using random forest

technique based on use case points

BY-KARTIK JADHAV(D12B-36)
PRANAV GANORKAR (D12B-25)

1.Introduction

Object-oriented technology is the accepted methodology for software


development in major industries. Various features are offered by object
oriented programming concepts, which play an important role during
software development process. With the increase in the complexities
associated with modern day software projects, the need for early and
accurate effort estimation in the software development phase has
become crucial. Currently used effort estimation techniques such as
function point analysis (FPA) and constructive cost estimation model
(COCOMO), fail to consistently estimate the cost and effort required to
develop the software.

Random Forest Technique(RFT)

Due to the increasing complexity of software development activities, the need for
effective effort estimation techniques has arisen. Underestimation leads to disruption
in the project's estimated cost and delivery. On the other hand, overestimation
causes outbidding and financial losses in business. Effective software effort
estimation techniques enable project managers to schedule the software life cycle
activities appropriately. Correctly assessing the effort needed to develop a software
product is a major concern in software industries. Random forest (RF) technique is a
popularly used machine learning technique that helps in improving the predictions.
PROS AND CONS

The use case point (UCP) relies on the use case diagram for effort estimation of
a given software product. UCP helps in providing more accurate effort
estimation from the design phase itself. The total number of UCPs is measured
by ascertaining the total number of use cases and actors and then multiplying
each of them with their corresponding complexity factors. Each use case and
actor are classified into one of the three classes such as simple, average and
complex. The number of transactions per use case helps in determining its
complexity value.

PROPOSED SYSTEM

Datasets such as Albrecht, COCOMO, Desharnais and NASA which are shared
publicly are not applicable for the proposed work, because the size metric used
in these datasets is either source lines of code or function points.Proposed
steps for software effort estimation are (i) Collection of software size,
productivity, complexity and actual effort values (ii) Normalisation of dataset
(iii) Selection of arbitrary random vector (iv) Division of dataset (v) Selection of
final random vector (vi) Performance evaluation

RF TECHNIQUE:
VARIABLE IMPORTANCE:The variable importance defines the contribution of a
variable in achieving accurate prediction. It is calculated by taking into
consideration its interaction with other variables. The error rate for each tree
T, is calculated using the out-of-bag (OOB) data. Then, the permutation result
of the OOB values is calculated for each variable v and the error is again
calculated again using each tree. If the number of variables for implementing
RF technique is very large, forests can be run once with all the variables.

2.LITREATURE SURVEY

The generated results are hence understood with the results obtained from
the MLP, RBFN, SGB and LLR techniques. After analysing the results, it is found
out that the effort estimation model developed using RF technique provides
less value of MMRE and higher values of PRED. Consequently, it can be
inferred that the effort estimation model developed on the considered dataset
using RF technique will give more exact results than the MLP, RBFN, SGB and
LLR techniques-based effort estimation models.
RF TECHNIQUE BASED COST ESTIMATION MODEL

Fig. 6 describes the outlier value generated using RF technique for 120 number
of training cases. The outlier value is dependent on the proximity value
generated using RF technique, which means that the outlier value is higher for
lower proximity value and vice versa. Fig. 6 displays the deviation of outlier
value from the mean outlier. The training cases for which the outlier value is
higher, will generate the predicted effort value deviated more from actual
effort value. This deviation is clearly visible from Figs. 7a and b. Figs. 7a and b
display the final effort estimation model obtained using RF technique.
3.ANALYSIS

The SGB creates a tree ensemble, and also uses randomisation during the
creations of the trees. The prediction accuracy is calculated by feeding the
result obtained from one tree to the next tree in the series. However, RF builds
trees in parallel and also uses voting method on the prediction.While utilising
the MMRE, and PRED performance evaluation metrics, lower value of MMRE
and higher value of PRED signifies better result.

RF TECHNIQUE ANALYSIS

A sample dataset of ten projects has already been provided in Table 5. These
data are going to be used in this section for demonstration purpose of
proposed steps for software effort estimation using the RF model. The first
column represents the software size, which is calculated in terms of the
number of UCPs required to complete the project. The second column
represents the team productivity. EFactors provided in Table 4 contribute to
the calculation of team efficiency and productivity. The third column
represents the project complexity. TFactors provided in Table 3 contribute to
the calculation of complexity of the project.
4.Conclusion:

Several methodologies have been proposed by researchers and practitioners


for software effort estimation purpose. However, the UCP is one of the effort
estimation models which is used because of its simplicity, fastness and
accurateness to a certain degree. In this paper, the proposed model has been
implemented using the RF technique. The RF technique is an ensemble
learning method for regression, which combines the results from different
models of similar type or different and gives result which is usually better than
the result from other individual models. RF also builds trees in parallel and
uses voting method on the prediction. Hence, the use of UCP approach
implemented using the RF technique makes advances in making better
software effort estimation.

You might also like