You are on page 1of 6

Applying Adaptive Software Development (ASD) Agile Modeling on

Predictive Data Mining Applications: ASD-DM Methodology

Mouhib Alnoukari Zaidoun Alzoabi Saiid Hanna


Arab International Arab Academy for Banking Arab International
University and Financial Sciences. University
Damascus, Syria Damascus, Syria Damascus, Syria
m-noukari@aeu.ac.sy z-zoabi@aeu.ac.sy s-hanna@aeu.ac.s

Abstract mining techniques used in SE? are all important


questions that a lot of researches were trying to find,
As the world becomes increasingly dynamic, the have relevant responses.
traditional static modeling may not be able to deal On the other hand, few researchers were trying to
with it. One solution is to use agile modeling that is find the SE processes modeling that fit better with data
characterized with flexibility and adaptability. On the mining applications.
other hand, data mining applications require greater In this paper, we will focus on using agile modeling
diversity of technology, business skills, and knowledge for predictive data mining applications, focusing on
than the typical applications, which means it may ASD (Adaptive Software Development) modeling,
benefit a lot from features of agile software which replaced the static Plan-Design-Build lifecycle,
development. In this paper, we will propose a with Speculate-Collaborate-Learn lifecycle. The main
framework named ASD-DM based on Adaptive characteristics of ASD lifecycle are the continuous
Software Development (ASD) that can easily adapt learning, intense collaboration among developers,
with predictive data mining applications. A case study testers, and customers, and it can easily adapt with
in automotive manufacturing domain was explained uncertain future [17].
and experimented to evaluate ASD-DM methodology. We will start by viewing the characteristics of data
mining applications, and the most widely methodology
1. Introduction used for process modeling for data mining applications
(CRISP-DM methodology), then we will present the
Data mining is the search for relationships and characteristics of agile modeling, and suggest a new
distinct patterns that exist in datasets but are “hidden" framework named ASD-DM for data mining processes
among the vast amount of data. A data mining task using Adaptive Software Development (ASD) method.
involves determining useful patterns from collected The new framework was tested using a case study in
data or determining a model that fits best on the the automobile manufacturing domain.
collected data.
Although the idea of applying data mining 2. What Characterizes Data Mining
techniques on software engineering data has existed Applications?
since mid 1990s, only lately the idea has especially
attracted a large amount of interest within software Data mining applications are characterized by the
engineering [1]. Data mining techniques are applied to ability to deal with the explosion of business data and
analyze the problems raised during the life cycle of a accelerated market changes. These characteristics help
software project development [3, 7], also to determine providing powerful tools for decision makers. Such
if two software components are related or not [16]. tools can be used by business users (not only PhDs, or
They were also used for software maintenance [2,9], statisticians) for analyzing huge amount of data for
software testing [15], software reliability analysis [8, patterns and trends [19].
15], and software quality [6]. The most widely used methodology when applying
Many questions arise when trying to apply data data mining processes is named CRISP-DM1. It was
mining techniques on software engineering field. What one of the first attempts towards standardizing data
types of SE data are available to be mined?, which SE
tasks can be held using data mining?, how are data 1
 CRoss Industry Standard Process for Data Mining 
mining process modeling [18]. CRISP-DM has six Design-Build lifecycle, with the dynamic Speculate-
main phases, starting by business understanding that Collaborate-Learn life cycle. Speculation recognizes
can help in converting the knowledge about the project the uncertain nature of complex problems such as
objectives and requirements into a data mining predictive data mining, and encourages exploration
problem definition, followed by data understanding by and experimentation. Predictive data mining problems
performing different activities such as initial data require a huge volume of information to be collected,
collection, identifying data quality problems, and other analyzed, and applied; they also require advanced
preliminary activities that can help users be familiar knowledge, and greater business skills than typical
with the data. problems, which need “Collaboration” among different
The next and important step is data preparation by stakeholders, in order to improve their decision making
performing different activities to convert the initial raw ability. That decision making ability depends on
data into data that can be fed into modeling phase. This “Learning” component in order to test knowledge
phase includes tasks such as data cleansing and data raised by practices iteratively after each cycle, rather
transformation. Modeling is the core phase which can than waiting till the end of the project. Learning
use a number of algorithmic techniques (decision trees, organizations can adapt more easily with ASD life
rule learning, neural networks, linear/logistic cycle [17].
regression, association learning, instance- Hence the core of ASD is the premises were
based/nearest-neighbor learning, unsupervised outcomes are naturally unpredictable, therefore,
learning, and probabilistic learning, etc.) available for planning is a paradox. It is very difficult to
each data mining approach, with features that must be successfully plan in a fast moving and unpredictable
weighed against data characteristics and additional business environment, which is one of the main
business requirements. characteristics of predictive data mining application.
The final two modules focus on evaluation of This is one of the major points that we are using to
module results, and deployment of the models into create a data mining process framework based on ASD
production. Hence, users must decide on what and how methodology (figure1). We call our new methodology
they wish to disseminate/deploy results, and how they ASD-DM as it combines the characteristics of ASD
integrate data mining into their overall business methodology, with the prediction data mining solution
strategy [18, 19]. steps.
Speculation phase includes business and data
3. Applying ASD method on predictive understanding, and data preparations including ETL
data mining applications: ASD-DM (Extract/Transform/Load) operations. This phase is the
most important one as it takes considerable time and
Methodology resources. This preparation phase will end by creating
the enterprise data warehouse, and the required data
Software is intangible and more easily adapted than marts and cubes.
a physical product. Also, software processes depend on Collaboration phase ensures the high
how a firm competes, and may be more adaptable than communication in a diversity of experienced
manufacturing processes bound by machinery, raw stakeholders in order o use the best modeling
materials, and physical plants. algorithm for predicative data mining process. Testing
Technologies such as agile methods may make it and evaluation of such algorithms occur in the
less costly to customize and adapt development “Learning” phase, the results will be discussed among
processes. Agile modeling has many process centric the members of the project team, if the results are
software management methods, such as: Adaptive acceptable, a new release can be deployed in a form of
Software Development (ASD), Extreme Programming predictive scoring reports, otherwise a new
(XP), Lean Development, SCRUM, and Crystal Light collaboration phase will be used in order to chose
methods. better data mining algorithm.
Adaptive approaches are best fit when requirements The cyclic nature of the whole framework can
are uncertain or volatile; this can happen due to respond to the business dynamic changes, a new data
business dynamism, and rapid evolving markets. It’s sources can be added to the preparation phase, and the
difficult to practice traditional methodologies in such cycle will move again.
unstable evolving markets [11]. ASD modeling is one
of such adaptive approaches. It replaces the static Plan-
Speculation
Business understanding Collaboration
Data understanding Modeling
Data preparation/ETL

Predictive algorithm

Data Warehouse
Data Marts
Cubes
Aggregations
Learning
Implementation/Testing
Evaluation

New scoring
reports

Figure1: ASD-DM a predictive data mining process


framework based on ASD methodology.

4. A Data Mining Case Study in An enterprise data warehouse is built to hold web
Automotive Manufacturing Domain data, inventory data, car demand data and sales data to
better analyzing and predicting car sales, managing car
Automotive manufacturing are markets where the inventory and planning car production. Sales and
manufacturer does not interact with the consumer marketing managers are interested in better leveraging
directly, yet a fundamental understanding of the data in support of the enterprise goals and objectives.
market, the trends, the moods, and the changing Managers envision an analytic environment that will
consumer tastes and preferences are fundamental to improve their ability to support planning and inventory
competitive. management, incentives management, and ultimately
The information gathered in order to produce production planning, in addition to enable them to
automotive data mining solution are the following meet the expectations of their decision-making process
[22]: which is supported by appropriate data and trends.
Regardless of functional boundaries and type of
 Supply chain process (sales, inventory,
analysis needed, their requirements focus on
orders, production plan).
improving access to detailed data, more consistent and
 Manufacturing information (car
more integrated information.
configurations/packages/options codes and
Having a data warehouse that combines online and
description).
offline behavioral data for decision-making purposes is
 Marketing information (dealers, business
a strategic tool which business users can leverage to
centers… etc). improve sales demand forecasting, improve model/trim
 Customers’ trends information (websites web- level mix planning, adjust body model/trim level mix
activities). with inventory data, and reduce days on lot.
LR Overall SEB ZZ: Sales & Prediction
Train Test Eval
4000

3500

3000

2500
Value

2000

1500

1000

500

0
15/4 19/5 23/6 27/6 31/7 35/8 39/9 43/10 47/11 51/12 55/1 59/2 63/3 67/4 71/5 75/5 79/6 83/7 87/8 91/9 95/10 99/11

Week/Month

Prediction SALES
SET MAPE
Train 8.4 %
Test 18.6 %
Eval 15.5 %

Figure2: Prediction results using neural networks (Sebring


model family).

The main goal for such data mining solution was to without using separated weeks, Fig.3 shows the results
get some initial positive results on prediction and to obtained:
measure the prediction score of different data sources
using findings of correlation studies.
Using our proposed ASD-DM methodology, the This method provided more accurate results for the
enterprise data warehouse was created as a result of the first week, but the next weeks predictions are
speculation phase, and the ETL package was defined inacceptable.
and developed. Another collaboration cycle was launched, and the
The collaboration phase was one of the most team adopted a new model named ANFIS (Adaptive
important phases as it needed a lot of discussions and Neuro-Fuzzy Inference System), this method is a
intensive collaborative team work. The method needed combination of fuzzy logic & neural networks by
to model our prediction solution was not specified. The clustering values in fuzzy sets, membership functions
main goal was to test and evaluate the most appropriate are estimated during training, and using neural
solution that gives the most accurate prediction results networks to estimate weights. The results obtained
on a weekly basis, for the future next 4 weeks. were more accurate and this method was adapted in
We started by using neural networks to get the first our solution as the MAPE errors don’t exceed 10%.
set of prediction results. The training data subset was
gotten from April 2002 till June 2003. The test subset 5. Conclusion
was from July 2003 till September 2003, and the
evaluation subset was from October 2003 till In this paper, we explained the use of data mining
November 2003. Fig. 2 shows the overall results for techniques in software engineering tasks such as
Sebring model family. programming, testing, maintenance, reliability, and
After evaluating the first prediction method, we quality. Due to the uncertain nature of predictive data
tried to use another method based on linear regression mining application requirements, we proposed a new
framework ASD-DM based on agile methodology, [4] Slaughter S. A., Levine L., Ramesh B., Pries-Heje J., and
specifically Adaptive Software Development (ASD) Baskerville R., “Aligning Software Processes with Strategy”,
methodology, and the CRISP-DM data mining MIS Quarterly Vol. 30 No. 4, Pp. 891-918/December 2006.

LR SEB ZZ: Sales & Prediction


TRAIN TEST EVAL
4000
3500
3000
2500
Total

2000
1500
1000
500
0
/4

/4

/5

/6

/6

/7

/8

/8

/9

/1

/1

/2

/3

/3

/4

/5

/5

/6

/7

/8

/8

/9

1
/1

/1

/1

/1

/1

/1

/1
15

18

21

24

27

30

33

36

39

54

57

60

63

66

69

72

75

78

81

84

87

90
42

45

48

51

93

96

99
Week / Month

Prediction SALES

SET MAPE
Train 17.3 %
Test 6.6 %
Eval 21.7 %

Figure3: Prediction results using linear regression (Sebring


model family).

processes. This framework ensures continuous


learning, and intense collaboration among developers, [5] Giraud-Carrier C. And Povel O., “Characterizing Data
testers, and data mining customers. Mining Software”, Intelligent Data Analysis 7 (2003) 181–
192, IOS Press.

6. References [6] Khoshgoftaar T. M., Allen E. B., Jones W. D., Hudepohl


J. P., “Data Mining for Predictors of Software Quality”,
[1] Hassan A. E., Mockus A., Holt R. C., and Johnson P. M., International Journal of Software Engineering and
“Guest editor’s introduction: Special issue on mining Knowledge Engineering, Vol. 9, No. 5 (1999) 547-563.
software repositories”. IEEE Trans. Softw. Eng., 31(6):426–
428, 2005 [7] Alvarez-Mac´Ias J. L. and Mata-V´Azquez J., “Data
Mining for the Management of Software Development
[2] Riquelme J. C., Polo M., Aguilar_Ruiz J. S., Piattini M., Process”, International Journal of Software Engineering and
Francisco J. and Francisco Ruiz F. T., “A Comparison of Knowledge Engineering, Vol. 14, No. 6 (2004) 665–695.
Effort Estimation Methods for 4GL Programs: Experiences
with Statistics and Data Mining”, International Journal of [8] Last M., Friedman M., and Kandel A., “Using Data
Software Engineering and Knowledge Engineering, Vol. 16, Mining for Automated Software Testing”, International
No. 1 (2006) 127-140. Journal of Software Engineering and Knowledge
Engineering, Vol. 14, No. 4 (2004) 369-393.
[3] Nayak R., Qiu T., “A Data Mining Application: Analysis
of Problems Occurring During A Software Project [9] Mattsson M. K., Chapin N., “Data Mining for Validation
Development Process”, International Journal of Software in Software Engineering: an Example”, International Journal
Engineering and Knowledge Engineering, Vol. 15, No. 4 of Software Engineering and Knowledge Engineering Vol.
(2005) 647-663. 14, No. 4 (2004) 407-427.
[10] Chen L., Sakaguchi T., Frolick M. N., “Data Mining
Methods, Application, and Tools”, Information Systems
Management, Winter 2000.

[11] Sinha A. P., and May J. H., “Evaluating and Tuning


Predictive Data Mining Models Using Receiver Operating
Characteristic Curves”, Journal of Management Information
Systems/Winter 2004-5, Vol. 21, No. 3, pp. 249-280.

[12] Rupnik R., Kukar M., and Krisper M., “Integrating data
mining and decision support through data mining based
decision support system”, Journal of Computer Information
Systems, Spring 2007.
[13] Maqbool O., Babri H. A., Karim A. and Sarwar M.,
“Metarule-guided association rule mining for program
understanding”, IEE Proc.-Softw., Vol. 152, No. 6,
December 2005.

[14] Madsen H., and Thyregod P., “On Using Soft


Computing Techniques in Software Reliability Engineering”,
International Journal Of Reliability, Quality And Safety
Engineering Vol. 13, No. 1 (2006) 61–72.

[15] Watkins A., E. M. Hufnagel E. M., Berndt D. and L.


Johnson L., “Using Genetic Algorithms and Decision Tree
Induction to Classify Software Failures”, International
Journal of Software Engineering and Knowledge
Engineering Vol. 16, No. 2 (2006) 269-291.

[16] Yan X., Zhang C. and Zhang H., “Identifying Software


Component Association with Genetic Algorithm”,
International Journal of Software Engineering and
Knowledge Engineering, Vol. 14, No. 4 (2004) 441-447.

[17] Highsmith J., “Retiring Lifecycle Dinosaurs: Using


Adaptive Software Development to Meet the Challenges of a
High-Speed, High-Change Environment”, Software Testing
& Quality Engineering (STQE) magazine, July/August 2000.

[18] C. Talbot, “Conference Review”, CRISP-DM Special


Interest Group 4th Workshop, March 18th 1999, Brussels,
Belgium.

[19] K. Bauer, “Predictive Analytics: Data Mining with A


Twist”, DM Review Journal, December 2005.

[20] R. Ghani, and C. Soares, “Data Mining for Business


Applications”, KDD-2006 Workshop, Volume 8, Issue 2,
page 79-81.

[21] S. S. Abdullah, M. Holcombe, and M. Gheorge, “The


Impact of an Agile Methodology on the Well Being of
Development Teams”, Empir Software Eng (2006) 11,
Springer, page 143-167.

[22] M. Alnoukari, W. Alhussan, “Using Data Mining


Techniques for Predicting Future Car market Demand”,
International Conference on Information & Communication
Technologies: From Theory to Applications, IEEE
Conference, Syria, 2008.

You might also like