Professional Documents
Culture Documents
2394-9007
Vol. V, No. I, February 2018 www.ijrtonline.org
Abstract— We can see that data farming is an emerging field of research Due to the lacking of analysis tools, narrow or insufficient
because decision making is an important issue in the competitive business
awareness of data mining and data farming tools increases the
environment. The appropriateness of data and the data collection cost
become the goals of data farming. Data farming is emerging fields of data collection cost. Data farming enhances the data on hand
research in the current scenario, where data collection cost and time and also determines the most relevant data to be collected.
consumed in data collection is significant to reduce. We can farm the data Determining the features for which data should be collected in
where we have narrow data set and then apply the data mining algorithm the absence of a dataset or its partial availability. This issue is
to extract the useful knowledge.
We proposed an algorithm for data farming steps data important as the interest in data mining is growing in
plantation & harvesting. We farm sufficient data from the exponential order in the last decade. The process and methods
available little seed data by applying the proposed algorithm of used to determine the most suitable features for data collection
data farming. Classification results of J48 classification, for and subsequent data analysis are referred to as data farming.
farmed data is achieved better than classification results for the Feature extraction is a push approach, in which selected
seed data, which proves that the proposed data farming features determine the quality of knowledge while data
algorithm has produced effective data. farming pulls the data necessary for knowledge extraction.
In this paper, we present an algorithm for data farming The appropriateness of features can be expressed with
which farms the data with the help of the seed data on a
different measures like classification accuracy, missing values
predefined error threshold rate. Proposed algorithm has
implemented on farmed datasets are verified for the classification etc. which may vary for different application areas. There are
accuracy on the WEKA open source data mining tool. two approaches of the data farming surveyed in the available
literature as follows.
Keywords: data farming, J48, HPC
This approach is used in the high performance computing
I. INTRODUCTION environment where thousands of iterations are executed in
parallel environment. Development of novel data mining
Adequate data is required for decisions making on the basis of
applications calls for the definition of appropriate features and
knowledge extracted by the data mining process, data
data collection at minimal cost Data farming is a process to
collection is a crucial process, many times data is not adequate
get data on minimum cost. Data farming is an iterative team
for the mining. In that case data cleaning, data reduction,
process. Presents the data farming approach for HPC as a set
selection and data farming techniques are applied to get
of imbedded loops. This process normally requires input and
adequate data. After getting the adequate data, someone can
participation by subject matter experts, modelers, analysts, and
apply the mining algorithms to extract more accurate and
decision-makers.
useful information compared to the former data.
The “Scenario Creation” loop shown on the left side of
Methodologies and tools are needed for determining the most
the figure involves developing and honing a model that
appropriate data at an acceptable cost. According to the
adequately represents the system that addresses the question
previous experience, we can see that the data farming effort
being asked by the decision-maker .This is an iterative process
often outweighs the data mining task, especially in the
that often requires honing the question as well. The “Scenario
industry. One of the major reasons behind this is that the
Run Space Execution” loop is entered once the base case of
industrial data is collected for reasons other than decision-
the scenario is complete. In this loop the team defines a study
making. Sometime collected data is a wide range of features
to determine which scenario input parameters should be
that go beyond traditional models.
examined and what processes should be used to vary them.
Manuscript received on February, 2018.
Here the team is exploring the possible variations or
Abhilash Raghuwanshi, Reseach Scholar, Department of Computer Science excursions of the base case in the initial conditions of the
& Engineering, Sri Satya Sai College of Technology, Bhopal, M.P., India. scenario.
Prof. Gagan Sharma, Asst. Professor, Department of Computer Science &
Engineering, Sri Satya Sai College of Technology, Bhopal, M.P., India. II. RELATED WORK
Prof. Rajesh Sharma, Asst. Professor, Department of Computer Science & In this section we discuss the literature survey entitled with
Engineering, Sri Satya Sai College of Technology, Bhopal, M.P., India. their author name and given references number respectively.
In this paper presents the review of data farming techniques. [11] Han J, Kamber M 2001 Data Mining: Concepts and Techniques (San
Fransisco, CA: Morgan Kauffmann)
Prediction and classification is critical subject in information
http://www.cs.uiuc.edu/homes/hanj/bk2/toc.pdf
mining and machine learning and it can be broadly utilized as
a part of many fields. Farming time required is highly [12] Dariusz Krola, Bartosz Kryzaa, Michal Wrzeszcza, Lukasz Dutka, Jacek
Kitowski, Elastic Infrastructure for Interactive Data Farming
dependent on the instances to be farmed and lightly on the Experiments, International Conference on Computational Science, ICCS
number of seed data & error threshold. Correctly classified 2012.
instances (CCI) & kappa statistics (KS) are increased &
[13] Henrik Friman, Gary E.Horne, Using Agent Models and Data Farming
incorrectly classified instances (ICI), Mean Absolute Error to Explore Network Centric Operations. Proceedings of the 2005 Winter
(MAE), Root Mean Squared Error (RMSE), Relative absolute Simulation Conference.
error (RAE), Root relative squared error (RRSE) are [14] C.L. Chua, W.C. Sim, Automated Red Teaming: An Objective-Based
decreased for the farmed data when compared to the original Data Farming Approach for Red Teaming. Proceedings of the 2008
dataset and sample datasets. Winter Simulation Conference.
There are a few conceivable augmentations to MAE. [15] Dr. Alfred G. Brandstein, Dr. Gary E. Horne, Data Farming: A Meta-
Presently, RAE utilizes a help certainty structure to find visit Technique for Research in the 21st Century, Maneuver Warfare Science
thing sets and produce order rules. 1998.
[16] Dr. Gary E. Horne, Beyond Point Estimates: Operational Synthesis and
REFERENCES Data Farming, Maneuver Warfare Science 2001.
[1] Ankita Dewan and Meghna Sharma “Prediction of Heart Disease Using [17] Gary E.Horne, Henrik Friman. “Analysis of the Military Effectiveness
a Hybrid Technique in Data Mining Classification”, IEEE, 2015, Pp of Future C2 Concepts and Systems”, Held at NC3A, The Hague, the
704-706. Netherlands, 23-25 April 2002, in RTO-MP-117.
[2] Ming Yuchi and Jun Jo “Heart Rate Prediction Based on Physical [18] Andrew Kusiak, Member, IEEE, “Feature Transformation Methods in
Activity using Feed forward Neural Network”, International Conference Data Mining”, IEEE Transactions on Electronics Packaging
on Convergence and Hybrid Information Technology 2008International Manufacturing, Vol. 24, No. 3, July 2001.
Conference on Convergence and Hybrid Information Technology, 2008,
Pp 344-350. [19] Jun Zheng, Ming-Zeng Hu, Hong-Li Zhang, A New Method of Data
Pre-processing and Anomaly Detection, Proceedings of the third
[3] M. S. A. Megat Ali and A. H. Jahidin “Hybrid Multilayered Perceptron international Conference on Machine Learning and Cybernetics,
Network for Classification of Bundle Branch Blocks”, IEEE., 2011, Pp Shanghai, 26-29 August 2004.
149-154.
[20] Fang Yuan, Li-Juan Wang, Ge Yu, Study on Data Pre-processing
[4] M.Akhiljabbar , Priti Chandra and B.L Deekshatulu “Prediction of Risk Algorithm in Web Log Mining, Proceedings of the Second International
Score for Heart Disease using Associative Classification and Hybrid Conference on Machine Learning and Cybernetics, Wan, 2-5 November
Feature Subset Selection”, IEEE, 2012, Pp 628-634. 2003.
[5] Sivagowry .S, Dr. Durairaj. M and Persia.A “An Empirical Study on [21] Srivatsan Laxman And P.S. Sastry, A Survey of Temporal Data Mining,
applying Data Mining Techniques for the Analysis and Prediction of Sadhana Vol. 31, Part 2, April 2006, pp. 173–198.
Heart Disease”, IEEE, 2013, Pp 1-6.
[22] Andrew Kusiak, Data Farming: A Primer, International Journal of
[6] Andrew Kusiak, Data Farming Methods for Temporal Data Mining, Operations Research Vol. 2, No. 2, 48−57 (2005) 1527
Intelligent Systems Laboratory, 2139 Seamans Center, The University of http://www.orstw.org.tw/ijor/vol2no2/Paper-6-IJOR-Vol2_2_-
Iowa, Iowa City, Iowa 52242 Kusiak.pdf
http://www.sigkdd.org/kdd2001/Workshops/kus.pdf
[23] Brian F. Tivnan, Data Farming Co evolutionary Dynamics in Repast,
[7] D. Burnell, A.Al-Zobaidie, G.Windall, A.Butler. Self-Optimising Data Proceedings of the 2004 Winter Simulation Conference R. G. Ingalls,M.
Farming for Web Applications. Proceedings of the 15th International D. Rossetti, J. S. Smith, and B. A. Peters, eds.
Workshop on Database and Expert Systems Applications (Dexa’04)
1529-4188/04 IEEE. [24] Theresa Princy. R and J. Thomas “Human Heart Disease Prediction
System using Data Mining Techniques”, ICCPCT, 2016, Pp 1-5.
[8] Gary E. Horne, Ted E. Meyer, Data Farming: Discovering Surprise,
Proceedings of the 2005 Winter Simulation Conference, R. G. Ingalls,M. [25] AigerimAltayeva, SuleimenovZharas and Young Im Cho “Medical
D. Rossetti, J. S. Smith, and B. A. Peters, eds. Decision Making Diagnosis System Integrating k-means and Naïve
Bayes algorithms”, ICCAS, 2016, Pp 1087-1092.
[9] Jian Lin and Minjing Peng 2007, SVR-Based Data Farming Technique
for Web Application. In Ifip International Federation for Information
Processing, Volume 254, Research and Practical Issues of Enterprises