You are on page 1of 3

INTERNATIONAL JOURNAL OF RESEARCH IN TECHNOLOGY (IJRT) ISSN No.

2394-9007
Vol. V, No. I, February 2018 www.ijrtonline.org

A Review of Medical Data based on Data Farming


Techniques
Abhilash Raghuwanshi, Gagan Sharma, Rajesh Sharma

Abstract— We can see that data farming is an emerging field of research Due to the lacking of analysis tools, narrow or insufficient
because decision making is an important issue in the competitive business
awareness of data mining and data farming tools increases the
environment. The appropriateness of data and the data collection cost
become the goals of data farming. Data farming is emerging fields of data collection cost. Data farming enhances the data on hand
research in the current scenario, where data collection cost and time and also determines the most relevant data to be collected.
consumed in data collection is significant to reduce. We can farm the data Determining the features for which data should be collected in
where we have narrow data set and then apply the data mining algorithm the absence of a dataset or its partial availability. This issue is
to extract the useful knowledge.
We proposed an algorithm for data farming steps data important as the interest in data mining is growing in
plantation & harvesting. We farm sufficient data from the exponential order in the last decade. The process and methods
available little seed data by applying the proposed algorithm of used to determine the most suitable features for data collection
data farming. Classification results of J48 classification, for and subsequent data analysis are referred to as data farming.
farmed data is achieved better than classification results for the Feature extraction is a push approach, in which selected
seed data, which proves that the proposed data farming features determine the quality of knowledge while data
algorithm has produced effective data. farming pulls the data necessary for knowledge extraction.
In this paper, we present an algorithm for data farming The appropriateness of features can be expressed with
which farms the data with the help of the seed data on a
different measures like classification accuracy, missing values
predefined error threshold rate. Proposed algorithm has
implemented on farmed datasets are verified for the classification etc. which may vary for different application areas. There are
accuracy on the WEKA open source data mining tool. two approaches of the data farming surveyed in the available
literature as follows.
Keywords: data farming, J48, HPC
This approach is used in the high performance computing
I. INTRODUCTION environment where thousands of iterations are executed in
parallel environment. Development of novel data mining
Adequate data is required for decisions making on the basis of
applications calls for the definition of appropriate features and
knowledge extracted by the data mining process, data
data collection at minimal cost Data farming is a process to
collection is a crucial process, many times data is not adequate
get data on minimum cost. Data farming is an iterative team
for the mining. In that case data cleaning, data reduction,
process. Presents the data farming approach for HPC as a set
selection and data farming techniques are applied to get
of imbedded loops. This process normally requires input and
adequate data. After getting the adequate data, someone can
participation by subject matter experts, modelers, analysts, and
apply the mining algorithms to extract more accurate and
decision-makers.
useful information compared to the former data.
The “Scenario Creation” loop shown on the left side of
Methodologies and tools are needed for determining the most
the figure involves developing and honing a model that
appropriate data at an acceptable cost. According to the
adequately represents the system that addresses the question
previous experience, we can see that the data farming effort
being asked by the decision-maker .This is an iterative process
often outweighs the data mining task, especially in the
that often requires honing the question as well. The “Scenario
industry. One of the major reasons behind this is that the
Run Space Execution” loop is entered once the base case of
industrial data is collected for reasons other than decision-
the scenario is complete. In this loop the team defines a study
making. Sometime collected data is a wide range of features
to determine which scenario input parameters should be
that go beyond traditional models.
examined and what processes should be used to vary them.
Manuscript received on February, 2018.
Here the team is exploring the possible variations or
Abhilash Raghuwanshi, Reseach Scholar, Department of Computer Science excursions of the base case in the initial conditions of the
& Engineering, Sri Satya Sai College of Technology, Bhopal, M.P., India. scenario.
Prof. Gagan Sharma, Asst. Professor, Department of Computer Science &
Engineering, Sri Satya Sai College of Technology, Bhopal, M.P., India. II. RELATED WORK
Prof. Rajesh Sharma, Asst. Professor, Department of Computer Science & In this section we discuss the literature survey entitled with
Engineering, Sri Satya Sai College of Technology, Bhopal, M.P., India. their author name and given references number respectively.

Impact Factor: 4.012 35


Published under
Asian Research & Training Publication
ISO 9001:2015 Certified
INTERNATIONAL JOURNAL OF RESEARCH IN TECHNOLOGY (IJRT) ISSN No. 2394-9007
Vol. V, No. I, February 2018 www.ijrtonline.org
Data farming was first developed and used at the Marine
Corps Combat Development Command in late 1997. Data
farming has been a never ending opportunity to explore our
question. The ideas behind data farming had been initially
developed much earlier, but were introduced by Dr. Horne to
the defence community in 1997 in concert with the
combination of agent based models with high performance
computing that was the start of Project Albert. Project Albert
was a congressionally funded modelling and simulation
initiative of the United States Marine Corps (USMC).
Then Dr. Alfred Brandstein and Dr. Gary Horne
published the concept of fertilize, cultivate, plant and harvest
when growing of the data discussed in 1998 [8]. They also
discussed Crop rotation and JWARS. Data farming came into
light by the Winter Simulation community in 1999 at Winter
Simulation Conference in Phoenix.
The international community drove the German mission
through other methodological developments such as Fig. 1: Data Farming Approach for Single Processor
experimental designs, model developments (MANA, B. Plantation
Pythagoras, PAX, etc.), Application of agent based Plantation is an essential step for farming. To plant, the
development environments (NetLogo, REPAST, MASON, available fertilized seed in the cultivated environment is
etc.) and through evaluation and analysis tools of many types known as plantation. In the terminology of data farming,
[15]. In the international workshops the availability of the generating farmed data from fertilized seed data in cultivated
experts and the free and open information sharing led to a big environment is called data plantation. This step is actually
success in Germany in the application of the tools. responsible for the data farming. We can say algorithm for
growing data is actually run in this step.
III. DATA FARMING ARCHITECTURE
This approach is mostly used for the data farming process. C. Harvesting
There are four steps in this approach similar to the agriculture Harvesting step is the last phase of the farming cycle.
farming process [15]. Each step has an isolated & necessary Harvesting is a process to collect the crops. Similarly, data
functioning. Data fertilization is essential to make the seed harvesting is a process to collect the farmed dataset. Hence, in
data more productive. Cultivation is required to create the this step, collection of the farmed data set is done.
environment for the plantation. Plantation is the core step in Data farming is a cyclic procedure including the steps;
any farming process, harvesting is an obligatory step for data fertilization, data cultivation, data plantation and data
collecting farmed data. harvesting. Data fertilization, in which the available data is
Fertilization initiates the data farming cycle. Data fertilized by filling missing values etc. The second step of data
fertilization is required, to make available seed data more farming is the data cultivation, in which we prepare the
productive. Fertilized seed data can be achieved using filling available seed for plantation. The third step of data farming is
of missing data values in the sample, data selection/reduction the data plantation, in which the seed data is planted to grow
or applying prediction on some attributes; if the seed dataset more data by the proper algorithms. The last step of the data
has some missing values then first of all we have to fill these farming is the data harvesting, in this step the farmed data or
missing values. crops are collected and stored permanently. The farmed data
can be used either for data mining purpose or it can be used as
A. Cultivation seed data to the next iteration of the data farming.
Second step of data farming is cultivation. Cultivation is
IV. PROBLEM STATEMENT
responsible for developing the scenario for data plantation;
Cultivation is the procedure to prepare the environment for the Objective of data farming is to improve the mining accuracy
seed and pre-initialization of plantation method. Cultivation is as well as reduce the data collection cost. Classification
a highly application dependent step i.e. the different accuracy, cluster density and rule support or confidence is a
algorithmic procedures are followed for the cultivation based measure of the data mining results. Data farming is used to
on the application. For example sometime statistical methods improve the results in terms of these performance measures.
are used for the cultivation & sometimes range of the seed The goal of data farming is given below.
dataset is used to create the environment for plantation.  Maximize performance measure (e.g., classification
accuracy, cluster density, rule support and confidence)
 Minimize or reduce the data collection cost

Impact Factor: 4.012 36


Published under
Asian Research & Training Publication
ISO 9001:2015 Certified
INTERNATIONAL JOURNAL OF RESEARCH IN TECHNOLOGY (IJRT) ISSN No. 2394-9007
Vol. V, No. I, February 2018 www.ijrtonline.org
These criteria directly affect accuracy and cost savings. High Information Systems II Volume I, eds. L.Xu, Tjoa A.,Chaudhry S.
(Boston: Springer),pp 433-441.
accuracy on low price increased competitiveness. Various other
criteria for data farming may be evaluated in real life. [10] M.Fleury, A.C.Downton and A.F.Clark, Scheduling Schemes for Data
Farming, IEEE Proc. Computer & Digital Tech., Vol. 146, No. 5,
V. CONCLUSION & FUTURE WORK September 1999.

In this paper presents the review of data farming techniques. [11] Han J, Kamber M 2001 Data Mining: Concepts and Techniques (San
Fransisco, CA: Morgan Kauffmann)
Prediction and classification is critical subject in information
http://www.cs.uiuc.edu/homes/hanj/bk2/toc.pdf
mining and machine learning and it can be broadly utilized as
a part of many fields. Farming time required is highly [12] Dariusz Krola, Bartosz Kryzaa, Michal Wrzeszcza, Lukasz Dutka, Jacek
Kitowski, Elastic Infrastructure for Interactive Data Farming
dependent on the instances to be farmed and lightly on the Experiments, International Conference on Computational Science, ICCS
number of seed data & error threshold. Correctly classified 2012.
instances (CCI) & kappa statistics (KS) are increased &
[13] Henrik Friman, Gary E.Horne, Using Agent Models and Data Farming
incorrectly classified instances (ICI), Mean Absolute Error to Explore Network Centric Operations. Proceedings of the 2005 Winter
(MAE), Root Mean Squared Error (RMSE), Relative absolute Simulation Conference.
error (RAE), Root relative squared error (RRSE) are [14] C.L. Chua, W.C. Sim, Automated Red Teaming: An Objective-Based
decreased for the farmed data when compared to the original Data Farming Approach for Red Teaming. Proceedings of the 2008
dataset and sample datasets. Winter Simulation Conference.
There are a few conceivable augmentations to MAE. [15] Dr. Alfred G. Brandstein, Dr. Gary E. Horne, Data Farming: A Meta-
Presently, RAE utilizes a help certainty structure to find visit Technique for Research in the 21st Century, Maneuver Warfare Science
thing sets and produce order rules. 1998.
[16] Dr. Gary E. Horne, Beyond Point Estimates: Operational Synthesis and
REFERENCES Data Farming, Maneuver Warfare Science 2001.
[1] Ankita Dewan and Meghna Sharma “Prediction of Heart Disease Using [17] Gary E.Horne, Henrik Friman. “Analysis of the Military Effectiveness
a Hybrid Technique in Data Mining Classification”, IEEE, 2015, Pp of Future C2 Concepts and Systems”, Held at NC3A, The Hague, the
704-706. Netherlands, 23-25 April 2002, in RTO-MP-117.
[2] Ming Yuchi and Jun Jo “Heart Rate Prediction Based on Physical [18] Andrew Kusiak, Member, IEEE, “Feature Transformation Methods in
Activity using Feed forward Neural Network”, International Conference Data Mining”, IEEE Transactions on Electronics Packaging
on Convergence and Hybrid Information Technology 2008International Manufacturing, Vol. 24, No. 3, July 2001.
Conference on Convergence and Hybrid Information Technology, 2008,
Pp 344-350. [19] Jun Zheng, Ming-Zeng Hu, Hong-Li Zhang, A New Method of Data
Pre-processing and Anomaly Detection, Proceedings of the third
[3] M. S. A. Megat Ali and A. H. Jahidin “Hybrid Multilayered Perceptron international Conference on Machine Learning and Cybernetics,
Network for Classification of Bundle Branch Blocks”, IEEE., 2011, Pp Shanghai, 26-29 August 2004.
149-154.
[20] Fang Yuan, Li-Juan Wang, Ge Yu, Study on Data Pre-processing
[4] M.Akhiljabbar , Priti Chandra and B.L Deekshatulu “Prediction of Risk Algorithm in Web Log Mining, Proceedings of the Second International
Score for Heart Disease using Associative Classification and Hybrid Conference on Machine Learning and Cybernetics, Wan, 2-5 November
Feature Subset Selection”, IEEE, 2012, Pp 628-634. 2003.
[5] Sivagowry .S, Dr. Durairaj. M and Persia.A “An Empirical Study on [21] Srivatsan Laxman And P.S. Sastry, A Survey of Temporal Data Mining,
applying Data Mining Techniques for the Analysis and Prediction of Sadhana Vol. 31, Part 2, April 2006, pp. 173–198.
Heart Disease”, IEEE, 2013, Pp 1-6.
[22] Andrew Kusiak, Data Farming: A Primer, International Journal of
[6] Andrew Kusiak, Data Farming Methods for Temporal Data Mining, Operations Research Vol. 2, No. 2, 48−57 (2005) 1527
Intelligent Systems Laboratory, 2139 Seamans Center, The University of http://www.orstw.org.tw/ijor/vol2no2/Paper-6-IJOR-Vol2_2_-
Iowa, Iowa City, Iowa 52242 Kusiak.pdf
http://www.sigkdd.org/kdd2001/Workshops/kus.pdf
[23] Brian F. Tivnan, Data Farming Co evolutionary Dynamics in Repast,
[7] D. Burnell, A.Al-Zobaidie, G.Windall, A.Butler. Self-Optimising Data Proceedings of the 2004 Winter Simulation Conference R. G. Ingalls,M.
Farming for Web Applications. Proceedings of the 15th International D. Rossetti, J. S. Smith, and B. A. Peters, eds.
Workshop on Database and Expert Systems Applications (Dexa’04)
1529-4188/04 IEEE. [24] Theresa Princy. R and J. Thomas “Human Heart Disease Prediction
System using Data Mining Techniques”, ICCPCT, 2016, Pp 1-5.
[8] Gary E. Horne, Ted E. Meyer, Data Farming: Discovering Surprise,
Proceedings of the 2005 Winter Simulation Conference, R. G. Ingalls,M. [25] AigerimAltayeva, SuleimenovZharas and Young Im Cho “Medical
D. Rossetti, J. S. Smith, and B. A. Peters, eds. Decision Making Diagnosis System Integrating k-means and Naïve
Bayes algorithms”, ICCAS, 2016, Pp 1087-1092.
[9] Jian Lin and Minjing Peng 2007, SVR-Based Data Farming Technique
for Web Application. In Ifip International Federation for Information
Processing, Volume 254, Research and Practical Issues of Enterprises

Impact Factor: 4.012 37


Published under
Asian Research & Training Publication
ISO 9001:2015 Certified

You might also like