Professional Documents
Culture Documents
Abstract: In this research paper, agricultural Data Mining data are summarized. An improved Soil Data Prediction Model is developed to
estimate the above parameters at locations for Coimbatore city. 142 locations were investigated for the development of the model. The model
involves multiple regression equation, chi-square test and Bio inspired k-nearest neighbor classification. The GIS was used to manage the
database and to develop thematic maps for depth, N value, free swell, liquid limit, plastic limit, plastic index, percentage gravel, percentage sand
and percentage slit and clay. Field and laboratory studies were conducted in four locations and are compared with the predicted values.
Keywords: Data mining, Soil Data Analysis, K-nearest neighbor Algorithm, Dataset
__________________________________________________*****_________________________________________________
I. INTRODUCTION
Data Mining (DM) becomes popular in the field of Study area location
agriculture for soil classification, wasteland management The Corporation of Coimbatore is the study area selected
and crop and pest management. The variety of association which is situated in the Southern part of the Peninsula at a
techniques in DM and applied into the database of soil longitude of 77004’00” and 76054’00” and latitude of
science to predict the meaningful relationships and provided 11003’30” and 1057’30”. The total geographical area of
association rules for different soil types in agriculture. the city is 105.5 sq.km.A total of 138 bore hole locations
Similarly, agriculture prediction, disease detection and were identified for the purpose of this study. Soil samplings
optimizing the pesticides are analyzed with the use of were taken at each layer to analyse the soil for its grain size
various data mining techniques earlier. DM techniques for distribution, plasticity characteristics and differential free
knowledge discovery in agriculture sector and introduced swell in the laboratory. The bore hole locations were
different exhibits for knowledge discovery in the form of selected such that it covers the entire area of Coimbatore
Association Rules, Clustering, Classification and City Corporation. It was also ensured that the four zones of
Correlation. In predicted the soil fertility classes using with the corporation namely Coimbatore North, Coimbatore
classification techniques were Naïve Bayes, J48 and K- South, Coimbatore West and Coimbatore East were also
Nearest Neighbor algorithms. In6 used Adopted data mining individually well covered. The zone wise number of bore
techniques to estimate crop yield analysis. Multiple Linear hole investigation points are shown in Table-1.
Regression (MLR) method was used to find the linear
Description of Number of bore
relationship between dependent and independent variables. Sl.No
Zone hole locations
K-Means clustering approach was also use to form four
1 Coimbatore North 47
clusters considering Rainfall as key parameter. Decision
2 Coimbatore South 25
tree, Bayesian Network data mining techniques and the non-
3 Coimbatore West 44
linear approaches were implemented. Optimization based
Bayesian Network approach was considered as better than 4 Coimbatore East 24
non-linear[1]. Due to expensive and time consuming process
of subsoil investigation works, the present research was The lesser number of bore hole locations in Coimbatore East
undertaken to develop a model in order to predict the and Coimbatore south when compared to Coimbatore West
important geotechnical related parameters such as N value, and Coimbatore North is due to the presence of water bodies
differential, free swell and the depth of foundation taking in these regions. The locations of bore hole are shown in
advantage of the spatial continuity and data mining Fig.1
technique using Weka.
345
IJFRCSCE | November 2017, Available @ http://www.ijfrcsce.org
_______________________________________________________________________________________
International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: 2454-4248
Volume: 3 Issue: 11 345 – 349
_______________________________________________________________________________________________
346
IJFRCSCE | November 2017, Available @ http://www.ijfrcsce.org
_______________________________________________________________________________________
International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: 2454-4248
Volume: 3 Issue: 11 345 – 349
_______________________________________________________________________________________________
II. RESEARCH METHODOLOGY McKinsey, Jr. and Robert Evenson (Yale university, CT,
Dataset Collection USA). As discussed later in the report, a number of
The dataset is part of surveys which are carried out regularly corrections have been made in the original data set by cross-
in Pune District. Primary data for the soil survey are checking the same with government publications. Some of
acquired by field sampling. These samples are then sent for the main government publications used in compiling the
chemical and physical analysis at the soil testing database are:
laboratories; hence this dataset was collected from a private Agricultural Situation in India
soil testing lab in Pune. It contains information about Area and Production of Principal Crops in India
number of soil samples taken from 3 regions of Pune district Agricultural Prices in India
(Khed, Bhor, and Velhe). Dataset has 9 attributes and a total Fertilizer Statistics (published by Fertilizer
1988 instances of soil samples. Association of India)
Statistical Abstracts of India
Sources of data Climate data from over 160 meteorological stations are
A number of national and international sources have been from the Food and Agricultural Organization (FAO) of the
used in compiling the database. For agricultural data, the United Nations.
main source was the data set created by James W.
348
IJFRCSCE | November 2017, Available @ http://www.ijfrcsce.org
_______________________________________________________________________________________
International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: 2454-4248
Volume: 3 Issue: 11 345 – 349
_______________________________________________________________________________________________
III. CONCLUSION [6] Gholap J, Lngole A, Gohil J, Shailesh, Attar V. Soil data
An improved Soil Data Prediction Model(ISDPM) is analysis using classification techniques and soil attribute
developed for the predication of the parameters required for prediction. 2012 Jun; 9(3):1–4.
[7] Anuradha C, Velmurugan T. A comparative analysis on the
foundation design. It gives the predicated values based on
evaluation of classification algorithms in the prediction of
regression analysis and Bio-inspired K-nearest neighbour.
student performance. Indian Journal of Science and
The values predicated using ISDPM are found to have close Technology. 2015 Jul; 8(15):1–12.
agreement with actual values. The SCPM has flexibility to [8] Narain B. Study for Data Mining techniques in
include additional parameters and additional data for any classification of agricultural land soils. Journal of
other specific purpose oriented studies. The good Advanced Research in Computer Engineering. 2011 Jan-
performance of ISDPM is confirmed by Chi-Square test for Jun; 5(1):35–7.
goodness of fit where the calculated values are well within [9] Venkatesan E, Velmurugan T. Performance analysis of
the tabulated values. decision tree algorithms for breast cancer classification.
Indian Journal of Science and Technology. 2015 Nov;
8(29):1–8.
REFERENCES
[10] Chandrakar PK, Kumar S, Mukherjee D, Applying
[1] Geetha MCS. Implementation of association rule mining
classification techniques in Data Mining in agricultural
for different soil types in agriculture. International Journal
land soil. International Journal
of Advanced Research in Computer and Communication
Engineering. 2015 Apr; 4(4):520–2.
[2] Fayer M.J. et al (1995), "Estimating recharge rates for a
groundwater model using GIS," Journal of Environmental
quality V 25 I 3 May-June 1996. pp 510-518.
[3] Kothyari et al (1997), "Sediment yield estimation using
GIS," Hydrological science journal V 42 n 6 Dec. 1997.pp
833-843.
[4] Meijerink et al (1996), "Comparison of approaches for
erosion modeling using flow accumulation with GIS,"
Application of GIS system in Hydrology and water
resource management. IAHS Publication n 235. IAHS
press, Wallingford, Engl. pp 437-444.
[5] Gandhimathi et. al. / International Journal of Engineering
Science and Technology,Vol. 2(7), 2010, 2982-2996
349
IJFRCSCE | November 2017, Available @ http://www.ijfrcsce.org
_______________________________________________________________________________________