Professional Documents
Culture Documents
techniques
By-
Group number- 14
Chidroop Madhavarapu(105644921)
Deepanshu Sandhuria(105595184)
Data Mining CSE 634
Prof. Anita Wasilewska
1
References
http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/10335/ftp:zSzzSzf
tp.cs.umn.eduzSzdeptzSzuserszSzkumarzSzdatavis.pdf/ganesh96visual.pdf
http://www.ailab.si/blaz/predavanja/ozp/gradivo/2002-Keim-Visualization%20in%20DM-
IEEE%20Trans%20Vis.pdf
http://www.geocities.com/anand_palm/
http://citeseer.ist.psu.edu/cache/papers/cs/27216/http:zSzzSzwww-users.cs.umn.eduzSz
zCz7EctluzSzPaperTalkFilezSzits02.pdf/shekhar02cubeview.pdf
http://www.cs.umn.edu/Research/shashi-group/
http://www.cs.umn.edu/Research/shashi-group/Book/sdb-chap1.pdf
http://www.cs.umn.edu/research/shashi-group/alan_planb.pdf
http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/27637/http:zSzzSzw
ww-
users.cs.umn.eduzSzzCz7EpushengzSzpubzSzkdd2001zSzkdd.pdf/shekhar01detecting.p
df
2
Motivation
4
Why Visual Data Mining
5
VDM Approach
6
Levels of VDM
Loose integration
Visualization and automated mining methods are applied
sequentially.
The result of one step can be used as input for another step.
Full integration
Automated mining and visualization methods applied in
parallel.
Combination of the results.
7
Methods of Data
Visualization
Data can be
Univariate
Bivariate
Multivariate
8
Univariate data
Characterize distribution
Histogram
Pie Chart
9
Histogram
10
Pie Chart
11
Bivariate Data
Line graphs
12
Scatter plots
13
Line graphs
14
Multivariate Data
Multi dimensional representation of
multivariate data
15
Icon based Methods
16
Pixel Based Methods
Approach:
Each attribute value is represented by one
colored pixel (the value ranges of the attributes
are mapped to a fixed color map).
The values of each attribute are presented in
separate sub windows.
Examples:
Dense Pixel Displays
17
Dense Pixel Display
Approach:
Each attribute value is represented by one
colored pixel (the value ranges of the
attributes are mapped to a fixed color map).
Different attributes are presented in separate
sub windows.
18
Visual Data Mining: Framework
and Algorithm Development
19
References
http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/1
0335/ftp:zSzzSzftp.cs.umn.eduzSzdeptzSzuserszSzkumarzSzdatavis.
pdf/ganesh96visual.pdf
http://www.ailab.si/blaz/predavanja/ozp/gradivo/2002-Keim-
Visualization%20in%20DM-IEEE%20Trans%20Vis.pdf
http://www.geocities.com/anand_palm/
20
Abstract
VDM refers to refers to the use of visualization techniques in
Data Mining process to
Evaluate
Monitor
Guide
21
Components of VQLBCI
The three major components of VQLBCI are
Visual Representations, Computations and
Events.
22
Visual Development of
Algorithms
24
General Framework
27
Acceptability Constraint
Model Constraints consist of Acceptability
constraints, Expandability constraints and a Data-
Entropy calculation function.
28
Expandability Constraint
An Expandability constraint predicate
specifies whether a leaf model atom is
expandable or not. EX:
C4.5 uses E1 and E2
CDP uses E2 and E3
29
Traversal Strategy
30
Steps in Visual Algorithm
Development
31
BF
This algorithm is based on the Best-First
search idea.
For Acceptability criteria, it includes A1 and A2
with a user specified acceptable error rate.
The Traversal strategy chosen is T3
In Best-First, expandable leaf model atoms are
ranked according to the decreasing order of
the number of misclassified training cases.
(local error rate * size of subset training data
set)
The traversal strategy will expand a model
atom that has the most misclassified training
cases, thus reducing the overall error rate the
most.
32
CDP +
CDP+ is a modification of CDP
We set
B is the branching factor of the decision tree,
t is the size of training data set belonging to
model atom, T is the whole training data set.
33
Comparison of different classification learning
algorithms
34
Experiment
The new BF and CDP+ algorithms are
compared with the C4.5 and CDP
algorithms.
Various metrics are selected to compare the
efficiency, accuracy and size of final
decision trees of the classification
algorithm.
The generation efficiency of the nodes is
measured in terms of the total number of
nodes generated.
To compare accuracy of the various
algorithms, the mean classification error on
the test data sets have been computed.
35
Classification error for 10 data
sets
36
Nodes generated for 10 data
sets
37
Final decision tree size
38
Results/Conclusion
39
Conclusion
Different datasets require different
algorithms for best results.
Diverse user requirements put different
constraints on the final decision tree.
The experiment shows that Interactive
Visual Data Mining Framework can help find
the most suitable algorithm for a given data
set and group of user requirements.
40
Data Mining for Selective
Visualization of Large Spatial
Datasets
41
References
http://citeseer.ist.psu.edu/cache/papers/cs/27216/http:zSzzSzww
w-
users.cs.umn.eduzSzzCz7EctluzSzPaperTalkFilezSzits02.pdf/shekh
ar02cubeview.pdf
http://www.cs.umn.edu/Research/shashi-group/
http://www.cs.umn.edu/Research/shashi-group/Book/sdb-
chap1.pdf
http://www.cs.umn.edu/research/shashi-group/alan_planb.pdf
http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/
cs/27637/http:zSzzSzwww-
users.cs.umn.eduzSzzCz7EpushengzSzpubzSzkdd2001zSzkdd.pdf
/shekhar01detecting.pdf
42
Basic Terminology
Spatial databases
Alphanumeric data + geographical cordinates
Spatial mining
Mining of spatial databases
Spatial datawarehouse
Contains geographical data
Spatial outliers
Observations that appear to be inconsistent with
the remainder of that set of data
43
Spatial Cluster
44
Contribution
Propose and implement the CubeView
visualization system
General data cube operations
Built on the concept of spatial data
warehouse to support data mining and data
visualization
Efficient and scalable spatial outlier
detection algorithms
45
Challenges in spatial data
mining
Classical data mining - numbers and
categories.
Spatial data
more complex and
extended objects such as points, lines and
polygons.
48
Spatial Data Warehouse
Employs data cube structure
Outputs - albums of maps.
Traffic data warehouse
Measures - volume and occupancy
Dimensions - time and space.
49
Spatial Data Mining
Process of discovering interesting and
useful but implicit spatial patterns.
key goal is to partially automate
knowledge discovery
Search for nuggets of information
embedded in very large quantities of spatial
data.
50
Spatial Outliers Detection
Suspiciously deviating observations
Local instability
Each Station
Spatial attributes time, space
Non spatial attributes volume, occupancy
51
Basic Structure CubeView
52
CubeView Visualization
System
53
Dimension Lattice
54
CubeView Visualization
System
55
CubeView Visualization
System
56
CubeView Visualization
System
57
Data Mining Algorithms for
Visualization
Problem Definition
58
Data Mining Algorithms for
Visualization
59
Data Mining Algorithms for
Visualization
Few points
60
Data Mining Algorithms for
Visualization
Algorithms
61
Data Mining Algorithms for
Visualization
62
Data Mining Algorithms for
Visualization
63
Data Mining Algorithms for
Visualization
64
Software
http://www.cs.umn.edu/research/shashi-
group/vis/traffic_volumemap2.htm
http://www.cs.umn.edu/research/shashi-
group/vis/DataCube.htm
65
Visualization and Data Mining
techniques
Thank you!!!!
66