You are on page 1of 6

Clustering is the process of organizing objects into groups whose members are

similar in some way.

Clustering is one of data mining technique that used the following areas:-

1) Statistical data analysis


2) Machine learning
3) Data mining
4) Pattern recognition
5) Image analysis
6) bioinformatics

A cluster is therefore a collection of objects which are similar between them and
dissimilar to the objects belonging to other clusters.

There are different types of clustering and they are as follows:

a) Partitioning clustering (or algorithm):-


Partitioning clustering: - finding all clusters at once
Partitioning algorithms are clustering techniques that subdivide the data sets
into a set of k groups, where k is the number of groups pre-specified by the
analyst.
b) Hierarchal clustering (or algorithm):- as the name suggests is an algorithm
that builds hierarchy of clusters. This algorithm starts with all the data points
assigned to a cluster of their own.
Hierarchal clustering: - finding new clusters using previous found ones

A decision tree is a structure that includes a root node, branches, and leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes the
outcome of a test, and each leaf node holds a class label. The topmost node in the
tree is the root node.
The benefits of having a decision tree are as follows −

 It does not require any domain knowledge.


 It is easy to comprehend.
 The learning and classification steps of a decision tree are simple and fast.

Tree construction principles

 Splitting attribute: - with every node of decision tree, there is an


associated attribute whose value determine the partitioning of the data set
when node is expanded.
 Splitting criterion: - the qualifying condition on the splitting attribute for
data set splitting at a node is called splitting criterion.
Neural networks are used to model complex relationships between inputs and
outputs or to find patterns in data.
Neural network method is used for classification, clustering, feature mining,
prediction and pattern recognition.

Genetic Algorithm (GA) is a search-based optimization technique based on the


principles of Genetics and Natural Selection.
Genetic algorithms are excellent for searching through large and complex
data sets.
Genetic Algorithms were developed by John Holland and his students and
colleagues at the University of Michigan, most notably David E. Goldberg and has
since been tried on various optimization problems with a high degree of success.
Web mining is the application of data mining techniques to discover patterns from
the World Wide Web. As the name proposes, this is information gathered
by mining the web.

 1- Web content mining: - describe the discovery of useful information from the
web contents. Web content mining can be used for mining of useful data,
information and knowledge from web page content.
 2- Web structure mining: - helps to find useful knowledge or information pattern
from the structure of hyperlinks. This web structure concerns with discovering the
model underlying the link structures of the web. It used to study the topology of
hyperlinks with or without the description of links.
 3- Web usage mining: - is used for mining the web log records (access
information of web pages) and helps to discover the user access patterns of web
pages. Web server registers a web log entry for every web page.

Some of the techniques to discover and analyze the web usage pattern
are:-
i) Session and visitor analysis
 The analysis of preprocessed data can be performed in session analysis, which
includes the record of visitors, days, sessions etc. This information can be used to
analyze the behavior of visitors.
ii) OLAP (Online Analytical Processing)
 OLAP performs Multidimensional analysis of complex data.
 OLAP can be performed on different parts of log related data in a certain interval of
time.
Text mining:-is the process of deriving high-quality information from text.
High-quality information is typically derived through the devising of patterns and
trends through means such as statistical pattern learning.
Text mining, is the process of examining large collections of written resources
to generate new information, and to transform the unstructured text into
structured data for use in further analysis.
Areas of text mining
1. Information extraction
2. Natural language processing
3. Data mining
4. Information retrieval

Text clustering is the application of cluster analysis to text-based documents. It


uses machine learning and natural language processing (NLP) to understand and
categorize unstructured, textual data.

Temporal data mining can be defined as “process of knowledge


discovery in temporal databases that enumerates structures (temporal patterns
or models) over the temporal data, and any algorithm that enumerates temporal
patterns from, or fits models to, temporal data is a temporal data
mining algorithm” (Lin et al., 2002). The aim of temporal data mining is to
discover temporal patterns, unexpected trends, or other hidden relations in the
larger sequential data.
Sequential pattern mining is a topic of data mining concerned with finding statistically
relevant patterns between data examples where the values are delivered in a sequence. It is
usually presumed that the values are discrete, and thus time series mining is closely related,
but usually considered a different activity.

GSP algorithm (Generalized Sequential Pattern algorithm) is an algorithm used


for sequence mining. The algorithms for solving sequence mining problems are
mostly based on the a priori (level-wise) algorithm. There are two main steps in the
algorithm
What are Time Series?

A time series is a collection of observations made sequentially in time. A time


series is a sequence of data points recorded at specific time points - most
often in regular time intervals (seconds, hours, days, months etc.). ... Time
series data mining can generate valuable information for long-term business
decisions.

You might also like