You are on page 1of 6

Data mining functionality:

Associations:
The Associations mining function finds items in your data that are associated with each other in a
meaningful way.

 Business context
The goal of the Associations mining function is to find items that are consistently
associated with each other in a meaningful way. For example, you can analyze purchase
transactions to discover combinations of goods that are often purchased together. The
Associations mining function answers the question: If certain items are present in a
transaction, what other item or items are likely to be present in the same transaction?
 Concepts
 Items and groups
When you define associations mining settings, you can specify items and groups.
 Rule body and rule head
 Support in an association rule
 Confidence in an association rule
 Lift in an association rule
With the lift value, you can interpret the importance of a rule. It is a measure for a rule. It
is displayed in the visualizer. However, you cannot define a minimum lift in the settings
similar to minimum support or minimum confidence.
 Name mappings
 Taxonomies
 Rule length and item constraints
 Active or inactive fields
Active fields are used to build a model. Inactive field are ignored when the model is built.

Parent topic: Introducing data mining functions


Related concepts
Associations operators

Classification:
IM Modeling enables you to create Classification models and to validate or test these models,
such as:

 To analyze why a certain classification was made


 To predict a classification for new data
You can use the Classification mining function to do the following, for example:

 To approve or deny insurance claims


 To detect credit-card fraud
 To identify defects in images of manufactured parts
 To diagnose error conditions

Other suitable applications are target marketing, medical diagnosis, medical treatment
effectiveness, inventory replenishment, and store location planning.

Use IM Visualization to view and analyze the Classification models, or use IM Scoring to score
new data records against a model.

For example, an insurance company has data about customers who allowed their insurance to
lapse and those who did not. How can the company best use this information to identify such
customers in the future?

The insurance customers already belong to a certain class: they are 'classified' as having allowed
their insurance to lapse. The company can use the Classification mining function to create a risk
group profile in the form of a data mining model. This profile, or model, contains the common
attributes of the lapsed customers, compared to the other customers. The insurance company can
then apply this profile to new customers (as yet 'unclassified') to ascertain if they belong to the
risk group. The procedure is as follows:

1. The insurance company uses an IM Modeling Classification training run to identify the
attributes of each defined customer risk class, and to create a model.
2. The insurer can use IM Modeling to test the accuracy of this model by applying the
model to test data with known customer risk classes.
3. The insurer can use IM Scoring to apply the tested model to new data. This will predict
which customers are likely to let their insurance lapse in the future.

 Concept
You can use the Classification mining function to gain a deeper understanding of your
database structure or to structure unclassified databases.
 Decision tree classification
 Class label field
 Maximum purity per internal node
 Maximum tree depth
 Minimum number of records per internal node
 Tolerated levels of incorrect predictions
 Cost matrix
 Active or inactive fields

Parent topic: Introducing data mining functions


Related concepts
Predictor operators
Tree rules extractor operator

Clustering:
IM Modeling provides the Clustering mining function. The Clustering mining function includes the
following algorithms

 Distribution-based Clustering
 Center-based Clustering

These Clustering algorithms group data records on the basis of how similar the data records are.

A data record might, for example, consist of information about a customer. The Clustering
algorithm groups similar customers together. At the same time it maximizes the differences
between the different customer groups that are formed in this way.

The groups that are found are known as clusters. Each cluster tells a specific story about
customer identity or behavior, for example, about their demographic background, or about their
preferred products or product combinations. In this way, customers that are similar are grouped
together in homogeneous groups that are then available for marketing or for other business
processes.

Business context

The Clustering mining function is largely used in CRM. It provides business insights that enable
firms to offer specific, personalized services and products to their customers.

In the commercial environment, clustering is used in the following areas:

 Cross-marketing
 Cross-selling
 Customizing marketing plans for different customer types
 Deciding which media approach to use
 Understanding shopping goals
 Many other areas

 Concept
The Clustering mining function searches the input data for characteristics that most frequently
occur in common. It groups the input data into clusters. The members of each cluster have
similar properties. There are no preconceived notions of what patterns exist within the data.
Clustering is a discovery process.
 Distribution-based Clustering
Distribution-based Clustering provides fast and natural clustering of very large databases. It
automatically determines the number of clusters to be generated.
 Center-based Clustering
Center-based Clustering is based on a Kohonen feature map.

Parent topic: Introducing data mining functions


Related concepts

Clusterer operators

Cluster extractor operator

Related tasks

Defining a Clusterer operator

Related reference

Clusterer operator reference

Regression mining function:


Regression is similar to classification except for the type of the predicted value. Classification
predicts a class label, regression predicts a numeric value. Regression also can determine the
input fields that are most relevant to predict the target field values. The predicted value might not
be identical to any value contained in the data that is used to build the model. An example
application is customer ranking by expected profit.

When a model is applied, IM Scoring assigns a predicted value to each customer being scored.

Sequence rules:
Sequence Rules models contain various sequence rules.

A sequence rule consists of a previous sequence in the rule body that leads to a consecutive item
set in the rule head. The consecutive item set occurs after a particular period of time.

A sequence is an ordered set of item sets. Sequences contain the following grouping levels:

 Events that happen simultaneously form a single transaction or an item set.


 Each item or each item set belongs to a transaction group. For example, a purchased
article belongs to a customer, a particular page-click belongs to a Web surfer, or a
component belongs to a produced car. Several item sets that occur at different times and
belong to the same transaction group form a sequence.

An item set is a collection of items. There are also singleton item sets that contain only one item.
An item is a single part or event in a collection of parts or events.

A transaction is a set of items that are linked by a common key value. For example, in a car
repair scenario, the item sets might represent the repair orders. The items represent the
replacement parts that are required for a repair order. The transaction is the set of replacement
parts that are required for a particular repair order.

This means that the input data must contain particular fields. The following table shows the
required fields for the Sequence Rules mining function. It also shows the different meanings and
contents these fields can have in a customer-purchase analysis in the retail industry and in a
quality-control analysis in the manufacturing industry.

Table 1. Required fields for the Sequence Rules mining function


Customer
purchase
Required analysis in the Quality control analysis in
fields Purpose retail industry the manufacturing industry
Sequence Identifies the object or the entity Customer ID, for Product ID, 639785
field or to which the items in this data example, 155634
transaction record belong.
group
field
Group Marks the items or the events of Date of a Date and time of a process
field or this data record as parts of one purchase, for step or of a particular error
time particular transaction by example, situation, for example,
stamp assigning to them a transaction 2006/03/01 2006/03/01 3:15pm
field ID or a time stamp. 10:45am
One or Contain item values in textual Articles, for Replacement part ID, for
more item form, as integer keys, or as example, apples example, M391X771
fields identifiers. or mineral water

The Sequence Rules algorithm finds sequences like this:

itemset1 >>> itemset2 >>> ... => itemsetN

Where:

 itemset1, itemset2, and itemsetN are sets of single items or events that
happen simultaneously

itemset1 = {item11, item12, ..., item1N}


 >>> denotes that time elapsed between the occurrence of the item set on the left hand
side and the item set on the right hand side.
 => also denotes the elapsed time between the occurrences of two item sets, however, this
last elapsed time step within a sequence rule is interpreted as the separation step between
the sequence rule body and the sequence rule head.

This means that a Sequence Rules training run results in a list of frequent sequences that include
various item sets.

Parent topic: Sequence rules

Autonomic variable selection:


IM Modeling provides a device for autonomic variable selection. It removes fields from the input
data that are not useful for the mining run.

IM Modeling provides autonomic variable selection for the following algorithms:

 Associations
 Decision tree (Classification)
 Center-based Clustering
 Distribution-based Clustering
 Sequence Rules
 Regression

Autonomic variable selection removes fields from the input data that are not useful for the
mining run. These might be, for example, the following fields:

You might also like