Professional Documents
Culture Documents
Associations:
The Associations mining function finds items in your data that are associated with each other in a
meaningful way.
Business context
The goal of the Associations mining function is to find items that are consistently
associated with each other in a meaningful way. For example, you can analyze purchase
transactions to discover combinations of goods that are often purchased together. The
Associations mining function answers the question: If certain items are present in a
transaction, what other item or items are likely to be present in the same transaction?
Concepts
Items and groups
When you define associations mining settings, you can specify items and groups.
Rule body and rule head
Support in an association rule
Confidence in an association rule
Lift in an association rule
With the lift value, you can interpret the importance of a rule. It is a measure for a rule. It
is displayed in the visualizer. However, you cannot define a minimum lift in the settings
similar to minimum support or minimum confidence.
Name mappings
Taxonomies
Rule length and item constraints
Active or inactive fields
Active fields are used to build a model. Inactive field are ignored when the model is built.
Classification:
IM Modeling enables you to create Classification models and to validate or test these models,
such as:
Other suitable applications are target marketing, medical diagnosis, medical treatment
effectiveness, inventory replenishment, and store location planning.
Use IM Visualization to view and analyze the Classification models, or use IM Scoring to score
new data records against a model.
For example, an insurance company has data about customers who allowed their insurance to
lapse and those who did not. How can the company best use this information to identify such
customers in the future?
The insurance customers already belong to a certain class: they are 'classified' as having allowed
their insurance to lapse. The company can use the Classification mining function to create a risk
group profile in the form of a data mining model. This profile, or model, contains the common
attributes of the lapsed customers, compared to the other customers. The insurance company can
then apply this profile to new customers (as yet 'unclassified') to ascertain if they belong to the
risk group. The procedure is as follows:
1. The insurance company uses an IM Modeling Classification training run to identify the
attributes of each defined customer risk class, and to create a model.
2. The insurer can use IM Modeling to test the accuracy of this model by applying the
model to test data with known customer risk classes.
3. The insurer can use IM Scoring to apply the tested model to new data. This will predict
which customers are likely to let their insurance lapse in the future.
Concept
You can use the Classification mining function to gain a deeper understanding of your
database structure or to structure unclassified databases.
Decision tree classification
Class label field
Maximum purity per internal node
Maximum tree depth
Minimum number of records per internal node
Tolerated levels of incorrect predictions
Cost matrix
Active or inactive fields
Clustering:
IM Modeling provides the Clustering mining function. The Clustering mining function includes the
following algorithms
Distribution-based Clustering
Center-based Clustering
These Clustering algorithms group data records on the basis of how similar the data records are.
A data record might, for example, consist of information about a customer. The Clustering
algorithm groups similar customers together. At the same time it maximizes the differences
between the different customer groups that are formed in this way.
The groups that are found are known as clusters. Each cluster tells a specific story about
customer identity or behavior, for example, about their demographic background, or about their
preferred products or product combinations. In this way, customers that are similar are grouped
together in homogeneous groups that are then available for marketing or for other business
processes.
Business context
The Clustering mining function is largely used in CRM. It provides business insights that enable
firms to offer specific, personalized services and products to their customers.
Cross-marketing
Cross-selling
Customizing marketing plans for different customer types
Deciding which media approach to use
Understanding shopping goals
Many other areas
Concept
The Clustering mining function searches the input data for characteristics that most frequently
occur in common. It groups the input data into clusters. The members of each cluster have
similar properties. There are no preconceived notions of what patterns exist within the data.
Clustering is a discovery process.
Distribution-based Clustering
Distribution-based Clustering provides fast and natural clustering of very large databases. It
automatically determines the number of clusters to be generated.
Center-based Clustering
Center-based Clustering is based on a Kohonen feature map.
Clusterer operators
Related tasks
Related reference
When a model is applied, IM Scoring assigns a predicted value to each customer being scored.
Sequence rules:
Sequence Rules models contain various sequence rules.
A sequence rule consists of a previous sequence in the rule body that leads to a consecutive item
set in the rule head. The consecutive item set occurs after a particular period of time.
A sequence is an ordered set of item sets. Sequences contain the following grouping levels:
An item set is a collection of items. There are also singleton item sets that contain only one item.
An item is a single part or event in a collection of parts or events.
A transaction is a set of items that are linked by a common key value. For example, in a car
repair scenario, the item sets might represent the repair orders. The items represent the
replacement parts that are required for a repair order. The transaction is the set of replacement
parts that are required for a particular repair order.
This means that the input data must contain particular fields. The following table shows the
required fields for the Sequence Rules mining function. It also shows the different meanings and
contents these fields can have in a customer-purchase analysis in the retail industry and in a
quality-control analysis in the manufacturing industry.
Where:
itemset1, itemset2, and itemsetN are sets of single items or events that
happen simultaneously
This means that a Sequence Rules training run results in a list of frequent sequences that include
various item sets.
Associations
Decision tree (Classification)
Center-based Clustering
Distribution-based Clustering
Sequence Rules
Regression
Autonomic variable selection removes fields from the input data that are not useful for the
mining run. These might be, for example, the following fields: