You are on page 1of 400

Contents

Preface
1 Introduction
1.1 What Is Data Mining? . . . . . . . .
1.2 Motivating Challenges . . . . . . . .
1.3 The Origins of Data Mining . . . . .
1.4 Data Mining Tasks . . . . . . . . . .
1.5 Scope and Organization of the Book
1.6 Bibliographic Notes . . . . . . . . . .
1.7 Exercises . . . . . . . . . . . . . . .

vii

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

2 Data
2.1 Types of Data . . . . . . . . . . . . . . . . . . . . .
2.1.1 Attributes and Measurement . . . . . . . .
2.1.2 Types of Data Sets . . . . . . . . . . . . . .
2.2 Data Quality . . . . . . . . . . . . . . . . . . . . .
2.2.1 Measurement and Data Collection Issues . .
2.2.2 Issues Related to Applications . . . . . . .
2.3 Data Preprocessing . . . . . . . . . . . . . . . . . .
2.3.1 Aggregation . . . . . . . . . . . . . . . . . .
2.3.2 Sampling . . . . . . . . . . . . . . . . . . .
2.3.3 Dimensionality Reduction . . . . . . . . . .
2.3.4 Feature Subset Selection . . . . . . . . . . .
2.3.5 Feature Creation . . . . . . . . . . . . . . .
2.3.6 Discretization and Binarization . . . . . . .
2.3.7 Variable Transformation . . . . . . . . . . .
2.4 Measures of Similarity and Dissimilarity . . . . . .
2.4.1 Basics . . . . . . . . . . . . . . . . . . . . .
2.4.2 Similarity and Dissimilarity between Simple
2.4.3 Dissimilarities between Data Objects . . . .
2.4.4 Similarities between Data Objects . . . . .

.
.
.
.
.
.
.

1
2
4
6
7
11
13
16

. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Attributes .
. . . . . . .
. . . . . . .

19
22
23
29
36
37
43
44
45
47
50
52
55
57
63
65
66
67
69
72

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

xiv

Contents
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

73
80
83
84
88

3 Exploring Data
3.1 The Iris Data Set . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Frequencies and the Mode . . . . . . . . . . . . . . .
3.2.2 Percentiles . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Measures of Location: Mean and Median . . . . . .
3.2.4 Measures of Spread: Range and Variance . . . . . .
3.2.5 Multivariate Summary Statistics . . . . . . . . . . .
3.2.6 Other Ways to Summarize the Data . . . . . . . . .
3.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Motivations for Visualization . . . . . . . . . . . . .
3.3.2 General Concepts . . . . . . . . . . . . . . . . . . . .
3.3.3 Techniques . . . . . . . . . . . . . . . . . . . . . . .
3.3.4 Visualizing Higher-Dimensional Data . . . . . . . . .
3.3.5 Dos and Donts . . . . . . . . . . . . . . . . . . . .
3.4 OLAP and Multidimensional Data Analysis . . . . . . . . .
3.4.1 Representing Iris Data as a Multidimensional Array
3.4.2 Multidimensional Data: The General Case . . . . . .
3.4.3 Analyzing Multidimensional Data . . . . . . . . . .
3.4.4 Final Comments on Multidimensional Data Analysis
3.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . .
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

97
98
98
99
100
101
102
104
105
105
105
106
110
124
130
131
131
133
135
139
139
141

4 Classication:
Basic Concepts, Decision Trees, and Model Evaluation
4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 General Approach to Solving a Classication Problem . .
4.3 Decision Tree Induction . . . . . . . . . . . . . . . . . . .
4.3.1 How a Decision Tree Works . . . . . . . . . . . . .
4.3.2 How to Build a Decision Tree . . . . . . . . . . . .
4.3.3 Methods for Expressing Attribute Test Conditions
4.3.4 Measures for Selecting the Best Split . . . . . . . .
4.3.5 Algorithm for Decision Tree Induction . . . . . . .
4.3.6 An Example: Web Robot Detection . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

145
146
148
150
150
151
155
158
164
166

2.5
2.6

2.4.5 Examples of Proximity Measures . . .


2.4.6 Issues in Proximity Calculation . . . .
2.4.7 Selecting the Right Proximity Measure
Bibliographic Notes . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.

Contents

4.4

4.5

4.6

4.7
4.8

4.3.7 Characteristics of Decision Tree Induction . . . . . .


Model Overtting . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Overtting Due to Presence of Noise . . . . . . . . .
4.4.2 Overtting Due to Lack of Representative Samples .
4.4.3 Overtting and the Multiple Comparison Procedure
4.4.4 Estimation of Generalization Errors . . . . . . . . .
4.4.5 Handling Overtting in Decision Tree Induction . .
Evaluating the Performance of a Classier . . . . . . . . . .
4.5.1 Holdout Method . . . . . . . . . . . . . . . . . . . .
4.5.2 Random Subsampling . . . . . . . . . . . . . . . . .
4.5.3 Cross-Validation . . . . . . . . . . . . . . . . . . . .
4.5.4 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . .
Methods for Comparing Classiers . . . . . . . . . . . . . .
4.6.1 Estimating a Condence Interval for Accuracy . . .
4.6.2 Comparing the Performance of Two Models . . . . .
4.6.3 Comparing the Performance of Two Classiers . . .
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Classication: Alternative Techniques


5.1 Rule-Based Classier . . . . . . . . . . . . . . . . . . .
5.1.1 How a Rule-Based Classier Works . . . . . . .
5.1.2 Rule-Ordering Schemes . . . . . . . . . . . . .
5.1.3 How to Build a Rule-Based Classier . . . . . .
5.1.4 Direct Methods for Rule Extraction . . . . . .
5.1.5 Indirect Methods for Rule Extraction . . . . .
5.1.6 Characteristics of Rule-Based Classiers . . . .
5.2 Nearest-Neighbor classiers . . . . . . . . . . . . . . .
5.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . .
5.2.2 Characteristics of Nearest-Neighbor Classiers
5.3 Bayesian Classiers . . . . . . . . . . . . . . . . . . . .
5.3.1 Bayes Theorem . . . . . . . . . . . . . . . . . .
5.3.2 Using the Bayes Theorem for Classication . .
5.3.3 Nave Bayes Classier . . . . . . . . . . . . . .
5.3.4 Bayes Error Rate . . . . . . . . . . . . . . . . .
5.3.5 Bayesian Belief Networks . . . . . . . . . . . .
5.4 Articial Neural Network (ANN) . . . . . . . . . . . .
5.4.1 Perceptron . . . . . . . . . . . . . . . . . . . .
5.4.2 Multilayer Articial Neural Network . . . . . .
5.4.3 Characteristics of ANN . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

xv

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

168
172
175
177
178
179
184
186
186
187
187
188
188
189
191
192
193
198

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

207
207
209
211
212
213
221
223
223
225
226
227
228
229
231
238
240
246
247
251
255

xvi Contents
5.5

Support Vector Machine (SVM) . . . . . . . . . . . . . .


5.5.1 Maximum Margin Hyperplanes . . . . . . . . . .
5.5.2 Linear SVM: Separable Case . . . . . . . . . . .
5.5.3 Linear SVM: Nonseparable Case . . . . . . . . .
5.5.4 Nonlinear SVM . . . . . . . . . . . . . . . . . . .
5.5.5 Characteristics of SVM . . . . . . . . . . . . . .
5.6 Ensemble Methods . . . . . . . . . . . . . . . . . . . . .
5.6.1 Rationale for Ensemble Method . . . . . . . . . .
5.6.2 Methods for Constructing an Ensemble Classier
5.6.3 Bias-Variance Decomposition . . . . . . . . . . .
5.6.4 Bagging . . . . . . . . . . . . . . . . . . . . . . .
5.6.5 Boosting . . . . . . . . . . . . . . . . . . . . . . .
5.6.6 Random Forests . . . . . . . . . . . . . . . . . .
5.6.7 Empirical Comparison among Ensemble Methods
5.7 Class Imbalance Problem . . . . . . . . . . . . . . . . .
5.7.1 Alternative Metrics . . . . . . . . . . . . . . . . .
5.7.2 The Receiver Operating Characteristic Curve . .
5.7.3 Cost-Sensitive Learning . . . . . . . . . . . . . .
5.7.4 Sampling-Based Approaches . . . . . . . . . . . .
5.8 Multiclass Problem . . . . . . . . . . . . . . . . . . . . .
5.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . .
5.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

256
256
259
266
270
276
276
277
278
281
283
285
290
294
294
295
298
302
305
306
309
315

6 Association Analysis: Basic Concepts and Algorithms


6.1 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Frequent Itemset Generation . . . . . . . . . . . . . . . . . .
6.2.1 The Apriori Principle . . . . . . . . . . . . . . . . . .
6.2.2 Frequent Itemset Generation in the Apriori Algorithm
6.2.3 Candidate Generation and Pruning . . . . . . . . . . .
6.2.4 Support Counting . . . . . . . . . . . . . . . . . . . .
6.2.5 Computational Complexity . . . . . . . . . . . . . . .
6.3 Rule Generation . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1 Condence-Based Pruning . . . . . . . . . . . . . . . .
6.3.2 Rule Generation in Apriori Algorithm . . . . . . . . .
6.3.3 An Example: Congressional Voting Records . . . . . .
6.4 Compact Representation of Frequent Itemsets . . . . . . . . .
6.4.1 Maximal Frequent Itemsets . . . . . . . . . . . . . . .
6.4.2 Closed Frequent Itemsets . . . . . . . . . . . . . . . .
6.5 Alternative Methods for Generating Frequent Itemsets . . . .
6.6 FP-Growth Algorithm . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

327
328
332
333
335
338
342
345
349
350
350
352
353
354
355
359
363

Contents
6.6.1 FP-Tree Representation . . . . . . . . . . . . . . . . .
6.6.2 Frequent Itemset Generation in FP-Growth Algorithm
6.7 Evaluation of Association Patterns . . . . . . . . . . . . . . .
6.7.1 Objective Measures of Interestingness . . . . . . . . .
6.7.2 Measures beyond Pairs of Binary Variables . . . . . .
6.7.3 Simpsons Paradox . . . . . . . . . . . . . . . . . . . .
6.8 Eect of Skewed Support Distribution . . . . . . . . . . . . .
6.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . .
6.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xvii
.
.
.
.
.
.
.
.
.

363
366
370
371
382
384
386
390
404

7 Association Analysis: Advanced Concepts


7.1 Handling Categorical Attributes . . . . . . . . . . . . . . . . .
7.2 Handling Continuous Attributes . . . . . . . . . . . . . . . . .
7.2.1 Discretization-Based Methods . . . . . . . . . . . . . . .
7.2.2 Statistics-Based Methods . . . . . . . . . . . . . . . . .
7.2.3 Non-discretization Methods . . . . . . . . . . . . . . . .
7.3 Handling a Concept Hierarchy . . . . . . . . . . . . . . . . . .
7.4 Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.1 Problem Formulation . . . . . . . . . . . . . . . . . . .
7.4.2 Sequential Pattern Discovery . . . . . . . . . . . . . . .
7.4.3 Timing Constraints . . . . . . . . . . . . . . . . . . . . .
7.4.4 Alternative Counting Schemes . . . . . . . . . . . . . .
7.5 Subgraph Patterns . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.1 Graphs and Subgraphs . . . . . . . . . . . . . . . . . . .
7.5.2 Frequent Subgraph Mining . . . . . . . . . . . . . . . .
7.5.3 Apriori -like Method . . . . . . . . . . . . . . . . . . . .
7.5.4 Candidate Generation . . . . . . . . . . . . . . . . . . .
7.5.5 Candidate Pruning . . . . . . . . . . . . . . . . . . . . .
7.5.6 Support Counting . . . . . . . . . . . . . . . . . . . . .
7.6 Infrequent Patterns . . . . . . . . . . . . . . . . . . . . . . . . .
7.6.1 Negative Patterns . . . . . . . . . . . . . . . . . . . . .
7.6.2 Negatively Correlated Patterns . . . . . . . . . . . . . .
7.6.3 Comparisons among Infrequent Patterns, Negative Patterns, and Negatively Correlated Patterns . . . . . . . .
7.6.4 Techniques for Mining Interesting Infrequent Patterns .
7.6.5 Techniques Based on Mining Negative Patterns . . . . .
7.6.6 Techniques Based on Support Expectation . . . . . . . .
7.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .
7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

415
415
418
418
422
424
426
429
429
431
436
439
442
443
444
447
448
453
457
457
458
458
460
461
463
465
469
473

xviii Contents
8 Cluster Analysis: Basic Concepts and Algorithms
8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1 What Is Cluster Analysis? . . . . . . . . . . . . . . . . .
8.1.2 Dierent Types of Clusterings . . . . . . . . . . . . . . .
8.1.3 Dierent Types of Clusters . . . . . . . . . . . . . . . .
8.2 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.1 The Basic K-means Algorithm . . . . . . . . . . . . . .
8.2.2 K-means: Additional Issues . . . . . . . . . . . . . . . .
8.2.3 Bisecting K-means . . . . . . . . . . . . . . . . . . . . .
8.2.4 K-means and Dierent Types of Clusters . . . . . . . .
8.2.5 Strengths and Weaknesses . . . . . . . . . . . . . . . . .
8.2.6 K-means as an Optimization Problem . . . . . . . . . .
8.3 Agglomerative Hierarchical Clustering . . . . . . . . . . . . . .
8.3.1 Basic Agglomerative Hierarchical Clustering Algorithm
8.3.2 Specic Techniques . . . . . . . . . . . . . . . . . . . . .
8.3.3 The Lance-Williams Formula for Cluster Proximity . . .
8.3.4 Key Issues in Hierarchical Clustering . . . . . . . . . . .
8.3.5 Strengths and Weaknesses . . . . . . . . . . . . . . . . .
8.4 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4.1 Traditional Density: Center-Based Approach . . . . . .
8.4.2 The DBSCAN Algorithm . . . . . . . . . . . . . . . . .
8.4.3 Strengths and Weaknesses . . . . . . . . . . . . . . . . .
8.5 Cluster Evaluation . . . . . . . . . . . . . . . . . . . . . . . . .
8.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5.2 Unsupervised Cluster Evaluation Using Cohesion and
Separation . . . . . . . . . . . . . . . . . . . . . . . . .
8.5.3 Unsupervised Cluster Evaluation Using the Proximity
Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5.4 Unsupervised Evaluation of Hierarchical Clustering . . .
8.5.5 Determining the Correct Number of Clusters . . . . . .
8.5.6 Clustering Tendency . . . . . . . . . . . . . . . . . . . .
8.5.7 Supervised Measures of Cluster Validity . . . . . . . . .
8.5.8 Assessing the Signicance of Cluster Validity Measures .
8.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .
8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

487
490
490
491
493
496
497
506
508
510
510
513
515
516
518
524
524
526
526
527
528
530
532
533
536
542
544
546
547
548
553
555
559

9 Cluster Analysis: Additional Issues and Algorithms


569
9.1 Characteristics of Data, Clusters, and Clustering Algorithms . 570
9.1.1 Example: Comparing K-means and DBSCAN . . . . . . 570
9.1.2 Data Characteristics . . . . . . . . . . . . . . . . . . . . 571

Contents

xix

9.1.3 Cluster Characteristics . . . . . . . . . . . . . . . . . . .


9.1.4 General Characteristics of Clustering Algorithms . . . .
Prototype-Based Clustering . . . . . . . . . . . . . . . . . . . .
9.2.1 Fuzzy Clustering . . . . . . . . . . . . . . . . . . . . . .
9.2.2 Clustering Using Mixture Models . . . . . . . . . . . . .
9.2.3 Self-Organizing Maps (SOM) . . . . . . . . . . . . . . .
Density-Based Clustering . . . . . . . . . . . . . . . . . . . . .
9.3.1 Grid-Based Clustering . . . . . . . . . . . . . . . . . . .
9.3.2 Subspace Clustering . . . . . . . . . . . . . . . . . . . .
9.3.3 DENCLUE: A Kernel-Based Scheme for Density-Based
Clustering . . . . . . . . . . . . . . . . . . . . . . . . . .
Graph-Based Clustering . . . . . . . . . . . . . . . . . . . . . .
9.4.1 Sparsication . . . . . . . . . . . . . . . . . . . . . . . .
9.4.2 Minimum Spanning Tree (MST) Clustering . . . . . . .
9.4.3 OPOSSUM: Optimal Partitioning of Sparse Similarities
Using METIS . . . . . . . . . . . . . . . . . . . . . . . .
9.4.4 Chameleon: Hierarchical Clustering with Dynamic
Modeling . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4.5 Shared Nearest Neighbor Similarity . . . . . . . . . . .
9.4.6 The Jarvis-Patrick Clustering Algorithm . . . . . . . . .
9.4.7 SNN Density . . . . . . . . . . . . . . . . . . . . . . . .
9.4.8 SNN Density-Based Clustering . . . . . . . . . . . . . .
Scalable Clustering Algorithms . . . . . . . . . . . . . . . . . .
9.5.1 Scalability: General Issues and Approaches . . . . . . .
9.5.2 BIRCH . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.3 CURE . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Which Clustering Algorithm? . . . . . . . . . . . . . . . . . . .
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

573
575
577
577
583
594
600
601
604

10 Anomaly Detection
10.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.1 Causes of Anomalies . . . . . . . . . . . . . . . . . . . .
10.1.2 Approaches to Anomaly Detection . . . . . . . . . . . .
10.1.3 The Use of Class Labels . . . . . . . . . . . . . . . . . .
10.1.4 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Statistical Approaches . . . . . . . . . . . . . . . . . . . . . . .
10.2.1 Detecting Outliers in a Univariate Normal Distribution
10.2.2 Outliers in a Multivariate Normal Distribution . . . . .
10.2.3 A Mixture Model Approach for Anomaly Detection . . .

651
653
653
654
655
656
658
659
661
662

9.2

9.3

9.4

9.5

9.6
9.7
9.8

608
612
613
614
616
616
622
625
627
629
630
630
633
635
639
643
647

xx Contents
10.2.4 Strengths and Weaknesses . . . . . . . . . . . . . . . . .
10.3 Proximity-Based Outlier Detection . . . . . . . . . . . . . . . .
10.3.1 Strengths and Weaknesses . . . . . . . . . . . . . . . . .
10.4 Density-Based Outlier Detection . . . . . . . . . . . . . . . . .
10.4.1 Detection of Outliers Using Relative Density . . . . . .
10.4.2 Strengths and Weaknesses . . . . . . . . . . . . . . . . .
10.5 Clustering-Based Techniques . . . . . . . . . . . . . . . . . . .
10.5.1 Assessing the Extent to Which an Object Belongs to a
Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.5.2 Impact of Outliers on the Initial Clustering . . . . . . .
10.5.3 The Number of Clusters to Use . . . . . . . . . . . . . .
10.5.4 Strengths and Weaknesses . . . . . . . . . . . . . . . . .
10.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .
10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

665
666
666
668
669
670
671
672
674
674
674
675
680

Appendix A Linear Algebra


A.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.1.1 Denition . . . . . . . . . . . . . . . . . . . . . . .
A.1.2 Vector Addition and Multiplication by a Scalar . .
A.1.3 Vector Spaces . . . . . . . . . . . . . . . . . . . . .
A.1.4 The Dot Product, Orthogonality, and Orthogonal
Projections . . . . . . . . . . . . . . . . . . . . . .
A.1.5 Vectors and Data Analysis . . . . . . . . . . . . .
A.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2.1 Matrices: Denitions . . . . . . . . . . . . . . . . .
A.2.2 Matrices: Addition and Multiplication by a Scalar
A.2.3 Matrices: Multiplication . . . . . . . . . . . . . . .
A.2.4 Linear Transformations and Inverse Matrices . . .
A.2.5 Eigenvalue and Singular Value Decomposition . . .
A.2.6 Matrices and Data Analysis . . . . . . . . . . . . .
A.3 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

685
685
685
685
687

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

688
690
691
691
692
693
695
697
699
700

Appendix B Dimensionality Reduction


B.1 PCA and SVD . . . . . . . . . . . . . . . . . .
B.1.1 Principal Components Analysis (PCA) .
B.1.2 SVD . . . . . . . . . . . . . . . . . . . .
B.2 Other Dimensionality Reduction Techniques . .
B.2.1 Factor Analysis . . . . . . . . . . . . . .
B.2.2 Locally Linear Embedding (LLE) . . . .
B.2.3 Multidimensional Scaling, FastMap, and

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

701
701
701
706
708
708
710
712

. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
ISOMAP

Contents

xxi

B.2.4 Common Issues . . . . . . . . . . . . . . . . . . . . . . . 715


B.3 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 716
Appendix C Probability and Statistics
C.1 Probability . . . . . . . . . . . . .
C.1.1 Expected Values . . . . . .
C.2 Statistics . . . . . . . . . . . . . .
C.2.1 Point Estimation . . . . . .
C.2.2 Central Limit Theorem . .
C.2.3 Interval Estimation . . . . .
C.3 Hypothesis Testing . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

719
719
722
723
724
724
725
726

Appendix D Regression
D.1 Preliminaries . . . . . . . . . . . . . . . . . .
D.2 Simple Linear Regression . . . . . . . . . . .
D.2.1 Least Square Method . . . . . . . . .
D.2.2 Analyzing Regression Errors . . . . .
D.2.3 Analyzing Goodness of Fit . . . . . .
D.3 Multivariate Linear Regression . . . . . . . .
D.4 Alternative Least-Square Regression Methods

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

729
729
730
731
733
735
736
737

Appendix E Optimization
E.1 Unconstrained Optimization .
E.1.1 Numerical Methods .
E.2 Constrained Optimization . .
E.2.1 Equality Constraints .
E.2.2 Inequality Constraints

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

739
739
742
746
746
747

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

Author Index

750

Subject Index

758

Copyright Permissions

769

You might also like