Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Introduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
Ebook1,144 pages24 hours

Introduction to Statistical Machine Learning

Rating: 3.5 out of 5 stars

3.5/5

()

Read preview

About this ebook

Machine learning allows computers to learn and discern patterns without actually being programmed. When Statistical techniques and machine learning are combined together they are a powerful tool for analysing various kinds of data in many computer science/engineering areas including, image processing, speech processing, natural language processing, robot control, as well as in fundamental sciences such as biology, medicine, astronomy, physics, and materials.

Introduction to Statistical Machine Learning provides a general introduction to machine learning that covers a wide range of topics concisely and will help you bridge the gap between theory and practice. Part I discusses the fundamental concepts of statistics and probability that are used in describing machine learning algorithms. Part II and Part III explain the two major approaches of machine learning techniques; generative methods and discriminative methods. While Part III provides an in-depth look at advanced topics that play essential roles in making machine learning algorithms more useful in practice. The accompanying MATLAB/Octave programs provide you with the necessary practical skills needed to accomplish a wide range of data analysis tasks.

  • Provides the necessary background material to understand machine learning such as statistics, probability, linear algebra, and calculus
  • Complete coverage of the generative approach to statistical pattern recognition and the discriminative approach to statistical machine learning
  • Includes MATLAB/Octave programs so that readers can test the algorithms numerically and acquire both mathematical and practical skills in a wide range of data analysis tasks
  • Discusses a wide range of applications in machine learning and statistics and provides examples drawn from image processing, speech processing, natural language processing, robot control, as well as biology, medicine, astronomy, physics, and materials
LanguageEnglish
Release dateOct 31, 2015
ISBN9780128023501
Introduction to Statistical Machine Learning
Author

Masashi Sugiyama

Masashi Sugiyama received the degrees of Bachelor of Engineering, Master of Engineering, and Doctor of Engineering in Computer Science from Tokyo Institute of Technology, Japan in 1997, 1999, and 2001, respectively. In 2001, he was appointed Assistant Professor in the same institute, and he was promoted to Associate Professor in 2003. He moved to the University of Tokyo as Professor in 2014. He received an Alexander von Humboldt Foundation Research Fellowship and researched at Fraunhofer Institute, Berlin, Germany, from 2003 to 2004. In 2006, he received a European Commission Program Erasmus Mundus Scholarship and researched at the University of Edinburgh, Edinburgh, UK. He received the Faculty Award from IBM in 2007 for his contribution to machine learning under non-stationarity, the Nagao Special Researcher Award from the Information Processing Society of Japan in 2011 and the Young Scientists' Prize from the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology Japan for his contribution to the density-ratio paradigm of machine learning. His research interests include theories and algorithms of machine learning and data mining, and a wide range of applications such as signal processing, image processing, and robot control.

Related to Introduction to Statistical Machine Learning

Related ebooks

Mathematics For You

View More

Related articles

Reviews for Introduction to Statistical Machine Learning

Rating: 3.7142857142857144 out of 5 stars
3.5/5

7 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Introduction to Statistical Machine Learning - Masashi Sugiyama

    Introduction to Statistical Machine Learning

    Masashi Sugiyama

    The University of Tokyo, Japan

    Table of Contents

    Cover image

    Title page

    Copyright

    Biography

    Preface

    Part 1

    INTRODUCTION

    Chapter 1. Statistical Machine Learning

    1.1. Types of Learning

    1.2. Examples of Machine Learning Tasks

    1.3. Structure of This Textbook

    Part 2

    STATISTICS AND PROBABILITY

    Chapter 2. Random Variables and Probability Distributions

    2.1. Mathematical Preliminaries

    2.2. Probability

    2.3. Random Variable and Probability Distribution

    2.4. Properties of Probability Distributions

    2.5. Transformation of Random Variables

    Chapter 3. Examples of Discrete Probability Distributions

    3.1. Discrete Uniform Distribution

    3.2. Binomial Distribution

    3.3. Hypergeometric Distribution

    3.4. Poisson Distribution

    3.5. Negative Binomial Distribution

    3.6. Geometric Distribution

    Chapter 4. Examples of Continuous Probability Distributions

    4.1. Continuous Uniform Distribution

    4.2. Normal Distribution

    4.3. Gamma Distribution, Exponential Distribution, and Chi-Squared Distribution

    4.4. Beta Distribution

    4.5. Cauchy Distribution and Laplace Distribution

    4.6. t-Distribution and F-Distribution

    Chapter 5. Multidimensional Probability Distributions

    5.1. Joint Probability Distribution

    5.2. Conditional Probability Distribution

    5.3. Contingency Table

    5.4. Bayes’ Theorem

    5.5. Covariance and Correlation

    5.6. Independence

    Chapter 6. Examples of Multidimensional Probability Distributions

    6.1. Multinomial Distribution

    6.2. Multivariate Normal Distribution

    6.3. Dirichlet Distribution

    6.4. Wishart Distribution

    Chapter 7. Sum of Independent Random Variables

    7.1. Convolution

    7.2. Reproductive Property

    7.3. Law of Large Numbers

    7.4. Central Limit Theorem

    Chapter 8. Probability Inequalities

    8.1. Union Bound

    8.2. Inequalities for Probabilities

    8.3. Inequalities for Expectation

    8.4. Inequalities for the Sum of Independent Random Variables

    Chapter 9. Statistical Estimation

    9.1. Fundamentals of Statistical Estimation

    9.2. Point Estimation

    9.3. Interval Estimation

    Chapter 10. Hypothesis Testing

    10.1. Fundamentals of Hypothesis Testing

    10.2. Test for Expectation of Normal Samples

    10.3. Neyman-Pearson Lemma

    10.4. Test for Contingency Tables

    10.5. Test for Difference in Expectations of Normal Samples

    10.6. Nonparametric Test for Ranks

    10.7. Monte Carlo Test

    Part 3

    GENERATIVE APPROACH TO STATISTICAL PATTERN RECOGNITION

    Chapter 11. Pattern Recognition via Generative Model Estimation

    11.1. Formulation of Pattern Recognition

    11.2. Statistical Pattern Recognition

    11.3. Criteria for Classifier Training

    11.4. Generative and Discriminative Approaches

    Chapter 12. Maximum Likelihood Estimation

    12.1. Definition

    12.2. Gaussian Model

    12.3. Computing the Class-Posterior Probability

    12.4. Fisher’s Linear Discriminant Analysis (FDA)

    12.5. Hand-Written Digit Recognition

    Chapter 13. Properties of Maximum Likelihood Estimation

    13.1. Consistency

    13.2. Asymptotic Unbiasedness

    13.3. Asymptotic Efficiency

    13.4. Asymptotic Normality

    13.5. Summary

    Chapter 14. Model Selection for Maximum Likelihood Estimation

    14.1. Model Selection

    14.2. KL Divergence

    14.3. AIC

    14.4. Cross Validation

    14.5. Discussion

    Chapter 15. Maximum Likelihood Estimation for Gaussian Mixture Model

    15.1. Gaussian Mixture Model

    15.2. MLE

    15.3. Gradient Ascent Algorithm

    15.4. EM Algorithm

    Chapter 16. Nonparametric Estimation

    16.1. Histogram Method

    16.2. Problem Formulation

    16.3. KDE

    16.4. NNDE

    Chapter 17. Bayesian Inference

    17.1. Bayesian Predictive Distribution

    17.2. Conjugate Prior

    17.3. MAP Estimation

    17.4. Bayesian Model Selection

    Chapter 18. Analytic Approximation of Marginal Likelihood

    18.1. Laplace Approximation

    18.2. Variational Approximation

    Chapter 19. Numerical Approximation of Predictive Distribution

    19.1. Monte Carlo Integration

    19.2. Importance Sampling

    19.3. Sampling Algorithms

    Chapter 20. Bayesian Mixture Models

    20.1. Gaussian Mixture Models

    20.2. Latent Dirichlet Allocation (LDA)

    Part 4

    DISCRIMINATIVE APPROACH TO STATISTICAL MACHINE LEARNING

    Chapter 21. Learning Models

    21.1. Linear-in-Parameter Model

    21.2. Kernel Model

    21.3. Hierarchical Model

    Chapter 22. Least Squares Regression

    22.1. Method of LS

    22.2. Solution for Linear-in-Parameter Model

    22.3. Properties of LS Solution

    22.4. Learning Algorithm for Large-Scale Data

    22.5. Learning Algorithm for Hierarchical Model

    Chapter 23. Constrained LS Regression

    23.1. Subspace-Constrained LS

    23.2. ℓ2-Constrained LS

    23.3. Model Selection

    Chapter 24. Sparse Regression

    24.1. ℓ1-Constrained LS

    24.2. Solving ℓ1-Constrained LS

    24.3. Feature Selection by Sparse Learning

    24.4. Various Extensions

    Chapter 25. Robust Regression

    25.1. Nonrobustness of ℓ2-Loss Minimization

    25.2. ℓ1-Loss Minimization

    25.3. Huber Loss Minimization

    25.4. Tukey Loss Minimization

    Chapter 26. Least Squares Classification

    26.1. Classification by LS Regression

    26.2. 0∕1-Loss and Margin

    26.3. Multiclass Classification

    Chapter 27. Support Vector Classification

    27.1. Maximum Margin Classification

    27.2. Dual Optimization of Support Vector Classification

    27.3. Sparseness of Dual Solution

    27.4. Nonlinearization by Kernel Trick

    27.5. Multiclass Extension

    27.6. Loss Minimization View

    Chapter 28. Probabilistic Classification

    28.1. Logistic Regression

    28.2. LS Probabilistic Classification

    Chapter 29. Structured Classification

    29.1. Sequence Classification

    29.2. Probabilistic Classification for Sequences

    29.3. Deterministic Classification for Sequences

    Part 5

    FURTHER TOPICS

    Chapter 30. Ensemble Learning

    30.1. Decision Stump Classifier

    30.2. Bagging

    30.3. Boosting

    30.4. General Ensemble Learning

    Chapter 31. Online Learning

    31.1. Stochastic Gradient Descent

    31.2. Passive-Aggressive Learning

    31.3. Adaptive Regularization of Weight Vectors (AROW)

    Chapter 32. Confidence of Prediction

    32.1. Predictive Variance for ℓ2-Regularized LS

    32.2. Bootstrap Confidence Estimation

    32.3. Applications

    Chapter 33. Semisupervised Learning

    33.1. Manifold Regularization

    33.2. Covariate Shift Adaptation

    33.3. Class-balance Change Adaptation

    Chapter 34. Multitask Learning

    34.1. Task Similarity Regularization

    34.2. Multidimensional Function Learning

    34.3. Matrix Regularization

    Chapter 35. Linear Dimensionality Reduction

    35.1. Curse of Dimensionality

    35.2. Unsupervised Dimensionality Reduction

    35.3. Linear Discriminant Analyses for Classification

    35.4. Sufficient Dimensionality Reduction for Regression

    35.5. Matrix Imputation

    Chapter 36. Nonlinear Dimensionality Reduction

    36.1. Dimensionality Reduction with Kernel Trick

    36.2. Supervised Dimensionality Reduction with Neural Networks

    36.3. Unsupervised Dimensionality Reduction with Autoencoder

    36.4. Unsupervised Dimensionality Reduction with Restricted Boltzmann Machine

    36.5. Deep Learning

    Chapter 37. Clustering

    37.1. k-Means Clustering

    37.2. Kernel k-Means Clustering

    37.3. Spectral Clustering

    37.4. Tuning Parameter Selection

    Chapter 38. Outlier Detection

    38.1. Density Estimation and Local Outlier Factor

    38.2. Support Vector Data Description

    38.3. Inlier-Based Outlier Detection

    Chapter 39. Change Detection

    39.1. Distributional Change Detection

    39.2. Structural Change Detection

    References

    Index

    Copyright

    Acquiring Editor: Todd Green

    Editorial Project Manager: Amy Invernizzi

    Project Manager: Mohanambal Natarajan

    Designer: Maria Ines Cruz

    Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451 USA

    Copyright © 2016 by Elsevier Inc. All rights of reproduction in any form reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    British Library Cataloging-in-Publication Data

    A catalogue record for this book is available from the British Library.

    ISBN: 978-0-12-802121-7

    For information on all Morgan Kaufmann publications visit our website at www.mkp.com

    Biography

    Masashi Sugiyama

    Masashi Sugiyama received the degrees of Bachelor of Engineering, Master of Engineering, and Doctor of Engineering in Computer Science from Tokyo Institute of Technology, Japan in 1997, 1999, and 2001, respectively. In 2001, he was appointed Assistant Professor in the same institute, and he was promoted to Associate Professor in 2003. He moved to the University of Tokyo as Professor in 2014. He received an Alexander von Humboldt Foundation Research Fellowship and researched at Fraunhofer Institute, Berlin, Germany, from 2003 to 2004. In 2006, he received a European Commission Program Erasmus Mundus Scholarship and researched at the University of Edinburgh, Edinburgh, UK. He received the Faculty Award from IBM in 2007 for his contribution to machine learning under non-stationarity, the Nagao Special Researcher Award from the Information Processing Society of Japan in 2011 and the Young Scientists’ Prize from the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology Japan for his contribution to the density-ratio paradigm of machine learning. His research interests include theories and algorithms of machine learning and data mining, and a wide range of applications such as signal processing, image processing, and robot control.

    Preface

    Machine learning is a subject in computer science, aimed at studying theories, algorithms, and applications of systems that learn like humans. Recent development of computers and sensors allows us to access a huge amount of data in diverse domains such as documents, audio, images, movies, e-commerce, electric power, medicine, and biology. Machine learning plays a central role in analyzing and benefiting from such big data.

    This textbook is devoted to presenting mathematical backgrounds and practical algorithms of various machine learning techniques, targeting undergraduate and graduate students in computer science and related fields. Engineers who are applying machine learning techniques in their business and scientists who are analyzing their data can also benefit from this book.

    A distinctive feature of this book is that each chapter concisely summarizes the main idea and mathematical derivation of particular machine learning techniques, followed by compact MATLAB programs. Thus, readers can study both mathematical concepts and practical values of various machine learning techniques simultaneously. All MATLAB programs are available from

    "http://www.ms.k.u-tokyo.ac.jp/software/SMLbook.zip".

    This book begins by giving a brief overview of the field of machine learning in Part 1. Then Part 2 introduces fundamental concepts of probability and statistics, which form the mathematical basis of statistical machine learning. Part 2 was written based on

    Sugiyama, M. Probability and Statistics for Machine Learning, Kodansha, Tokyo, Japan, 2015. (in Japanese).

    Part 3 and Part 4 present a variety of practical machine learning algorithms in the generative and discriminative frameworks, respectively. Then Part 5 covers various advanced topics for tackling more challenging machine learning tasks. Part 3 was written based on

    Sugiyama, M. Statistical Pattern Recognition: Pattern Recognition Based on Generative Models, Ohmsha, Tokyo, Japan, 2009. (in Japanese),

    and Part 4 and Part 5 were written based on

    Sugiyama, M. An Illustrated Guide to Machine Learning, Kodansha, Tokyo, Japan, 2013. (in Japanese).

    The author would like to thank researchers and students in his groups at the University of Tokyo and Tokyo Institute of Technology for their valuable feedback on earlier manuscripts.

    Masashi Sugiyama The University of Tokyo

    Part 1

    Outline

    INTRODUCTION

    Chapter 1. Statistical Machine Learning

    INTRODUCTION

    Chapter 1

    Statistical Machine Learning

    Supervised learning; Unsupervised learning; Reinforcement learning; Regression; Classification; Clustering; Outlier detection

    Recent development of computers and the Internet allows us to immediately access a vast amount of information such as texts, sounds, images, and movies. Furthermore, a wide range of personal data such as search logs, purchase records, and diagnosis history are accumulated everyday. Such a huge amount of data is called big data, and there is a growing tendency to create new values and business opportunities by extracting useful knowledge from data. This process is often called data mining, and machine learning is the key technology for extracting useful knowledge. In this chapter, an overview of the field of machine learning is provided.

    1.1. Types of Learning

    Depending on the type of available data, machine learning can be categorized into supervised learning, unsupervised learning, and reinforcement learning.

    Supervised learning would be the most fundamental type of machine learning, which considers a student learning from a supervisor through questioning and answering. In the context of machine learning, a student corresponds to a computer and a supervisor corresponds to a user of the computer, and the computer learns a mapping from a question to its answer from paired samples of questions and answers. The objective of supervised learning is to acquire the generalization ability, which refers to the capability that an appropriate answer can be guessed for unlearned questions. Thus, the user does not have to teach everything to the computer, but the computer can automatically cope with unknown situations by learning only a fraction of knowledge. Supervised learning has been successfully applied to a wide range of real-world problems, such as hand-written letter recognition, speech recognition, image recognition, spam filtering, information retrieval, online advertisement, recommendation, brain signal analysis, gene analysis, stock price prediction, weather forecasting, and astronomy data analysis. The supervised learning problem is particularly called regression if the answer is a real value (such as the temperature), classification if the answer is a categorical value (such as yes or no), and ranking if the answer is an ordinal value (such as good, normal, or poor).

    Unsupervised learning considers the situation where no supervisor exists and a student learns by himself/herself. In the context of machine learning, the computer autonomously collects data through the Internet and tries to extract useful knowledge without any guidance from the user. Thus, unsupervised learning is more automatic than supervised learning, although its objective is not necessarily specified clearly. Typical tasks of unsupervised learning include data clustering and outlier detection, and these unsupervised learning techniques have achieved great success in a wide range of real-world problems, such as system diagnosis, security, event detection, and social network analysis. Unsupervised learning is also often used as a preprocessing step of supervised learning.

    Reinforcement learning is aimed at acquiring the generalization ability in the same way as supervised learning, but the supervisor does not directly give answers to the student’s questions. Instead, the supervisor evaluates the student’s behavior and gives feedback about it. The objective of reinforcement learning is, based on the feedback from the supervisor, to let the student improve his/her behavior to maximize the supervisor’s evaluation. Reinforcement learning is an important model of the behavior of humans and robots, and it has been applied to various areas such as autonomous robot control, computer games, and marketing strategy optimization. Behind reinforcement learning, supervised and unsupervised learning methods such as regression, classification, and clustering are often utilized.

    The focus on this textbook is supervised learning and unsupervised learning. For reinforcement learning, see references [99,105]

    1.2. Examples of Machine Learning Tasks

    In this section, various supervised and unsupervised learning tasks are introduced in more detail.

    1.2.1. Supervised Learning

    The objective of regression is to approximate a real-valued function from its samples (.

    FIGURE 1.1  Regression.

    On the other hand, classification is a pattern recognition problem in a supervised manner (are the same is the only concern in classification.

    FIGURE 1.2  Classification.

    The problem of ranking , ranking would be more similar to regression than classification. For this reason, the problem of ranking is also referred to as ordinal regressionis still a perfect solution.

    1.2.2. Unsupervised Learning

    Clustering is an unsupervised counter part of classification (. Usually, similar samples are supposed to belong to the same cluster, and dissimilar samples are supposed to belong to different clusters. Thus, how to measure the similarity between samples is the key issue in clustering.

    FIGURE 1.3  Clustering.

    Outlier detection, which is also referred to as anomaly detection. In the same way as clustering, the definition of similarity between samples plays a central role in outlier detection, because samples that are dissimilar from others are usually regarded as outliers (Fig. 1.4).

    FIGURE 1.4  Outlier detection.

    The objective of change detection, which is also referred to as novelty detection, i.e., only a single point is provided for detecting change, the problem of change detection may be reduced to an outlier problem.

    1.2.3. Further Topics

    In addition to supervised and unsupervised learnings, various useful techniques are available in machine learning.

    samples are input-only. Semisupervised learning .

    Given weak learning algorithms that perform only slightly better than a random guess, ensemble learning is aimed at constructing a strong learning algorithm by combining such weak learning algorithms. One of the most popular approaches is voting by the weak learning algorithms, which may complement the weakness of each other.

    . However, if data have a two-dimensional structure such as an image, directly learning from matrix data would be more promising than vectorizing the matrix data. Studies of matrix learning or tensor learning are directed to handle such higher-order data.

    When data samples are provided sequentially one by one, updating the learning results to incorporate new data would be more natural than re-learning all data from scratch. Online learning is aimed at efficiently handling such sequentially given data.

    When solving a learning task, transferring knowledge from other similar learning tasks would be helpful. Such a paradigm is called transfer learning or domain adaptation. If multiple related learning tasks need to be solved, solving them simultaneously would be more promising than solving them individually. This idea is called multitask learning and can be regarded as a bidirectional variant of transfer learning.

    Learning from high-dimensional data is challenging, which is often referred to as the curse of dimensionality. The objective of dimensionality reduction (such that certain structure of original data is maintained, for example, for visualization purposes. Metric learning is similar to dimensionality reduction, but it has more emphasis on learning the metric in the original high-dimensional space rather than reducing the dimensionality of data.

    FIGURE 1.5  Dimensionality reduction.

    1.3. Structure of This Textbook

    The main contents of this textbook consist of the following four parts.

    Part 2 introduces fundamental concepts of statistics and probability, which will be extensively utilized in the subsequent chapters. Those who are familiar with statistics and probability or those who want to study machine learning immediately may skip Part 2.

    Based on the concept of statistics and probability, Part 3, Part 4, and Part 5 introduce various machine learning techniques. These parts are rather independent, and therefore readers may start from any part based on their interests.

    Part 3 targets the generative approach to statistical pattern recognition. The basic idea of generative methods is that the probability distribution of data is modeled to perform pattern recognition. When prior knowledge on data generation mechanisms is available, the generative approach is highly useful. Various classification and clustering techniques based on generative model estimation in the frequentist and Bayesian frameworks will be introduced.

    Part 4 focuses on the discriminative approach to statistical machine learning. The basic idea of discriminative methods is to solve target machine learning tasks such as regression and classification directly without modeling data-generating probability distributions. If no prior knowledge is available for data generation mechanisms, the discriminative approach would be more promising than the generative approach. In addition to statistics and probability, knowledge of optimization theory also plays an important role in the discriminative approach, which will also be covered.

    Part 5 is devoted to introducing various advanced issues in machine learning, including ensemble learning, online learning, confidence of prediction, semisupervised learning, transfer learning, multitask learning, dimensionality reduction, clustering, outlier detection, and change detection.

    Compact MATLAB codes are provided for the methods introduced in Part 3, Part 4, and Part 5. So readers can immediately test the algorithms and learn their numerical behaviors.

    Part 2

    Outline

    STATISTICS AND PROBABILITY

    Chapter 2. Random Variables and Probability Distributions

    Chapter 3. Examples of Discrete Probability Distributions

    Chapter 4. Examples of Continuous Probability Distributions

    Chapter 5. Multidimensional Probability Distributions

    Chapter 6. Examples of Multidimensional Probability Distributions

    Chapter 7. Sum of Independent Random Variables

    Chapter 8. Probability Inequalities

    Chapter 9. Statistical Estimation

    Chapter 10. Hypothesis Testing

    STATISTICS AND PROBABILITY

    Probability and statistics are important mathematical tools used in the state-of-the-art machine learning methods, and are becoming indispensable subjects of science in the era of big data. Part 2 of this book is devoted to providing fundamentals of probability and statistics.

    Chapter 2 overviews the basic notions of random variables and probability distributions. Chapter 3 and Chapter 4 illustrate examples of discrete and continuous probability distributions, respectively. Chapter 5 introduces concepts used in multidimensional data analysis, and Chapter 6 give examples of multidimensional probability distributions. Chapter 7 discusses the asymptotic behavior of the sum of independent random variables, and Chapter 8 shows various inequalities related to random variables. Finally, Chapter 9 and Chapter 10 cover fundamentals of statistical estimation and hypothesis testing, respectively.

    Chapter 2

    Random Variables and Probability Distributions

    Probability; Random variable; Probability distribution; Expectation; Variance; Moment

    In this chapter, the notions of random variables and probability distributions are introduced, which form the basis of probability and statistics. Then simple statistics that summarize probability distributions are discussed.

    2.1. Mathematical Preliminaries

    , and no others. Such possible outcomes are called sample points and the set of all sample points is called the sample space.

    An event that any odd number appears is expressed as

    The event with no sample point is called the empty event . An event consisting only of a single sample point is called an elementary event, while an event consisting of multiple sample points is called a composite event. An event that includes all possible sample points is called the whole event. Below, the notion of combining events is explained using Fig. 2.1.

    FIGURE 2.1  Combination of events.

    Union of events

    Intersection of

    Enjoying the preview?
    Page 1 of 1