Professional Documents
Culture Documents
There are so many algorithms available and it can feel overwhelming when algorithm names are thrown
around and you are expected to just know what they are and where they fit.
In this post I want to give you two ways to think about and categorize the algorithms you may come
across in the field.
After reading this post, you will have a much better understanding of the most popular machine learning
algorithms for supervised learning and how they are related.
A cool example of an ensemble of lines of best fit. Weak members are grey, the combined prediction is
red.
Plot from Wikipedia, licensed under public domain.
There are different ways an algorithm can model a problem based on its interaction with the experience
or environment or whatever we want to call the input data.
It is popular in machine learning and artificial intelligence textbooks to first consider the learning styles
that an algorithm can adopt.
There are only a few main learning styles or learning models that an algorithm can have and we’ll go
through them here with a few examples of algorithms and problem types that they suit.
This taxonomy or way of organizing machine learning algorithms is useful because it forces you to think
about the the roles of the input data and the model preparation process and select one that is the most
appropriate for your problem in order to get the best result.
Let’s take a look at four different learning styles in machine learning algorithms:
Supervised Learning
Input data is called training data and has a known label or result
such as spam/not-spam or a stock price at a time.
A model is prepared through a training process where it is required to make predictions and is corrected
when those predictions are wrong. The training process continues until the model achieves a desired
level of accuracy on the training data.
Example problems are classification and regression.
Example algorithms include Logistic Regression and the Back Propagation Neural Network.
Unsupervised Learning
Input data is not labelled and does not have a known result.
A model is prepared by deducing structures present in the input data. This may be to extract general
rules. It may through a mathematical process to systematically reduce redundancy, or it may be
to organize data by similarity.
Example problems are clustering, dimensionality reduction and association rule learning.
Semi-Supervised Learning
Input data is a mixture of labelled and unlabelled examples.
There is a desired prediction problem but the model must learn the structures to organize the data as well
as make predictions.
Example algorithms are extensions to other flexible methods that make assumptions about how to model
the unlabelled data.
Overview
When crunching data to model business decisions, you are most typically using supervised and
unsupervised learning methods.
A hot topic at the moment is semi-supervised learning methods in areas such as image classification
where there are large datasets with very few labelled examples.
Algorithms are often grouped by similarity in terms of their function (how they work). For example, tree-
based methods, and neural network inspired methods.
I think this is the most useful way to group algorithms and it is the approach we will use here.
This is a useful grouping method, but it is not perfect. There are still algorithms that could just as easily fit
into multiple categories like Learning Vector Quantization that is both a neural network inspired method
and an instance-based method. There are also categories that have the same name that describes the
problem and the class of algorithm such as Regression and Clustering.
We could handle these cases by listing algorithms twice or by selecting the group that subjectively is the
“best” fit. I like this latter approach of not duplicating algorithms to keep things simple.
In this section I list many of the popular machine leaning algorithms grouped the way I think is the most
intuitive. It is not exhaustive in either the groups or the algorithms, but I think it is representative and will
be useful to you to get an idea of the lay of the land.
Please Note: There is a strong bias towards algorithms used for classification and regression, the two
most prevalent supervised machine learning problems you will encounter.
If you know of an algorithm or a group of algorithms not listed, put it in the comments and share it with us.
Let’s dive in.
Regression Algorithms
Regression methods are a workhorse of statistics and have been cooped into statistical machine learning.
This may be confusing because we can use regression to refer to the class of problem and the class of
algorithm. Really, regression is a process.
Such methods typically build up a database of example data and compare new data to the database
using a similarity measure in order to find the best match and make a prediction. For this reason,
instance-based methods are also called winner-take-all methods and memory-based learning. Focus is
put on representation of the stored instances and similarity measures used between instances.
Regularization Algorithms
An extension made to another method (typically regression methods)
that penalizes models based on their complexity, favoring simpler models that are also better at
generalizing.
I have listed regularization algorithms separately here because they are popular, powerful and generally
simple modifications made to other methods.
Ridge Regression
Least Absolute Shrinkage and Selection Operator (LASSO)
Elastic Net
Least-Angle Regression (LARS)
Decision Tree Algorithms
Naive Bayes
Gaussian Naive Bayes
Multinomial Naive Bayes
Averaged One-Dependence Estimators (AODE)
Bayesian Belief Network (BBN)
Bayesian Network (BN)
Clustering Algorithms
Clustering methods are typically organized by the modelling approaches such as centroid-based and
hierarchal. All methods are concerned with using the inherent structures in the data to best organize the
data into groups of maximum commonality.
k-Means
k-Medians
Expectation Maximisation (EM)
Hierarchical Clustering
Association rule learning are methods that extract rules that best
explain observed relationships between variables in data.
These rules can discover important and commercially useful associations in large multidimensional
datasets that can be exploited by an organisation.
Apriori algorithm
Eclat algorithm
Artificial Neural Networks are models that are inspired by the structure
and/or function of biological neural networks.
They are a class of pattern matching that are commonly used for regression and classification problems
but are really an enormous subfield comprised of hundreds of algorithms and variations for all manner of
problem types.
Note that I have separated out Deep Learning from neural networks because of the massive growth and
popularity in the field. Here we are concerned with the more classical methods.
Perceptron
Back-Propagation
Hopfield Network
Radial Basis Function Network (RBFN)
Deep Learning Algorithms
They are concerned with building much larger and more complex neural networks, and as commented
above, many methods are concerned with semi-supervised learning problems where large datasets
contain very little labelled data.
This can be useful to visualize dimensional data or to simplify data which can then be used in a
supervized learning method. Many of these methods can be adapted for use in classification and
regression.
Much effort is put into what types of weak learners to combine and the ways in which to combine them.
This is a very powerful class of techniques and as such is very popular.
Boosting
Bootstrapped Aggregation (Bagging)
AdaBoost
Stacked Generalization (blending)
Gradient Boosting Machines (GBM)
Gradient Boosted Regression Trees (GBRT)
Random Forest
Machine Learning with R
R is the most popular platform for applied machine learning. When you want to get serious with applied
machine learning you will find your way into R.
It is very powerful because so many machine learning algorithms are provided. A problem is that the
algorithms are all provided by third parties, which makes their usage very inconsistent. This slows you
down, a lot, because you have to learn how to model data and how to make predicts with each algorithm
in each package, again and again.
In this post you will discover how you can overcome this difficulty with machine learning algorithms in R,
with pre-prepared recipes that follow a consistent structure.
The R ecosystem is enormous. Open source third party packages provide this power, allowing academics
and professionals to get the most powerful algorithms available into the hands of us practitioners.
A problem that I experienced when starting out with R was that the usage to each algorithm differs from
package to package. This inconsistency also extends to the documentation, with some providing worked
example for classification but ignoring regression and others not providing examples at all.
All this means that if you want to try a few different algorithms from different packages, you must spend
time figuring out how to fit and make predictions with each method in turn. This takes a lot of time,
especially with the spotty examples and vignettes.
Inconsistent: Algorithm implementations vary in the way a model is fit to data and the way a model is
used to generate predictions. This means that you have to study each package and each algorithm
implementation just to put together a working example, let alone adapt it to your problem.
Decentralized: Algorithm are implemented across different packages and it can be hard to locate which
packages provide an implementation of the algorithm you need, let alone which package provides the
most popular implementation. Additionally, the documentation for one package may be spread across
multiple help files, website and vignettes. This means you have to do a lot of searching just to locate an
algorithm, let alone compile a list of algorithms from which you can choose.
Incomplete: Algorithm documentation is almost always partially complete. An example usage may or
may not be provided, if it is, it may or may not be demonstrated on a canonical problem. This means you
have no obvious way to quickly understand how to use an implementation.
Complexity: Algorithms vary in their complexity of implementation and description. This can take it’s toll
on you as you jump from package to package. You want to focus on how to get the most from the
algorithm and its parameters, and not burn energy on parsing reams of PDFs just to get a hello world.
Linear Regression in R
You can copy and paste the recipes in this post to make a jump-start on your own problem or to learn and
practice with linear regression in R.
Ordinary Least Squares Regression
Ordinary Least Squares (OLS) regression is a linear model that seeks to find a set of coefficients for a
line/hyper-plane that minimise the sum of the squared errors.
# load data
data(longley)
# fit model
fit<- lm(Employed~., longley)
# summarize the fit
summary(fit)
# make predictions
predictions<- predict(fit, longley)
# summarize accuracy
rmse<- mean((longley$Employed - predictions)^2)
print(rmse)
Stepwize Linear Regression
Stepwise Linear Regression is a method that makes use of linear regression to discover which subset of
attributes in the dataset result in the best performing model. It is step-wise because each iteration of the
method makes a change to the set of attributes and creates a model to evaluate the performance of the
set.
# load data
data(longley)
# fit model
base<- lm(Employed~., longley)
# summarize the fit
summary(base)
# perform step-wise feature selection
fit<- step(base)
# summarize the selected model
summary(fit)
# make predictions
predictions<- predict(fit, longley)
# summarize accuracy
rmse<- mean((longley$Employed - predictions)^2)
print(rmse)
Principal Component Regression (PCR) creates a linear regression model using the outputs of a Principal
Component Analysis (PCA) to estimate the coefficients of the model. PCR is useful when the data has
highly-correlated predictors.
Partial Least Squares (PLS) Regression creates a linear model of the data in a transformed projection of
problem space. Like PCR, PLS is appropriate for data with highly-correlated predictors.
Multivariate Adaptive Regression Splines (MARS) is a non-parametric regression method that models
multiple nonlinearities in data using hinge functions (functions with a kink in them).
# load the package
library(earth)
# load data
data(longley)
# fit model
fit<- earth(Employed~., longley)
# summarize the fit
summary(fit)
# summarize the importance of input variables
evimp(fit)
# make predictions
predictions<- predict(fit, longley)
# summarize accuracy
rmse<- mean((longley$Employed - predictions)^2)
print(rmse)
Support Vector Machines (SVM) are a class of methods, developed originally for classification, that find
support points that best separate classes. SVM for regression is called Support Vector Regression
(SVM).
k-Nearest Neighbor
The k-Nearest Neighbor (kNN) does not create a model, instead it creates predictions from close data on-
demand when a prediction is required. A similarity measure (such as Euclidean distance) is used to
locate close data in order to make predictions.
Neural Network
A Neural Network (NN) is a graph of computational units that recieve inputs and transfer the result into an
output that is passed on. The units are ordered into layers to connect the features of an input vector to the
features of an output vector. With training, such as the Back-Propagation algorithm, neural networks can
be designed and trained to model the underlying relationship in data.
Classification and Regression Trees (CART) split attributes based on values that minimize a loss function,
such as sum of squared errors.
The following recipe demonstrates the recursive partitioning decision tree method on the longley dataset.
Condition Decision Trees are created using statistical tests to select split points on attributes rather than a
loss function.
The following recipe demonstrates the condition inference trees method on the longley dataset.
Model Trees
Model Trees create a decision tree and use a linear model at each node to make a prediction rather than
using an average value.
The following recipe demonstrates the M5P Model Tree method on the longley dataset.
Rule System
Rule Systems can be created by extracting and simplifying the rules from a decision tree.
The following recipe demonstrates the M5Rules Rule System on the longley dataset.
Bagging CART
Bootstrapped Aggregation (Bagging) is an ensemble method that creates multiple models of the same
type from different sub-samples of the same dataset. The predictions from each separate model are
combined together to provide a superior result. This approach has shown participially effective for high-
variance methods such as decision trees.
The following recipe demonstrates bagging applied to the recursive partitioning decision tree.
Random Forest is variation on Bagging of decision trees by reducing the attributes available to making a
tree at each decision point to a random sub-sample. This further increases the variance of the trees and
more trees are required.
Cubist
Cubist decision trees are another ensemble method. They are constructed like model trees but involve a
boosting-like procedure called committees that re rule-like models.
a(2)0=g(Θ(1)00x0+Θ(1)01x1+Θ(1)02x2)=g(ΘT0x)=g(z(2)0)a0(2)=g(Θ00(1)x0+Θ01(1)x1+Θ02(1)x2)=g(Θ0Tx)=g(z0(
2))
a(2)1=g(Θ(1)10x0+Θ(1)11x1+Θ(1)12x2)=g(ΘT1x)=g(z(2)1)a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2)=g(Θ1Tx)=g(z1(
2))
a(2)2=g(Θ(1)20x0+Θ(1)21x1+Θ(1)22x2)=g(ΘT2x)=g(z(2)2)a2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2)=g(Θ2Tx)=g(z2(
2))
hΘ(x)=a(3)1=g(Θ(2)10a(2)0+Θ(2)11a(2)1+Θ(2)12a(2)2)hΘ(x)=a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2))
In the equations, the gg is sigmoid function that refers to the special case of the logisticfunction and
defined by the formula:
g(z)=11+e−zg(z)=11+e−z
Sigmoid functions
One of the reasons to use the sigmoid function (also called the logistic function) is it was the first one to
be used. Its derivative has a very good property. In a lot of weight update algorithms, we need to know a
derivative (sometimes even higher order derivatives). These can all be expressed as products
of ff and 1−f1−f. In fact, it's the only class of functions that satisfies f′(t)=f(t)(1−f(t))f′(t)=f(t)(1−f(t)).
However, usually the weights are much more important than the particular function chosen. These
sigmoid functions are very similar, and the output differences are small. Here's a plot from Wikipedia-
Sigmoid function. Note that all functions are normalized in such a way that their slope at the origin is 1.
Forward Propagation
x=⎡⎣⎢x0x1x2⎤⎦⎥z(2)=⎡⎣⎢⎢⎢z(2)0z(2)1z(2)2⎤⎦⎥⎥⎥x=[x0x1x2]z(2)=[z0(2)z1(2)z2(2)]
z(2)=Θ(1)x=Θ(1)a(1)z(2)=Θ(1)x=Θ(1)a(1)
a(2)=g(z(2))a(2)=g(z(2))
a(2)0=1.0a0(2)=1.0
z(3)=Θ(2)a(2)z(3)=Θ(2)a(2)
hΘ(x)=a(3)=g(z(3))hΘ(x)=a(3)=g(z(3))
The backpropagation learning algorithm can be divided into two phases: propagation and weight update.
- from wiki - Backpropagatio.
1. Phase 1: Propagation
Each propagation involves the following steps:
o Forward propagation of a training pattern's input through the neural network in order to
generate the propagation's output activations.
o Backward propagation of the propagation's output activations through the neural network
using the training pattern target in order to generate the deltas of all output and hidden
neurons.
2. Phase 2: Weight update
For each weight-synapse follow the following steps:
o Multiply its output delta and input activation to get the gradient of the weight.
o Subtract a ratio (percentage) of the gradient from the weight.
This ratio (percentage) influences the speed and quality of learning; it is called the learning rate.
The greater the ratio, the faster the neuron trains; the lower the ratio, the more accurate the
training is. The sign of the gradient of a weight indicates where the error is increasing, this is why
the weight must be updated in the opposite direction.
If we denote an error of node jj in layer ll as δ(l)jδj(l), for our output unit(L=3) becomes activation -actual
value:
δ(3)j=a(3)j−yj=hΘ(x)−yjδj(3)=aj(3)−yj=hΘ(x)−yj
δ(3)=a(3)−yδ(3)=a(3)−y
δ(2)=(Θ(2))Tδ(3)⋅g′(z(2))δ(2)=(Θ(2))Tδ(3)⋅g′(z(2))
where
g′(z(2))=a(2)⋅(1−a(2))g′(z(2))=a(2)⋅(1−a(2))
Note that we do not have δ(1)δ(1) term because that's the input layer and the values are the ones that we
observed and they are being used as a training set. So, there is no errors associate with the input.
∂∂Θ(l)ijJ(Θ)=a(l)jδ(l+1)i∂∂Θij(l)J(Θ)=aj(l)δi(l+1)
We use this value to update weights and we can multiply learning rate before we adjust the weight.
Code
importnumpy as np
def sigmoid(x):
return 1.0/(1.0 + np.exp(-x))
defsigmoid_prime(x):
return sigmoid(x)*(1.0-sigmoid(x))
deftanh(x):
returnnp.tanh(x)
deftanh_prime(x):
return 1.0 - x**2
classNeuralNetwork:
def __init__(self, layers, activation='tanh'):
if activation == 'sigmoid':
self.activation = sigmoid
self.activation_prime = sigmoid_prime
elif activation == 'tanh':
self.activation = tanh
self.activation_prime = tanh_prime
# Set weights
self.weights = []
# layers = [2,2,1]
# range of weight values (-1,1)
# input and hidden layers - random((2+1, 2+1)) : 3 x 3
fori in range(1, len(layers) - 1):
r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) -1
self.weights.append(r)
# output layer - random((2+1, 1)) : 3 x 1
r = 2*np.random.random( (layers[i] + 1, layers[i+1])) - 1
self.weights.append(r)
for k in range(epochs):
i = np.random.randint(X.shape[0])
a = [X[i]]
for l in range(len(self.weights)):
dot_value = np.dot(a[l], self.weights[l])
activation = self.activation(dot_value)
a.append(activation)
# output layer
error = y[i] - a[-1]
deltas = [error * self.activation_prime(a[-1])]
# reverse
# [level3(output)->level2(hidden)] => [level2(hidden)-
>level3(output)]
deltas.reverse()
# backpropagation
# 1. Multiply its output delta and input activation
# to get the gradient of the weight.
# 2. Subtract a ratio (percentage) of the gradient from the
weight.
fori in range(len(self.weights)):
layer = np.atleast_2d(a[i])
delta = np.atleast_2d(deltas[i])
self.weights[i] += learning_rate * layer.T.dot(delta)
if __name__ == '__main__':
nn = NeuralNetwork([2,2,1])
X = np.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
y = np.array([0, 1, 1, 0])
nn.fit(X, y)
for e in X:
print(e,nn.predict(e))
Output:
epochs: 0
epochs: 10000
epochs: 20000
epochs: 30000
epochs: 40000
epochs: 50000
epochs: 60000
epochs: 70000
epochs: 80000
epochs: 90000
(array([0, 0]), array([ 9.14891326e-05]))
(array([0, 1]), array([ 0.99557796]))
(array([1, 0]), array([ 0.99707463]))
(array([1, 1]), array([ 0.00090973]))
k-Means Clustering
The k-Means Clustering finds centers of clusters and groups input samples around the clusters.
k-Means Clustering is a partitioning method which partitions data into k mutually exclusive clusters, and
returns the index of the cluster to which it has assigned each observation. Unlike hierarchical clustering,
k-means clustering operates on actual observations (rather than the larger set of dissimilarity measures),
and creates a single level of clusters. The distinctions mean that k-means clustering is often more
suitable than hierarchical clustering for large amounts of data.
The following description for the steps is from wiki - K-means_clustering.
Step 1
k initial "means" (in this case k=3) are randomly generated within the data domain.
Step 2
k clusters are created by associating every observation with the nearest mean. The partitions here
represent the Voronoi diagram generated by the means.
Step 3
The centroid of each of the k clusters becomes the new mean.
Step 4
Steps 2 and 3 are repeated until convergence has been reached.
We generated two groups of random numbers in two separated ranges, each group with 25 numbers as
shown in the picture below:
The code:
importnumpy as np
import cv2
frommatplotlib import pyplot as plt
Now, we're going to apply cv2.kmeans() function to the data with the following parameters:
# Apply KMeans
compactness,labels,centers = cv2.kmeans(z,2,None,criteria,10,flags)
In the code, the criteria means whenever 10 iterations of algorithm is ran, or an accuracy of epsilon = 1.0
is reached, stop the algorithm and return the answer.
Here are the meanings of the return values from cv2.kmeans() function:
compactness : It is the sum of squared distance from each point to their corresponding centers.
labels : This is the label array where each element marked '0','1',.....
centers : This is array of centers of clusters.
importnumpy as np
import cv2
frommatplotlib import pyplot as plt
# Apply KMeans
compactness,labels,centers = cv2.kmeans(z,2,None,criteria,10,flags)
A = z[labels==0]
B = z[labels==1]
# initial plot
plt.subplot(211)
plt.hist(z,256,[0,256])
plt.show()
We can see the data has been clustered: A's centroid is 72 and B's centroid is 217 which was one of the
outputs from the cv2.kmeans() function.
iPython
Note the values in the array is for y-values, for x-axis, the sample numbers are used: we have
100+30+100=230 samples.
Hamming filter
In [8]: hamming(40)
Out[8]:
array([ 0.08 , 0.08595688, 0.10367324, 0.13269023, 0.17225633,
0.2213468 , 0.27869022, 0.34280142, 0.41201997, 0.48455313,
0.55852233, 0.63201182, 0.70311825, 0.77 , 0.83092487,
0.88431494, 0.92878744, 0.96319054, 0.98663324, 0.99850836,
0.99850836, 0.98663324, 0.96319054, 0.92878744, 0.88431494,
0.83092487, 0.77 , 0.70311825, 0.63201182, 0.55852233,
0.48455313, 0.41201997, 0.34280142, 0.27869022, 0.2213468 ,
0.17225633, 0.13269023, 0.10367324, 0.08595688, 0.08
])
In [9]: plot(hamming(40))
Out[9]: []
We overwrote a new to the old one because we did not close the plot window before we made a
new draw.
So, let's close this window, and plot the Hamming again:
In [10]: plot(hamming(40))
Out[10]: []
However, if we want to apply the filter to the other signal, we need to normalize the filter.
Note that the peak value is much lower than the previous Hamming filter.
Convolve
Now we do convolve the filter with the boxcar data:
In [14]: convolve(data, norm)
Out[14]:
array([ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.0037843 , 0.00785037, 0.0127545 , 0.01903124, 0.0271796 ,
0.03765012, 0.05083319, 0.06704896, 0.08653903, 0.10946018,
0.13588035, 0.16577684, 0.19903693, 0.23546077, 0.27476658,
0.31659794, 0.36053301, 0.40609548, 0.45276687, 0.5 ,
0.54723313, 0.59390452, 0.63946699, 0.68340206, 0.72523342,
0.76453923, 0.80096307, 0.83422316, 0.86411965, 0.89053982,
0.90967668, 0.92510066, 0.93641231, 0.94331865, 0.94564081,
0.94331865, 0.93641231, 0.92510066, 0.90967668, 0.89053982,
0.86411965, 0.83422316, 0.80096307, 0.76453923, 0.72523342,
0.68340206, 0.63946699, 0.59390452, 0.54723313, 0.5 ,
0.45276687, 0.40609548, 0.36053301, 0.31659794, 0.27476658,
0.23546077, 0.19903693, 0.16577684, 0.13588035, 0.10946018,
0.08653903, 0.06704896, 0.05083319, 0.03765012, 0.0271796 ,
0.01903124, 0.0127545 , 0.00785037, 0.0037843 , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ])
In [15]:
In this section, we want to track blue umbrella from the Pixar movie clip. How?
After converting BGR image to HSV, we extract a blue colored object. The reason for convderting to HSV
colorspace is because it is more easier to represent a color than RGB colorspace. Here are the steps to
take:
cap = cv2.VideoCapture('BlueUmbrella.webm')
while(1):
cv2.imshow('frame',frame)
cv2.imshow('mask',mask)
cv2.imshow('res',res)
cv2.destroyAllWindows()
1. Original image
2. Threshold the HSV image to get only blue colors - mask
3. Bitwise-AND mask and original image - res
cspace_blue.mp4
cspace_blue.webm
cspace_blue.ogv