You are on page 1of 14

Pal and Mather Support Vector Machines

Support Vector Machines for Classification in Remote Sensing

Mahesh Pal and P. M. Mather


School of Geography
University of Nottingham
UK

Abstract. Support Vector Machines (SVM) have been introduced recently

in machine learning for pattern recognition. In this paper, multi-class SVMs

are used for land use classification using ETM+ multispectral and DAIS

hyperspectral data. Results show that the SVM achieves a higher level of

classification accuracy than either the maximum likelihood or the

backpropagation neural network classifier. The SVM classifier can produce

higher accuracies with smaller training datasets and high-dimensional data.

1. Introduction

Neural classifiers are now widely used in remote sensing (Benediktsson et al., 1990;

Heermann and Khazenie, 1992). Although neural networks may generally be used to

classify data at least as accurately as statistical classification approaches, there are a

range of factors that limit their use (Wilkinson, 1997). Another classification technique

based on statistical learning theory (Vapnik, 1995) called Support Vector Machines

(SVM) has recently been applied to the problem of remote sensing data classification

(Huang et al., 2002; Zhu and Blumberg, 2002). This technique is based on statistical

learning theory and the optimal separation of classes (Vapnik 1995; Vapnik and

Chervonenkis 1971). In the case of two-class pattern recognition problem if the classes

are linearly separable, this technique selects, from among the infinite number of linear
Pal and Mather Support Vector Machines

classifiers that separate the data, the one that minimise the generalisation error. Thus,

the selected hyperplane will be one that leaves the maximum margin between the two

classes, where margin is defined as the sum of the distances of the hyperplane from the

closest point of the two classes (Vapnik, 1995). This problem of maximising the

margin can be solved using standard Quadratic Programming (QP) optimisation

techniques. The data points that are closest to the hyperplane are used to measure the

margin, hence these data points are termed support vectors. The number of support

vectors is thus small as they are points close to the class boundaries (Vapnik, 1995).

If the two classes are linearly non-separable, the SVMs try to find the hyperplane that

maximises the margin and that, at the same time, minimises a quantity proportional to

the number of misclassification errors. The trade off between margin and

misclassification error is controlled by a positive constant that has to be chosen

beforehand (Cortes and Vapnik, 1995). The technique of designing SVMs can be

extended for non-linear decision surfaces also. Boser et al. (1992) suggested projecting

input data into a high dimensional feature space through some nonlinear mapping and

formulating a linear classification problem in that feature space. Further, kernel

functions were used to reduce the computational cost in feature space (Vapnik

1995). Detailed discussion on SVM can be found in Vapnik (1995), Cristianini and

Shawe-Taylor (2000).

The SVM was initially designed for binary (two-class) problems. When dealing with

multiple classes, an appropriate multi-class method is needed. Initially one against the

rest approach (Vapnik, 1995) was the most implemented SVM multi class classification
Pal and Mather Support Vector Machines

technique. This multi class technique compares a given class with all the others put

together, thus generating n classifiers, where n is the number of classes. The final output

of this SVMs is the class that corresponds to the SVM with the largest margin. For

multi-class data set problems one has to determine n hyperplanes. Thus, this method

requires the solution of n QP optimisation problems, each of which separates one class

from the others.

Another way is to combine several classifiers in a pair-wise approach: the one

against one approach (Knerr et al., 1990) applies pairwise comparisons between

classes. In this method, all possible two-class classifiers are evaluated from the

training set of n classes, each classifier being trained on only two out of n classes.

There would be a total of n(n-1)/2 classifiers. Applying each classifier to the

vectors of the test data gives one vote to the winning class. The data is assigned

the label of the class with most votes.

2. Data and methodology used


The two study areas used in the work reported here are located near the town of

Littleport in UK and La Mancha region in Spain, respectively. For the Littleport area,

ETM+ data acquired on 19th June 2000, were used. The classification problem involves

the identification of seven land cover types (wheat, potato, sugar beet, onion, peas,

lettuce and beans). For the La Mancha area, hyperspectral data acquired on 29 th June

2000 by the DAIS 7915 airborne imaging spectrometer were used. Eight different land

cover types (wheat, water body, dry salt lake, hydrophytic vegetation, vineyards, bare

soil, pasture lands and built up area) were used.


Pal and Mather Support Vector Machines

In this study, both the "one against one" and the "one against the rest" approaches to

generating multi-class SVMs were used. Two SVM-based software packages using

different approach to solve quadratic optimisation problem were used. The first of these,

RHUL_SVM, uses both the one against one and the one against the rest approaches

while the other, LIBSVM, is based on a one against one approach. Results obtained

using these SVMs were compared with results derived from more traditional Maximum

Likelihood (ML) and Neural Network (NN, using SNNS software) classifiers. For this

study, a standard back-propagation neural classifier with one hidden layer having twenty

six nodes was used. All other factors affecting the neural network classifier were set as

recommended by Kavzoglu (2001).

Ground reference pixels for both test areas were selected using a random sampling

procedure. Selected pixels were divided so as to remove any possible bias caused by

using the same pixels for testing and training the classifiers. A total of 2700 training and

2037 test pixels with ETM+ data and 800 (100 pixels/class) training and 3800 test

pixels with DAIS data were used. To control over training of NN a validation data set of

60 pixels/class was also used.

3. Results
The concept of the kernel was introduced earlier to extend the SVM to deal with non-

linear decision surfaces. There is little guidance in the literature to the best choice of

kernel and kernel specific parameters. A number of trials were carried out using five

different kernels with different kernel specific parameters by training and testing the

classifier. A radial basis kernel function with parameters = 2 and C = 5000 gave the
Pal and Mather Support Vector Machines

highest overall classification accuracy. Both one against one and one against the rest

strategies were used. Table 1 lists the training times taken using a Sun workstation and

the classification accuracies achieved. Results obtained using ML and NN classifiers are

given in table 2.

[Insert table 1 about here]

[Insert table 2 about here]

The results show that the time taken using the one against the rest method is much

higher, and the classification accuracies are lower, than the one against one technique.

This result suggests that the one against one method should be employed for

generating multi-class SVM. One reason for this finding could be unbalanced data size

of the two training classes while using one against the rest multi-class method. The

level of classification accuracy achieved by the one against one SVM is higher than

that produced by either the ML or the NN classifier.

In the second part of the experiment, DAIS hyperspectral data were used to study the

behaviour of the SVM, ML and NN classifiers with a fixed-size training set (800 pixels)

and an increasing number of features (spectral bands). A total of 65 features was used,

as seven bands with severe striping were rejected. The procedure began with the use of

five bands. An additional five bands was added at each cycle, thus generating thirteen

sets of results.

[Insert figure 1about here]

Figure 1 suggests that the performance of an SVM is acceptable even with a small

training data set size, in comparison with other classifiers. Results also suggest that

classification accuracy using an SVM, ML and NN declines slightly when the number
Pal and Mather Support Vector Machines

of features exceeds 50 or so. SVMs are based on the principle of optimal separation of

classes. Thus, the error on test data set will not depend on the dimensionality of the

input space (Vapnik, 1995). Slight performance degradation after 55 bands may be

attributed to the quality of training data obtained as we add new features every time.

4. Conclusions
Comparison of the results obtained by the SVM and those produced by other

classifiers suggests that the SVM classifier generally achieves higher classification

accuracies. Like neural classifiers, the effective use of SVM depends on the values of

a few user-defined parameters. Huang et al. (2002) discusses some of factors affecting

the performance of SVM in detail. This study concludes that the approach used by

Huang et al. (2002) is not optimal for multi-class classification for two reasons: (1) for

their study they replicated the sample size of smaller class, thus increasing number of

training patterns, and (2) used a one against the rest strategy for generating the SVM.

The main problem with one against the rest is that it may lead to the existence of

unclassified data, thus producing lower classification accuracies. Further, higher

training time for one against one approach by RHUL_SVM may be due the

technique used to solve quadratic programming optimisation problem.

The study reported here supports the use of a one against one multi-class

approach for multi-class image classification problems. Finally, SVM seems to

perform well with high dimensional data even with a small number of training data.
Pal and Mather Support Vector Machines

Acknowledgement
The RHUL_SVM software was made available by AT&T, Royal Holloway College,

University of London. The DAIS data were kindly made available by Prof. J. Gumuzzio

of the Autonomous University of Madrid. Computing facilities were provided by the

School of Geography, University of Nottingham. Mahesh Pals research was supported

by a Commonwealth Scholarship. The authors are grateful for the critical comments of

two anonymous referees, whose advice has led to an improvement in the presentation of

many of the findings contained in this paper.

References

BENEDIKTSSON, J. A., SWAIN, P. H. and ERASE, O. K., 1990, Neural network

approaches versus statistical methods in classification of multisource remote

sensing data. IEEE Transactions on Geoscience and Remote Sensing, 28, 540-551.

BOSER, B., GUYON, I. and VAPNIK, V. N., 1992, A training algorithm for optimal

margin classifiers. Proceedings of 5th Annual Workshop on Computer Learning

Theory, Pittsburgh, PA: ACM, 144-152.

CHANG, C., AND LIN, C., 2001, LIBSVM: A Library for Support Vector Machines.

Department of Computer Science and Information Engineering, National Taiwan

University, Taiwan.http://www.csie.ntu.edu.tw/~cjlin/libsvm.

CORTES, C. and VAPNIK, V. N., 1995, Support vector networks. Machine Learning,

20, 273 - 297.

CRISTIANINI, N. and SHAWE-TAYLOR, J., 2000, An Introduction to Support Vector

Machines. London, Cambridge University Press.


Pal and Mather Support Vector Machines

HEERMAN, P. D. and KHAZENIE, N., 1992, Classification of multispectral remote

sensing data using a back propagation neural network. IEEE Transactions on

Geoscience and Remote Sensing, 30, 81-88.

HUANG, C., DAVIS, L. S. and TOWNSHEND, J. R. G., 2002, An assessment of

support vector machines for land cover classification. International Journal of

Remote Sensing, 23, 725-749.

KAVZOGLU, T., 2001, An Investigation of the Design and Use of Feed-forward

Artificial Neural Networks in the Classification of Remotely Sensed Images.

PhD thesis. School of Geography, The University of Nottingham,

Nottingham, UK.

KNERR, S., PERSONNAZ, L. and DREYFUS, G., 1990, Single-layer learning

revisited: A stepwise procdure for building and training neural network.

Neurocomputing: Algorithms, Architectures and Applications (ed.) , NATO ASI,

Springer.

SAUNDERS, C., STITSON, M. O., WESTON, J., BOTTOU, L., SCHLKOPF, B. and

SMOLA, A., 1998, Support Vector Machine - Reference Manual. Technical

Report, CSD-TR-98-03, Royal Holloway and AT&T, University of London.

VAPNIK, V. N., 1995, The Nature of Statistical Learning Theory. New York: Springer-

Verlag.

WILKINSON, G. G., 1997, Open questions in neurocomputing for Earth observation.

Neuro-Computational in Remote Sensing Data Analysis. New York: Springer-

Verlag, 3-13.
Pal and Mather Support Vector Machines

ZHU, G. and BLUMBERG, D. G., 2002, Classification using ASTER data and SVM

algorithms; The case study of Beer Sheva, Israel. Remote Sensing of Environment,

80, 233-240.
Pal and Mather Support Vector Machines

100

90

Maximum likelihood
80 Neural network
Accuracy (%)

Support vector machines

70

60

50

40
5 10 15 20 25 30 35 40 45 50 55 60 65
Maximum likelihood 64.8 75.7 80.6 80.4 84.3 85.9 88.8 89.1 89.3 88.7 88.9 87.4 85.8
Neural network 48.6 66.7 77.2 79.6 83.3 86.6 85.6 88.7 88.5 89.4 89.4 89.6 88.4
Support vector machines 66.7 74.7 83.5 84.8 87.2 90.5 91.5 92.1 92.3 93.4 94 93.4 93.6
Number of bands

Figure 1.
Pal and Mather Support Vector Machines

Table 1

Multi-class method Number of Accuracy (%) Training time


training and Kappa (CPU minutes)
pixels value
One against rest (RHUL_SVM) 2700 79.73 (0.77) 505.27
One against one (RHUL_SVM) 2700 87.37 (0.86) 21.54
One against one (LIBSVM) 2700 87.9 (0.87) 0.30
Pal and Mather Support Vector Machines

Table 2
Classifier Accuracy (%) and Kappa value Training time (CPU minutes)
ML 82.9 (0.80) 0.20
NN 85.1(0.83) 58
Pal and Mather Support Vector Machines

Captions

Figure 1. Variation in classification accuracy with increasing number of features and

fixed training data set.


Pal and Mather Support Vector Machines

Table Captions

Table 1. Classification accuracy and training time using SVMs and different multi-class
methods with Littleport ETM+ data.

Table 2. Classification accuracies with ML and NN classifiers with Littleport ETM+


data.

You might also like