You are on page 1of 17

Technology, guides, tutorials... whatever.

• Blog Home
• Web
• Projects
• Disclaimer
• | Contact
This Blog
Linked From Here
Links
Blog List
This
Blog

Linked From Here

Links

Blog List

Search

powered
by

Principal Component Analysis in C#


Published Friday, October 30, 2009 by César Souza in C#, Computer Science, Mathematics, Science, Software

Para a versão em português, Análise de Componente Principal em C#, clique neste link.

Principal Component Analysis (PCA) is an exploratory tool designed by Karl Pearson in 1901 to identify unknown trends
in a multidimensional data set. It involves a mathematical procedure that transforms a number of possibly correlated
variables into a smaller number of uncorrelated variables called principal components. The first principal component
accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much
of the remaining variability as possible. [Wikipedia]

• Download source code


• Download sample application

Analysis Overview
PCA essentially rotates the set of points around their mean in order to align with the first few principal components.
This moves as much of the variance as possible (using a linear transformation) into the first few dimensions. The
values in the remaining dimensions, therefore, tend to be highly correlated and may be dropped with minimal loss of
information. Please note that the signs of the columns of the rotation matrix are arbitrary, and so may differ between
different programs for PCA.

For a more complete explanation for PCA, please visit Lindsay Smith excellent Tutorial On Principal Component Analysis
(2002).
AForge.NET Framework
My initial idea was to implement PCA into the AForge.NET Framework. AForge.NET is an excellent open-source Artificial
Intelligence / Computer Vision framework written for .NET and developed mainly by Andrew Kirillov.

However, because it involved a lot of new additions to AForge, I think it was very difficult to review all the changes
required to incorporate it directly into the source code of the project.

Because of that, this new, cleaner code, was implemented using only the binaries of the AForge Framework as a
starting point. I hope this code will help others needing to perform Principal Component Analysis into their own C#
projects.

This new library, which I called Accord.NET, extends the AForge Framework by adding new features such as the
Principal Component Analysis, numerical decompositions, and a few other mathematical transformations and tools. This
extension is much like a test field for new features I'd like to see implemented in future versions of AForge.NET.

Design Decisions
As people who want to use PCA in their projects usually already have their own Matrix classes definitions, I decided to
avoid using custom Matrix and Vector classes in order to make the code more flexible. I also tried to avoid
dependencies on other methods whenever possible, to make the code very independent. I think this also made the
code simpler to understand.

The code is divided into two projects:

• Accord.Math, which provides mathematical tools, decompositions and transformations, and


• Accord.Statistics, which provides the statistical analysis, statistical tools and visualizations.

Both of them depends on the AForge.NET core. Also, their internal structure and organization tries to mimic AForge's
wherever possible.

The given source code doesn't include the full source of the Accord Framework, which remains as a test bed for new
features I'd like to see in AForge.NET. Rather, it includes only limited portions of the code to support PCA. It also
contains code for Kernel Principal Component Analysis, as both share the same framework. Please be sure to look for
the correct project when testing.

Code Overview
Below is the main code behind PCA.

/// <summary>Computes the Principal Component Analysis


algorithm.</summary>
public void Compute()
{
int rows = sourceMatrix.GetLength(0);
int cols = sourceMatrix.GetLength(1);

// Create a new matrix to work upon


double[,] matrix = new double[rows, cols];

// Prepare the data, storing it in the new matrix.


if (this.analysisMethod == AnalysisMethod.Correlation)
{
for (int i = 0; i < rows; i++)
for (int j = 0; j < cols; j++)
// subtract mean and divide by standard deviation (convert to Z
Scores)
matrix[i, j] = (sourceMatrix[i, j] - columnMeans[j]) /
columnStdDev[j];
}
else
{
for (int i = 0; i < rows; i++)
for (int j = 0; j < cols; j++)
// Just center the data around the mean. Will have no effect if
the
// data is already centered (the mean will be zero).
matrix[i, j] = (sourceMatrix[i, j] - columnMeans[j]);
}

// Perform the Singular Value Decomposition (SVD) of the matrix


SingularValueDecomposition singularDecomposition = new
SingularValueDecomposition(matrix);
singularValues = singularDecomposition.Diagonal;

// Eigen values are the square of the singular values


for (int i = 0; i < singularValues.Length; i++)
{
eigenValues[i] = singularValues[i] * singularValues[i];
}

// The principal components of 'Source' are the eigenvectors of


Cov(Source). Thus if we
// calculate the SVD of 'matrix' (which is Source standardized), the
columns of matrix V
// (right side of SVD) will be the principal components of Source.

// The right singular vectors contains the principal components of the


data matrix
this.eigenVectors = singularDecomposition.RightSingularVectors;
// The left singular vectors contains the scores of the principal
components
this.resultMatrix = singularDecomposition.LeftSingularVectors;

// Calculate proportions
double sum = 0;
for (int i = 0; i < eigenValues.Length; i++)
sum += eigenValues[i];
sum = (1.0 / sum);

for (int i = 0; i < eigenValues.Length; i++)


componentProportions[i] = eigenValues[i] * sum;

// Calculate cumulative proportions


this.componentCumulative[0] = this.componentProportions[0];
for (int i = 1; i < this.componentCumulative.Length; i++)
{
this.componentCumulative[i] = this.componentCumulative[i - 1] +
this.componentProportions[i];
}

// Creates the object-oriented structure to hold the principal


components
PrincipalComponent[] components = new
PrincipalComponent[singularValues.Length];
for (int i = 0; i < components.Length; i++)
{
components[i] = new PrincipalComponent(this, i);
}
this.componentCollection = new
PrincipalComponentCollection(components);
}

Using the Code


To perform a simple analysis, you can simple instantiate a new PrincipalComponentAnalysis object passing your
data and call its Compute method to compute the model. Then you can simply call the Transform method to project the
data into the principal component space.

A sample sample code demonstrating its usage is presented below.

// Creates the Principal Component Analysis of the given source


PrincipalComponentAnalysis pca = new
PrincipalComponentAnalysis(sourceMatrix,
PrincipalComponentAnalysis.AnalysisMethod.Correlation);
// Compute the Principal Component Analysis
pca.Compute();

// Creates a projection considering 80% of the information


double[,] components = pca.Transform(sourceMatrix, 0.8f, true);

Test Application
To demonstrate the use of PCA, I created a simple Windows Forms Application which performs simple statistical
analysis and PCA transformations.

The application can open Excel workbooks. Here we are loading some random Gaussian data, some random Poisson
data, and a linear multiplication of the first variable (thus also being Gaussian).
Simple descriptive analysis of the source data, with a histogram plot of the first variable. We can see it fits a Gaussian
distribution with 1.0 mean and 1.0 standard deviation.
Here we perform PCA by using the Correlation method. Actually, the transformation uses SVD on the standardized data
rather than on the correlation matrix, the effect being the same. As the third variable is a linear multiplication of the
first, the analysis detected it as irrelevant, thus having a zero importance level.
Now we can make a projection of the source data using only the first two components.

Note: The principal components are not unique because the Singular Value Decomposition is not unique. Also the signs
of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA.

Together with the demo application comes an Excel spreadsheet containing several data examples. The first example is
the same used by Lindsay on his Tutorial on Principal Component Analysis. The others include Gaussian data,
uncorrelated data and linear combinations of Gaussian data to further exemplify the analysis.

Anyway, I hope someone finds this code useful! If you have any comments about the code or the article, please let me
know it.

Update (19/01/2010): I've also finished writing the code and article for Kernel Principal Component Analysis (KPCA). I've
included both codes into one single source package.

Newer Post Older Post Home


Subscribe to: Post Comments (Atom)

25 comments:

1. Antok November 11, 2009 6:15 AM


just what i needed !
but unfortunately I can't download your source code from www.comp.ufscar.br.
DNS kind of problem... Any mirror ?

2. César Souza November 14, 2009 3:53 PM

Sorry, we just had an almost national-wide power outage here in Brazil. The link is working now.

3. Anonymous November 24, 2009 12:01 PM

What licence is used for this code?

4. César Souza November 25, 2009 8:48 AM

I've updated the sources to include license information.

5. Anonymous March 3, 2010 8:13 AM

your project is really very good but i have problems,when i open the program, it has difficulties in identifing
the "statistics" and "samples" characterizing them as unavailable. Thus some properties in solution cannot be
read.

Thank you in advance

6. Anonymous May 5, 2010 9:56 AM

why the eigenvalue is so big?

7. César Souza May 5, 2010 10:46 AM

I suspect this is because the PCA is being computed using the SVD.

In SVD, singular values are equal to the square root of the eigenvalues. So the singular values are being
squared (thus becoming large) to give the eigenvalues. Those eigenvalues, however, may not be the actual
eigenvalues that would be obtained using a Eigendecomposition because the SVD implementation used
automatically normalizes its singular values.

As the eigenvalues are used only to compute the amount of variance explained by each component, the
important thing to note is that their ratio is preserved.

Cesar

8. lobna June 18, 2010 9:39 AM

thanks alot for ur efforts, it's really great


i have a question, if i wanna get the principal components for an image
i mean my target is to classify faces and non faces, so i need to get the pca to the face images
how can i pass my data which is image in my case to the pca object?
thanks in advance

9. César Souza June 18, 2010 11:18 PM

Hi,

You can always transform your image into a single vector and then pass it to PCA as you would pass any
other input vector.

For example, if you have a 320x240 image, you can create a vector of 320*240=76800 positions and then
copy the image pixel by pixel to this vector, in any order you wish, as long as you are consistent using the
same ordering with all your images.

In the page http://www.face-rec.org/algorithms/ you can find more information about face recognition
algorithms, including ones using PCA.

Best regards,
César

10. lobna June 20, 2010 2:09 PM

Really thanks
great work, and it works with me well

just another small question


are there any restrictions about type or size of the images entered to pca.compute()?

Thanks alot and sorry for too many questions.

11. César Souza June 20, 2010 2:20 PM

Well, I have not tested the memory limits of this implementation, but in theory it should work as long as you
have enough memory to accommodate those matrices during computation.

About the type of the image, it all depends on how you are going to create your input vector. But typically
only gray-scale images are used.

Best regards,
César

12. Mohammad September 20, 2010 3:30 AM

Dear César,
Thank you very much for making this valuable code available to the public.
I have a problem using your code:
My data is too large: about 500,000 rows and 118 columns. When I give this matrix to the
DescriptiveAnalysis(sourceMatrix, sourceColumns) method, the exception "System.OutOfMemoryException" is
thrown.
Is there any way that instead giving a matrix, I can pass a comma-seperated file containing the info to the
methods?

13. César Souza September 23, 2010 10:40 AM

Hi Mohammed,
Have you tried running the method on a 64 bit system? Perhaps it should work. Besides, the error happens in
the DescriptiveAnalysis class, perhaps it may work if you comment this portion of the code and use only the
PrincipalComponentAnalysis classes.

Best regards,
César

14. Ligemm September 24, 2010 7:04 AM

Hello! This code is really helpful but my problem is that I dont know how to use this. My project is about face
recognition and base on my researches, PCA is paired with neural network on most of the face recognition
systems. How will I connect your code with my neural network? What is really the output of PCA that will be
the input of the neural network? Are eigenvectors values? I have read that the output of PCA are
eigenvectors? I really cant understand PCA. I wish you could help me. Thank you in advance!

15. César Souza September 24, 2010 9:55 AM

Hi Ligemm,

PCA can be seen as a linear transformation. Being a transformation, what it does is project your data into
another space. The PCA output you are looking for is the projection of your original data into this space,
which in the case of PCA, will be a space where your variables are (hopefully) uncorrelated.

The eigenvectors found by the analysis will form the basis for this new space, and the eigenvalues can be
used to measure the importance of each of the vectors. If you discard the less important eigenvectors before
performing the projection, then you can also perform dimensionality reduction in the process.

By the way, I have used PCA as a preprocessing step for ANNs too. If you wish, please take a look on the
images on this poster, they might help to understand how PCA can be used in this scenario.

Regards,
César

16. Mohammad December 14, 2010 11:56 PM

Dear César,

I reduced the number of samples to 50, but still the number of features (118) seems to be too much for your
code. The maximum number of features the code can handle seems to be 46 for my data, otherwise it takes
forever to finish pca.Compute().
Do you have any suggestions?

Thanks.

17. César Souza December 15, 2010 12:06 AM

Hi Mohammad,

Well, can I have a look in your data? If Compute is taking forever (and is not throwing any exceptions) then
this may be a bug. If you could provide an excerpt of your data (perhaps the 50 samples with 118 columns
you mentioned) it would be great!

Best regards,
César

18. Anonymous January 13, 2011 5:52 PM


Hello, I think there is s small bug in PCA implementation. When you use matrix where column number is
higher then row number, then it does not work correctly (transform method returns all zeros).
I think solution is in method PCA.Compute where turning all params to "true" value helps.

SingularValueDecomposition svd = new SingularValueDecomposition(matrix, true, true, true);

19. César Souza January 13, 2011 7:10 PM

Hi Anonymous,

Thanks, you are correct about that. The latest version of PCA in the development branch of Accord.NET does
indeed uses those parameters when creating the SingularValueDecomposition, but I forgot to update the code
available here.

Thanks again!
César

20. Monika February 1, 2011 10:17 AM

Yes, Cesar and Anonymous, I have noticed this for my data, as well (I have a matrix where the column nr is
higher than the row nr and I get zeros from the transform method).

I am using the latest Accord.Net framework (version 2.1.4) and get this problem... How do I update this
myself?

BTW - THANK YOU VERY MUCH for providing us with this great framework, Cesar!

Greetings from Monika

21. César Souza February 1, 2011 10:42 AM

Hi Monika,

I am going to release an updated version of the framework soon. If you wish, you can change the
aforementioned line in the PrincipalComponentAnalysis.cs class to:

SingularValueDecomposition svd = new SingularValueDecomposition(matrix, true, true, true);

If you have installed the framework using the executable installer, the source code will be available in the
installation folder. However, if you can wait a little, I will try to release a new version of the framework this
week.

Best regards,
César

22. Monika February 1, 2011 12:33 PM

Ok, I'll wait.

23. MSA2000 February 9, 2011 12:46 PM

Dear César,

As I encountered the same problem with column count > row count, I wanted to ask, if the updated version
of the framework is already online and maybe within the version 2.1.4?
Thanks again for your efforts and regards,
MSA

24. César Souza February 12, 2011 9:34 PM

Hi MSA2000, Monika,

A new version of the Accord.NET Framework (v2.1.5) has just been released. The bug has been fixed in this
release. There are many additions in this release as well, which has delayed the release a little. Those
additions, corrections and enhancements are detailed in the release notes.

Best regards,
César

25. Anonymous April 14, 2011 2:44 PM

Dear César,
thanks you very much This code is really helpful
i use PCA for my examination but how put jacobian iterasion for this code to get eigen vektor And eigen value
thanks...

Post a Comment

Links to this post

Create a Link

Every day we offer licensed software you'd have to buy otherwise, for free! Today's giveaway is XUS PC Lock
Ultimate 2.0.

XUS PC Lock is a powerful computer lock application. It offers a new and fun way for you to lock your computer.

Before using XUS PC Lock, you have to define your lock pattern. The next time you see the lock screen, you trace your
mouse along the pattern you drew before and your computer will unlock.
The program is available for 14.95$, but is free for the visitors of crsouza.com today.

<a href="http://www.giveawayoftheday.com">Giveaway of the Day</a>

Labels

• Accord.NET (6)
• Advices (4)
• Articles (17)
• Audio (3)
• Bookmarks (1)
• Books (4)
• C# (38)
• C++ (3)
• Compras (8)
• Computer Science (43)
• Copyright (1)
• Curiosities (4)
• Debian (10)
• dot-Net (21)
• Electronics (7)
• English (69)
• Eventos (1)
• Fun (9)
• Google (12)
• Google Insights (3)
• Guidelines (1)
• Guides (12)
• Hardware (14)
• Java (2)
• JavaScript (1)
• Law (1)
• Licenses (1)
• Linux (17)
• Mathematics (25)
• Microsoft (9)
• Misc (1)
• Neural Networks (1)
• News (6)
• Notebook (9)
• Open Source (9)
• PIC (2)
• Português (45)
• Programming (33)
• Projects (6)
• Quotes (7)
• Random Thoughts (4)
• Science (12)
• Security (1)
• Site (1)
• Software (22)
• Statistics (14)
• São Carlos (3)
• Technology (12)
• Tips (34)
• UFSCar (4)
• Valinhos (2)
• Videos (1)
• Web (11)
• Windows (13)

Blog Archive

• ► 2010 (36)
o ► November (1)
 Machine Learning Books and Lectures
o ► October (2)
 Gaussian Mixture Models and Expectation-Maximizati...
 K-Means Clustering
o ► September (1)
 Handwritten Digits Recognition with C# SVMs in Acc...
o ► August (2)
 Matrix manipulation using Accord.NET
 Automatic Image Stitching with Accord.NET
o ► July (1)
 New additions to Accord.NET: Computer Vision names...
o ► June (2)
 Generalized Eigenvalue Decomposition in C#
 RANdom Sample Consensus (RANSAC) in C#
o ► May (4)
 Accord.NET Framework - An extension to AForge.NET
 Science!
 Harris Corners Detector in C#
 Recognition of Handwritten Digits using Non-linear...
o ► April (4)
 Kernel Support Vector Machines for Classification ...
 Java over-engineered?
 Partial Least Squares Analysis and Regression in C...
 Cleaning Cached Blog Entries in Google Reader - Or...
o ► March (7)
 Hidden Markov Model -Based Sequence Classifiers in...
 Hidden Markov Models in C#
 Onde encontrar cabo de força para fonte de aliment...
 Kernel Functions for Machine Learning Applications...
 Windows Vista falha ao retornar do modo sleep ou d...
 Impressão de Painéis em São Carlos para Apresentaç...
 List of Windows Messages
o ► February (5)
 C# is now a better language than Java...
 Onde Comprar Acetato de Sódio
 Google Chrome Bug?
 Logistic Regression in C#
 The Kernel Trick
o ► January (7)
 Kernel Discriminant Analysis in C#
 On Google Account Security, Trustfulness and Depen...
 Linear Discriminant Analysis in C#
 A word on Neural Network learning
 Kernel Principal Component Analysis in C#
 Drive óptico (DVD) não reconhecido no Windows Vist...
 Discriminatory Power Analysis by Receiver-Operatin...
• ▼ 2009 (66)
o ► December (4)
 Discriminatory Power Analysis by Receiver-Operatin...
 Adaptador Wireless USB D-Link DWL-G132 no Windows ...
 The Power of Sudo
 Desbloqueando PDFs Seguros
o ► November (6)
 Como enviar cartas pela internet
 Neural Network Learning by the Levenberg-Marquardt...
 Neural Network Learning by the Levenberg-Marquardt...
 How to become a computer expert!
 Universal Approximation Theorem
 Análise de Componente Principal em C#
o ▼ October (8)
 Principal Component Analysis in C#
 IX Congresso Brasileiro de Redes Neurais e Intelig...
 Como Remover o Adobe Drive do Context Menu no Wind...
 Controle Remoto HP dv5 no Windows 7
 Instalando o Debian no Notebook HP DV5 1251
 Dicionário de Correção Ortográfica em Português pa...
 Formatando Codigo Fonte em HTML
 Pendrive Para Não Perder
o ► September (8)
o ► August (7)
o ► July (9)
o ► June (1)
o ► May (3)
o ► April (2)
o ► March (1)
o ► February (12)
o ► January (5)
• ► 2008 (49)
o ► December (2)
o ► November (2)
o ► October (7)
o ► September (12)
o ► August (12)
o ► July (8)
o ► May (1)
o ► April (4)
o ► March (1)

Blog List


Planet Debian
Save the environment and come to LinuxTag 2011 in Berlin
1 hour ago

Planet KDE
Season of KDE: Need Help Anyone?
2 hours ago

iorgut
Diário de estagiário #78
11 hours ago

Paulo Ricardo Stradioti
Próximo Passo: Mergulhe de cabeça na Imagine Cup! #MSPTeam2011
1 week ago

#Hobbyando
Módulo Bluetooth ( Serial Bluetooth RF Transceiver ) e aplicações
2 months ago
Alexandre Chohfi
CommunityZone2010
7 months ago

Erro semântico
Você já assistiu...? Now and Then, Here and There
9 months ago
INFOhelp.org
Garmin Mobile PC – Transforme seu PC em um GPS
10 months ago

Diego Desani
Hospedagem de Imagens
2 years ago

Palavras ao vento
Rodízio de pizza
2 years ago

Operated
Opera pwns Acid2 test! (Opera detona teste Acid2)
5 years ago

Links

• Accord.NET Framework
• Universidade Federal de São Carlos
• Departamento de Computação
• The GNU/Debian Project
• Magico Kaio (irmão)

Subscribe

Posts
Atom

Posts
Comments
Atom

Comments

Powered by

Copyright © 2010 Cesar Roberto de Souza - Some rights reserved. Otherwise specified, all works presented
here are licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Black Minimalism theme by Valter Nepomuceno | Blogger Templates by Blogcrowds | Further modifications by Cesar Souza

~/cesarsouza/blog
Powered by Idea.informer.com
<a href='http://crsouza.useridea.com'>~/cesarsouza/blog feedback</a> <a href='http://useridea.com'> Powered by
<img src='http://widget.useridea.com/i/reformal_ru.png'/></a>

You might also like