Another Example of using Principal Component Analysis

In [1]:
 
import numpy as np
In [2]:
 
X = np.array([[10,20,10],
              [2,5,2],
              [8,17,7],
              [9,20,10],
              [12,22,11]])
In [3]:
 
print X
[[10 20 10]
 [ 2  5  2]
 [ 8 17  7]
 [ 9 20 10]
 [12 22 11]]
In [16]:
 
X = np.mat(X)
meanVals = mean(X, axis=0)
A = X - meanVals              # A is the zero-mean (centered) version of X
C = np.cov(A, rowvar=0)       # C is the covarianvce matrix of X
 
print C
[[ 14.2   25.3   13.5 ]
 [ 25.3   46.7   24.75]
 [ 13.5   24.75  13.5 ]]
In [14]:
 
# Note that C = (1/(N-1)) A.T*A
print dot(A.T,A)/(shape(X)[0]-1)
[[ 14.2   25.3   13.5 ]
 [ 25.3   46.7   24.75]
 [ 13.5   24.75  13.5 ]]

Now we can obtain eigenvalues and eigenvectors of the covariance matrix:

In [41]:
 
np.set_printoptions(precision=2,suppress=True)
e, ev = eig(C)
print "Eigenvalues:"
print e
print
print "Eigenvectors:"
print ev
Eigenvalues:
[ 73.72   0.38   0.3 ]

Eigenvectors:
[[ 0.43  0.9  -0.04]
 [ 0.79 -0.41 -0.45]
 [ 0.42 -0.16  0.89]]

We can transform the full data into the new feature space based on the eigenvectors:

In [39]:
 
newFeatures = ev.T
XTrans = dot(newFeatures, A.T)
print XTrans.T
[[  4.17   0.     0.26]
 [-14.61   0.17   0.25]
 [ -0.35  -0.1   -0.97]
 [  3.74  -0.9    0.3 ]
 [  7.05   0.83   0.16]]

However, typically, we want a lower-dimensional space. We can sort the eigenvectors in the decreasing order of their eigenvalues and take the top k. In the example below, we'll take only the top first principal component (since it has the largest eigenvalue, no sorting necessary):

In [42]:
 
reducedFeatures = ev[:,0].T
redcuedXTrans = dot(reducedFeatures, A.T)
print redcuedXTrans.T
[[  4.17]
 [-14.61]
 [ -0.35]
 [  3.74]
 [  7.05]]

We can also use Scikit-learn decomposition module to do the same thing:

In [25]:
 
from sklearn import decomposition
In [44]:
 
pca = decomposition.PCA()
XTrans2 = pca.fit_transform(X)
In [45]:
 
print XTrans2
[[ -4.17   0.     0.26]
 [ 14.61   0.17   0.25]
 [  0.35  -0.1   -0.97]
 [ -3.74  -0.9    0.3 ]
 [ -7.05   0.83   0.16]]

The remaining part of this notebook, is another example of using PCA for dimensionality reduction:

In [32]:
 
M = array([[2.5, 2.4],
           [0.5, 0.7],
           [2.2, 2.9],
           [1.9, 2.2],
           [3.1, 3.0],
           [2.3, 2.7],
           [2, 1.6],
           [1, 1.1],
           [1.5, 1.6],
           [1.1, 0.9]])
 
In [37]:
 
meanM = mean(M, axis=0)
MC = M - meanM             # MC is the zero-mean (centered) version of X
CovM = cov(MC, rowvar=0)    # CovM is the covarianvce matrix of M
print "Zero Mean Matrix:\n", MC,"\n"
print "Covariance Matrix:\n", CovM,"\n"
Zero Mean Matrix:
[[ 0.69  0.49]
 [-1.31 -1.21]
 [ 0.39  0.99]
 [ 0.09  0.29]
 [ 1.29  1.09]
 [ 0.49  0.79]
 [ 0.19 -0.31]
 [-0.81 -0.81]
 [-0.31 -0.31]
 [-0.71 -1.01]] 

Covariance Matrix:
[[ 0.61655556  0.61544444]
 [ 0.61544444  0.71655556]] 

In [38]:
 
eigVals, eigVecs = eig(CovM)
print "Eigenvalues:\n", eigVals,"\n"
print "Eigenvectors:\n", eigVecs,"\n"
Eigenvalues:
[ 0.0490834   1.28402771] 

Eigenvectors:
[[-0.73517866 -0.6778734 ]
 [ 0.6778734  -0.73517866]] 

In [39]:
 
newFeatures = eigVecs[:,1].T
In [47]:
 
MTrans = dot(newFeatures, MC.T)
print mat(MTrans).T
[[-0.82797019]
 [ 1.77758033]
 [-0.99219749]
 [-0.27421042]
 [-1.67580142]
 [-0.9129491 ]
 [ 0.09910944]
 [ 1.14457216]
 [ 0.43804614]
 [ 1.22382056]]
In [53]:
 
pca2 = decomposition.PCA(n_components=1)
MTrans2 = pca2.fit_transform(M)
print MTrans2
[[-0.82797019]
 [ 1.77758033]
 [-0.99219749]
 [-0.27421042]
 [-1.67580142]
 [-0.9129491 ]
 [ 0.09910944]
 [ 1.14457216]
 [ 0.43804614]
 [ 1.22382056]]
In [ ]: