Dimension Reduction using Principal Components Analysis (PCA)

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Dimension Reduction using Principal Components Analysis (PCA)"

Transcription

1 Dimension Reduction using Princial Comonents Analysis (PCA)

2 Alication of dimension reduction Comutational advantage for other algorithms Face recognition image data (ixels) along new axes works better for recognizing faces Image comression

3 Data for 25 undergraduate rograms at business schools in US universities in Use PCA to: 1) Reduce # columns Additional benefits: 2) Identify relation between columns 3) Visualize universities in 2D Univ SAT To10 Accet SFRatio Exenses GradRate Brown , CalTech , CMU , Columbia , Cornell , Dartmouth , Duke , Georgetown , Harvard , JohnsHokins , MIT , Northwestern , NotreDame , PennState , Princeton , Purdue , Stanford , TexasA&M , UCBerkeley , UChicago , UMichigan , UPenn , UVA , UWisconsin , Yale , Source: US News & World Reort, Set

4 PCA Inut Outut Univ SAT To10 Accet SFRatio Exenses GradRate PC 1 PC 2 PC 3 PC 4 PC 5 PC 6 Brown , CalTech , CMU , Columbia , Cornell , Dartmouth , Duke , Georgetown , Harvard , JohnsHokins , Hoe is that a fewer columns may cature most of the information from the original dataset

5 The Primitive Idea Intuition First How to comress the data loosing the least amount of information?

6 Inut PCA Outut measurements/ original columns rincial comonents (= weighted averages of original measurements) Correlated Uncorrelated Ordered by variance Kee to rincial comonents; dro rest

7 Mechanism Univ SAT To10 Accet SFRatio Exenses GradRate PC 1 PC 2 PC 3 PC 4 PC 5 PC 6 Brown , CalTech , CMU , Columbia , Cornell , Dartmouth , Duke , Georgetown , Harvard , JohnsHokins , The i th rincial comonent is a weighted average of original measurements/columns: PC = a X + a X + + i i1 1 Weights (a ij ) are chosen such that: 1. PCs are ordered by their variance (PC 1 has largest variance, followed by PC 2, PC 3, and so on) 2. Pairs of PCs have correlation = 0 3. For each PC, sum of squared weights =1 i2 2 a i X

8 PC = a X + a X + + i i1 1 i2 2 a i X Demystifying weight comutation Main idea: high variance = lots of information Var(PC i ) = a 2 i1 Var(X 1 ) + Var(X ) ai1ai 2Cov(X 1,X 2 ) + + 2ai -1aiCov(X -1,X a 2 i 2 2 a 2 i ) Var(X ) + Also want, Covar ( PC i, PC j ) Goal: Find weights a ij that maximize variance of PC i, while keeing PC i uncorrelated to other PCs. The covariance matrix of the X s is needed. = 0, when i j

9 Standardize the inuts Why? variables with large variances will have bigger influence on result Solution Standardize before alying PCA Univ Z_SAT Z_To10 Z_Accet Z_SFRatio Z_Exenses Z_GradRate Brown CalTech CMU Columbia Cornell Dartmouth Duke Georgetown Harvard JohnsHokins MIT Northwestern NotreDame PennState Princeton Purdue Stanford TexasA&M UCBerkeley UChicago UMichigan UPenn UVA UWisconsin Yale Excel: =standardize(cell, average(column), stdev(column))

10 Standardization shortcut for PCA Rather than standardize the data manually, you can use correlation matrix instead of covariance matrix as inut PCA with and without standardization gives different results!

11 PCA Transform > Princial Comonents (correlation matrix has been used here) Princial Comonents PCs are uncorrelated Var(PC1) > Var (PC2) >... PC = a X + a X + + i i1 1 i2 2 a i X Scaled Data PC Scores

12 Comuting rincial scores For each record, we can comute their score on each PC. Multily each weight (a ij ) by the aroriate X ij Examle for Brown University (using standardized numbers): PC Score1 for Brown University = ( 0.458)(0.40) +( 0.427)(.64) +(0.424)( 0.87) +(0.391)(.07) + ( 0.363)( 0.32) + ( 0.379)(.80) = 0.989

13 R Code for PCA (Assignment) OPTIONAL R Code install.ackages("gdata") ## for reading xls files install.ackages("xlsx") ## for reading xlsx files mydata<-read.xlsx("university Ranking.xlsx",1) ## use read.csv for csv files mydata ## make sure the data is loaded correctly hel(rincom) ## to understand the ai for rincom caobj<-rincom(mydata[1:25,2:7], cor = TRUE, scores = TRUE, covmat = NULL) ## the first column in mydata has university names ## rincom(mydata, cor = TRUE) not_same_as rcom(mydata, scale=true); similar, but different summary(caobj) loadings(caobj) lot(caobj) bilot(caobj) caobj$loadings caobj$scores

14 Goal #1: Reduce data dimension PCs are ordered by their variance (=information) Choose to few PCs and dro the rest! Examle: PC 1 catures most??% of the information. The first 2 PCs cature??% Data reduction: use only two variables instead of 6.

15 Matrix Transose OPTIONAL: R code hel(matrix) A<-matrix(c(1,2),nrow=1,ncol=2,byrow=TRUE) A t(a) B<-matrix(c(1,2,3,4),nrow=2,ncol=2,byrow=TRUE) B t(b) C<matrix(c(1,2,3,4,5,6),nrow=3,ncol=2,byrow=TRUE) C t(c)

16 Matrix Multilication OPTIONAL R Code A<matrix(c(1,2,3,4,5,6),nrow=3,ncol=2,byrow= TRUE) A B<matrix(c(1,2,3,4,5,6,7,8),nrow=2,ncol=4,byro w=true) B C<-A%*%B D<-t(B)%*%t(A) ## note, B%*%A is not ossible; how does D look like?

17 Matrix Inverse If, A B = I,identity matrix Then, B= A -1 Identity matrix: OPTIONAL R Code ## How to create nˣn Identity matrix? hel(diag) A<-diag(5) ## find inverse of a matrix solve(a)

18 Data Comression PCScores N = [ ScaledData] N PrincialComonents [ ScaledData] N = [ PCScores] N [ PrincialComonents] = PCScores N 1 [ PrincialComonents] T c = Number of comonents ket; c Aroximation: [ AroximatedScaledData ] = [ PCScores] N c [ PrincialComonents] N T c

19 Goal #2: Learn relationshis with PCA by interreting the weights a i1,, a i are the coefficients for PC i. They describe the role of original X variables in comuting PC i. Useful in roviding context-secific interretation of each PC.

20 PC 1 Scores (choose one or more) 1. are aroximately a simle average of the 6 variables 2. measure the degree of high Accet & SFRatio, but low Exenses, GradRate, SAT, and To10

21 Goal #3: Use PCA for visualization The first 2 (or 3) PCs rovide a way to roject the data from a -dimensional sace onto a 2D (or 3D) sace

22 Scatter Plot: PC 2 vs. PC 1 scores

23 Monitoring batch rocesses using PCA Multivariate data at different time oints Historical database of successful batches are used Multivariate trajectory data is rojected to low-dimensional sace >>> Simle monitoring charts to sot outlier

24 Your Turn! 1. If we use a subset of the rincial comonents, is this useful for rediction? for exlanation? 2. What are advantages and weaknesses of PCA comared to choosing a subset of the variables? 3. PCA vs. Clustering