简单总结R语言PCA相关函数 这里是数据集
yearX1X2X3
19511-2.7-4.31952-5.3-5.9-3.51953-2-3.4-0.81954-5.7-4.7-1.11955-0.9-3.8-3.11956-5.7-5.3-5.91957-2.1-5-1.619580.6-4.3-0.21959-1.7-5.721960-3.6-3.61.319613-3.1-0.819620.1-3.9-1.11963-2.6-3-5.21964-1.4-4.9-1.71965-3.9-5.7-2.51966-4.7-4.8-3.31967-6-5.6-4.91968-1.7-6.4-5.11969-3.4-5.6-2.91970-3.1-4.2-21971-3.8-4.9-3.91972-2-4.1-2.41973-1.7-4.2-21974-3.6-3.3-21975-2.7-3.70.11976-2.4-7.6-2.2
princomp
这个函数是R中的标准PCA函数,可用cor,也可用cov协方差阵来做PCA
> pca <- princomp(temprature)
> summary(pca,loadings = T)
Importance
of components:
Comp
.1 Comp
.2 Comp
.3
Standard deviation
2.3927483 1.6766875 1.0093123
Proportion
of Variance
0.5991735 0.2942137 0.1066129
Cumulative Proportion
0.5991735 0.8933871 1.0000000
Loadings:
Comp
.1 Comp
.2 Comp
.3
X1
0.800 -
0.532 0.278
X2
0.238 -
0.145 -
0.960
X3
0.551 0.834
相关数据
> pca
$scores
Comp.
1 Comp.
2 Comp.
3
1951 2.1396991 -
3.83416758 -
0.86081047
1952 -
3.2170709 0.65036989 0.46732300
1953 1.5047115 0.78326549 -
0.98770616
1954 -
1.9282363 2.69070902 -
0.77199999
1955 1.0208400 -
1.66238671 -
0.32088258
1956 -
4.7179457 -
1.22586764 -
0.24479186
1957 0.6034101 0.40169317 0.51297983
1958 3.7008125 0.03107027 0.60645276
1959 2.7423145 3.29340498 1.33340102
1960 1.3359150 3.41533524 -
1.21947452
1961 5.5741402 -
1.92082051 0.11577404
1962 2.8996923 -
0.51171559 0.07392112
1963 -
1.3065888 -
2.62572541 -
1.58383704
1964 1.1317614 -
0.06871931 0.61074000
1965 -
1.4985797 0.71048716 0.67510633
1966 -
2.3656444 0.33808233 -
0.42011674
1967 -
4.4776216 -
0.18852359 -
0.02993928
1968 -
1.3395824 -
2.52711224 1.93314844
1969 -
1.2956017 0.09625872 0.71413634
1970 -
0.2267435 0.48388739 -
0.53777663
1971 -
2.0006306 -
0.62674492 -
0.07971504
1972 0.4560127 -
0.44959984 -
0.33175339
1973 0.8927389 -
0.26104978 -
0.14812577
1974 -
0.4127266 0.61914798 -
1.54132776
1975 1.3700344 1.95003688 -
0.88520432
1976 -
0.5851104 0.43868459 2.92047869
> pca
$loadings
Loadings:
Comp.
1 Comp.
2 Comp.
3
X1
0.800 -
0.532 0.278
X2
0.238 -
0.145 -
0.960
X3
0.551 0.834
> screeplot(pca,
type =
"lines")
biplot(pca)
椭圆图还不会画,等以后填坑
注意
若要用相关系数矩阵,指定参数 cor=true若要求完整载荷矩阵(权重矩阵)可以用cor,或者cov先求协差阵或相关阵,然后用eigen求特征值和特征向量(不过一般不需要用完整特征向量数据)
principal
应用平行法则挑选主成分
> fa.parallel(temprature,n.iter =
100,fa=
"pc",main=
"screen plot with parallel analysis")
Parallel analysis suggests
that the number of factors = NA
and the number of components =
1
他会画出图,应用平行法则挑选主成分变量
主成分principal
参数 - data: 相关矩阵或者数据框 - rotate: 指定旋转方法 - scores: 是否计算得分 - nfactor: 主成分个数
> pca <- principal(temprature,rotate =
"none",nfactors =
2,scores = T)
> pca
Principal Components Analysis
Call: principal(r = temprature, nfactors =
2, rotate =
"none", scores = T)
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 h2 u2 com
X1
0.82 -
0.11 0.69 0.313 1.0
X2
0.74 -
0.51 0.81 0.193 1.8
X3
0.63 0.75 0.95 0.045 1.9
PC1 PC2
SS loadings
1.62 0.83
Proportion Var
0.54 0.28
Cumulative Var
0.54 0.82
Proportion Explained
0.66 0.34
Cumulative Proportion
0.66 1.00
Mean
item complexity =
1.6
Test
of the hypothesis
that 2 components are sufficient.
The root mean square
of the residuals (RMSR)
is 0.17
with the empirical chi square
4.31 with prob < NA
Fit based upon off diagonal values =
0.73
得分和权重
> pca
$weights
PC1 PC2
X1
0.5067548 -
0.1384156
X2
0.4575790 -
0.6118978
X3
0.3885133 0.9012161
> pca
$scores
PC1 PC2
1951 1.15613102 -
2.14274863
1952 -
1.40397385 0.29619057
1953 0.87645580 0.02225637
1954 -
0.55355307 0.79852707
1955 0.52133957 -
0.89617467
1956 -
1.73868595 -
1.09440483
1957 0.07029356 0.49584374
1958 1.25079371 0.60192968
1959 0.60373762 2.49012260
1960 0.84116086 1.19371677
1961 2.16014463 -
0.45293824
1962 1.11189864 0.01144684
1963 0.02078349 -
2.17040460
1964 0.25282918 0.35309338
1965 -
0.80115014 0.56219311
1966 -
0.79512763 -
0.22411286
1967 -
1.72761805 -
0.45915894
1968 -
1.07610798 -
0.40767704
1969 -
0.72471773 0.29431309
1970 0.06988130 -
0.04170145
1971 -
0.74303145 -
0.50504203
1972 0.28630899 -
0.34781995
1973 0.39653695 -
0.13092458
1974 0.30440309 -
0.47945781
1975 0.77438081 0.63703597
1976 -
1.13311339 1.59589645