Details
Description
The Jalview PCA function doesn't give any of the standard diagnostics needed to understand the result of PCA analysis.
At the very least, a scree plot and some summary of the top N most informative components (i.e. that have a scaled magnitude of 1 or more) should be included in the report, and the GUI could be limited to just those components.
e.g. see the following output when using R to analyse the SeqSpace matrix generated for the example alignment:
<pre>
> m <- read.table(file="/Users/jimp/Documents/Jalview/JessicaRichard/pcaDemo_fer.Rd", sep=' ',header=FALSE)
> print(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
1 157 109 92 93 82 80 82 81 129 77 132 74 54 77 82
2 109 157 128 128 95 97 99 91 81 98 84 94 74 83 80
3 92 128 157 151 101 98 107 96 85 103 87 98 79 89 85
4 93 128 151 157 103 99 105 95 84 101 87 97 78 87 83
5 82 95 101 103 157 124 100 90 82 103 83 104 81 95 90
6 80 97 98 99 124 157 103 92 77 102 80 101 80 91 85
7 82 99 107 105 100 103 157 117 85 106 87 106 85 88 89
8 81 91 96 95 90 92 117 157 81 96 82 89 69 84 85
9 129 81 85 84 82 77 85 81 157 100 152 91 71 83 92
10 77 98 103 101 103 102 106 96 100 157 98 138 115 91 95
11 132 84 87 87 83 80 87 82 152 98 157 90 69 84 91
12 74 94 98 97 104 101 106 89 91 138 90 157 127 89 92
13 54 74 79 78 81 80 85 69 71 115 69 127 157 68 72
14 77 83 89 87 95 91 88 84 83 91 84 89 68 157 103
15 82 80 85 83 90 85 89 85 92 95 91 92 72 103 157
> p_m <- prcomp(m, scale=TRUE,center=TRUE)
> print(p_m)
Standard deviations:
[1] 2.165472e+00 1.845833e+00 1.460628e+00 1.177353e+00 1.095638e+00 9.504232e-01 7.090474e-01 5.251139e-01
[9] 5.018949e-01 4.164572e-01 2.237328e-01 1.401649e-01 6.698598e-02 4.788474e-02 6.858450e-17
Rotation:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
V1 -0.372384318 -0.24286620 0.16580470 -0.034052285 0.19344361 -0.06912293 0.0041315227 -0.000162841
V2 0.126543560 -0.43889058 0.20207618 -0.168608165 -0.13578841 -0.03775281 0.0591315847 -0.127933215
V3 0.204288055 -0.41663933 0.16869970 -0.139637494 -0.20004490 -0.21542714 -0.0132673062 0.031720345
V4 0.200714196 -0.42421366 0.17010633 -0.157141334 -0.16753375 -0.18791855 -0.0002053613 0.015860798
V5 0.264124227 -0.06548596 -0.21191321 -0.275837278 0.54238743 -0.08119972 0.0826726746 -0.080453186
V6 0.283613080 -0.07850084 -0.19765136 -0.175801192 0.56167737 0.03934542 0.0797995375 0.016227136
V7 0.266119000 -0.12599761 -0.02501376 0.533819156 0.11619838 -0.23903980 -0.0532599860 0.716070157
V8 0.143541675 -0.16520454 -0.14515545 0.690476414 0.04754732 -0.04519917 -0.0488477269 -0.589364014
V9 -0.397033486 0.01777164 0.20368663 0.033716792 0.23763306 -0.31037852 -0.0968846796 0.020267547
V10 0.248466929 0.27771797 0.29875454 -0.001012087 0.06434316 -0.46759959 -0.1087073033 -0.306804609
V11 -0.398697485 -0.01552172 0.19705986 0.031075187 0.25120859 -0.29603741 -0.1065640194 0.044929663
V12 0.278398155 0.32519593 0.28541681 -0.050328673 0.02968275 -0.25776051 -0.0640469523 -0.037770263
V13 0.237504350 0.36764292 0.32032938 -0.062704968 -0.12756067 0.16095752 0.0155301470 0.130075797
V14 0.003727431 0.04314816 -0.50635276 -0.212435338 -0.21149426 -0.27819819 -0.7150301929 0.022247444
V15 -0.097038641 0.14410858 -0.39820511 -0.061722711 -0.25500837 -0.52190581 0.6555254956 0.011537079
PC9 PC10 PC11 PC12 PC13 PC14 PC15
V1 0.28010345 0.202453844 -0.26093163 -0.675877588 -0.0508877837 0.04627735 -0.2870665017
V2 0.61064322 0.279872164 0.14263759 0.443541597 0.0061725346 0.03052172 -0.1145128967
V3 -0.32081446 -0.191514482 -0.06477903 -0.073431036 -0.6925215730 0.13480757 0.0465129970
V4 -0.33581684 -0.169097091 -0.11135775 -0.099820026 0.6854337583 -0.13315197 -0.1058972377
V5 -0.35077470 0.594607831 0.07876288 0.039363875 -0.0264048649 0.03583492 -0.0673478716
V6 0.29557242 -0.634160432 -0.09789227 0.021761973 -0.0184100181 0.01431965 -0.1401618039
V7 0.11555660 0.131110669 0.09770957 -0.029908473 0.0165013157 -0.01497227 -0.0002118365
V8 -0.08018785 0.001804577 -0.16250092 0.066249666 -0.0165624021 0.06870677 -0.2408565161
V9 -0.14526121 -0.088555425 0.04813986 0.333927033 -0.1438394170 -0.63535946 -0.2639749761
V10 0.14876682 -0.066560627 0.53804856 -0.326222539 0.0345595039 -0.01344639 0.1388935014
V11 -0.11832676 -0.103633527 -0.01059660 0.300862397 0.1515187945 0.70171114 0.0784512129
V12 0.13737722 0.135985067 -0.73715065 0.109320875 -0.0057064995 -0.08029849 0.2463923728
V13 -0.10321630 0.005120743 0.02312357 0.056846222 -0.0388277762 0.21633756 -0.7614407674
V14 0.10803144 0.026552757 -0.06159214 0.001607315 0.0001935581 0.04703112 -0.2105499870
V15 0.06589523 -0.016300377 -0.05988970 0.012877498 0.0013171378 0.05355025 -0.1701400503
> summary(p_m)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11
Standard deviation 2.1655 1.8458 1.4606 1.17735 1.09564 0.95042 0.70905 0.52511 0.50189 0.41646 0.22373
Proportion of Variance 0.3126 0.2271 0.1422 0.09241 0.08003 0.06022 0.03352 0.01838 0.01679 0.01156 0.00334
Cumulative Proportion 0.3126 0.5398 0.6820 0.77440 0.85443 0.91465 0.94816 0.96655 0.98334 0.99490 0.99824
PC12 PC13 PC14 PC15
Standard deviation 0.14016 0.06699 0.04788 6.858e-17
Proportion of Variance 0.00131 0.00030 0.00015 0.000e+00
Cumulative Proportion 0.99955 0.99985 1.00000 1.000e+00
</pre>
Inspired by thread from Jessica Richard over at: http://www.jalview.org/pipermail/jalview-discuss/2013-January/000908.html
At the very least, a scree plot and some summary of the top N most informative components (i.e. that have a scaled magnitude of 1 or more) should be included in the report, and the GUI could be limited to just those components.
e.g. see the following output when using R to analyse the SeqSpace matrix generated for the example alignment:
<pre>
> m <- read.table(file="/Users/jimp/Documents/Jalview/JessicaRichard/pcaDemo_fer.Rd", sep=' ',header=FALSE)
> print(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
1 157 109 92 93 82 80 82 81 129 77 132 74 54 77 82
2 109 157 128 128 95 97 99 91 81 98 84 94 74 83 80
3 92 128 157 151 101 98 107 96 85 103 87 98 79 89 85
4 93 128 151 157 103 99 105 95 84 101 87 97 78 87 83
5 82 95 101 103 157 124 100 90 82 103 83 104 81 95 90
6 80 97 98 99 124 157 103 92 77 102 80 101 80 91 85
7 82 99 107 105 100 103 157 117 85 106 87 106 85 88 89
8 81 91 96 95 90 92 117 157 81 96 82 89 69 84 85
9 129 81 85 84 82 77 85 81 157 100 152 91 71 83 92
10 77 98 103 101 103 102 106 96 100 157 98 138 115 91 95
11 132 84 87 87 83 80 87 82 152 98 157 90 69 84 91
12 74 94 98 97 104 101 106 89 91 138 90 157 127 89 92
13 54 74 79 78 81 80 85 69 71 115 69 127 157 68 72
14 77 83 89 87 95 91 88 84 83 91 84 89 68 157 103
15 82 80 85 83 90 85 89 85 92 95 91 92 72 103 157
> p_m <- prcomp(m, scale=TRUE,center=TRUE)
> print(p_m)
Standard deviations:
[1] 2.165472e+00 1.845833e+00 1.460628e+00 1.177353e+00 1.095638e+00 9.504232e-01 7.090474e-01 5.251139e-01
[9] 5.018949e-01 4.164572e-01 2.237328e-01 1.401649e-01 6.698598e-02 4.788474e-02 6.858450e-17
Rotation:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
V1 -0.372384318 -0.24286620 0.16580470 -0.034052285 0.19344361 -0.06912293 0.0041315227 -0.000162841
V2 0.126543560 -0.43889058 0.20207618 -0.168608165 -0.13578841 -0.03775281 0.0591315847 -0.127933215
V3 0.204288055 -0.41663933 0.16869970 -0.139637494 -0.20004490 -0.21542714 -0.0132673062 0.031720345
V4 0.200714196 -0.42421366 0.17010633 -0.157141334 -0.16753375 -0.18791855 -0.0002053613 0.015860798
V5 0.264124227 -0.06548596 -0.21191321 -0.275837278 0.54238743 -0.08119972 0.0826726746 -0.080453186
V6 0.283613080 -0.07850084 -0.19765136 -0.175801192 0.56167737 0.03934542 0.0797995375 0.016227136
V7 0.266119000 -0.12599761 -0.02501376 0.533819156 0.11619838 -0.23903980 -0.0532599860 0.716070157
V8 0.143541675 -0.16520454 -0.14515545 0.690476414 0.04754732 -0.04519917 -0.0488477269 -0.589364014
V9 -0.397033486 0.01777164 0.20368663 0.033716792 0.23763306 -0.31037852 -0.0968846796 0.020267547
V10 0.248466929 0.27771797 0.29875454 -0.001012087 0.06434316 -0.46759959 -0.1087073033 -0.306804609
V11 -0.398697485 -0.01552172 0.19705986 0.031075187 0.25120859 -0.29603741 -0.1065640194 0.044929663
V12 0.278398155 0.32519593 0.28541681 -0.050328673 0.02968275 -0.25776051 -0.0640469523 -0.037770263
V13 0.237504350 0.36764292 0.32032938 -0.062704968 -0.12756067 0.16095752 0.0155301470 0.130075797
V14 0.003727431 0.04314816 -0.50635276 -0.212435338 -0.21149426 -0.27819819 -0.7150301929 0.022247444
V15 -0.097038641 0.14410858 -0.39820511 -0.061722711 -0.25500837 -0.52190581 0.6555254956 0.011537079
PC9 PC10 PC11 PC12 PC13 PC14 PC15
V1 0.28010345 0.202453844 -0.26093163 -0.675877588 -0.0508877837 0.04627735 -0.2870665017
V2 0.61064322 0.279872164 0.14263759 0.443541597 0.0061725346 0.03052172 -0.1145128967
V3 -0.32081446 -0.191514482 -0.06477903 -0.073431036 -0.6925215730 0.13480757 0.0465129970
V4 -0.33581684 -0.169097091 -0.11135775 -0.099820026 0.6854337583 -0.13315197 -0.1058972377
V5 -0.35077470 0.594607831 0.07876288 0.039363875 -0.0264048649 0.03583492 -0.0673478716
V6 0.29557242 -0.634160432 -0.09789227 0.021761973 -0.0184100181 0.01431965 -0.1401618039
V7 0.11555660 0.131110669 0.09770957 -0.029908473 0.0165013157 -0.01497227 -0.0002118365
V8 -0.08018785 0.001804577 -0.16250092 0.066249666 -0.0165624021 0.06870677 -0.2408565161
V9 -0.14526121 -0.088555425 0.04813986 0.333927033 -0.1438394170 -0.63535946 -0.2639749761
V10 0.14876682 -0.066560627 0.53804856 -0.326222539 0.0345595039 -0.01344639 0.1388935014
V11 -0.11832676 -0.103633527 -0.01059660 0.300862397 0.1515187945 0.70171114 0.0784512129
V12 0.13737722 0.135985067 -0.73715065 0.109320875 -0.0057064995 -0.08029849 0.2463923728
V13 -0.10321630 0.005120743 0.02312357 0.056846222 -0.0388277762 0.21633756 -0.7614407674
V14 0.10803144 0.026552757 -0.06159214 0.001607315 0.0001935581 0.04703112 -0.2105499870
V15 0.06589523 -0.016300377 -0.05988970 0.012877498 0.0013171378 0.05355025 -0.1701400503
> summary(p_m)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11
Standard deviation 2.1655 1.8458 1.4606 1.17735 1.09564 0.95042 0.70905 0.52511 0.50189 0.41646 0.22373
Proportion of Variance 0.3126 0.2271 0.1422 0.09241 0.08003 0.06022 0.03352 0.01838 0.01679 0.01156 0.00334
Cumulative Proportion 0.3126 0.5398 0.6820 0.77440 0.85443 0.91465 0.94816 0.96655 0.98334 0.99490 0.99824
PC12 PC13 PC14 PC15
Standard deviation 0.14016 0.06699 0.04788 6.858e-17
Proportion of Variance 0.00131 0.00030 0.00015 0.000e+00
Cumulative Proportion 0.99955 0.99985 1.00000 1.000e+00
</pre>
Inspired by thread from Jessica Richard over at: http://www.jalview.org/pipermail/jalview-discuss/2013-January/000908.html
Attachments
Issue Links
- related with
-
JAL-4016 fraction of variance explained in PCA
- Open