Principle Component Analysis
Introduction
PCAs are cool sounding and extremely useful tools that belong to unsupervised learning models. Some fear them as they are not straightforward to grasp but this is just because they were displayed the algebra that does the job. We will do it very practical and show you why and how it works, and what they do. So overall we will,
- see the motivation behind this method,
- show how it works in R,
- interpret over different examples.
Unsupervised Methods
So far, we showed you methods that belongs to supervised learning, that is for each data, we tell the machine which data must be classified how and then checking the model really shows that it learned. For some, this is not how humans really learn. We see some patterns, regularities and then this commonality is a signal that they belong to a class. Of course humans are way more than this but at least this is one of our capabilities.
Let’s clarify our goal with an example and play with Iris dataset:
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
This dataset is used so often on supervised learning, and in fact it has labels already. There are 150 iris flowers with their sepal/petal length and widths, and yes, all are labeled as either Setosa, Versicolor or Virginica.
Now, let’s try to visualize this dataset, since I cannot visualize in 4D (there are 4 predictrs), let’s pick two dimensions and plot:
It looks there is a group that is slightly distinct than the others but is it really? Also, it is noticeable that some flowers have the exact same properties, so that they overlap perfectly. Are they perfectly the same? We need more information. Let’s try other predictors:
There is still some overlap, but we cannot be sure if the overlapping ones are the same flowers. Again, we can see that one group is very different than the others.
Now the question is, when we changed from one plot to another, what just happened? Let’s plot it in 3D and try to answer this question:
library(plotly)
plot_ly(iris, x=~Sepal.Length, z=~Sepal.Width,y=~Petal.Width, opacity=0.3) %>% add_markers()
Each plot is just a standpoint, if we turn the cube we shift from one angle to another. But when shifting is a continuous process. There is a shade. And when we slowly change the dimension from one to another, we can see that the points that are overlapping are actually different from one plot to another. This is one motivation behind PCA. Let’s show the other one.
The 3D plot tells us very well about the groups and the granularity, but since we have 4 columns, when we plot it is 4D. But maybe we can disregard one variable:
In the above plot, we can see that Petal Length and Petal Width seem to be saying the same thing. They do not differ much. Overall, there is some variation between the dimensions, each tell something but I cannot choose one over another easily.
PCA
We gave insight into the motivation behind PCA but didn’t state what it does. PCA is a method that finds an optimal angle that shows enough variety between the points and combine the columns so that we can plot the combined ones.
In the above 3D plot, we showed that there is a shade between two angles. When we rotate the cube right, we shift from one dimension to another but as we shift, we see the combined axes:
library(gridExtra)
gs <- lapply(seq(0,1,length.out = 9), function(a)
ggplot(iris, aes(x=Sepal.Length, y=a*Sepal.Width + (1-a) * Petal.Width)) +
geom_point(alpha=.5)+
ggtitle(paste0("a=",round(a,2))) +
theme(axis.title = element_blank()))
grid.arrange(grobs=gs, ncol=3)
What PCA does is finding the optimal combination of axes that maximize the variation (e.g. granularity):
## Standard deviations (1, .., p=4):
## [1] 1.7083611 0.9560494 0.3830886 0.1439265
##
## Rotation (n x k) = (4 x 4):
## PC1 PC2 PC3 PC4
## Sepal.Length 0.5210659 -0.37741762 0.7195664 0.2612863
## Sepal.Width -0.2693474 -0.92329566 -0.2443818 -0.1235096
## Petal.Length 0.5804131 -0.02449161 -0.1421264 -0.8014492
## Petal.Width 0.5648565 -0.06694199 -0.6342727 0.5235971
The above are the formula to combine the axes, in other words the weights. To obtain the first axis, we multiply the loads on PC1 column with the each flower’s predictors. So, the first axis will be:
\[Axis1 = 0.521 S.Len -0.269 S.Wid + 0.580 P.Len + 0.565 P.Wid\]
\[Axis2 = -0.377 S.Len -0.924 S.Wid -0.024 P.Len -0.067 P.Wid\]
We don’t need to calculate these, R has already computed it:
dat <- data.frame(pca$x)
colnames(dat) <- paste0("Ax",1:4)
ggplot(dat, aes(Ax1, Ax2)) +
geom_point(alpha=.5)
The distinction between two groups is very clear and the dat is way more granular.
Dimension Reduction
This is not the only reason why we use PCA. What we just did is we reduced the dimensionality into 2. But does it contain all the information? The answer is given by the screeplot.
The resulting object, pca
reports sdev
as well. This is how much variation is explained by each component:
## [1] 1.7083611 0.9560494 0.3830886 0.1439265
You can see, the first two component is actually explaining a lot. More clearly, we can understand how much variety is explained by the components as below:
var_explained <- pca$sdev^2 / sum(pca$sdev^2)
ggplot() +
geom_point(aes(y=var_explained, x=1:length(var_explained))) +
geom_line(aes(y=var_explained, x=1:length(var_explained))) +
labs(x="PC", title = "Scree Plot for Explained Variance")
> It looks the first two components does the vast majority of our job. We can just use that two and forget about the rest.
Biplot
Biplot was introduced three decades ago and combines all the information we need. Let’s plot it:
The above plot shows the information about how data points are distributed, and what does the columns say about them. We know the black numbers there, they are the previous plot. The arrows on the other hand are the coefficients in the pca
results:
## PC1 PC2 PC3 PC4
## Sepal.Length 0.5210659 -0.37741762 0.7195664 0.2612863
## Sepal.Width -0.2693474 -0.92329566 -0.2443818 -0.1235096
## Petal.Length 0.5804131 -0.02449161 -0.1421264 -0.8014492
## Petal.Width 0.5648565 -0.06694199 -0.6342727 0.5235971
You can see that Sepal Width has both negative coefficients in PC1 and PC2 columns. So the arrow is pointing south west. The Sepal Length is positive in PC1 column but negative in PC2 column, so the arrow is pointing South East.
Also, notice that in the biplot, the Petal Width and Petal Length show almost the same direction. This means, the two are not giving different information from one to another.
Rock v Rap
Let’s try a harder dataset. Since it is painfully hard to visualize, we can reduce the data size:
set.seed(156)
sp_songs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv')
sp_songs <- sp_songs[sample(1:nrow(sp_songs), 1000),]
songs <- sp_songs[sp_songs$playlist_genre %in% c("rock", "rap"), ]
# colnames(songs)
features <- c(12:13,15,17:20) # the columns with song features
genres <- songs$playlist_genre
# pairs(songs[,features])
## Standard deviations (1, .., p=7):
## [1] 1.5539878 1.1865227 0.9751116 0.8712813 0.8253064 0.7380865 0.4913346
##
## Rotation (n x k) = (7 x 7):
## PC1 PC2 PC3 PC4 PC5
## danceability -0.135726342 0.64684308 -0.10399636 0.08628599 -0.70567752
## energy 0.561008158 -0.05483809 -0.03230532 -0.31020109 0.02720557
## loudness 0.526751046 0.19212832 0.04334411 -0.36603096 -0.17943004
## speechiness 0.007001837 0.56107669 -0.57857849 -0.06627048 0.58499878
## acousticness -0.491671942 0.01254930 0.02352951 -0.09477047 0.09357531
## instrumentalness -0.359944017 -0.21325856 -0.27418129 -0.80458227 -0.17978980
## liveness 0.134281502 -0.42571091 -0.75880190 0.31878756 -0.29288166
## PC6 PC7
## danceability -0.006251489 0.216545681
## energy 0.257681135 0.719621131
## loudness 0.301713205 -0.653097753
## speechiness -0.058637448 -0.018361756
## acousticness 0.859482443 0.033162166
## instrumentalness -0.264693120 0.006801827
## liveness 0.174125315 -0.085050416
The first component ignores speechiness but the second does not. Now, let’s use the first three com
Now, let’s check how much variety is explained by the components.
var_explained <- pca$sdev^2 / sum(pca$sdev^2)
ggplot() +
geom_point(aes(y=var_explained, x=1:length(var_explained))) +
geom_line(aes(y=var_explained, x=1:length(var_explained))) +
labs(x="PC", title = "Scree Plot for Explained Variance")
It looks if we only focus on the first two columns, we can explain 0.4+0.2=0.6 of the total variance. That sounds not nice.
temp <- data.frame(pca$x, genre=genres)
plot_ly(temp, x=~PC1, y=~PC2,z=~PC3, color=~genre) %>% add_markers()
Looks like the first three column gives good enough information to split the data. When we input these components to SVM, for example, it can be a very strong classifier that can predict if a song is a rock or rap song from the 8 features.
Virus Classification
In the last decade, a method was proposed to extractmeaning of the words appearing in texts to improve AI model performance. More clearly, the neural network model can encode the meaning in multidimensional numbers and if the two words have the similar meaning, the points are close to each other.
We used this method to extract meaning of the words not from Reddit posts to see which meme means what, but to extract the academic meaning of viruses to see if they tell us something, maybe tell us a research direction or whatsoever. If you want to know about the details, check this post.
vecs <- read.csv("vecs_virus_related.tsv", sep = "\t", header = F)
meta <- read.csv("meta_virus_related.tsv", sep="\t", header = F)
dat <- vecs
rownames(dat) <- meta[,1]
head(dat)
## V1 V2 V3 V4 V5
## equines -0.19604936 -0.15985060 -0.33375525 -0.04526498 -0.08473371
## mers001 0.09533985 -0.12091427 -0.04860811 -0.22899745 0.21668658
## hku11 0.19850877 0.36221334 -0.30019190 -0.33239266 -0.25177744
## b-cov 0.31060657 0.15887283 0.01660459 -0.39972785 0.15783097
## borreliosis -0.43272218 0.09581771 0.06180802 -0.30769423 -0.05672168
## poliomyelitis-like -0.22673094 0.06537160 -0.07942042 0.22347581 0.19538689
## V6 V7 V8 V9 V10
## equines 0.18724363 0.175352570 -0.33040246 -0.22763714 -0.3844061
## mers001 0.13490912 -0.203830720 -0.15187198 0.18818237 -0.3531088
## hku11 -0.16418317 -0.337758700 -0.08196682 0.16301866 -0.5239893
## b-cov -0.08401532 0.003977944 0.37901790 -0.67011666 -0.3204910
## borreliosis -0.24599838 0.390529800 -0.38848907 0.07356641 -0.4662028
## poliomyelitis-like 0.23535174 -0.351668500 0.12845084 -0.38519500 -0.6980681
## V11 V12 V13 V14 V15
## equines -0.05841804 0.2521597 -0.08796906 0.001995911 0.209795200
## mers001 -0.31874305 0.2847713 -0.13825710 0.305129100 0.114544310
## hku11 0.03627815 0.3593696 -0.42319410 0.449764200 -0.028787075
## b-cov 0.55579930 0.5765179 -0.07539823 0.274146500 0.134159710
## borreliosis -0.11823486 0.4391051 -0.40558285 0.006365238 -0.008241586
## poliomyelitis-like 0.03049587 0.4880849 -0.12883498 0.264644120 0.234832630
## V16 V17 V18 V19 V20
## equines 0.30043286 -0.18959412 0.15412992 -0.13095929 0.01784034
## mers001 -0.05907526 0.16455530 -0.01593633 0.05753520 0.15220699
## hku11 -0.12662533 0.03187899 -0.27627766 -0.02182303 0.38226452
## b-cov 0.01118912 0.24639592 -0.06969859 0.03574854 0.62901896
## borreliosis 0.09235452 -0.18413404 0.10557121 -0.06079557 0.22987083
## poliomyelitis-like -0.11755105 0.11830633 -0.01873994 -0.41434890 -0.09271124
## V21 V22 V23 V24 V25
## equines -0.05379143 0.052943468 -0.02882974 0.1008605 0.14688523
## mers001 0.12780519 0.112527430 -0.08299220 -0.3823345 -0.11602654
## hku11 0.43632240 0.343336970 -0.17617550 -0.2437657 -0.13321213
## b-cov 0.26003730 -0.006703441 -0.36988720 -0.3095366 -0.14622375
## borreliosis 0.09352521 -0.536463440 0.12218170 -0.2249366 0.12041151
## poliomyelitis-like -0.13991481 0.181409910 0.35772988 -0.1305362 -0.07889254
## V26 V27 V28 V29 V30
## equines 0.3456238 0.47229975 0.05785259 0.10619861 0.3519128
## mers001 0.4101166 0.03908592 0.10776450 0.01154217 0.1907543
## hku11 0.7419931 -0.11774983 0.36565253 -0.11557471 -0.0540347
## b-cov 0.4850854 0.17207184 0.40831062 -0.48165646 0.1166412
## borreliosis 0.7005657 -0.22509162 -0.15982066 0.19249503 0.2814141
## poliomyelitis-like 0.2622321 -0.21769132 -0.07928118 -0.02027107 0.2645990
## V31 V32 V33 V34 V35
## equines -0.2999940 -0.23882273 0.08510522 0.219175770 -0.24785782
## mers001 0.0624940 -0.14482483 -0.38366708 0.008633031 0.12605068
## hku11 -0.4808529 0.02047880 -0.35151652 0.122259510 -0.34891877
## b-cov -0.2601735 0.05708602 -0.28903824 0.502219740 0.15422930
## borreliosis -0.3930179 -0.31906393 -0.17255227 0.175272170 -0.27145046
## poliomyelitis-like -0.4431140 0.14488801 0.30957890 0.191314180 0.03358244
## V36 V37 V38 V39 V40
## equines -0.40609875 0.092535265 -0.1511196 -0.7494975 -0.10247708
## mers001 -0.23164459 0.039437560 0.2597214 -0.1266904 0.15620887
## hku11 -0.24052672 -0.041107293 0.8306102 -0.6716584 0.03407414
## b-cov -0.31066877 0.002905066 0.8746976 -0.3155835 -0.05224784
## borreliosis 0.03460294 -0.012162454 0.2668772 -0.4132175 -0.02798660
## poliomyelitis-like -0.06415557 -0.199063290 -0.1609680 -0.1621970 0.54403484
## V41 V42 V43 V44 V45
## equines 0.09861577 -0.23506160 0.06284624 0.06150116 -0.22852804
## mers001 -0.04784691 -0.08643361 0.21643722 -0.06471372 -0.07704247
## hku11 -0.07686678 0.27484750 -0.05445142 -0.06148862 0.05169616
## b-cov -0.40163094 0.25676134 -0.29571330 -0.60712600 0.10866741
## borreliosis 0.14237761 -0.30811600 0.10124352 -0.02221499 -0.32452935
## poliomyelitis-like -0.38632488 0.02213759 -0.01754020 -0.02739187 -0.15652606
## V46 V47 V48 V49 V50
## equines 0.29163656 -0.1802190 -0.37307423 -0.1934313 0.25554007
## mers001 0.16373323 -0.3418007 -0.39115778 0.1227862 -0.17470026
## hku11 -0.30658415 -0.3457119 -0.05758975 0.1161614 -0.01792405
## b-cov 0.09866195 -0.4664430 -0.36187595 0.1403619 0.03901501
## borreliosis -0.12996303 -0.3068976 0.02267982 -0.1353656 0.28949420
## poliomyelitis-like -0.15224352 -0.4984072 -0.11030888 0.3238560 0.48632880
## V51 V52 V53 V54
## equines -0.2082705600 -0.18568088 0.26382798 0.23157777
## mers001 -0.1591661300 0.10718155 0.09390677 -0.11849096
## hku11 -0.2436856800 -0.01890829 0.22587393 0.17272880
## b-cov -0.0004999846 0.06068788 0.17303184 -0.04468082
## borreliosis -0.0885528100 -0.24583420 0.53755736 0.10771827
## poliomyelitis-like 0.2151278600 -0.24089275 0.06163979 -0.07298398
## V55 V56 V57 V58 V59
## equines -0.351542980 -0.11182350 -0.1643292 -0.04864380 -0.3288410
## mers001 -0.005245881 0.25363585 -0.3973192 -0.04726816 0.2395085
## hku11 0.201157080 -0.32690760 -0.5769513 -0.08071554 0.1464000
## b-cov 0.417917730 -0.55117315 -0.3611101 -0.24642663 0.1731268
## borreliosis 0.047956593 0.08051036 -0.6842339 0.28745168 0.0679123
## poliomyelitis-like 0.283314300 -0.28844750 -0.4893426 0.41779630 -0.1695315
## V60 V61 V62 V63 V64
## equines -0.25969570 0.06348841 -0.08446921 0.17970666 -0.10126198
## mers001 0.09375896 0.02036974 0.12161475 0.22678424 -0.07044533
## hku11 0.36962217 -0.40295112 -0.15166497 0.22815123 0.18739258
## b-cov 0.41992864 -0.34692308 -0.09718464 -0.28458658 0.07647395
## borreliosis -0.59206414 -0.27773198 0.35949272 -0.03839309 -0.17731513
## poliomyelitis-like 0.07701400 -0.41940640 0.32567742 -0.09233902 0.05675062
## V65 V66 V67 V68 V69
## equines 0.048943277 0.439685850 0.01568799 0.11472826 0.7851075
## mers001 0.004517741 -0.054614905 -0.10509843 -0.09188886 0.2327702
## hku11 -0.690761570 0.447350140 0.23177305 -0.31628010 0.4685404
## b-cov -0.521235900 -0.007482542 -0.64317244 0.27271834 0.6022128
## borreliosis -0.422649230 0.374498100 0.30402985 0.21718895 0.2328670
## poliomyelitis-like -0.299597440 0.211467860 0.39611614 0.06038408 0.2556555
## V70 V71 V72 V73 V74
## equines -0.269997360 0.11288070 -0.05986281 -0.19175588 0.11216648
## mers001 0.134452430 -0.18387290 0.13322376 0.20365916 -0.24747040
## hku11 0.457447740 -0.21217439 0.39808255 -0.26899450 -0.02196863
## b-cov 0.620727900 -0.12140574 -0.08003557 -0.22593118 0.07454983
## borreliosis -0.009705198 0.03053164 0.02432608 0.07526477 -0.32310325
## poliomyelitis-like 0.505742970 -0.05720771 0.29541054 -0.16055603 -0.40778255
## V75 V76 V77 V78 V79
## equines -0.21007375 -0.32183766 -0.03911978 0.03686093 0.24634829
## mers001 -0.19555463 -0.07640504 0.07390786 -0.21851528 0.04780307
## hku11 -0.55660190 -0.08393969 -0.34060127 -0.14774418 0.26958185
## b-cov -0.38532420 0.04379074 -0.14391372 -0.08125513 0.10167339
## borreliosis -0.19300480 -0.26928908 0.30342427 -0.12442948 0.03732965
## poliomyelitis-like 0.02305253 0.09579863 0.14320521 -0.10527664 0.06615215
## V80 V81 V82 V83 V84
## equines -0.27439702 0.17477857 0.04295845 0.41372700 0.41418630
## mers001 -0.22572468 -0.02502449 0.24418520 -0.02254113 0.53733176
## hku11 -0.23666194 0.49089608 0.50261360 -0.23210403 0.42859990
## b-cov 0.11579821 -0.16797331 0.30416295 -0.05064074 0.20175567
## borreliosis 0.02404444 -0.08481916 0.07479294 -0.13935393 0.02349797
## poliomyelitis-like -0.11586829 -0.04007268 0.30478215 0.40840300 0.16423246
## V85 V86 V87 V88 V89
## equines -0.16532214 0.46748320 0.059586370 0.22331789 -0.0426954480
## mers001 -0.01108884 -0.01110627 0.074187180 0.31948936 0.0000198004
## hku11 -0.08256926 -0.01340122 0.411217700 0.38305685 -0.0727075600
## b-cov 0.05183039 -0.06397828 0.347998920 0.49619020 0.0947839900
## borreliosis 0.26973072 -0.26084498 0.147293600 0.34735286 -0.1627523600
## poliomyelitis-like 0.32666000 0.03151882 0.004034847 0.06756948 0.1766026200
## V90 V91 V92 V93 V94
## equines 0.59770770 -0.36366388 0.01373434 0.2639800 0.33127597
## mers001 -0.02208472 -0.08115952 -0.04845850 -0.1677030 0.06963458
## hku11 0.02492071 0.05554588 0.13357910 0.4150949 -0.39602700
## b-cov 0.20804620 -0.10950676 -0.12646545 0.1665936 -0.33739820
## borreliosis 0.29593116 -0.38370925 0.17979509 -0.2703794 -0.02844513
## poliomyelitis-like 0.28499904 -0.74571770 0.17973045 0.1480117 0.23081128
## V95 V96 V97 V98
## equines -0.1367745300 -0.48563746 -0.01037558 -0.363665200
## mers001 -0.1684513000 0.05612662 -0.04942359 0.018018162
## hku11 0.1120943100 0.04920124 0.07255974 0.488652350
## b-cov -0.0567360400 0.34878746 -0.16349310 0.353435130
## borreliosis -0.0008955629 0.10303862 -0.01058765 -0.068070464
## poliomyelitis-like -0.3632419400 -0.10044260 0.03537858 -0.007223219
## V99 V100 V101 V102 V103
## equines 0.06288832 -0.18197851 0.003182828 -0.2576990 -0.17944457
## mers001 0.08713524 0.15707925 -0.310919850 -0.1623783 -0.03571728
## hku11 0.47124237 0.04754869 0.090061490 -0.2519619 0.08264628
## b-cov 0.05775549 -0.01913434 0.044992585 -0.2230673 -0.29291450
## borreliosis -0.06706584 -0.37837505 0.171302150 -0.1383903 0.16947761
## poliomyelitis-like -0.11208253 -0.24409150 0.096219560 -0.3683824 0.28280544
## V104 V105 V106 V107 V108
## equines 0.03850167 0.209915680 0.0286270 -0.153680580 0.10860915
## mers001 -0.18340577 0.174518750 -0.1102432 -0.170321230 -0.39196393
## hku11 -0.59387280 -0.055182144 0.3148848 -0.001195435 -0.12453662
## b-cov -0.38968197 -0.156676830 0.1217769 -0.352764640 0.13634157
## borreliosis 0.10113045 -0.249505530 -0.4791971 0.107203595 0.09377944
## poliomyelitis-like 0.08561149 -0.002085549 -0.4612314 -0.204826520 -0.32568040
## V109 V110 V111 V112
## equines -0.27293727 -0.125377670 0.05724353 -0.127467400
## mers001 -0.10129624 -0.033316650 -0.21377735 0.172942830
## hku11 -0.03045480 0.159106640 -0.35495484 -0.062365692
## b-cov -0.09661293 0.154738560 -0.53820520 -0.348132160
## borreliosis 0.24539512 -0.009553239 -0.02482518 -0.015675617
## poliomyelitis-like -0.01992879 0.239583340 0.04869350 0.000814566
## V113 V114 V115 V116
## equines 0.01167934 0.0002972821 -0.24678935 -0.003730779
## mers001 -0.15316261 0.0693265200 -0.18455695 -0.066308690
## hku11 0.14389774 0.0498981140 -0.04681563 -0.096085526
## b-cov -0.44182830 0.2733139400 -0.07999499 0.283359100
## borreliosis 0.30000097 -0.0197712150 0.06263128 0.137922350
## poliomyelitis-like -0.25497213 0.1821271600 -0.07260312 0.123448570
## V117 V118 V119 V120 V121
## equines -0.37824884 -0.35737630 -0.1705389 0.23039809 0.09361552
## mers001 0.22959502 -0.26289484 0.2966222 -0.02522067 0.04427286
## hku11 0.17672981 0.05740396 0.4728231 0.08010118 0.07233953
## b-cov -0.01554258 0.11832526 0.1339620 -0.44059533 0.12839934
## borreliosis -0.48736504 -0.17013560 0.1732535 0.44089440 -0.05468110
## poliomyelitis-like -0.32202655 -0.18327130 -0.1840263 0.61527780 -0.05377749
## V122 V123 V124 V125 V126
## equines 0.36563644 0.172094200 0.10659908 0.29402998 -0.29901650
## mers001 0.04694556 0.008450834 -0.30523450 0.27178967 -0.71265894
## hku11 -0.37912887 0.057719820 -0.09395649 0.24378350 -0.84951380
## b-cov -0.14529464 0.062466267 -0.06050108 0.08597333 -0.61796130
## borreliosis 0.13731120 -0.235689000 -0.02806987 0.46878815 -0.04861113
## poliomyelitis-like 0.03025746 0.023461062 -0.31550685 0.07963061 -0.04221697
## V127 V128 V129 V130 V131
## equines -0.13179165 0.7846460 -0.40386644 0.21716532 0.02132709
## mers001 0.10588993 0.3017691 0.13777168 -0.26052877 -0.14275712
## hku11 0.22732249 0.4672230 0.10237148 0.22355798 0.07244939
## b-cov 0.14036152 0.1119408 -0.09582249 0.56131150 -0.72161555
## borreliosis -0.09116106 0.3242821 -0.28812304 -0.07897736 0.07722757
## poliomyelitis-like 0.19990191 0.3690472 0.05487371 0.27653033 0.10368124
## V132 V133 V134 V135 V136
## equines -0.12621357 0.233089800 -0.22907245 -0.08795387 -0.0503377
## mers001 -0.10831823 -0.031755812 -0.34849414 0.08101975 0.1913450
## hku11 0.25853118 0.111142010 -0.17696716 -0.11350215 0.2894583
## b-cov 0.12845090 -0.278358130 -0.03219130 -0.21409883 0.4819149
## borreliosis 0.14096563 0.006114585 0.37045540 -0.44005182 -0.1738730
## poliomyelitis-like 0.08239415 -0.215800020 0.09494355 -0.14054270 0.1749046
## V137 V138 V139 V140
## equines 0.41797236 0.0828731900 -0.04825955 0.23277410
## mers001 0.05393028 0.1787868600 0.20588996 0.29474717
## hku11 0.21043213 -0.0914452200 0.28244470 -0.16002327
## b-cov 0.23285612 -0.2692440200 0.43429145 -0.38732150
## borreliosis 0.46082413 -0.0009884253 0.39014820 0.41860345
## poliomyelitis-like 0.43642170 -0.0857382600 0.37035853 -0.08208404
## V141 V142 V143 V144
## equines -0.429055840 0.028607856 0.35155952 -0.42549875
## mers001 0.005764766 0.053249024 -0.12999618 0.03211414
## hku11 -0.163612780 -0.028228352 0.08634844 -0.28566197
## b-cov -0.211568740 -0.375126930 -0.10023294 -0.36877298
## borreliosis -0.252642660 -0.008620767 0.16877899 -0.60625905
## poliomyelitis-like -0.545547900 0.017839260 0.06970191 -0.49693453
## V145 V146 V147 V148 V149
## equines 0.21488832 -0.19087236 -0.208799780 0.5280470 0.06769451
## mers001 0.04789665 -0.14145264 -0.159848470 0.5104026 -0.12406941
## hku11 0.07406280 -0.30760157 -0.268232320 0.1979419 -0.49153075
## b-cov 0.05995635 -0.54717845 -0.402375250 0.1581831 -0.93189406
## borreliosis -0.06424087 -0.07684705 0.030546910 0.3477288 0.13147873
## poliomyelitis-like 0.03264135 0.04375606 -0.009349079 0.4499451 0.23761225
## V150 V151 V152 V153
## equines -0.007813436 -0.05459520 -0.193949000 -0.20981869
## mers001 -0.041833530 0.08287029 0.300396950 0.07124590
## hku11 0.011398625 -0.19048138 0.380353600 0.54612080
## b-cov -0.179289000 -0.33348137 0.059097570 0.40356344
## borreliosis -0.110463010 0.13510463 -0.008548246 -0.02858288
## poliomyelitis-like -0.238369050 -0.20058973 -0.137753160 -0.13680993
## V154 V155 V156 V157 V158
## equines -0.25995082 -0.12111603 -0.20279801 0.02789337 0.17150928
## mers001 -0.18092304 0.16430795 0.15433483 -0.39403270 -0.13361003
## hku11 0.09348992 -0.09662580 0.07898445 -0.44553700 -0.05666127
## b-cov -0.02856796 0.26269254 0.18342538 -0.11710444 0.20337483
## borreliosis 0.05089686 -0.02799435 -0.15208177 0.29969525 0.83172864
## poliomyelitis-like -0.22720748 -0.09389418 0.02613947 0.16555296 0.21306038
## V159 V160 V161 V162
## equines 0.25940454 -0.0008098162 -0.24987337 -0.364904640
## mers001 -0.17023386 -0.0893142740 -0.38253920 0.006695611
## hku11 0.03653084 -0.0003601186 -0.92480990 -0.101823850
## b-cov -0.17514380 -0.3247465200 -0.32537222 -0.098991560
## borreliosis 0.16658542 -0.0922757700 -0.27896658 -0.407526100
## poliomyelitis-like -0.02610543 -0.0272165950 -0.04350732 -0.185763050
## V163 V164 V165 V166 V167
## equines 0.14527513 -0.06562413 -0.23909129 -0.24013902 0.23111100
## mers001 0.15445861 0.08273049 0.02270171 -0.13855740 -0.02772518
## hku11 -0.30934680 0.04404795 -0.30309707 0.08624872 -0.13426265
## b-cov -0.07296255 0.04502268 0.01646242 0.09453747 -0.15236412
## borreliosis 0.47567633 -0.02695656 -0.20048903 0.24127948 -0.04332382
## poliomyelitis-like 0.26348713 0.33564633 -0.04068514 -0.14925751 -0.08675695
## V168 V169 V170 V171 V172
## equines 0.07795896 0.24058068 -0.3986824 0.02946474 0.07894487
## mers001 0.20235808 0.36868745 0.0588368 0.14772032 -0.23150292
## hku11 0.51739144 -0.01217322 -0.1247788 0.23079434 -0.32382150
## b-cov 0.47665837 0.20718984 -0.2642424 0.54260430 -0.58999180
## borreliosis 0.13783404 0.55832154 -0.1998835 0.02255052 0.12206259
## poliomyelitis-like 0.36517364 0.23671019 -0.1868521 0.35799750 -0.29241010
## V173 V174 V175 V176 V177
## equines -0.003781465 -0.01435838 0.5497689 -0.18712670 -0.03035912
## mers001 0.326151600 0.05116332 0.1079977 -0.01517098 0.19367751
## hku11 -0.105765290 0.10132213 0.2043012 -0.17183754 0.41669464
## b-cov -0.026334891 0.05925777 0.1272498 -0.09436248 0.09368961
## borreliosis -0.206606550 -0.12627213 0.5554458 0.12658648 -0.02391166
## poliomyelitis-like 0.232025710 -0.12706552 0.2372027 0.02969237 -0.08325937
## V178 V179 V180 V181 V182
## equines 0.26532042 0.03670397 -0.12620616 -0.161513340 -0.01268972
## mers001 0.12470616 -0.43714180 -0.12463964 -0.142172890 -0.26198632
## hku11 0.23811486 -0.33559206 -0.34073886 -0.040639386 0.41053290
## b-cov 0.05952244 -0.31512102 -0.33276623 -0.004498089 0.11486569
## borreliosis 0.51248246 -0.18627120 -0.24971439 0.326712070 -0.18411640
## poliomyelitis-like 0.33876854 -0.17683163 -0.06913313 0.221873700 0.02174703
## V183 V184 V185 V186
## equines -0.457782180 0.143490150 0.087909690 0.1517338
## mers001 -0.145981270 -0.038388167 0.099016540 0.1239053
## hku11 0.004406804 -0.007407106 0.004945443 -0.2150326
## b-cov -0.102516730 0.406989300 0.235677210 -0.2545040
## borreliosis 0.054074943 0.320766400 -0.006954881 -0.1731949
## poliomyelitis-like 0.272774430 -0.052664530 0.376317860 0.5914898
## V187 V188 V189 V190 V191
## equines 0.42087322 -0.36597410 -0.06584722 0.3573998 -0.137010400
## mers001 -0.17369235 -0.06181429 0.28676143 0.3154593 0.012966560
## hku11 0.02159361 -0.18035564 0.48564348 0.4243284 0.007847323
## b-cov -0.06831494 -0.16921449 0.37782225 0.2002607 0.093410500
## borreliosis 0.17764412 0.09751696 0.01521987 0.5461369 0.424490450
## poliomyelitis-like -0.19595821 0.17418551 0.15076812 0.4319515 -0.016196895
## V192 V193 V194 V195 V196
## equines 0.1624896 0.1087535 -0.2022066 0.35776657 -0.29410204
## mers001 -0.2854688 0.2999495 0.3093498 0.11270025 -0.03992354
## hku11 -0.2389549 0.1645915 0.2138592 -0.03042003 -0.52357846
## b-cov -0.2547604 0.4151901 0.2779614 0.01610372 -0.58224470
## borreliosis 0.2007006 0.1168109 -0.3036234 0.04078365 -0.62842757
## poliomyelitis-like -0.1168004 -0.1719519 0.1090612 -0.02902745 -0.50079066
## V197 V198 V199 V200
## equines -0.028814096 0.04766546 0.216620980 0.16556330
## mers001 0.007453341 0.21219005 -0.185051160 -0.20677567
## hku11 -0.187933620 0.17386950 -0.095082180 -0.05236667
## b-cov -0.241806750 0.02941018 -0.304866280 -0.11762194
## borreliosis 0.379432470 -0.09830010 0.386721730 0.25588295
## poliomyelitis-like 0.150634660 0.25686637 -0.001932861 0.19550231
## V201 V202 V203 V204 V205
## equines 0.18613254 0.10608642 0.188608150 0.04754428 0.3255620
## mers001 0.07281964 -0.08285254 0.003118353 -0.06370807 0.2174113
## hku11 -0.22771889 0.09542241 -0.063823740 0.22735481 0.2320159
## b-cov -0.36781853 0.20156723 -0.268793460 0.67092260 -0.1426486
## borreliosis -0.39058962 0.39499617 0.144764940 -0.24212147 0.3773659
## poliomyelitis-like -0.14390786 0.52212405 0.313817140 -0.24428256 0.5229696
## V206 V207 V208 V209 V210
## equines 0.2319084 0.03963953 -0.36132090 0.016873358 -0.02759563
## mers001 0.2451663 -0.16690166 -0.20339786 -0.177194680 0.15705639
## hku11 0.1731011 -0.47152624 -0.65817480 0.114002750 0.06271697
## b-cov 0.2151703 -0.16525987 -0.67182910 0.125973140 0.31474245
## borreliosis -0.1053069 0.18104509 0.05363941 0.275543660 0.10960455
## poliomyelitis-like 0.3620242 0.20043504 0.47056037 -0.006114637 0.02504732
## V211 V212 V213 V214 V215
## equines -0.26372626 0.03257174 0.40217030 -0.3071024 -0.50471747
## mers001 -0.17882200 -0.02921272 0.11465856 -0.3478030 -0.02309733
## hku11 -0.31383547 -0.07420144 -0.41447848 -0.3894331 -0.03476122
## b-cov 0.08613645 0.20690933 -0.14504576 -0.3684033 0.51333370
## borreliosis 0.40549436 0.26288828 0.08812041 -0.1356047 -0.10155952
## poliomyelitis-like 0.01145139 0.04120889 0.01060974 -0.1566200 0.17707877
## V216 V217 V218 V219 V220
## equines -0.347624480 0.25134113 0.17289613 0.06425332 -0.22090912
## mers001 -0.230126440 0.16177766 0.16422175 0.32201266 -0.06409440
## hku11 0.013584516 0.09962453 0.09325928 0.24159013 0.19926314
## b-cov 0.005007951 -0.09722962 0.10105199 0.32949862 0.06472880
## borreliosis 0.246019700 -0.20778210 0.57176680 0.30317047 0.02694851
## poliomyelitis-like 0.302016760 0.30871420 0.31613630 0.30972922 0.04999819
## V221 V222 V223 V224 V225
## equines -0.03914366 0.267297180 -0.40559062 0.26880324 -0.03962610
## mers001 -0.02389959 0.004673506 -0.17718841 0.31779373 -0.47349674
## hku11 -0.20502025 -0.007450726 -0.57745550 0.42681015 -0.34667084
## b-cov -0.25079620 -0.249126780 -0.40402317 0.56244224 -0.34712234
## borreliosis 0.13645588 -0.390733240 0.20225444 0.22791688 0.29351622
## poliomyelitis-like -0.02242862 -0.243055120 0.08236051 -0.04970396 0.05514716
## V226 V227 V228 V229 V230
## equines 0.35779288 0.2839354 -0.19542465 0.11771324 -0.28835562
## mers001 0.07950526 0.1885495 0.23916902 0.05820171 -0.18966669
## hku11 -0.21290350 0.1301343 0.12400005 0.49399230 -0.20088142
## b-cov 0.14166690 0.1461979 -0.08977294 0.25534433 0.02016562
## borreliosis 0.33142710 0.3367960 -0.02722750 0.19862170 -0.21143624
## poliomyelitis-like 0.05923095 -0.0466193 0.01736952 -0.29026827 -0.23174636
## V231 V232 V233 V234 V235
## equines 0.19786093 -0.3202346 -0.18523312 -0.32119352 -0.22577132
## mers001 -0.16353333 0.4485322 0.04204082 -0.31471682 -0.34111652
## hku11 -0.06623317 0.3427113 -0.41357666 -0.57398380 -0.23631270
## b-cov 0.21680413 0.3554559 0.13968627 -0.08231734 0.06063811
## borreliosis -0.25803310 0.1661090 -0.17815682 0.09660935 -0.03232753
## poliomyelitis-like -0.04653626 0.2204026 -0.05573781 0.24898517 0.15071727
## V236 V237 V238 V239 V240
## equines 0.39558515 -0.099668120 -0.37339172 0.2086149 0.16076975
## mers001 0.20834091 -0.007026819 -0.21515651 -0.1712667 0.12169899
## hku11 0.26208773 -0.141404820 -0.08103006 -0.1148244 0.34616962
## b-cov 0.33567770 0.172250050 -0.01322550 -0.3577069 0.04360465
## borreliosis -0.11582337 -0.040503304 -0.43144286 0.4114651 0.37995930
## poliomyelitis-like 0.09200669 -0.045372255 -0.39357942 0.3566563 0.05077722
## V241 V242 V243 V244 V245
## equines 0.43269540 0.04955981 -0.19315349 -0.09475119 0.23335266
## mers001 0.14524509 -0.10713989 -0.18247716 0.08876727 -0.07008161
## hku11 0.54344666 -0.14574017 -0.28058330 0.05062875 -0.18894057
## b-cov 0.04061089 0.19942217 -0.09818464 -0.01775772 -0.03356880
## borreliosis 0.30501503 -0.17092353 -0.50302655 -0.22425042 0.13168944
## poliomyelitis-like 0.04802032 -0.19320427 -0.38053164 0.23217405 0.14958037
## V246 V247 V248 V249 V250
## equines -0.27807960 -0.59880180 0.05784727 0.33433694 -0.1343852
## mers001 -0.25165847 0.18113878 -0.09820239 0.05394435 0.0547007
## hku11 0.09709910 0.07265194 -0.19711237 0.35272485 -0.1929643
## b-cov 0.03883014 0.05494646 -0.20215640 0.01995254 0.1946652
## borreliosis -0.30895576 -0.53390880 -0.01698794 -0.19850339 0.4606695
## poliomyelitis-like -0.33366677 -0.13542713 -0.18045029 0.09620881 -0.1170429
## V251 V252 V253 V254 V255
## equines 0.03034481 0.14790972 -0.3850267 0.0975031300 0.24965230
## mers001 -0.01177250 -0.05401717 -0.3264950 0.2353285600 -0.03745336
## hku11 -0.34167590 -0.06567857 -0.2317287 -0.0114241520 0.25615534
## b-cov -0.26532114 -0.12106454 -0.2887557 -0.0963161660 -0.10188762
## borreliosis -0.34002087 0.02003723 -0.1819038 -0.0304181020 0.50953520
## poliomyelitis-like 0.11071904 0.29162252 0.1666907 -0.0005288996 0.17921741
## V256 V257 V258 V259 V260
## equines 0.34692624 0.191845940 -0.04191696 -0.3702121 0.02572495
## mers001 0.08797582 -0.007942038 0.08913151 -0.5168732 -0.13164702
## hku11 0.36027262 0.117123290 0.29862240 -0.6802452 0.13713352
## b-cov 0.25983927 0.044115983 0.36059994 -0.6835191 -0.35674700
## borreliosis 0.11013354 0.362597730 0.20926453 -0.4350297 -0.28296992
## poliomyelitis-like 0.15192662 0.307962150 -0.14545391 -0.1668293 0.23250972
## V261 V262 V263 V264 V265
## equines 0.1792525 0.29659910 0.1162683 -0.27740390 0.04152656
## mers001 0.4275140 0.08730372 0.1054778 -0.06272911 -0.07715718
## hku11 0.1247107 -0.26501350 -0.4350363 0.04378432 -0.10968643
## b-cov -0.4056213 -0.40810516 -0.1033421 0.24427594 0.22438150
## borreliosis 0.5499617 0.20233347 0.1809648 0.05847350 -0.10501756
## poliomyelitis-like 0.3096064 0.20326872 -0.1399777 -0.19452696 0.07868132
## V266 V267 V268 V269 V270
## equines -0.11848922 0.03683834 0.01847811 0.02674482 0.13909931
## mers001 0.13856149 0.09963655 0.03863866 -0.01858614 0.36559018
## hku11 0.14839074 0.10619055 0.30536490 -0.33618847 0.01654463
## b-cov 0.01250131 -0.28506320 0.46109807 -0.08191922 0.22639474
## borreliosis -0.20493448 -0.57653564 -0.01984089 0.37170267 0.10076117
## poliomyelitis-like 0.06845472 -0.32258332 -0.20529631 0.03000377 -0.30889140
## V271 V272 V273 V274 V275
## equines -0.42862037 -0.19502930 0.40198484 0.3278304 -0.16277507
## mers001 -0.02635956 0.41522288 0.18331979 0.1353481 0.08126567
## hku11 -0.10665757 -0.43369332 0.06836406 0.3245627 -0.55878430
## b-cov -0.64407320 -0.25371397 0.01069643 0.1575683 -0.70094687
## borreliosis -0.16348471 0.39030525 0.26751190 0.3631494 -0.18764016
## poliomyelitis-like -0.21982083 0.02090026 0.16390444 0.4307295 -0.40388355
## V276 V277 V278 V279 V280
## equines 0.21001695 0.49048594 -0.1053051 -0.25823570 -0.04469810
## mers001 0.01905448 0.12541781 -0.1395396 -0.03068201 -0.06593797
## hku11 0.31158102 0.23720087 -0.3496884 -0.40164882 0.16403528
## b-cov -0.01764769 -0.07950428 -0.2208065 0.06064332 0.00916285
## borreliosis -0.09552341 0.14758122 0.1219204 -0.19795002 0.27263233
## poliomyelitis-like 0.03930342 0.39952046 0.1052264 -0.06366608 0.11151467
## V281 V282 V283 V284 V285
## equines -0.096857280 0.45906170 -0.3566937 -0.5131708 -0.12238224
## mers001 -0.162239540 -0.25933895 -0.3142889 0.0828340 -0.25893846
## hku11 -0.059018300 0.39247885 -0.4563845 -0.3375348 -0.28801718
## b-cov -0.119805420 -0.08710018 -0.8830411 -0.4490811 -0.50233907
## borreliosis 0.235125850 0.55596715 -0.1064238 -0.4271221 -0.03234239
## poliomyelitis-like 0.001646102 -0.01967174 -0.2237641 -0.5771344 0.25791848
## V286 V287 V288 V289 V290
## equines -0.26749176 0.36345056 -0.003188965 0.09551277 0.2102274
## mers001 0.02946636 0.09111407 0.195496920 0.08610142 0.1340841
## hku11 -0.09183521 0.20711760 -0.211467770 0.34648350 0.3594854
## b-cov -0.09268853 0.28624195 -0.377546600 0.36341962 0.3844398
## borreliosis -0.07472138 0.04783938 0.383256500 0.23114732 0.6568224
## poliomyelitis-like -0.18475701 0.17587167 0.386728940 0.30680250 0.4802814
## V291 V292 V293 V294 V295
## equines 0.10861573 0.065212056 0.17184897 -0.19625726 0.06891499
## mers001 0.15701516 0.002641898 0.20186493 0.11583256 0.04923562
## hku11 0.22847338 0.347385760 -0.19294739 -0.06360266 -0.25614417
## b-cov 0.21003467 0.423407730 -0.05406551 0.27895343 0.03179154
## borreliosis -0.05216979 -0.049621146 0.35023484 0.12712288 -0.21269230
## poliomyelitis-like -0.11619792 0.117667910 0.66434760 0.08907303 -0.09120734
## V296 V297 V298 V299 V300
## equines -0.270936370 -0.04311628 0.1599648 -0.44186664 0.05629280
## mers001 -0.320693520 -0.13031584 -0.1631802 0.04581932 0.02421508
## hku11 -0.555198800 -0.14952376 -0.6188920 -0.38361973 -0.25625452
## b-cov -0.446959400 -0.13856791 -0.6524845 -0.64554050 -0.08319721
## borreliosis 0.002503621 0.11319777 -0.4374845 -0.16431290 0.16926205
## poliomyelitis-like -0.126203370 0.38335085 -0.1145345 0.07980620 0.12920211
Each column is a meaning vector that tells something we don’t know. Remember, these were extracted from academic articles. But if we group them using PCA, they may present a good picture.
For a fancier visual, we will use Tensorflow’s projector. We already have done it for you: