Principle Component Analysis


Introduction

PCAs are cool sounding and extremely useful tools that belong to unsupervised learning models. Some fear them as they are not straightforward to grasp but this is just because they were displayed the algebra that does the job. We will do it very practical and show you why and how it works, and what they do. So overall we will,

  • see the motivation behind this method,
  • show how it works in R,
  • interpret over different examples.

Getting Started

To follow up, you will need

  • tidyverse
  • plotly

Now, call tidyverse and other packages:

library('tidyverse')
theme_set(theme_minimal())

Unsupervised Methods

So far, we showed you methods that belongs to supervised learning, that is for each data, we tell the machine which data must be classified how and then checking the model really shows that it learned. For some, this is not how humans really learn. We see some patterns, regularities and then this commonality is a signal that they belong to a class. Of course humans are way more than this but at least this is one of our capabilities.

Let’s clarify our goal with an example and play with Iris dataset:

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

This dataset is used so often on supervised learning, and in fact it has labels already. There are 150 iris flowers with their sepal/petal length and widths, and yes, all are labeled as either Setosa, Versicolor or Virginica.

Now, let’s try to visualize this dataset, since I cannot visualize in 4D (there are 4 predictrs), let’s pick two dimensions and plot:

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) + 
  geom_point(alpha=.5, size =2)

It looks there is a group that is slightly distinct than the others but is it really? Also, it is noticeable that some flowers have the exact same properties, so that they overlap perfectly. Are they perfectly the same? We need more information. Let’s try other predictors:

ggplot(iris, aes(x=Sepal.Length, y=Petal.Width)) + 
  geom_point(alpha=.5)

There is still some overlap, but we cannot be sure if the overlapping ones are the same flowers. Again, we can see that one group is very different than the others.

Now the question is, when we changed from one plot to another, what just happened? Let’s plot it in 3D and try to answer this question:

library(plotly)
plot_ly(iris, x=~Sepal.Length, z=~Sepal.Width,y=~Petal.Width, opacity=0.3) %>% add_markers()

Each plot is just a standpoint, if we turn the cube we shift from one angle to another. But when shifting is a continuous process. There is a shade. And when we slowly change the dimension from one to another, we can see that the points that are overlapping are actually different from one plot to another. This is one motivation behind PCA. Let’s show the other one.

The 3D plot tells us very well about the groups and the granularity, but since we have 4 columns, when we plot it is 4D. But maybe we can disregard one variable:

pairs(iris[,1:4])

In the above plot, we can see that Petal Length and Petal Width seem to be saying the same thing. They do not differ much. Overall, there is some variation between the dimensions, each tell something but I cannot choose one over another easily.

PCA

We gave insight into the motivation behind PCA but didn’t state what it does. PCA is a method that finds an optimal angle that shows enough variety between the points and combine the columns so that we can plot the combined ones.

In the above 3D plot, we showed that there is a shade between two angles. When we rotate the cube right, we shift from one dimension to another but as we shift, we see the combined axes:

library(gridExtra)

gs <- lapply(seq(0,1,length.out = 9), function(a)
  ggplot(iris, aes(x=Sepal.Length, y=a*Sepal.Width + (1-a) * Petal.Width)) + 
    geom_point(alpha=.5)+ 
    ggtitle(paste0("a=",round(a,2))) + 
    theme(axis.title = element_blank())) 

grid.arrange(grobs=gs, ncol=3)

What PCA does is finding the optimal combination of axes that maximize the variation (e.g. granularity):

pca <- prcomp(iris[,1:4], scale = TRUE)
pca
## Standard deviations (1, .., p=4):
## [1] 1.7083611 0.9560494 0.3830886 0.1439265
## 
## Rotation (n x k) = (4 x 4):
##                     PC1         PC2        PC3        PC4
## Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
## Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
## Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
## Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971

The above are the formula to combine the axes, in other words the weights. To obtain the first axis, we multiply the loads on PC1 column with the each flower’s predictors. So, the first axis will be:

\[Axis1 = 0.521 S.Len -0.269 S.Wid + 0.580 P.Len + 0.565 P.Wid\]

\[Axis2 = -0.377 S.Len -0.924 S.Wid -0.024 P.Len -0.067 P.Wid\]

We don’t need to calculate these, R has already computed it:

dat <- data.frame(pca$x)
colnames(dat) <- paste0("Ax",1:4)
ggplot(dat, aes(Ax1, Ax2)) + 
  geom_point(alpha=.5)

The distinction between two groups is very clear and the dat is way more granular.

Dimension Reduction

This is not the only reason why we use PCA. What we just did is we reduced the dimensionality into 2. But does it contain all the information? The answer is given by the screeplot.

The resulting object, pca reports sdev as well. This is how much variation is explained by each component:

pca$sdev
## [1] 1.7083611 0.9560494 0.3830886 0.1439265

You can see, the first two component is actually explaining a lot. More clearly, we can understand how much variety is explained by the components as below:

var_explained <- pca$sdev^2 / sum(pca$sdev^2)
ggplot() + 
  geom_point(aes(y=var_explained, x=1:length(var_explained))) + 
  geom_line(aes(y=var_explained, x=1:length(var_explained))) + 
  labs(x="PC", title = "Scree Plot for Explained Variance")

> It looks the first two components does the vast majority of our job. We can just use that two and forget about the rest.

Biplot

Biplot was introduced three decades ago and combines all the information we need. Let’s plot it:

biplot(pca)

The above plot shows the information about how data points are distributed, and what does the columns say about them. We know the black numbers there, they are the previous plot. The arrows on the other hand are the coefficients in the pca results:

pca$rotation
##                     PC1         PC2        PC3        PC4
## Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
## Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
## Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
## Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971

You can see that Sepal Width has both negative coefficients in PC1 and PC2 columns. So the arrow is pointing south west. The Sepal Length is positive in PC1 column but negative in PC2 column, so the arrow is pointing South East.

Also, notice that in the biplot, the Petal Width and Petal Length show almost the same direction. This means, the two are not giving different information from one to another.

Rock v Rap

Let’s try a harder dataset. Since it is painfully hard to visualize, we can reduce the data size:

set.seed(156)
sp_songs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv')
sp_songs <- sp_songs[sample(1:nrow(sp_songs), 1000),]
songs <- sp_songs[sp_songs$playlist_genre %in% c("rock", "rap"), ]
# colnames(songs)

features <- c(12:13,15,17:20) # the columns with song features
genres   <- songs$playlist_genre
# pairs(songs[,features])
pca <- prcomp(songs[,features], scale = TRUE)
pca
## Standard deviations (1, .., p=7):
## [1] 1.5539878 1.1865227 0.9751116 0.8712813 0.8253064 0.7380865 0.4913346
## 
## Rotation (n x k) = (7 x 7):
##                           PC1         PC2         PC3         PC4         PC5
## danceability     -0.135726342  0.64684308 -0.10399636  0.08628599 -0.70567752
## energy            0.561008158 -0.05483809 -0.03230532 -0.31020109  0.02720557
## loudness          0.526751046  0.19212832  0.04334411 -0.36603096 -0.17943004
## speechiness       0.007001837  0.56107669 -0.57857849 -0.06627048  0.58499878
## acousticness     -0.491671942  0.01254930  0.02352951 -0.09477047  0.09357531
## instrumentalness -0.359944017 -0.21325856 -0.27418129 -0.80458227 -0.17978980
## liveness          0.134281502 -0.42571091 -0.75880190  0.31878756 -0.29288166
##                           PC6          PC7
## danceability     -0.006251489  0.216545681
## energy            0.257681135  0.719621131
## loudness          0.301713205 -0.653097753
## speechiness      -0.058637448 -0.018361756
## acousticness      0.859482443  0.033162166
## instrumentalness -0.264693120  0.006801827
## liveness          0.174125315 -0.085050416

The first component ignores speechiness but the second does not. Now, let’s use the first three com

Now, let’s check how much variety is explained by the components.

var_explained <- pca$sdev^2 / sum(pca$sdev^2)
ggplot() + 
  geom_point(aes(y=var_explained, x=1:length(var_explained))) + 
  geom_line(aes(y=var_explained, x=1:length(var_explained))) + 
  labs(x="PC", title = "Scree Plot for Explained Variance")

It looks if we only focus on the first two columns, we can explain 0.4+0.2=0.6 of the total variance. That sounds not nice.

temp <- data.frame(pca$x, genre=genres)
plot_ly(temp, x=~PC1, y=~PC2,z=~PC3, color=~genre) %>% add_markers()

Looks like the first three column gives good enough information to split the data. When we input these components to SVM, for example, it can be a very strong classifier that can predict if a song is a rock or rap song from the 8 features.

Virus Classification

In the last decade, a method was proposed to extractmeaning of the words appearing in texts to improve AI model performance. More clearly, the neural network model can encode the meaning in multidimensional numbers and if the two words have the similar meaning, the points are close to each other.

We used this method to extract meaning of the words not from Reddit posts to see which meme means what, but to extract the academic meaning of viruses to see if they tell us something, maybe tell us a research direction or whatsoever. If you want to know about the details, check this post.

vecs <- read.csv("vecs_virus_related.tsv", sep = "\t", header = F)
meta <- read.csv("meta_virus_related.tsv", sep="\t", header = F)
dat  <- vecs
rownames(dat) <- meta[,1] 
head(dat)
##                             V1          V2          V3          V4          V5
## equines            -0.19604936 -0.15985060 -0.33375525 -0.04526498 -0.08473371
## mers001             0.09533985 -0.12091427 -0.04860811 -0.22899745  0.21668658
## hku11               0.19850877  0.36221334 -0.30019190 -0.33239266 -0.25177744
## b-cov               0.31060657  0.15887283  0.01660459 -0.39972785  0.15783097
## borreliosis        -0.43272218  0.09581771  0.06180802 -0.30769423 -0.05672168
## poliomyelitis-like -0.22673094  0.06537160 -0.07942042  0.22347581  0.19538689
##                             V6           V7          V8          V9        V10
## equines             0.18724363  0.175352570 -0.33040246 -0.22763714 -0.3844061
## mers001             0.13490912 -0.203830720 -0.15187198  0.18818237 -0.3531088
## hku11              -0.16418317 -0.337758700 -0.08196682  0.16301866 -0.5239893
## b-cov              -0.08401532  0.003977944  0.37901790 -0.67011666 -0.3204910
## borreliosis        -0.24599838  0.390529800 -0.38848907  0.07356641 -0.4662028
## poliomyelitis-like  0.23535174 -0.351668500  0.12845084 -0.38519500 -0.6980681
##                            V11       V12         V13         V14          V15
## equines            -0.05841804 0.2521597 -0.08796906 0.001995911  0.209795200
## mers001            -0.31874305 0.2847713 -0.13825710 0.305129100  0.114544310
## hku11               0.03627815 0.3593696 -0.42319410 0.449764200 -0.028787075
## b-cov               0.55579930 0.5765179 -0.07539823 0.274146500  0.134159710
## borreliosis        -0.11823486 0.4391051 -0.40558285 0.006365238 -0.008241586
## poliomyelitis-like  0.03049587 0.4880849 -0.12883498 0.264644120  0.234832630
##                            V16         V17         V18         V19         V20
## equines             0.30043286 -0.18959412  0.15412992 -0.13095929  0.01784034
## mers001            -0.05907526  0.16455530 -0.01593633  0.05753520  0.15220699
## hku11              -0.12662533  0.03187899 -0.27627766 -0.02182303  0.38226452
## b-cov               0.01118912  0.24639592 -0.06969859  0.03574854  0.62901896
## borreliosis         0.09235452 -0.18413404  0.10557121 -0.06079557  0.22987083
## poliomyelitis-like -0.11755105  0.11830633 -0.01873994 -0.41434890 -0.09271124
##                            V21          V22         V23        V24         V25
## equines            -0.05379143  0.052943468 -0.02882974  0.1008605  0.14688523
## mers001             0.12780519  0.112527430 -0.08299220 -0.3823345 -0.11602654
## hku11               0.43632240  0.343336970 -0.17617550 -0.2437657 -0.13321213
## b-cov               0.26003730 -0.006703441 -0.36988720 -0.3095366 -0.14622375
## borreliosis         0.09352521 -0.536463440  0.12218170 -0.2249366  0.12041151
## poliomyelitis-like -0.13991481  0.181409910  0.35772988 -0.1305362 -0.07889254
##                          V26         V27         V28         V29        V30
## equines            0.3456238  0.47229975  0.05785259  0.10619861  0.3519128
## mers001            0.4101166  0.03908592  0.10776450  0.01154217  0.1907543
## hku11              0.7419931 -0.11774983  0.36565253 -0.11557471 -0.0540347
## b-cov              0.4850854  0.17207184  0.40831062 -0.48165646  0.1166412
## borreliosis        0.7005657 -0.22509162 -0.15982066  0.19249503  0.2814141
## poliomyelitis-like 0.2622321 -0.21769132 -0.07928118 -0.02027107  0.2645990
##                           V31         V32         V33         V34         V35
## equines            -0.2999940 -0.23882273  0.08510522 0.219175770 -0.24785782
## mers001             0.0624940 -0.14482483 -0.38366708 0.008633031  0.12605068
## hku11              -0.4808529  0.02047880 -0.35151652 0.122259510 -0.34891877
## b-cov              -0.2601735  0.05708602 -0.28903824 0.502219740  0.15422930
## borreliosis        -0.3930179 -0.31906393 -0.17255227 0.175272170 -0.27145046
## poliomyelitis-like -0.4431140  0.14488801  0.30957890 0.191314180  0.03358244
##                            V36          V37        V38        V39         V40
## equines            -0.40609875  0.092535265 -0.1511196 -0.7494975 -0.10247708
## mers001            -0.23164459  0.039437560  0.2597214 -0.1266904  0.15620887
## hku11              -0.24052672 -0.041107293  0.8306102 -0.6716584  0.03407414
## b-cov              -0.31066877  0.002905066  0.8746976 -0.3155835 -0.05224784
## borreliosis         0.03460294 -0.012162454  0.2668772 -0.4132175 -0.02798660
## poliomyelitis-like -0.06415557 -0.199063290 -0.1609680 -0.1621970  0.54403484
##                            V41         V42         V43         V44         V45
## equines             0.09861577 -0.23506160  0.06284624  0.06150116 -0.22852804
## mers001            -0.04784691 -0.08643361  0.21643722 -0.06471372 -0.07704247
## hku11              -0.07686678  0.27484750 -0.05445142 -0.06148862  0.05169616
## b-cov              -0.40163094  0.25676134 -0.29571330 -0.60712600  0.10866741
## borreliosis         0.14237761 -0.30811600  0.10124352 -0.02221499 -0.32452935
## poliomyelitis-like -0.38632488  0.02213759 -0.01754020 -0.02739187 -0.15652606
##                            V46        V47         V48        V49         V50
## equines             0.29163656 -0.1802190 -0.37307423 -0.1934313  0.25554007
## mers001             0.16373323 -0.3418007 -0.39115778  0.1227862 -0.17470026
## hku11              -0.30658415 -0.3457119 -0.05758975  0.1161614 -0.01792405
## b-cov               0.09866195 -0.4664430 -0.36187595  0.1403619  0.03901501
## borreliosis        -0.12996303 -0.3068976  0.02267982 -0.1353656  0.28949420
## poliomyelitis-like -0.15224352 -0.4984072 -0.11030888  0.3238560  0.48632880
##                              V51         V52        V53         V54
## equines            -0.2082705600 -0.18568088 0.26382798  0.23157777
## mers001            -0.1591661300  0.10718155 0.09390677 -0.11849096
## hku11              -0.2436856800 -0.01890829 0.22587393  0.17272880
## b-cov              -0.0004999846  0.06068788 0.17303184 -0.04468082
## borreliosis        -0.0885528100 -0.24583420 0.53755736  0.10771827
## poliomyelitis-like  0.2151278600 -0.24089275 0.06163979 -0.07298398
##                             V55         V56        V57         V58        V59
## equines            -0.351542980 -0.11182350 -0.1643292 -0.04864380 -0.3288410
## mers001            -0.005245881  0.25363585 -0.3973192 -0.04726816  0.2395085
## hku11               0.201157080 -0.32690760 -0.5769513 -0.08071554  0.1464000
## b-cov               0.417917730 -0.55117315 -0.3611101 -0.24642663  0.1731268
## borreliosis         0.047956593  0.08051036 -0.6842339  0.28745168  0.0679123
## poliomyelitis-like  0.283314300 -0.28844750 -0.4893426  0.41779630 -0.1695315
##                            V60         V61         V62         V63         V64
## equines            -0.25969570  0.06348841 -0.08446921  0.17970666 -0.10126198
## mers001             0.09375896  0.02036974  0.12161475  0.22678424 -0.07044533
## hku11               0.36962217 -0.40295112 -0.15166497  0.22815123  0.18739258
## b-cov               0.41992864 -0.34692308 -0.09718464 -0.28458658  0.07647395
## borreliosis        -0.59206414 -0.27773198  0.35949272 -0.03839309 -0.17731513
## poliomyelitis-like  0.07701400 -0.41940640  0.32567742 -0.09233902  0.05675062
##                             V65          V66         V67         V68       V69
## equines             0.048943277  0.439685850  0.01568799  0.11472826 0.7851075
## mers001             0.004517741 -0.054614905 -0.10509843 -0.09188886 0.2327702
## hku11              -0.690761570  0.447350140  0.23177305 -0.31628010 0.4685404
## b-cov              -0.521235900 -0.007482542 -0.64317244  0.27271834 0.6022128
## borreliosis        -0.422649230  0.374498100  0.30402985  0.21718895 0.2328670
## poliomyelitis-like -0.299597440  0.211467860  0.39611614  0.06038408 0.2556555
##                             V70         V71         V72         V73         V74
## equines            -0.269997360  0.11288070 -0.05986281 -0.19175588  0.11216648
## mers001             0.134452430 -0.18387290  0.13322376  0.20365916 -0.24747040
## hku11               0.457447740 -0.21217439  0.39808255 -0.26899450 -0.02196863
## b-cov               0.620727900 -0.12140574 -0.08003557 -0.22593118  0.07454983
## borreliosis        -0.009705198  0.03053164  0.02432608  0.07526477 -0.32310325
## poliomyelitis-like  0.505742970 -0.05720771  0.29541054 -0.16055603 -0.40778255
##                            V75         V76         V77         V78        V79
## equines            -0.21007375 -0.32183766 -0.03911978  0.03686093 0.24634829
## mers001            -0.19555463 -0.07640504  0.07390786 -0.21851528 0.04780307
## hku11              -0.55660190 -0.08393969 -0.34060127 -0.14774418 0.26958185
## b-cov              -0.38532420  0.04379074 -0.14391372 -0.08125513 0.10167339
## borreliosis        -0.19300480 -0.26928908  0.30342427 -0.12442948 0.03732965
## poliomyelitis-like  0.02305253  0.09579863  0.14320521 -0.10527664 0.06615215
##                            V80         V81        V82         V83        V84
## equines            -0.27439702  0.17477857 0.04295845  0.41372700 0.41418630
## mers001            -0.22572468 -0.02502449 0.24418520 -0.02254113 0.53733176
## hku11              -0.23666194  0.49089608 0.50261360 -0.23210403 0.42859990
## b-cov               0.11579821 -0.16797331 0.30416295 -0.05064074 0.20175567
## borreliosis         0.02404444 -0.08481916 0.07479294 -0.13935393 0.02349797
## poliomyelitis-like -0.11586829 -0.04007268 0.30478215  0.40840300 0.16423246
##                            V85         V86         V87        V88           V89
## equines            -0.16532214  0.46748320 0.059586370 0.22331789 -0.0426954480
## mers001            -0.01108884 -0.01110627 0.074187180 0.31948936  0.0000198004
## hku11              -0.08256926 -0.01340122 0.411217700 0.38305685 -0.0727075600
## b-cov               0.05183039 -0.06397828 0.347998920 0.49619020  0.0947839900
## borreliosis         0.26973072 -0.26084498 0.147293600 0.34735286 -0.1627523600
## poliomyelitis-like  0.32666000  0.03151882 0.004034847 0.06756948  0.1766026200
##                            V90         V91         V92        V93         V94
## equines             0.59770770 -0.36366388  0.01373434  0.2639800  0.33127597
## mers001            -0.02208472 -0.08115952 -0.04845850 -0.1677030  0.06963458
## hku11               0.02492071  0.05554588  0.13357910  0.4150949 -0.39602700
## b-cov               0.20804620 -0.10950676 -0.12646545  0.1665936 -0.33739820
## borreliosis         0.29593116 -0.38370925  0.17979509 -0.2703794 -0.02844513
## poliomyelitis-like  0.28499904 -0.74571770  0.17973045  0.1480117  0.23081128
##                              V95         V96         V97          V98
## equines            -0.1367745300 -0.48563746 -0.01037558 -0.363665200
## mers001            -0.1684513000  0.05612662 -0.04942359  0.018018162
## hku11               0.1120943100  0.04920124  0.07255974  0.488652350
## b-cov              -0.0567360400  0.34878746 -0.16349310  0.353435130
## borreliosis        -0.0008955629  0.10303862 -0.01058765 -0.068070464
## poliomyelitis-like -0.3632419400 -0.10044260  0.03537858 -0.007223219
##                            V99        V100         V101       V102        V103
## equines             0.06288832 -0.18197851  0.003182828 -0.2576990 -0.17944457
## mers001             0.08713524  0.15707925 -0.310919850 -0.1623783 -0.03571728
## hku11               0.47124237  0.04754869  0.090061490 -0.2519619  0.08264628
## b-cov               0.05775549 -0.01913434  0.044992585 -0.2230673 -0.29291450
## borreliosis        -0.06706584 -0.37837505  0.171302150 -0.1383903  0.16947761
## poliomyelitis-like -0.11208253 -0.24409150  0.096219560 -0.3683824  0.28280544
##                           V104         V105       V106         V107        V108
## equines             0.03850167  0.209915680  0.0286270 -0.153680580  0.10860915
## mers001            -0.18340577  0.174518750 -0.1102432 -0.170321230 -0.39196393
## hku11              -0.59387280 -0.055182144  0.3148848 -0.001195435 -0.12453662
## b-cov              -0.38968197 -0.156676830  0.1217769 -0.352764640  0.13634157
## borreliosis         0.10113045 -0.249505530 -0.4791971  0.107203595  0.09377944
## poliomyelitis-like  0.08561149 -0.002085549 -0.4612314 -0.204826520 -0.32568040
##                           V109         V110        V111         V112
## equines            -0.27293727 -0.125377670  0.05724353 -0.127467400
## mers001            -0.10129624 -0.033316650 -0.21377735  0.172942830
## hku11              -0.03045480  0.159106640 -0.35495484 -0.062365692
## b-cov              -0.09661293  0.154738560 -0.53820520 -0.348132160
## borreliosis         0.24539512 -0.009553239 -0.02482518 -0.015675617
## poliomyelitis-like -0.01992879  0.239583340  0.04869350  0.000814566
##                           V113          V114        V115         V116
## equines             0.01167934  0.0002972821 -0.24678935 -0.003730779
## mers001            -0.15316261  0.0693265200 -0.18455695 -0.066308690
## hku11               0.14389774  0.0498981140 -0.04681563 -0.096085526
## b-cov              -0.44182830  0.2733139400 -0.07999499  0.283359100
## borreliosis         0.30000097 -0.0197712150  0.06263128  0.137922350
## poliomyelitis-like -0.25497213  0.1821271600 -0.07260312  0.123448570
##                           V117        V118       V119        V120        V121
## equines            -0.37824884 -0.35737630 -0.1705389  0.23039809  0.09361552
## mers001             0.22959502 -0.26289484  0.2966222 -0.02522067  0.04427286
## hku11               0.17672981  0.05740396  0.4728231  0.08010118  0.07233953
## b-cov              -0.01554258  0.11832526  0.1339620 -0.44059533  0.12839934
## borreliosis        -0.48736504 -0.17013560  0.1732535  0.44089440 -0.05468110
## poliomyelitis-like -0.32202655 -0.18327130 -0.1840263  0.61527780 -0.05377749
##                           V122         V123        V124       V125        V126
## equines             0.36563644  0.172094200  0.10659908 0.29402998 -0.29901650
## mers001             0.04694556  0.008450834 -0.30523450 0.27178967 -0.71265894
## hku11              -0.37912887  0.057719820 -0.09395649 0.24378350 -0.84951380
## b-cov              -0.14529464  0.062466267 -0.06050108 0.08597333 -0.61796130
## borreliosis         0.13731120 -0.235689000 -0.02806987 0.46878815 -0.04861113
## poliomyelitis-like  0.03025746  0.023461062 -0.31550685 0.07963061 -0.04221697
##                           V127      V128        V129        V130        V131
## equines            -0.13179165 0.7846460 -0.40386644  0.21716532  0.02132709
## mers001             0.10588993 0.3017691  0.13777168 -0.26052877 -0.14275712
## hku11               0.22732249 0.4672230  0.10237148  0.22355798  0.07244939
## b-cov               0.14036152 0.1119408 -0.09582249  0.56131150 -0.72161555
## borreliosis        -0.09116106 0.3242821 -0.28812304 -0.07897736  0.07722757
## poliomyelitis-like  0.19990191 0.3690472  0.05487371  0.27653033  0.10368124
##                           V132         V133        V134        V135       V136
## equines            -0.12621357  0.233089800 -0.22907245 -0.08795387 -0.0503377
## mers001            -0.10831823 -0.031755812 -0.34849414  0.08101975  0.1913450
## hku11               0.25853118  0.111142010 -0.17696716 -0.11350215  0.2894583
## b-cov               0.12845090 -0.278358130 -0.03219130 -0.21409883  0.4819149
## borreliosis         0.14096563  0.006114585  0.37045540 -0.44005182 -0.1738730
## poliomyelitis-like  0.08239415 -0.215800020  0.09494355 -0.14054270  0.1749046
##                          V137          V138        V139        V140
## equines            0.41797236  0.0828731900 -0.04825955  0.23277410
## mers001            0.05393028  0.1787868600  0.20588996  0.29474717
## hku11              0.21043213 -0.0914452200  0.28244470 -0.16002327
## b-cov              0.23285612 -0.2692440200  0.43429145 -0.38732150
## borreliosis        0.46082413 -0.0009884253  0.39014820  0.41860345
## poliomyelitis-like 0.43642170 -0.0857382600  0.37035853 -0.08208404
##                            V141         V142        V143        V144
## equines            -0.429055840  0.028607856  0.35155952 -0.42549875
## mers001             0.005764766  0.053249024 -0.12999618  0.03211414
## hku11              -0.163612780 -0.028228352  0.08634844 -0.28566197
## b-cov              -0.211568740 -0.375126930 -0.10023294 -0.36877298
## borreliosis        -0.252642660 -0.008620767  0.16877899 -0.60625905
## poliomyelitis-like -0.545547900  0.017839260  0.06970191 -0.49693453
##                           V145        V146         V147      V148        V149
## equines             0.21488832 -0.19087236 -0.208799780 0.5280470  0.06769451
## mers001             0.04789665 -0.14145264 -0.159848470 0.5104026 -0.12406941
## hku11               0.07406280 -0.30760157 -0.268232320 0.1979419 -0.49153075
## b-cov               0.05995635 -0.54717845 -0.402375250 0.1581831 -0.93189406
## borreliosis        -0.06424087 -0.07684705  0.030546910 0.3477288  0.13147873
## poliomyelitis-like  0.03264135  0.04375606 -0.009349079 0.4499451  0.23761225
##                            V150        V151         V152        V153
## equines            -0.007813436 -0.05459520 -0.193949000 -0.20981869
## mers001            -0.041833530  0.08287029  0.300396950  0.07124590
## hku11               0.011398625 -0.19048138  0.380353600  0.54612080
## b-cov              -0.179289000 -0.33348137  0.059097570  0.40356344
## borreliosis        -0.110463010  0.13510463 -0.008548246 -0.02858288
## poliomyelitis-like -0.238369050 -0.20058973 -0.137753160 -0.13680993
##                           V154        V155        V156        V157        V158
## equines            -0.25995082 -0.12111603 -0.20279801  0.02789337  0.17150928
## mers001            -0.18092304  0.16430795  0.15433483 -0.39403270 -0.13361003
## hku11               0.09348992 -0.09662580  0.07898445 -0.44553700 -0.05666127
## b-cov              -0.02856796  0.26269254  0.18342538 -0.11710444  0.20337483
## borreliosis         0.05089686 -0.02799435 -0.15208177  0.29969525  0.83172864
## poliomyelitis-like -0.22720748 -0.09389418  0.02613947  0.16555296  0.21306038
##                           V159          V160        V161         V162
## equines             0.25940454 -0.0008098162 -0.24987337 -0.364904640
## mers001            -0.17023386 -0.0893142740 -0.38253920  0.006695611
## hku11               0.03653084 -0.0003601186 -0.92480990 -0.101823850
## b-cov              -0.17514380 -0.3247465200 -0.32537222 -0.098991560
## borreliosis         0.16658542 -0.0922757700 -0.27896658 -0.407526100
## poliomyelitis-like -0.02610543 -0.0272165950 -0.04350732 -0.185763050
##                           V163        V164        V165        V166        V167
## equines             0.14527513 -0.06562413 -0.23909129 -0.24013902  0.23111100
## mers001             0.15445861  0.08273049  0.02270171 -0.13855740 -0.02772518
## hku11              -0.30934680  0.04404795 -0.30309707  0.08624872 -0.13426265
## b-cov              -0.07296255  0.04502268  0.01646242  0.09453747 -0.15236412
## borreliosis         0.47567633 -0.02695656 -0.20048903  0.24127948 -0.04332382
## poliomyelitis-like  0.26348713  0.33564633 -0.04068514 -0.14925751 -0.08675695
##                          V168        V169       V170       V171        V172
## equines            0.07795896  0.24058068 -0.3986824 0.02946474  0.07894487
## mers001            0.20235808  0.36868745  0.0588368 0.14772032 -0.23150292
## hku11              0.51739144 -0.01217322 -0.1247788 0.23079434 -0.32382150
## b-cov              0.47665837  0.20718984 -0.2642424 0.54260430 -0.58999180
## borreliosis        0.13783404  0.55832154 -0.1998835 0.02255052  0.12206259
## poliomyelitis-like 0.36517364  0.23671019 -0.1868521 0.35799750 -0.29241010
##                            V173        V174      V175        V176        V177
## equines            -0.003781465 -0.01435838 0.5497689 -0.18712670 -0.03035912
## mers001             0.326151600  0.05116332 0.1079977 -0.01517098  0.19367751
## hku11              -0.105765290  0.10132213 0.2043012 -0.17183754  0.41669464
## b-cov              -0.026334891  0.05925777 0.1272498 -0.09436248  0.09368961
## borreliosis        -0.206606550 -0.12627213 0.5554458  0.12658648 -0.02391166
## poliomyelitis-like  0.232025710 -0.12706552 0.2372027  0.02969237 -0.08325937
##                          V178        V179        V180         V181        V182
## equines            0.26532042  0.03670397 -0.12620616 -0.161513340 -0.01268972
## mers001            0.12470616 -0.43714180 -0.12463964 -0.142172890 -0.26198632
## hku11              0.23811486 -0.33559206 -0.34073886 -0.040639386  0.41053290
## b-cov              0.05952244 -0.31512102 -0.33276623 -0.004498089  0.11486569
## borreliosis        0.51248246 -0.18627120 -0.24971439  0.326712070 -0.18411640
## poliomyelitis-like 0.33876854 -0.17683163 -0.06913313  0.221873700  0.02174703
##                            V183         V184         V185       V186
## equines            -0.457782180  0.143490150  0.087909690  0.1517338
## mers001            -0.145981270 -0.038388167  0.099016540  0.1239053
## hku11               0.004406804 -0.007407106  0.004945443 -0.2150326
## b-cov              -0.102516730  0.406989300  0.235677210 -0.2545040
## borreliosis         0.054074943  0.320766400 -0.006954881 -0.1731949
## poliomyelitis-like  0.272774430 -0.052664530  0.376317860  0.5914898
##                           V187        V188        V189      V190         V191
## equines             0.42087322 -0.36597410 -0.06584722 0.3573998 -0.137010400
## mers001            -0.17369235 -0.06181429  0.28676143 0.3154593  0.012966560
## hku11               0.02159361 -0.18035564  0.48564348 0.4243284  0.007847323
## b-cov              -0.06831494 -0.16921449  0.37782225 0.2002607  0.093410500
## borreliosis         0.17764412  0.09751696  0.01521987 0.5461369  0.424490450
## poliomyelitis-like -0.19595821  0.17418551  0.15076812 0.4319515 -0.016196895
##                          V192       V193       V194        V195        V196
## equines             0.1624896  0.1087535 -0.2022066  0.35776657 -0.29410204
## mers001            -0.2854688  0.2999495  0.3093498  0.11270025 -0.03992354
## hku11              -0.2389549  0.1645915  0.2138592 -0.03042003 -0.52357846
## b-cov              -0.2547604  0.4151901  0.2779614  0.01610372 -0.58224470
## borreliosis         0.2007006  0.1168109 -0.3036234  0.04078365 -0.62842757
## poliomyelitis-like -0.1168004 -0.1719519  0.1090612 -0.02902745 -0.50079066
##                            V197        V198         V199        V200
## equines            -0.028814096  0.04766546  0.216620980  0.16556330
## mers001             0.007453341  0.21219005 -0.185051160 -0.20677567
## hku11              -0.187933620  0.17386950 -0.095082180 -0.05236667
## b-cov              -0.241806750  0.02941018 -0.304866280 -0.11762194
## borreliosis         0.379432470 -0.09830010  0.386721730  0.25588295
## poliomyelitis-like  0.150634660  0.25686637 -0.001932861  0.19550231
##                           V201        V202         V203        V204       V205
## equines             0.18613254  0.10608642  0.188608150  0.04754428  0.3255620
## mers001             0.07281964 -0.08285254  0.003118353 -0.06370807  0.2174113
## hku11              -0.22771889  0.09542241 -0.063823740  0.22735481  0.2320159
## b-cov              -0.36781853  0.20156723 -0.268793460  0.67092260 -0.1426486
## borreliosis        -0.39058962  0.39499617  0.144764940 -0.24212147  0.3773659
## poliomyelitis-like -0.14390786  0.52212405  0.313817140 -0.24428256  0.5229696
##                          V206        V207        V208         V209        V210
## equines             0.2319084  0.03963953 -0.36132090  0.016873358 -0.02759563
## mers001             0.2451663 -0.16690166 -0.20339786 -0.177194680  0.15705639
## hku11               0.1731011 -0.47152624 -0.65817480  0.114002750  0.06271697
## b-cov               0.2151703 -0.16525987 -0.67182910  0.125973140  0.31474245
## borreliosis        -0.1053069  0.18104509  0.05363941  0.275543660  0.10960455
## poliomyelitis-like  0.3620242  0.20043504  0.47056037 -0.006114637  0.02504732
##                           V211        V212        V213       V214        V215
## equines            -0.26372626  0.03257174  0.40217030 -0.3071024 -0.50471747
## mers001            -0.17882200 -0.02921272  0.11465856 -0.3478030 -0.02309733
## hku11              -0.31383547 -0.07420144 -0.41447848 -0.3894331 -0.03476122
## b-cov               0.08613645  0.20690933 -0.14504576 -0.3684033  0.51333370
## borreliosis         0.40549436  0.26288828  0.08812041 -0.1356047 -0.10155952
## poliomyelitis-like  0.01145139  0.04120889  0.01060974 -0.1566200  0.17707877
##                            V216        V217       V218       V219        V220
## equines            -0.347624480  0.25134113 0.17289613 0.06425332 -0.22090912
## mers001            -0.230126440  0.16177766 0.16422175 0.32201266 -0.06409440
## hku11               0.013584516  0.09962453 0.09325928 0.24159013  0.19926314
## b-cov               0.005007951 -0.09722962 0.10105199 0.32949862  0.06472880
## borreliosis         0.246019700 -0.20778210 0.57176680 0.30317047  0.02694851
## poliomyelitis-like  0.302016760  0.30871420 0.31613630 0.30972922  0.04999819
##                           V221         V222        V223        V224        V225
## equines            -0.03914366  0.267297180 -0.40559062  0.26880324 -0.03962610
## mers001            -0.02389959  0.004673506 -0.17718841  0.31779373 -0.47349674
## hku11              -0.20502025 -0.007450726 -0.57745550  0.42681015 -0.34667084
## b-cov              -0.25079620 -0.249126780 -0.40402317  0.56244224 -0.34712234
## borreliosis         0.13645588 -0.390733240  0.20225444  0.22791688  0.29351622
## poliomyelitis-like -0.02242862 -0.243055120  0.08236051 -0.04970396  0.05514716
##                           V226       V227        V228        V229        V230
## equines             0.35779288  0.2839354 -0.19542465  0.11771324 -0.28835562
## mers001             0.07950526  0.1885495  0.23916902  0.05820171 -0.18966669
## hku11              -0.21290350  0.1301343  0.12400005  0.49399230 -0.20088142
## b-cov               0.14166690  0.1461979 -0.08977294  0.25534433  0.02016562
## borreliosis         0.33142710  0.3367960 -0.02722750  0.19862170 -0.21143624
## poliomyelitis-like  0.05923095 -0.0466193  0.01736952 -0.29026827 -0.23174636
##                           V231       V232        V233        V234        V235
## equines             0.19786093 -0.3202346 -0.18523312 -0.32119352 -0.22577132
## mers001            -0.16353333  0.4485322  0.04204082 -0.31471682 -0.34111652
## hku11              -0.06623317  0.3427113 -0.41357666 -0.57398380 -0.23631270
## b-cov               0.21680413  0.3554559  0.13968627 -0.08231734  0.06063811
## borreliosis        -0.25803310  0.1661090 -0.17815682  0.09660935 -0.03232753
## poliomyelitis-like -0.04653626  0.2204026 -0.05573781  0.24898517  0.15071727
##                           V236         V237        V238       V239       V240
## equines             0.39558515 -0.099668120 -0.37339172  0.2086149 0.16076975
## mers001             0.20834091 -0.007026819 -0.21515651 -0.1712667 0.12169899
## hku11               0.26208773 -0.141404820 -0.08103006 -0.1148244 0.34616962
## b-cov               0.33567770  0.172250050 -0.01322550 -0.3577069 0.04360465
## borreliosis        -0.11582337 -0.040503304 -0.43144286  0.4114651 0.37995930
## poliomyelitis-like  0.09200669 -0.045372255 -0.39357942  0.3566563 0.05077722
##                          V241        V242        V243        V244        V245
## equines            0.43269540  0.04955981 -0.19315349 -0.09475119  0.23335266
## mers001            0.14524509 -0.10713989 -0.18247716  0.08876727 -0.07008161
## hku11              0.54344666 -0.14574017 -0.28058330  0.05062875 -0.18894057
## b-cov              0.04061089  0.19942217 -0.09818464 -0.01775772 -0.03356880
## borreliosis        0.30501503 -0.17092353 -0.50302655 -0.22425042  0.13168944
## poliomyelitis-like 0.04802032 -0.19320427 -0.38053164  0.23217405  0.14958037
##                           V246        V247        V248        V249       V250
## equines            -0.27807960 -0.59880180  0.05784727  0.33433694 -0.1343852
## mers001            -0.25165847  0.18113878 -0.09820239  0.05394435  0.0547007
## hku11               0.09709910  0.07265194 -0.19711237  0.35272485 -0.1929643
## b-cov               0.03883014  0.05494646 -0.20215640  0.01995254  0.1946652
## borreliosis        -0.30895576 -0.53390880 -0.01698794 -0.19850339  0.4606695
## poliomyelitis-like -0.33366677 -0.13542713 -0.18045029  0.09620881 -0.1170429
##                           V251        V252       V253          V254        V255
## equines             0.03034481  0.14790972 -0.3850267  0.0975031300  0.24965230
## mers001            -0.01177250 -0.05401717 -0.3264950  0.2353285600 -0.03745336
## hku11              -0.34167590 -0.06567857 -0.2317287 -0.0114241520  0.25615534
## b-cov              -0.26532114 -0.12106454 -0.2887557 -0.0963161660 -0.10188762
## borreliosis        -0.34002087  0.02003723 -0.1819038 -0.0304181020  0.50953520
## poliomyelitis-like  0.11071904  0.29162252  0.1666907 -0.0005288996  0.17921741
##                          V256         V257        V258       V259        V260
## equines            0.34692624  0.191845940 -0.04191696 -0.3702121  0.02572495
## mers001            0.08797582 -0.007942038  0.08913151 -0.5168732 -0.13164702
## hku11              0.36027262  0.117123290  0.29862240 -0.6802452  0.13713352
## b-cov              0.25983927  0.044115983  0.36059994 -0.6835191 -0.35674700
## borreliosis        0.11013354  0.362597730  0.20926453 -0.4350297 -0.28296992
## poliomyelitis-like 0.15192662  0.307962150 -0.14545391 -0.1668293  0.23250972
##                          V261        V262       V263        V264        V265
## equines             0.1792525  0.29659910  0.1162683 -0.27740390  0.04152656
## mers001             0.4275140  0.08730372  0.1054778 -0.06272911 -0.07715718
## hku11               0.1247107 -0.26501350 -0.4350363  0.04378432 -0.10968643
## b-cov              -0.4056213 -0.40810516 -0.1033421  0.24427594  0.22438150
## borreliosis         0.5499617  0.20233347  0.1809648  0.05847350 -0.10501756
## poliomyelitis-like  0.3096064  0.20326872 -0.1399777 -0.19452696  0.07868132
##                           V266        V267        V268        V269        V270
## equines            -0.11848922  0.03683834  0.01847811  0.02674482  0.13909931
## mers001             0.13856149  0.09963655  0.03863866 -0.01858614  0.36559018
## hku11               0.14839074  0.10619055  0.30536490 -0.33618847  0.01654463
## b-cov               0.01250131 -0.28506320  0.46109807 -0.08191922  0.22639474
## borreliosis        -0.20493448 -0.57653564 -0.01984089  0.37170267  0.10076117
## poliomyelitis-like  0.06845472 -0.32258332 -0.20529631  0.03000377 -0.30889140
##                           V271        V272       V273      V274        V275
## equines            -0.42862037 -0.19502930 0.40198484 0.3278304 -0.16277507
## mers001            -0.02635956  0.41522288 0.18331979 0.1353481  0.08126567
## hku11              -0.10665757 -0.43369332 0.06836406 0.3245627 -0.55878430
## b-cov              -0.64407320 -0.25371397 0.01069643 0.1575683 -0.70094687
## borreliosis        -0.16348471  0.39030525 0.26751190 0.3631494 -0.18764016
## poliomyelitis-like -0.21982083  0.02090026 0.16390444 0.4307295 -0.40388355
##                           V276        V277       V278        V279        V280
## equines             0.21001695  0.49048594 -0.1053051 -0.25823570 -0.04469810
## mers001             0.01905448  0.12541781 -0.1395396 -0.03068201 -0.06593797
## hku11               0.31158102  0.23720087 -0.3496884 -0.40164882  0.16403528
## b-cov              -0.01764769 -0.07950428 -0.2208065  0.06064332  0.00916285
## borreliosis        -0.09552341  0.14758122  0.1219204 -0.19795002  0.27263233
## poliomyelitis-like  0.03930342  0.39952046  0.1052264 -0.06366608  0.11151467
##                            V281        V282       V283       V284        V285
## equines            -0.096857280  0.45906170 -0.3566937 -0.5131708 -0.12238224
## mers001            -0.162239540 -0.25933895 -0.3142889  0.0828340 -0.25893846
## hku11              -0.059018300  0.39247885 -0.4563845 -0.3375348 -0.28801718
## b-cov              -0.119805420 -0.08710018 -0.8830411 -0.4490811 -0.50233907
## borreliosis         0.235125850  0.55596715 -0.1064238 -0.4271221 -0.03234239
## poliomyelitis-like  0.001646102 -0.01967174 -0.2237641 -0.5771344  0.25791848
##                           V286       V287         V288       V289      V290
## equines            -0.26749176 0.36345056 -0.003188965 0.09551277 0.2102274
## mers001             0.02946636 0.09111407  0.195496920 0.08610142 0.1340841
## hku11              -0.09183521 0.20711760 -0.211467770 0.34648350 0.3594854
## b-cov              -0.09268853 0.28624195 -0.377546600 0.36341962 0.3844398
## borreliosis        -0.07472138 0.04783938  0.383256500 0.23114732 0.6568224
## poliomyelitis-like -0.18475701 0.17587167  0.386728940 0.30680250 0.4802814
##                           V291         V292        V293        V294        V295
## equines             0.10861573  0.065212056  0.17184897 -0.19625726  0.06891499
## mers001             0.15701516  0.002641898  0.20186493  0.11583256  0.04923562
## hku11               0.22847338  0.347385760 -0.19294739 -0.06360266 -0.25614417
## b-cov               0.21003467  0.423407730 -0.05406551  0.27895343  0.03179154
## borreliosis        -0.05216979 -0.049621146  0.35023484  0.12712288 -0.21269230
## poliomyelitis-like -0.11619792  0.117667910  0.66434760  0.08907303 -0.09120734
##                            V296        V297       V298        V299        V300
## equines            -0.270936370 -0.04311628  0.1599648 -0.44186664  0.05629280
## mers001            -0.320693520 -0.13031584 -0.1631802  0.04581932  0.02421508
## hku11              -0.555198800 -0.14952376 -0.6188920 -0.38361973 -0.25625452
## b-cov              -0.446959400 -0.13856791 -0.6524845 -0.64554050 -0.08319721
## borreliosis         0.002503621  0.11319777 -0.4374845 -0.16431290  0.16926205
## poliomyelitis-like -0.126203370  0.38335085 -0.1145345  0.07980620  0.12920211

Each column is a meaning vector that tells something we don’t know. Remember, these were extracted from academic articles. But if we group them using PCA, they may present a good picture.

For a fancier visual, we will use Tensorflow’s projector. We already have done it for you:

http://covid19embedding.surge.sh/