The goal of this lab session is to test the approaches for linear and non-linear dimensionality reduction that we saw during the lecture, to see their practical interest and limitations.
Before you start the lab, please refer to this page to install and configure R. To run R, simply type R in a terminal. You can write and execute commands in the R environment directly, or in dedicated source files (whose names should have the .R extension). To run a source file file.R in R, simply type source('file.R') in the R environment. To edit and execute source files you can use Emacs together with the ESS extension. Alternatively, you can use a dedicated IDE such as R-Studio. The choice of a particular IDE is always a matter of personal taste...
We will need the following packages:
The R command to know which packages are installed is library(). To use package pkg, just type library(pkg). The package must be installed in order to be used. The command to install the package is install.packages("pkg", dependencies = TRUE). The installation itself requires an access to the Internet. It downloads either source or binary files, depending on your OS and architecture. In case source files are downloaded, you may need to have make and a C++ compiler like gcc installed to compile the package. Please refer to the dedicated webpage for more information, and note that some OSes like Linux conveniently allow you to install packages globally (i.e. for all users) through their own package manager.Implement PCA in the language R using the diagonalization of the covariance matrix, as seen during the lecture. Your function PCA should take a point cloud and a boolean as input. The cloud of n points in d dimensions will be given as a data frame with n lines and d columns. The boolean will indicate whether the data should be normalized in addition to being centered as a preprocessing.
Your function should output a list whose first four cells contain respectively:
Here is our solution. If you decide not to implement PCA then you should take the time to look at the solution carefullly.
Now you can test your function on the data sets decathlon and NBA. To read in the data you can use the function read.table, which reads the content of a data file and stores it in a data frame. For plotting the results in 2d you can use the functions plot and text. An interesting diagram to plot as well is the correlation circle, which gives the correlations of the input variables to the new variables. To draw a circle in the plane you can use the function draw.circle from the plotrix package. Here are the results obtained on the decathlon data set using our PCA function (from left to right: spectrum of covariance matrix, cumulated variance, 2d embedding, correlation circle):
![]() |
![]() |
![]() |
![]() |
Implement MDS in the language R using the diagonalization of the Gram matrix, as seen during the lecture. Your function MDS should take the same input as PCA. It should output a list with three cells only, containing respectively:
You can now apply MDS to the data set decathlon and should obtain the same result as with PCA. You can also apply it to the data set COIL-100 and should obtain the following result (from left to right: spectrum of the Gram matrix, 2d embedding, 3d embedding; to plot the result in 3d you can use the plot3d fuction from the rgl package):
![]() |
![]() |
![]() |