INF556  Topological Data Analysis (202021)
Steve Oudot
General Introduction
Topological Data Analysis (TDA) is an
emerging trend in exploratory data analysis and data mining. It
has known a growing interest and some notable successes (such as
the identification of a new type of breast cancer, or the
classification of NBA players) in the recent years. Indeed, with
the explosion in the amount and variety of available data,
identifying, extracting and exploiting their underlying
structure has become a problem of fundamental importance. Many
such data come in the form of point clouds, sitting in
potentially highdimensional spaces, yet concentrated around
lowdimensional geometric structures that need to be
uncovered. The nontrivial topology of these structures is
challenging for classical exploration techniques such as
dimensionality reduction. The goal is therefore to develop novel
methods that can reliably capture geometric or topological
information (connectivity, loops, holes, curvature, etc) from
the data without the need for an explicit mapping to
lowerdimensional space. The objective of this course is to
familiarize the students with these new methods, lying at the
interface between pure mathematics, applied mathematics, and
computer science.
News
 [Sept. 16 2020] Due to the COVID19 pandemic, the course will run in degraded mode this academic year, and it is reorganized as follows:
 the entire course takes place online only, on Zoom (links to the various sessions will be provided in due time on the Moodle page);
 lectures are turned into Q&A sessions, and you are supposed to get acquainted with the corresponding material (course notes, slides, videos) beforehand (links to this material are provided below, for watching the videos you will be asked for your Polytechnique LDAP on enseignement.medias.polytechnique.fr);
 exercise sessions (sessions #3, #4, #8 and #9) will take place on Zoom and we will use a shared board for interactions;
 for lab sessions you will be on you own, you can do them as you like and whenever you like; I will be available, either synchronously or asynchronously, via the Slack channel set up for the course (for which you must have received an invitation email);
 there will be no graded TD, only the final exam will be graded.
Practical Aspects
Where and when:
 lectures: Fridays 8:30  10:30, on Zoom
 exercise sessions: Fridays 10:45  12:45, on Zoom
 lab sessions: on your own
Before you start the first lab session, you are advised to:
 Install a Java compiler like the Java Development Kit, or a C++ compiler like gcc, or a Python interpreter. You can use whatever IDE you prefer, e.g. Eclipse or Code::Blocks.
 Make sure your laptop is equipped with R as well. See the Set Up section on the first TD page for more details on how to install the software and configure your environment.
 Get familiar with the R language a bit. For this I recommend the following tutorial, you can restrict yourself to the "R introduction" section (introductory page + pages on basic data types).
Course grading:
 Final (written exam, 3 hours)
Documents
Books:
 H. Edelsbrunner, J. Harer. Computational Topology: An
Introduction. AMS Press, 2009. A good introduction to applied topology, including TDA. Wellsuited for this course. This book is not available at the library, however it was compiled
from the following course
notes, which you can download instead.
 S. Oudot. Persistence Theory: From Quiver Representations to Data Analysis. AMS Surveys and Monographs, Vol. 209, 2015. A comprehensive treatment of persistence theory, perhaps too advanced for this course, but in principle you should be able to read it by the end of the course! Five printed copies are available at the library, otherwise you can download a pdf version here.
 James R. Munkres. Elements of Algebraic Topology. Perseus, 1984. A general introduction to algebraic topology, which you can consult (especially its first chapter) for more background on homological algebra. One printed copy is available at the library.
 Trevor Hastie, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning (2nd edition). SpringerVerlag, 2009. An excellent reference in learning. Chapter 14 covers the material on dimensionality reduction addressed during the first session. You can download a pdf version of the book here.
Schedule
Course introduction (video file) 
Session 1: Dimensionality Reduction (video file) 
Notes, TeXified notes, slides 
TD 1

Sept. 25 2020 
Session 2: Clustering (video file) 
Slides intro clustering, Slides modeseeking
Notes modeseeking, Notes degree0 persistence
ToMATo's webpage

TD 2

Oct. 2 2020 
Session 3: Homology I (video file) 
Notes homology, TeXified notes homology,
book homology

PC 34, solution

Oct. 9 2020 
Session 4: Homology II (video file) 
Oct. 16 2020 
Session 5: Persistence I (video file) 
Notes persistence,
Slides persistence
book persistence 1,
book persistence 2

TD 5

Oct. 23 2020 
Session 6: Persistence II (video file) / Topological Inference (video file) 
Notes inference, Slides inference

TD 6

Nov. 6 2020 
Session 7: Topological descriptors for geometric data
(video file) 
Notes descriptors, Slides descriptors, Notes on stability

TD 7

Nov. 13 2020 
Session 8: Learning with topological descriptors
(video file) 
Slides learning

PC 8

Nov. 20 2020 
Session 9: Statistics with topological descriptors (video file) 
Slides satistics

PC 9 
Dec. 11 2020 
Final exam: submit your work as a single pdf file by Dec. 11 (submission link)

Text of the exam 
Internships
Feel free to come
and discuss with us if you are looking for an internship in TDA.
Last update: Sept. 16 2020.