INF556 -- Topological Data Analysis (2020-21)
Topological Data Analysis (TDA) is an
emerging trend in exploratory data analysis and data mining. It
has known a growing interest and some notable successes (such as
the identification of a new type of breast cancer, or the
classification of NBA players) in the recent years. Indeed, with
the explosion in the amount and variety of available data,
identifying, extracting and exploiting their underlying
structure has become a problem of fundamental importance. Many
such data come in the form of point clouds, sitting in
potentially high-dimensional spaces, yet concentrated around
low-dimensional geometric structures that need to be
uncovered. The non-trivial topology of these structures is
challenging for classical exploration techniques such as
dimensionality reduction. The goal is therefore to develop novel
methods that can reliably capture geometric or topological
information (connectivity, loops, holes, curvature, etc) from
the data without the need for an explicit mapping to
lower-dimensional space. The objective of this course is to
familiarize the students with these new methods, lying at the
interface between pure mathematics, applied mathematics, and
- [Sept. 16 2020] Due to the COVID-19 pandemic, the course will run in degraded mode this academic year, and it is reorganized as follows:
- the entire course takes place online only, on Zoom (links to the various sessions will be provided in due time on the Moodle page);
- lectures are turned into Q&A sessions, and you are supposed to get acquainted with the corresponding material (course notes, slides, videos) beforehand (links to this material are provided below, for watching the videos you will be asked for your Polytechnique LDAP on enseignement.medias.polytechnique.fr);
- exercise sessions (sessions #3, #4, #8 and #9) will take place on Zoom and we will use a shared board for interactions;
- for lab sessions you will be on you own, you can do them as you like and whenever you like; I will be available, either synchronously or asynchronously, via the Slack channel set up for the course (for which you must have received an invitation e-mail);
- there will be no graded TD, only the final exam will be graded.
Where and when:
- lectures: Fridays 8:30 - 10:30, on Zoom
- exercise sessions: Fridays 10:45 - 12:45, on Zoom
- lab sessions: on your own
Before you start the first lab session, you are advised to:
- Install a Java compiler like the Java Development Kit, or a C++ compiler like gcc, or a Python interpreter. You can use whatever IDE you prefer, e.g. Eclipse or Code::Blocks.
- Make sure your laptop is equipped with R as well. See the Set Up section on the first TD page for more details on how to install the software and configure your environment.
- Get familiar with the R language a bit. For this I recommend the following tutorial, you can restrict yourself to the "R introduction" section (introductory page + pages on basic data types).
- Final (written exam, 3 hours)
- H. Edelsbrunner, J. Harer. Computational Topology: An
Introduction. AMS Press, 2009. A good introduction to applied topology, including TDA. Well-suited for this course. This book is not available at the library, however it was compiled
from the following course
notes, which you can download instead.
- S. Oudot. Persistence Theory: From Quiver Representations to Data Analysis. AMS Surveys and Monographs, Vol. 209, 2015. A comprehensive treatment of persistence theory, perhaps too advanced for this course, but in principle you should be able to read it by the end of the course! Five printed copies are available at the library, otherwise you can download a pdf version here.
- James R. Munkres. Elements of Algebraic Topology. Perseus, 1984. A general introduction to algebraic topology, which you can consult (especially its first chapter) for more background on homological algebra. One printed copy is available at the library.
- Trevor Hastie, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning (2nd edition). Springer-Verlag, 2009. An excellent reference in learning. Chapter 14 covers the material on dimensionality reduction addressed during the first session. You can download a pdf version of the book here.
|Course introduction (video file) |
|Session 1: Dimensionality Reduction (video file)
Notes, TeXified notes, slides
|| TD 1
||Sept. 25 2020
|Session 2: Clustering (video file)
Slides intro clustering, Slides mode-seeking
Notes mode-seeking, Notes degree-0 persistence
| TD 2
||Oct. 2 2020
|Session 3: Homology I (video file)
Notes homology, TeXified notes homology,
|| PC 3-4, solution
||Oct. 9 2020
|Session 4: Homology II (video file)
||Oct. 16 2020
|Session 5: Persistence I (video file)
book persistence 1,
book persistence 2
| TD 5
||Oct. 23 2020
|Session 6: Persistence II (video file) / Topological Inference (video file)
Notes inference, Slides inference
|| TD 6
||Nov. 6 2020
|Session 7: Topological descriptors for geometric data
Notes descriptors, Slides descriptors, Notes on stability
|| TD 7
||Nov. 13 2020
| Session 8: Learning with topological descriptors
|| PC 8
||Nov. 20 2020
|Session 9: Statistics with topological descriptors (video file)
|| PC 9
||Dec. 11 2020
Final exam: submit your work as a single pdf file by Dec. 11 (submission link)
||Text of the exam
Feel free to come
and discuss with us if you are looking for an internship in TDA.
Last update: Sept. 16 2020.