INF556 -- Topological Data Analysis (2024-25)
Steve Oudot
-->
General Introduction
Topological Data Analysis (TDA) is an
emerging trend in exploratory data analysis and data mining. It
has known a growing interest and some notable successes (such as
the identification of a new type of breast cancer, or the
classification of NBA players) in the recent years. Indeed, with
the explosion in the amount and variety of available data,
identifying, extracting and exploiting their underlying
structure has become a problem of fundamental importance. Many
such data come in the form of point clouds, sitting in
potentially high-dimensional spaces, yet concentrated around
low-dimensional geometric structures that need to be
uncovered. The non-trivial topology of these structures is
challenging for classical exploration techniques such as
dimensionality reduction. The goal is therefore to develop novel
methods that can reliably capture geometric or topological
information (connectivity, loops, holes, curvature, etc) from
the data without the need for an explicit mapping to
lower-dimensional space. The objective of this course is to
familiarize the students with these new methods, lying at the
interface between pure mathematics, applied mathematics, and
computer science.
News
- The course starts on September 27th, at 8:30am, in Amphi Curie. Please bring in pens and paper to take notes during the lecture, and your laptop for the lab.
Practical Aspects
Where and when:
- lectures: Fridays 8:30 - 10:30, Amphi Curie
- practical sessions: Fridays 10:45 - 12:45, Amphi Curie
Coding language:
- You can use whichever programming language (e.g. Java, C++, Python) and whichever IDE you want (e.g. Emacs, Eclipse, VSCode). Note however that, if you choose a language the teachers are not familiar with, then you take the risk of not receiving much help with your code from them. Also, please install the required compiler or interpreter and libraries for your chosen language, as well as your chosen IDE, before you come to the first lab!
Course grading:
- Midterm (graded TD) + Final (written exam)
Resources
Course notes and slides:
- the official slides and notes (handwritten) for each lecture, provided in the schedule below
- some unofficial lecture notes written by Théo Lacombe (contain some mistakes, use at your own risk)
- the course notes from an introductory course given to the professors of classes préparatoires in 2024, available here (in French, look for the links to the pdf files after each abstract).
Videos:
- the official videos for each lecture, provided in the schedule below. These videos were made during the CoVid pandemic, so they basically consist of a voice over the notes and slides. But from experience they are good enough.
- the videos of the introductory course given to the professors of classes préparatoires in 2024, available here (in French, look for the videos from year 2024)
Books:
- H. Edelsbrunner, J. Harer. Computational Topology: An
Introduction. AMS Press, 2009. A good introduction to applied topology, including TDA. Well-suited for this course. This book is not available at the library, however it was compiled
from the following course
notes, which you can download instead.
- S. Oudot. Persistence Theory: From Quiver Representations to Data Analysis. AMS Surveys and Monographs, Vol. 209, 2015. A comprehensive treatment of persistence theory, perhaps too advanced for this course, but in principle you should be able to read it by the end of the course! Five printed copies are available at the library, otherwise you can download a pdf version here.
- James R. Munkres. Elements of Algebraic Topology. Perseus, 1984. A general introduction to algebraic topology, which you can consult (one printed copy is available at the library). Especially relevant to the course is its first chapter on homology theory.
- Trevor Hastie, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning (2nd edition). Springer-Verlag, 2009. An excellent reference for the ML aspects of the course.
You can download a pdf version of the book here.
Schedule
Session 1: Clustering (video file) |
Slides course introduction,
Slides degree-0 persistence, Notes degree-0 persistence,
Slides clustering, Notes mode-seeking,
ToMATo's webpage
|
TD 1
|
Sept. 27 |
Session 2: Homology I (video file) |
Notes homology,
book homology
|
PC 2-3, solution
|
Oct. 4 |
Session 3: Homology II (video file) |
Oct. 11 |
Session 4: Persistence I (video file) |
Notes persistence,
Slides persistence
book persistence 1,
book persistence 2
|
TD 4
|
Oct. 18 |
Session 5: Persistence II (video file) / Topological Inference (video file) |
Notes inference, Slides inference
|
TD 5
|
Oct. 25 |
Session 6: Topological descriptors for geometric data
(video file) |
Notes descriptors, Slides descriptors, Notes on stability
|
PC 6
|
Nov. 8 |
Session 7: Learning with topological descriptors
(video file) |
Slides learning
|
TD 7, solution
| Nov. 15 |
Session 8: Statistics with topological descriptors (video file) |
Slides satistics
|
TD 8
|
Nov. 22 |
Session 9: Reeb graphs and Mapper |
Notes Reeb and Mapper, Slides Reeb and Mapper
|
PC 9 |
Nov. 29 |
Final exam (Dec. 19, 9:00am - 12:00pm):
|
|
Last update: Sept. 23 2024.