3A Parcours d'Approfondissement Bioinformatique - PA Bioinformatics

2024-25 / X22


Slides of spring 2024
Old slides of September 12th, 2023

Livret PA Bioinformatique [PDF]

Contact : sebastian.will@polytechnique.edu

In the last 50 years, Computer Science has become a major component in many areas of research and engineering in Biology.

New topics have emerged that, without Computer Science, would not exist, simply because of the mass of data to be processed: for example, in genome sequencing and thus for comparative genomics, for the classification and annotation of proteins and RNAs, for predicting and engineering new molecular structures which open new perspectives in Biology, Pharmacology and Medicine, with the new challenges of Genomic Medicine. Conversely, the questions raised by Biology and the singularities of biological data induce new problems in Algorithmic, Combinatorics and Machine Learning.

When high-speed sequencing is now reaching spectacular output flow, some analyses are still long, delicate and costly. Both "Omics" and clinical data may also remain scarce for some pathologies and exceptional cases are more often a rule than normality. Then a systematic use of massive automatic learning reaches some limits. At last, the Biologist and the Medics are waiting for explanations that will allow to increase their understanding of the underlying phenomena and not just some results of a statistical optimization.

Then Modelling, Algorithmics and Data Science are therefore becoming increasingly important. We will focus on the construction of mathematical and efficiently computable models to predict but also explain the biological phenomena that occur at the level of the cell and its components, and also at the level of groups of cells, and complex organisms. It is often a question of designing dedicated tools for very specific phenomena.

But Information Technologies alone cannot solve the problems and broad knowledges in Biology remain essential for upstream and downstream contribution to the Life Sciences. Thus, Bioinformatics, at the interface between Biology and Computer Science, is a very diverse and rapidly evolving science, as questions are renewed as fast as the discoveries in Biology and the progress in tools for observing the living world.

In addition, Neuroscience is an even broader field. The skills acquired in the Bioinformatics PA are also a good introduction to address Neuroscience before a specialization in this track as the "fourth" year.
The most of courses in Biology are quite directly related to the study of the nervous system, and several courses in Computer Science apply naturally to the analysis of data in Neuroscience, while others apply to algorithms for the simulation of neural circuits.

Objectives

The third year Bioinformatics program is an introduction to the concepts and issues of this discipline, through a few general courses in Biology and Computer Science and a few other modules that can constitute a specialization. It is the question of giving themself the dual competence that will enable to interact at the highest level with Computer Scientists, Biologists and Medics, within the multi-disciplinary teams that now structure research and development in this area. A relatively wide choice of courses, either theoretical or applied, gives the possibility to build an individualized curriculum, to complete his own skills on a case-by-case basis and to prepare a project of career accorded to his interests.

This specialization will continue during the "fourth" year, before looking towards both academic and corporate career. In the latter case too, most often in international companies, a PhD is always the recommended step for any high-level career.

Program composition rules

During the first two terms, the emphasis is on the consolidation of knowledges, the cross-cutting use of which will be rather developed during the Research Internship. A "long" project, taken over both first terms, is also a way to approach such a multidisciplinary exercise. The imposed rules ensure some balance between Biology and Computer Science, with some possible additions in Statistics, in order to acquire the extensive skills that make the specificity of an Bioinformatician. These rules also aim to avoid to much dispersal, while allowing everyone to consolidate their weaknesses and deepen in the field for an expected early career.

Prerequisites

The general prerequisites for this program are to have validated, during the second year:

When registering, it is also necessary to check the specific requirements for each course, as advided on "moodle.polytechnique.fr".
In particular, it is advisable to have taken in the second year, the course APM_41033_ EP (MAP433)- « Statistiques », particularly if one incorporates MAP courses into the third year curriculum. The course APM_41032_EP (MAP432)- « Modélisation de phénomènes aléatoires » may also be useful.

Opportunities

At the end of this third year program, it is of course possible to continue a curriculum in Bioinformatics. It is also possible to reorient towards Computer Science or Biology, keeping a benefit from this beginning of dual Bioinformatics training. Continuing on a monodisciplinary curriculum assumes, of course, that this discipline has remained a major part of the 2A and 3A curriculum, thus supporting such a motivation.
As relatively balanced formations between the two disciplines, we can quote, for example:

In France, one can add an M2 year such as « AMI2B » at Paris-Saclay, the BIM-BMC « BioInformatique et Modélisation » curriculum at Sorbonne Université, and others at Bordeaux, Aix-Marseille, Toulouse...

The duration of such a curriculum varies, from one year for an M2 in France or some MSc, to two years for more comprehensive programs. They generally offer a research track that prepares for a PhD afterwards.
The opportunities in academic research, after a PhD, are in university laboratories or in major research centers either in France (INRA, Institut Pasteur, Institut Curie, Inserm, ...) or abroad (e.g. EBI, SIB, NCBI, ...). Activities may include the design of analysis or prediction algorithms, but also the development, cleaning and enrichment of large databases that are community, as well as the engineering of the platforms that host them.

Business opportunities, preferably after a PhD, may be in research and development in pharmaceutical and agri-food industries, agrochemicals, health engineering, the environment, or biotechnology.
There are also opportunities in the management of technological innovation.

Choice rules

Global consistency rule:
In total, over the two terms and the 8 required modules, one must take at least:

In this count, MDC_52P88_EP (BIO/INF588) can be counted as BIO or as INF and a "long" project is counted for 2 modules.

Project(s) :
Taking a long project, covering both periods, is encouraged. This is the best opportunity to develop cross-cutting skills before the 3A internship. This can be done within the 8 mandatory modules, in place of the EA of each term.
When a long project is carried out as an optional (additional) course, it will have to be completed in full, which corresponds to obtaining an additional grade per period.
Beware, several computer courses, especially those marked EA, require the realization of a significant programming project which counts for validation. It is appropriate for everyone to measure the workload when choosing courses.

Blending rules:
A mix is allowed within the limit of one module per period and it cannot be used for taking a long project other than BIO511, BIO572 or INF511.
A mix will only be accepted if it is part of a coherent and motivated curriculum, clearly explained at the time of registration.

Program composition

Term 1 (Fall)
3 courses to choose among :

1 EA (or project) to choose among :
grille P1
likely slots for P1 courses (standard format is 2h lecture followed by 2h classes)

Terms 1 and 2
The long project INF511 replaces both period 1 and 2 EAs, or it may be taken as two optional modules, one per period. BIO511 or BIO572 replace only a single EA, although they span P1 and P2.
1 long project to choose among :

The agreement for a long project requires the prior definition of a subject, at the latest at the start-up meeting in September. This is the case for BIO511. For BIO572 and INF511, it is even advisable to start the process as soon as you register.

Term 2 (Winter)
1 mandatory course : 2 courses to choose among : 1 EA (or project) to choose among :
grille P2
likely slots for P2 courses (MAP format is 2h lecture followed by 2h classes among two alternative group slots)

Term 3
Research Internship to choose among :


For all questions, comments..., email : Sebastian.Will@polytechnique.edu