Overview
Welcome to DH 607 - Introduction to computational multi-omics! The course focuses on diving deep into technologies that have been developed in the last couple of decades to understand what goes inside a cell.
We will take a first-principles approach to understanding biology and the technological advances that make it possible to understand how we can ‘measure’ biological processes quantitatively.
The course will build your foundational knowledge and skills necessary to explore and analyze complex (and real!) biological data. We will take multiple vignettes of biological questions and see how the fields of probability and statistics and computer science provide us with enlightening answers.
The course is neither a course in biology nor a course in statistics but somewhere in between.
One of the key objectives of (modern) biology is to understand how different molecules shape the functionality of a cell/tissue/organ. The human body has 36 trillion (\(36 \times 10^{12}\)) cells in the human body. All the cells in our body typically have the same DNA but they have different functions.
If we had to draw a functional map of the human body, we would need to measure what goes inside each cell. Over the years, significant technological advancements have made it possible to ‘profile’ the various molecules in each cell (DNA/RNA/Protein). This data is high-dimensional. Scales can vary, but a typical modern-day dataset can have thousands of features (DNA sequence/gene expression/ proteins) measured over millions of rows (observations). To draw any biological insights, this high-dimensional data requires ‘processing’. The course will focus on mathematical and statistical methods that are required to analyze a modern day genomics/transcriptomics dataset.
A detailed set of topics will be available here. But overall, you will learn about:
- What is sequencing
- Algorithms for aligning DNA sequences and searching DNA databases
- Statistical methods to discover genome variation, and application to discovering etiology of disease
- Probability and statistics for sequence analysis
- How is gene expression quantified computationally
- Statistical models for analyzing gene expression data
- Linear and non-linear dimensionality reduction methods and their applications in multiomics
- Statistical models for identifying transcription factor binding
- Hidden markov models and applications in multiomics
- How are disease-causing mutations identified (genome wide association studies)
- Statistical modelling of single-cell multomics data
- Statistical models for modeling CRISPR screens
- Recent advancements in statistical methods and deep learning applications in multiomics for human diseases
How to perform exploratory data analysis and visualize genomic data
Apply tailored statistical methods to answer questions using high dimensional biological data
“Getting your hands dirty” by analyzing genomics data to draw actionable insights for improving human health
Write production level, reproducible, reusable code and software packages
The course will be evaluated based on the following components:
- Assignments: 24% (Best 8 out of 9)
- Due every week on Friday 5pm via Gradescope
- Weightage: 3% each (Best 8 will be considered for grading)
- Late submission policy: 10% penalty per day upto a maximum of 6 days
- Surprise Quizzes: 6%
- Mid-sem: 25%
- Closed book and offline
- Course Project: 20%
- End-sem: 25%
- Closed book and offline
For assignment problems, you should work on your own. If you get stuck, you are welcome to discuss it with other students (in-person, or online on Piazza). However, the solutions must be your work. If you discussed with someone, please mention their name and what you received help with in your submission.
Mid-semester and end-semester exams will be closed-book. No collaboration is allowed.
You are allowed to use Large Language Models (LLMs) like ChatGPT, Claude, etc. as learning aids, but you must:
- Clearly document when and how you used an LLM in your submission
- Ensure you understand the solutions provided by the LLM
- Be prepared to explain your work during office hours or exams
- Not rely solely on LLM-generated code without understanding
For exams, LLMs will not be permitted.
Saket is an Assistant Professor at the Koita Centre for Digital Health at IIT Bombay. His lab focuses on developing statistical models for analyzing multi-omics data. Saket obtained his B.Tech+M.Tech in Chemical Engineering at IIT Bombay in 2014. He pursued his Ph.D in Computational Biology and Bioinformatics at the University of Southern California developing computational methods for understanding how proteins are synthesized in the body. Saket Lab develops novel statistical and computational methods to answer fundamental questions in disease biology and public health.
Units: 6
Lecture: Mondays and Thursdays, 3:30pm – 4:55pm.
Location: LC TBA, TBA Floor Lecture Hall Complex, L1 Building, Opp. KReSIT Bldg., Between Physics & MEMS Dept. GMaps coordinates
Instructor: Saket Choudhary | Homepage | Blog
Office: B-22, KCDH, KReSIT Basement
Office Hours: Wednesdays, 4:00 - 5:00pm or by appointment
For appointments outside office hours: https://cal.com/saketkc/
Contact: saketc@iitb.ac.in | Ext: 3785 (+91 22 2159 3785)
Head Teaching Assistant:
- Shubham Thakur
- Contact: shubham.thakur@iitb.ac.in
- Office Hours: Mondays 2:00 PM - 3:30 PM, B-20 ASL Lab, KRESIT Basement
Graders:
- Souparna Bhowmik
- Contact: 25d1623@iitb.ac.in
- Gaurav Devendra Jain
- Contact: 210040050@iitb.ac.in