DH 607 - Introduction to computational multi-omics

Welcome to DH 607 - Introduction to computational multi-omics! The course focuses on diving deep into technologies that have been developed in the last couple of decades to understand what goes inside a cell.

About the course

We will take a first-principles approach to understanding biology and the technological advances that make it possible to understand how we can ‘measure’ biological processes quantitatively.

The course will build your foundational knowledge and skills necessary to explore and analyze complex (and real!) biological data. We will take multiple vignettes of biological questions and see how the fields of probability and statistics and computer science provide us with enlightening answers.

The course is neither a course in biology nor a course in statistics but somewhere in between.

What will you learn?

One of the key objectives of (modern) biology is to understand how different molecules shape the functionality of a cell/tissue/organ. The human body has 36 trillion (\(36 \times 10^{12}\)) cells in the human body. All the cells in our body typically have the same DNA but they have different functions.

If we had to draw a functional map of the human body, we would need to measure what goes inside each cell. Over the years, significant technological advancements have made it possible to ‘profile’ the various molecules in each cell (DNA/RNA/Protein). This data is high-dimensional. Scales can vary, but a typical modern-day dataset can have thousands of features (DNA sequence/gene expression/ proteins) measured over millions of rows (observations). To draw any biological insights, this high-dimensional data requires ‘processing’. The course will focus on mathematical and statistical methods that are required to analyze a modern day genomics/transcriptomics dataset.

A detailed set of topics will be available here. But overall, you will learn about:

  • What is sequencing
  • Algorithms for aligning DNA sequences and searching DNA databases
  • Statistical methods to discover genome variation, and application to discovering etiology of disease
  • Probability and statistics for sequence analysis
  • How is gene expression quantified computationally
  • Statistical models for analyzing gene expression data
  • Linear and non-linear dimensionality reduction methods and their applications in multiomics
  • Statistical models for identifying transcription factor binding
  • Hidden markov models and applications in multiomics
  • How are disease-causing mutations identified (genome wide association studies)
  • Statistical modelling of single-cell multomics data
  • Statistical models for modeling CRISPR screens
  • Recent advancements in statistical methods and deep learning applications in multiomics for human diseases
Learning objectives
  • How to perform exploratory data analysis and visualize genomic data

  • Apply tailored statistical methods to answer questions using high dimensional biological data

  • “Getting your hands dirty” by analyzing genomics data to draw actionable insights for improving human health

  • Write production level, reproducible, reusable code and software packages

Evaluation

The course will be evaluated based on the following components:

  • Assignments: 45% (Best 10 out of 12)
    • Due every week on Friday 6pm via Gradescope
    • Weightage: 4.5% each (Best 10 will be considered for grading)
    • Late submission policy: 10% penalty per day
  • Mid-sem: 25%
    • Closed book and offline
  • Course project: 30%
    • Proposal: 5%
    • Poster presentation: 12.5%
    • Report: 12.5%
Collaboration policy

For assignment problems, you should work on your own. If you get stuck, you are welcome to discuss it with other students (in-person, or online on Piazza). However, the solutions must be your work. If you discussed with someone, please mention their name and what you received help with in your submission.

Mid-semester exam will be closed-book. No collaboration is allowed.

There is no end-semester exam, but a group course project. The group can have a maximum of 3 students. More details to be announced later.

About the instructor

Saket is an Assistant Professor at the Koita Centre for Digital Health at IIT Bombay. His lab focuses on developing statistical models for analyzing multi-omics data. Saket obtained his B.Tech+M.Tech in Chemical Engineering at IIT Bombay in 2014. He pursued his Ph.D in Computational Biology and Bioinformatics at the University of Southern California developing computational methods for understanding how proteins are synthesized in the body. Until very recently, he was a postdoc at the New York Genome Center developing statistical methods for analyzing single-cell genomics data.

Course information

Units: 6

Lecture: Wednesdays and Fridays, 11:05am – 12:30pm.

Location: LC 302, 3rd Floor Lecture Hall Complex, L1 Building, Opp. KReSIT Bldg., Between Physics & MEMS Dept. GMaps coordinates

Instructor: Saket Choudhary | Homepage | Blog

Office: G-22, KCDH, KReSIT Basement

Office Hours: Wednesdays, 4:00 - 6:00pm or by appointment

For appointments outside office hours: https://cal.com/saketkc/

Contact: saketc@iitb.ac.in | Ext: 3785 (+91 22 2159 3785)

Teaching Assistants:

  • Chetan Patil | Email | Office Hours: Mondays 2:00 PM - 3:00 PM (KCDH Office)
  • Neegar Naushaba Iqbal | Email | Office Hours: Mondays 4:00 PM - 5:00 PM (KCDH Office)
  • Devendra Singh | Email | Office Hours: Mondays 5:00 PM - 6:00 PM (KCDH Office)
  • Shobit Aggarwal | Email | Office Hours: Fridays 12:30PM - 1:30 PM (KCDH Office)
  • Gautham Venugopal | Email | Office Hours: Tuesdays 4:00PM - 6:00 PM (KCDH Office)