About

Last updated: 2021-12-17

Checks: 2 0

Knit directory: sct2_revision/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Repository version: 8afc486

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 8afc486. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/raw_data/
    Ignored:    data/rds_filtered/
    Ignored:    data/rds_raw/
    Ignored:    data/sampled_counts/
    Ignored:    output/snakemake_output/

Untracked files:
    Untracked:  code/02_run_seurat_noclip.R
    Untracked:  code/07AA_deseq2_muscat_simulate.R
    Untracked:  code/07A_muscat_simulate.R
    Untracked:  code/07A_simulate_muscat.R
    Untracked:  code/07BB_deseq2_muscat_process.R
    Untracked:  code/07B_muscat_process.R
    Untracked:  code/07B_process_muscat.R
    Untracked:  code/08_run_presto.R
    Untracked:  code/17A_HEK_SS3_dropseq.Rmd
    Untracked:  code/17A_HEK_SS3_dropseq_files/
    Untracked:  code/17C_HEK_Quartzeseq2_dropseq.Rmd
    Untracked:  code/17C_HEK_Quartzeseq2_dropseq_files/
    Untracked:  code/17_HEK_SS3_ChromiumV3.Rmd
    Untracked:  code/17_HEK_SS3_ChromiumV3.nb.html
    Untracked:  code/17_HEK_SS3_ChromiumV3_files/
    Untracked:  code/AA_process_muscat.R
    Untracked:  code/BB_process_muscat.R
    Untracked:  code/DD_simulate_muscat.R
    Untracked:  code/EE_simulate_muscat.R
    Untracked:  code/XX_process_muscat.R
    Untracked:  code/XX_simulate_muscat.R
    Untracked:  code/YY_simulate_muscat.R
    Untracked:  code/ZZ_simulate_muscat.R
    Untracked:  code/kang_muscat.R
    Untracked:  code/prep_sce.R
    Untracked:  code/prep_sce_ss3_dropseq.R
    Untracked:  data/azimuth_predictions/
    Untracked:  junk/
    Untracked:  mamba_update_changes.txt
    Untracked:  output/11C_VST/
    Untracked:  output/AAmuscat_simulated/
    Untracked:  output/BBmuscat_simulated/
    Untracked:  output/CCmuscat_simulated/
    Untracked:  output/CD4_NK_downsampling_DE.rds
    Untracked:  output/DDmuscat_simulated/
    Untracked:  output/EEmuscat_simulated/
    Untracked:  output/KANGmuscat_simulated/
    Untracked:  output/NK_downsampling/
    Untracked:  output/XXmuscat_simulated/
    Untracked:  output/YYmuscat_simulated/
    Untracked:  output/ZZmuscat_simulated/
    Untracked:  output/figures/
    Untracked:  output/kang_prepsce.rds
    Untracked:  output/muscat_simulated/
    Untracked:  output/muscat_simulation/
    Untracked:  output/seu_sct2_sim.rds
    Untracked:  output/simulation_HEK_QuartzSeq2_Dropseq_downsampling/
    Untracked:  output/simulation_HEK_SS3_ChromiumV3_downsampling/
    Untracked:  output/simulation_HEK_SS3_Dropseq_downsampling/
    Untracked:  output/simulation_HEK_downsampling/
    Untracked:  output/simulation_NK_downsampling/
    Untracked:  output/ss3_dropseq_prepsim.rds
    Untracked:  output/tables/
    Untracked:  output/vargenes/
    Untracked:  snakemake/.snakemake/
    Untracked:  snakemake/Snakefile_noclip.smk
    Untracked:  snakemake/Snakefile_presto.smk
    Untracked:  snakemake/cluster.yaml
    Untracked:  snakemake/install_glm.R
    Untracked:  snakemake/jobscript.sh
    Untracked:  snakemake/jobscript_ncells.sh
    Untracked:  snakemake/local_run_downsampling.sh
    Untracked:  snakemake/local_run_glm.sh
    Untracked:  snakemake/local_run_ncells.sh
    Untracked:  snakemake/local_run_noclip.sh
    Untracked:  snakemake/local_run_presto.sh
    Untracked:  snakemake/local_run_time.sh
    Untracked:  snakemake/run_glm.sh
    Untracked:  snakemake/run_ncells.sh
    Untracked:  snakemake/sct2_revision_env.yml
    Untracked:  temp_figures/

Unstaged changes:
    Deleted:    analysis/04_PBMC68k.Rmd
    Modified:   code/02_run_seurat.R
    Modified:   code/03_run_vst2_downsample.R
    Modified:   code/04_run_vst_ncells.R
    Modified:   code/06_run_sct.R
    Modified:   data/datasets.csv
    Modified:   snakemake/Snakefile_downsampling.smk
    Modified:   snakemake/Snakefile_glm_seurat.smk
    Modified:   snakemake/Snakefile_metacell.smk

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/about.Rmd) and HTML (docs/about.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	d736ec8	Saket Choudhary	2021-07-07	Build site.
html	400797a	Saket Choudhary	2021-07-06	workflowr::wflow_git_commit(all = TRUE)
Rmd	ccb0fb4	Saket Choudhary	2021-07-06	workflowr::wflow_git_commit(all = TRUE)
Rmd	e0b7c2c	Saket Choudhary	2021-07-06	Start workflowr project.

Heterogeneity in single-cell RNA-seq (scRNA-seq) data is driven by multiple sources, including biological variation in cellular state as well as technical variation introduced during experimental processing. Deconvolving these effects is a key challenge for preprocessing workflows. Recent work has demonstrated the importance and utility of count models for scRNA-seq analysis, but there is a lack of consensus on which statistical distributions and parameter settings are appropriate. Here, we analyze \(58\) scRNA-seq datasets that span a wide range of technologies, systems, and sequencing depths in order to evaluate the performance of different error models. We find that while a Poisson error model appears appropriate for sparse datasets, we observe clear evidence of overdispersion for genes with sufficient sequencing depth in all biological systems, necessitating the use of a negative binomial model. Moreover, we find that the degree of overdispersion varies widely across datasets, systems, and gene abundances, and argues for a data-driven approach for parameter estimation. Based on these analyses, we provide a set of recommendations for modeling variation in scRNA-seq data, particularly when using generalized linear models or likelihood-based approaches for preprocessing and downstream analysis.