Last updated: 2021-12-17

Checks: 7 0

Knit directory: sct2_revision/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210706) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 8afc486. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/raw_data/
    Ignored:    data/rds_filtered/
    Ignored:    data/rds_raw/
    Ignored:    data/sampled_counts/
    Ignored:    output/snakemake_output/

Untracked files:
    Untracked:  code/02_run_seurat_noclip.R
    Untracked:  code/07AA_deseq2_muscat_simulate.R
    Untracked:  code/07A_muscat_simulate.R
    Untracked:  code/07A_simulate_muscat.R
    Untracked:  code/07BB_deseq2_muscat_process.R
    Untracked:  code/07B_muscat_process.R
    Untracked:  code/07B_process_muscat.R
    Untracked:  code/08_run_presto.R
    Untracked:  code/17A_HEK_SS3_dropseq.Rmd
    Untracked:  code/17A_HEK_SS3_dropseq_files/
    Untracked:  code/17C_HEK_Quartzeseq2_dropseq.Rmd
    Untracked:  code/17C_HEK_Quartzeseq2_dropseq_files/
    Untracked:  code/17_HEK_SS3_ChromiumV3.Rmd
    Untracked:  code/17_HEK_SS3_ChromiumV3.nb.html
    Untracked:  code/17_HEK_SS3_ChromiumV3_files/
    Untracked:  code/AA_process_muscat.R
    Untracked:  code/BB_process_muscat.R
    Untracked:  code/DD_simulate_muscat.R
    Untracked:  code/EE_simulate_muscat.R
    Untracked:  code/XX_process_muscat.R
    Untracked:  code/XX_simulate_muscat.R
    Untracked:  code/YY_simulate_muscat.R
    Untracked:  code/ZZ_simulate_muscat.R
    Untracked:  code/kang_muscat.R
    Untracked:  code/prep_sce.R
    Untracked:  code/prep_sce_ss3_dropseq.R
    Untracked:  data/azimuth_predictions/
    Untracked:  junk/
    Untracked:  mamba_update_changes.txt
    Untracked:  output/11C_VST/
    Untracked:  output/AAmuscat_simulated/
    Untracked:  output/BBmuscat_simulated/
    Untracked:  output/CCmuscat_simulated/
    Untracked:  output/CD4_NK_downsampling_DE.rds
    Untracked:  output/DDmuscat_simulated/
    Untracked:  output/EEmuscat_simulated/
    Untracked:  output/KANGmuscat_simulated/
    Untracked:  output/NK_downsampling/
    Untracked:  output/XXmuscat_simulated/
    Untracked:  output/YYmuscat_simulated/
    Untracked:  output/ZZmuscat_simulated/
    Untracked:  output/figures/
    Untracked:  output/kang_prepsce.rds
    Untracked:  output/muscat_simulated/
    Untracked:  output/muscat_simulation/
    Untracked:  output/seu_sct2_sim.rds
    Untracked:  output/simulation_HEK_QuartzSeq2_Dropseq_downsampling/
    Untracked:  output/simulation_HEK_SS3_ChromiumV3_downsampling/
    Untracked:  output/simulation_HEK_SS3_Dropseq_downsampling/
    Untracked:  output/simulation_HEK_downsampling/
    Untracked:  output/simulation_NK_downsampling/
    Untracked:  output/ss3_dropseq_prepsim.rds
    Untracked:  output/tables/
    Untracked:  output/vargenes/
    Untracked:  snakemake/.snakemake/
    Untracked:  snakemake/Snakefile_noclip.smk
    Untracked:  snakemake/Snakefile_presto.smk
    Untracked:  snakemake/cluster.yaml
    Untracked:  snakemake/install_glm.R
    Untracked:  snakemake/jobscript.sh
    Untracked:  snakemake/jobscript_ncells.sh
    Untracked:  snakemake/local_run_downsampling.sh
    Untracked:  snakemake/local_run_glm.sh
    Untracked:  snakemake/local_run_ncells.sh
    Untracked:  snakemake/local_run_noclip.sh
    Untracked:  snakemake/local_run_presto.sh
    Untracked:  snakemake/local_run_time.sh
    Untracked:  snakemake/run_glm.sh
    Untracked:  snakemake/run_ncells.sh
    Untracked:  snakemake/sct2_revision_env.yml
    Untracked:  temp_figures/

Unstaged changes:
    Deleted:    analysis/04_PBMC68k.Rmd
    Modified:   code/02_run_seurat.R
    Modified:   code/03_run_vst2_downsample.R
    Modified:   code/04_run_vst_ncells.R
    Modified:   code/06_run_sct.R
    Modified:   data/datasets.csv
    Modified:   snakemake/Snakefile_downsampling.smk
    Modified:   snakemake/Snakefile_glm_seurat.smk
    Modified:   snakemake/Snakefile_metacell.smk

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/09_Figure1.Rmd) and HTML (docs/09_Figure1.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 8afc486 Saket Choudhary 2021-12-17 workflowr::wflow_publish("analysis/*")
html d736ec8 Saket Choudhary 2021-07-07 Build site.
Rmd 400797a Saket Choudhary 2021-07-06 workflowr::wflow_git_commit(all = TRUE)
html 400797a Saket Choudhary 2021-07-06 workflowr::wflow_git_commit(all = TRUE)

Read datassets

datasets <- readr::read_csv(here::here("data", "datasets.csv"), col_types = readr::cols())
datasets$datatype <- factor(datasets$datatype, levels = c("technical-control", "cell line", "heterogeneous"))
datasets <- datasets %>% arrange(datatype)

Read and process poisson GLM results

QVALUE_THRESHOLD <- 1e-2
ncells <- "1000"
residual_type <- "quantile"
nb_column <- paste0("qval_ssr", "_", residual_type)
ssr_column <- paste0("ssr", "_", residual_type, "_normalized")
root_dir <- here::here("output/snakemake_output/poisson_glm_output", ncells)



fits <- GetFits(root_dir, qval_col = nb_column)
fits_dfx <- fits[["fits"]]
frac_dfx <- fits[["frac_df"]]
cell_attrs <- fits[["cell_attrs"]]
gene_attrs <- fits[["gene_attrs"]]

fits_df <- ProcessFits(fits_dfx, datasets, nb_column)
frac_df <- left_join(frac_dfx, datasets, by = "key")

Fraction of non-poisson genes

frac_df_subset <- frac_df[, c(
  "sample_name", "datatype",
  "median_gene_avg_umi", "median_cell_total_umi", "nonpoisson_fraction"
)] %>% arrange(nonpoisson_fraction)
frac_df_subset <- frac_df_subset %>% mutate(across(where(is.numeric), round, 3))
kbl(frac_df_subset, booktabs = T) %>%
  kable_styling(latex_options = "striped")
sample_name datatype median_gene_avg_umi median_cell_total_umi nonpoisson_fraction
TechCtrl1 (ChromiumV1) technical-control 0.007 2264.5 0.001
TechCtrl2 (ChromiumV1) technical-control 0.007 2165.0 0.004
TechCtrl (inDrops) technical-control 0.365 32905.0 0.005
PBMC-r2 (Seq-Well) heterogeneous 0.009 521.0 0.007
HEK-m (ChromiumV2_sn) cell line 0.171 4967.0 0.008
PBMC-r1 (inDrops) heterogeneous 0.010 798.0 0.009
PBMC68k (ChromiumV1) heterogeneous 0.011 1513.0 0.010
HEK-m (inDrops) cell line 0.108 2943.0 0.010
PBMC-r1 (Drop-seq) heterogeneous 0.012 1554.5 0.015
PBMC-r2 (CEL-seq2) heterogeneous 0.069 5917.0 0.016
PBMC-r2 (ChromiumV2) heterogeneous 0.021 2850.5 0.016
PBMC-r1 (Seq-Well) heterogeneous 0.011 1200.0 0.017
HEK-r1 (inDrops) cell line 0.042 3132.0 0.017
PBMC-r1 (ChromiumV2A) heterogeneous 0.016 2384.0 0.018
PBMC-r2 (inDrops) heterogeneous 0.018 1690.0 0.019
PBMC-r1 (ChromiumV2B) heterogeneous 0.019 3286.0 0.020
PBMC-r1 (CEL-seq2) heterogeneous 0.071 6848.0 0.020
HEK-r2 (inDrops) cell line 0.052 3973.0 0.026
3T3-r1 (inDrops) cell line 0.051 2255.0 0.027
PBMC-r2 (Drop-seq) heterogeneous 0.017 2486.5 0.027
3T3-r2 (Drop-seq) cell line 0.080 3434.0 0.031
Cortex-r2 (ChromiumV2) heterogeneous 0.042 3980.5 0.037
HEK-r1 (Drop-seq) cell line 0.053 6015.5 0.038
3T3-r1 (Drop-seq) cell line 0.072 3152.5 0.038
HEK-r2 (Drop-seq) cell line 0.048 6312.5 0.040
3T3-r2 (inDrops) cell line 0.107 4666.5 0.042
Cortex-r2 (sci-RNA-seq) heterogeneous 0.029 3501.5 0.045
PBMC-r1 (ChromiumV3) heterogeneous 0.029 5839.5 0.046
Cortex-r1 (DroNc-seq) heterogeneous 0.043 3013.5 0.049
3T3-r1 (sci-RNA-seq) cell line 0.188 6609.5 0.062
PBMC (ChromiumV3) heterogeneous 0.050 6992.0 0.063
3T3 (ChromiumV3) cell line 0.072 16184.5 0.066
Cortex-r2 (DroNc-seq) heterogeneous 0.050 3094.0 0.067
Cortex-r1 (ChromiumV2) heterogeneous 0.065 7132.5 0.071
HEK-r1 (sci-RNA-seq) cell line 0.220 11490.0 0.071
Cortex-r1 (sci-RNA-seq) heterogeneous 0.031 4421.0 0.074
Bone Marrow (CITE-seq) heterogeneous 0.093 12585.0 0.111
HEK (ChromiumV3) cell line 0.069 41054.5 0.115
HEK-m (Drop-seq) cell line 0.109 1907.5 0.118
HEK-m (MARS-seq) cell line 0.235 11207.5 0.131
Fetal (sci-RNA-seq3) heterogeneous 0.030 6975.5 0.134
3T3-r2 (ChromiumV2) cell line 0.185 14057.0 0.142
3T3-r1 (ChromiumV2) cell line 0.151 11834.0 0.143
3T3-r2 (sci-RNA-seq) cell line 0.161 9199.0 0.151
HEK-r2 (sci-RNA-seq) cell line 0.096 13298.0 0.153
HEK-r2 (CEL-seq2) cell line 0.389 43670.0 0.159
HEK-r1 (CEL-seq2) cell line 0.615 52973.0 0.161
HEK-r1 (ChromiumV2) cell line 0.108 25193.5 0.162
HEK-r2 (ChromiumV2) cell line 0.096 23444.0 0.169
3T3-r1 (CEL-seq2) cell line 0.616 34291.0 0.172
HEK-m (mcSCRB-seq) cell line 0.121 3266.5 0.175
3T3-r2 (CEL-seq2) cell line 1.045 53036.0 0.220
HEK (Smart-seq3) cell line 0.286 106996.0 0.233
HEK-m (CEL-seq2) cell line 0.788 60592.5 0.242
PBMC (Smart-seq3) heterogeneous 0.032 9487.5 0.396
HEK-m (ddSeq) cell line 0.205 5304.5 0.572
HEK-m (ChromiumV2) cell line 0.710 73333.5 0.616
Fibroblasts (Smart-seq3) heterogeneous 0.526 197151.0 0.628
HEK-m (Quartz-Seq2) cell line 1.576 167199.0 0.647
dir.create(here::here("output", "tables"), showWarnings = F)
print(xtable(frac_df_subset, type = "latex",  digits=3), include.rownames = FALSE, file = here::here("output/tables/fraction_nonpoisson.tex"))

Medium to high mean non-poisson genes

medium_high_gene_pois <- list()
for (sample_name in unique(fits_df$sample_name)) {
  df <- fits_df[fits_df$sample_name == sample_name, ]

  med_abundance <- df[(df$mean > 1),]# & (df$mean <= 10), ]
  med_abundance_nonpois <- med_abundance[med_abundance$gene_type != "Pass", ]

  ratio <- dim(med_abundance_nonpois)[1] / dim(med_abundance)[1]

  medium_high_gene_pois[[sample_name]] <- data.frame(ratio = ratio, n_med_high_nonpoisson = dim(med_abundance_nonpois)[1], n_med_high = dim(med_abundance)[1], n_all = dim(df)[1])
}
medium_high_gene_pois <- bind_rows(medium_high_gene_pois, .id = "sample_name")

kbl(medium_high_gene_pois, booktabs = T) %>%
  kable_styling(latex_options = "striped")
sample_name ratio n_med_high_nonpoisson n_med_high n_all
TechCtrl1 (ChromiumV1) 0.1200000 18 150 20167
TechCtrl2 (ChromiumV1) 0.5869565 81 138 20838
TechCtrl (inDrops) 0.0248205 159 6406 25255
3T3 (ChromiumV3) 0.5779857 1297 2244 21920
3T3-r1 (CEL-seq2) 0.4772430 2915 6108 15457
3T3-r1 (ChromiumV2) 0.8497268 1555 1830 15390
3T3-r1 (Drop-seq) 0.6440072 360 559 13904
3T3-r1 (inDrops) 0.8231293 242 294 13337
3T3-r1 (sci-RNA-seq) 0.5419847 852 1572 12806
3T3-r2 (CEL-seq2) 0.5329213 3966 7442 14620
3T3-r2 (ChromiumV2) 0.7731538 1895 2451 15127
3T3-r2 (Drop-seq) 0.5073314 346 682 13977
3T3-r2 (inDrops) 0.5365854 484 902 12354
3T3-r2 (sci-RNA-seq) 0.8871124 1831 2064 15608
HEK (ChromiumV3) 0.6261718 3273 5227 26012
HEK (Smart-seq3) 0.7987632 7621 9541 27097
HEK-r1 (CEL-seq2) 0.5012410 4241 8461 19628
HEK-r1 (ChromiumV2) 0.8884120 3105 3495 21809
HEK-r1 (Drop-seq) 0.5241277 706 1347 19149
HEK-r1 (inDrops) 0.6119048 257 420 17091
HEK-r1 (sci-RNA-seq) 0.5590200 1757 3143 17299
HEK-r2 (CEL-seq2) 0.5554204 4109 7398 20294
HEK-r2 (ChromiumV2) 0.9317073 3056 3280 22655
HEK-r2 (Drop-seq) 0.6626617 717 1082 20614
HEK-r2 (inDrops) 0.6451613 400 620 17862
HEK-r2 (sci-RNA-seq) 0.9238095 2813 3045 23608
HEK-m (CEL-seq2) 0.7002563 6831 9755 21665
HEK-m (ChromiumV2) 0.9770573 9071 9284 21669
HEK-m (ChromiumV2_sn) 0.1847826 153 828 16502
HEK-m (ddSeq) 0.9952525 2306 2317 18354
HEK-m (Drop-seq) 0.9874804 631 639 15794
HEK-m (inDrops) 0.3333333 174 522 13118
HEK-m (MARS-seq) 0.8183222 1590 1943 18091
HEK-m (mcSCRB-seq) 0.9375000 1125 1200 13760
HEK-m (Quartz-Seq2) 0.9835809 11921 12120 21535
Cortex-r1 (ChromiumV2) 0.7914110 903 1141 22947
Cortex-r1 (DroNc-seq) 0.9146758 268 293 21965
Cortex-r1 (sci-RNA-seq) 0.9776248 568 581 22188
Cortex-r2 (ChromiumV2) 0.8120301 324 399 22920
Cortex-r2 (DroNc-seq) 0.9171843 443 483 21183
Cortex-r2 (sci-RNA-seq) 0.9878788 326 330 22492
Fibroblasts (Smart-seq3) 0.9940432 10680 10744 24003
PBMC-r1 (CEL-seq2) 0.4236902 372 878 19767
PBMC-r1 (ChromiumV2A) 0.9073171 186 205 21706
PBMC-r1 (ChromiumV2B) 0.8710801 250 287 21922
PBMC-r1 (ChromiumV3) 0.8287938 426 514 22851
PBMC-r1 (Drop-seq) 0.8833333 159 180 21292
PBMC-r1 (inDrops) 0.9696970 64 66 17972
PBMC-r1 (Seq-Well) 0.9624060 128 133 21785
PBMC-r2 (CEL-seq2) 0.4237288 275 649 19705
PBMC-r2 (ChromiumV2) 0.8448276 196 232 22283
PBMC-r2 (Drop-seq) 0.9475983 217 229 23577
PBMC-r2 (inDrops) 0.9693252 158 163 19143
PBMC-r2 (Seq-Well) 0.9264706 63 68 15088
PBMC68k (ChromiumV1) 0.8364780 133 159 19955
PBMC (ChromiumV3) 0.7335025 578 788 17853
PBMC (Smart-seq3) 0.9746622 1154 1184 28768
Fetal (sci-RNA-seq3) 0.7037037 19 27 46844
Bone Marrow (CITE-seq) 0.9523810 240 252 17009
print(xtable(medium_high_gene_pois,  digits=3), include.rownames = FALSE, digits=3, file = here::here("output/tables/medium_high_expr_nonpois_fraction.tex"))

Plot fraction of non-poisson vs gene abundance

nonpoismeanwise <- GetMeanwiseNonPois(fits_df)
nonpoismeanwise <- left_join(nonpoismeanwise, datasets, by = "key")
nonpoismeanwise_df <- nonpoismeanwise[complete.cases(nonpoismeanwise), ]
nonpoismeanwise_df$sample_name <- factor(nonpoismeanwise_df$sample_name, levels = datasets$sample_name)

nonpoismeanwise_filtered <- nonpoismeanwise_df[nonpoismeanwise_df$n_total_genes >= 10, ]


plot.dots <- ggplot(nonpoismeanwise_filtered, aes(sample_name, mean_quantile,
  color = nonpoisson_fraction,
  size = nonpoisson_fraction
)) +
  geom_point() +
  scale_color_viridis_c(
    direction = -1,
    breaks = c(0.25, 0.5, 0.75, 1),
    name = "Non-poisson fraction"
  ) +
  scale_size(
    name = "Non-poisson fraction",
    breaks = c(0.25, 0.5, 0.75, 1),
    range = c(0.5, 5)
  ) +
  theme_pubr(base_size = 10) +
  theme(legend.position = "right", legend.direction = "horizontal") +
  guides(color = guide_legend(), size = guide_legend()) +
  guides(x = guide_axis(angle = 90)) +
  xlab("") +
  ylab("Gene mean") +
  theme(legend.position = c(0.8, 1.06))

plot.dots

Version Author Date
400797a Saket Choudhary 2021-07-06
nonpoismeanwise_filtered_subset <- nonpoismeanwise_filtered[, c("sample_name", "mean_quantile", "nonpoisson_fraction")]
nonpoismeanwise_filtered_subset2 <- tidyr::spread(nonpoismeanwise_filtered_subset, mean_quantile, nonpoisson_fraction)

kbl(nonpoismeanwise_filtered_subset2, booktabs = T) %>%
  kable_styling(latex_options = "striped")
sample_name <0.01 >0.01 >0.1 >1 >5 >10 >25 >50 >100
TechCtrl1 (ChromiumV1) 0.0001579 NA NA 0.0083333 0.1000000 0.7000000 NA NA NA
TechCtrl2 (ChromiumV1) 0.0003051 0.0001568 0.0214168 0.5000000 NA 0.9285714 NA NA NA
TechCtrl (inDrops) NA NA 0.0002956 0.0051442 0.0369069 0.1264822 0.4102564 0.6734694 0.9285714
3T3 (ChromiumV3) 0.0015412 0.0064665 0.0675734 0.4989396 0.9889503 1.0000000 1.0000000 1.0000000 NA
3T3-r1 (CEL-seq2) 0.0071702 0.0073620 0.1178304 0.3713582 0.8368984 0.9785933 1.0000000 1.0000000 1.0000000
3T3-r1 (ChromiumV2) 0.0039175 0.0120156 0.1991565 0.8229234 1.0000000 1.0000000 1.0000000 1.0000000 NA
3T3-r1 (Drop-seq) 0.0018663 0.0010408 0.0565152 0.6059406 1.0000000 1.0000000 NA NA NA
3T3-r1 (inDrops) 0.0008782 0.0010646 0.0478577 0.7944664 1.0000000 1.0000000 NA NA NA
3T3-r1 (sci-RNA-seq) 0.0015835 0.0024155 0.0617210 0.5055021 0.9873418 1.0000000 1.0000000 NA NA
3T3-r2 (CEL-seq2) 0.0091116 0.0083042 0.0990764 0.3896331 0.7749627 0.9550225 1.0000000 1.0000000 1.0000000
3T3-r2 (ChromiumV2) 0.0081105 0.0076309 0.1400062 0.7306202 1.0000000 1.0000000 1.0000000 1.0000000 NA
3T3-r2 (Drop-seq) 0.0012251 0.0006667 0.0386980 0.4577922 0.9534884 1.0000000 NA NA NA
3T3-r2 (inDrops) 0.0009862 0.0007578 0.0272644 0.4793492 0.9710145 1.0000000 NA NA NA
3T3-r2 (sci-RNA-seq) 0.0043967 0.0082210 0.2092370 0.8754677 1.0000000 1.0000000 1.0000000 NA NA
HEK (ChromiumV3) 0.0027539 0.0061889 0.0778216 0.5137038 0.9701493 1.0000000 1.0000000 1.0000000 1.0000000
HEK (Smart-seq3) 0.0119471 0.0116812 0.1745152 0.7025160 0.9428066 0.9871324 1.0000000 1.0000000 1.0000000
HEK-r1 (CEL-seq2) NA 0.0085905 0.0954722 0.3765109 0.7530529 0.9382911 1.0000000 1.0000000 1.0000000
HEK-r1 (ChromiumV2) 0.0048426 0.0168411 0.1570022 0.8579235 1.0000000 1.0000000 1.0000000 NA 1.0000000
HEK-r1 (Drop-seq) 0.0007688 0.0027243 0.0353954 0.4715581 1.0000000 1.0000000 1.0000000 NA NA
HEK-r1 (inDrops) 0.0005487 0.0011022 0.0197816 0.5421348 1.0000000 1.0000000 NA NA NA
HEK-r1 (sci-RNA-seq) 0.0047764 0.0069048 0.0778780 0.5189962 0.9710145 1.0000000 NA NA NA
HEK-r2 (CEL-seq2) 0.0051769 0.0078385 0.0873059 0.4496459 0.9004065 0.9888641 1.0000000 1.0000000 1.0000000
HEK-r2 (ChromiumV2) 0.0060403 0.0195405 0.2029444 0.9128744 1.0000000 1.0000000 1.0000000 1.0000000 NA
HEK-r2 (Drop-seq) 0.0012235 0.0034984 0.0587045 0.6209761 1.0000000 1.0000000 1.0000000 NA NA
HEK-r2 (inDrops) 0.0008364 0.0025515 0.0270036 0.5903166 1.0000000 1.0000000 NA NA NA
HEK-r2 (sci-RNA-seq) 0.0055562 0.0186279 0.2636884 0.9166966 1.0000000 1.0000000 NA NA NA
HEK-m (CEL-seq2) NA 0.0181054 0.1380288 0.6144760 0.9327381 0.9932660 1.0000000 1.0000000 1.0000000
HEK-m (ChromiumV2) 0.1739130 0.1944444 0.6612867 0.9702465 0.9977238 1.0000000 1.0000000 1.0000000 1.0000000
HEK-m (ChromiumV2_sn) NA NA 0.0053493 0.1718171 0.7333333 NA NA NA NA
HEK-m (ddSeq) 0.2278481 0.4828010 0.9124015 0.9945972 1.0000000 1.0000000 NA NA NA
HEK-m (Drop-seq) 0.0038835 0.0125822 0.2866858 0.9855856 1.0000000 1.0000000 NA NA NA
HEK-m (inDrops) NA NA 0.0078245 0.2494331 0.6909091 1.0000000 NA NA NA
HEK-m (MARS-seq) NA 0.0101597 0.2094223 0.7965318 0.9916667 1.0000000 1.0000000 NA NA
HEK-m (mcSCRB-seq) NA 0.0066553 0.3362846 0.9304267 1.0000000 1.0000000 NA NA NA
HEK-m (Quartz-Seq2) 0.1176471 0.1485840 0.5538624 0.9636866 0.9993586 1.0000000 1.0000000 1.0000000 1.0000000
Cortex-r1 (ChromiumV2) 0.0014268 0.0094961 0.1243224 0.7659784 1.0000000 1.0000000 NA NA NA
Cortex-r1 (DroNc-seq) 0.0013364 0.0070822 0.2039531 0.9094203 1.0000000 NA NA NA NA
Cortex-r1 (sci-RNA-seq) 0.0016481 0.0116785 0.2611836 0.9734694 1.0000000 1.0000000 NA NA NA
Cortex-r2 (ChromiumV2) 0.0010207 0.0062638 0.1050808 0.7905028 1.0000000 1.0000000 NA NA NA
Cortex-r2 (DroNc-seq) 0.0019376 0.0071465 0.2034228 0.9124726 1.0000000 NA NA NA NA
Cortex-r2 (sci-RNA-seq) 0.0006347 0.0088618 0.2202040 0.9862069 1.0000000 NA NA NA NA
Fibroblasts (Smart-seq3) 0.0450250 0.2737935 0.7973094 0.9859772 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
PBMC-r1 (CEL-seq2) 0.0009934 0.0012436 0.0168625 0.3147139 0.9594595 1.0000000 NA NA NA
PBMC-r1 (ChromiumV2A) 0.0003506 0.0041029 0.1083984 0.8461538 0.9800000 1.0000000 NA NA NA
PBMC-r1 (ChromiumV2B) 0.0004798 0.0023103 0.0792907 0.7988827 0.9756098 1.0000000 1.0000000 NA NA
PBMC-r1 (ChromiumV3) 0.0013785 0.0125000 0.1441478 0.7710526 0.9772727 1.0000000 1.0000000 NA NA
PBMC-r1 (Drop-seq) 0.0000798 0.0019028 0.1227169 0.8618421 1.0000000 NA NA NA NA
PBMC-r1 (inDrops) 0.0003344 0.0026207 0.1887417 0.9666667 NA NA NA NA NA
PBMC-r1 (Seq-Well) 0.0006708 0.0037138 0.2056680 0.9586777 NA NA NA NA NA
PBMC-r2 (CEL-seq2) NA 0.0005705 0.0127430 0.3000000 0.9508197 1.0000000 NA NA NA
PBMC-r2 (ChromiumV2) 0.0003797 0.0011815 0.0609636 0.7445255 0.9756098 1.0000000 NA NA NA
PBMC-r2 (Drop-seq) 0.0009164 0.0041455 0.2072072 0.9215686 1.0000000 1.0000000 NA NA NA
PBMC-r2 (inDrops) 0.0007112 0.0051697 0.1436637 0.9557522 1.0000000 1.0000000 NA NA NA
PBMC-r2 (Seq-Well) 0.0002448 NA 0.1003344 0.9180328 NA NA NA NA NA
PBMC68k (ChromiumV1) 0.0002984 0.0024876 0.0626072 0.7368421 0.9772727 1.0000000 NA NA NA
PBMC (ChromiumV3) 0.0016126 0.0098021 0.1115682 0.6650485 0.9464286 1.0000000 1.0000000 1.0000000 NA
PBMC (Smart-seq3) 0.0311889 0.3590994 0.8952417 0.9704433 1.0000000 1.0000000 1.0000000 NA NA
Fetal (sci-RNA-seq3) 0.1162671 0.6176148 0.9902142 NA NA NA NA NA NA
Bone Marrow (CITE-seq) 0.0086455 0.0720512 0.5624242 0.9315068 0.9354839 1.0000000 1.0000000 NA NA
print(xtable(nonpoismeanwise_filtered_subset2, type = "latex",  digits=3), digits=3, include.rownames = FALSE, file = here::here("output/tables/nonpoisson_meanwise.tex"))

Plot residual variance

sample_names <- c(
  "TechCtrl (inDrops)", "TechCtrl1 (ChromiumV1)",
  "HEK (ChromiumV3)",
  "PBMC (Smart-seq3)"
)

fits_df_subset <- fits_df[fits_df$sample_name %in% sample_names, ]
plot.glmresid4 <- PlotGLMResiduals(fits_df_subset, ssr_column, 4)
plot.glmresid4

Version Author Date
400797a Saket Choudhary 2021-07-06

Distribution of means of poisson and non-poisson genes

plot.kde <- PlotKDEPois(fits_df)
plot.kde

Version Author Date
400797a Saket Choudhary 2021-07-06
dir.create(here::here("output", "figures"), showWarnings = F, recursive = T)
ggsave(here::here("output", "figures", paste0("01_PoissonKDE_ncells_", ncells, "_residtype_", residual_type, ".pdf")), width = 12, height = 10)

Plot fraction of non-poisson

plot.poisvsumi <- PlotPoisVsUMI.cell_line(frac_df)
plot.poisvsumix <- plot.poisvsumi + theme(legend.position = c(0.5, 1.0))
plot.poisvsumix

Version Author Date
400797a Saket Choudhary 2021-07-06

Downsampling

sample_name <- "PBMC__Smart-seq3"

root_dir <- here::here(paste0("output/snakemake_output/", sample_name, "_sampled_counts/poisson_glm_output/", ncells))


downsampled_umis <- c(
  10000, 7500, 5000,
  2000, 1000
)


names(downsampled_umis)<- c(
  "10,000", "7,500", "5,000",
  "2,000", "1,000"
)

sampled_dataset_keys <- paste0(sample_name, "_sampled_", downsampled_umis)

sampled_counts <- list()

for (name in downsampled_umis) {
  seu <- readRDS(here::here("data", "sampled_counts", paste0(sample_name, "_sampled_counts"), paste0(sample_name, "_sampled_", name, ".rds")))
  sampled_counts[[as.character(name)]] <- GetAssayData(seu, assay = "RNA", slot = "counts")
}

sampled <- readRDS(here::here("data", "rds_filtered", paste0(sample_name, ".rds")))
counts <- GetAssayData(sampled, assay = "RNA", slot = "counts")
counts_totalumis <- colSums2(counts)

generated_poissons <- list()

PredictPoisson <- function(counts_df) {
  predicted_poisson <- dpois(counts_df$counts, lambda = mean(counts_df$counts))
  return(predicted_poisson)
}

goi <- c("RPS19", "TPT1")
names(goi) <- goi

PredictPoisson <- function(counts_df) {
  predicted_poisson <- dpois(counts_df$counts, lambda = mean(counts_df$counts))
  return(predicted_poisson)
}

med_libsize <- "8,288" #(median(colSums2(counts)))

for (gene_name in names(goi)) {
  gene <- goi[[gene_name]]
  c1 <- "5000"
  c2 <- "1000"
  counts_goi_full <- counts[gene, ]
  counts_goi_full_subset <- counts_goi_full[counts_goi_full > 1]
  counts_goi_5k <- sampled_counts[[c1]][gene, ]
  counts_goi_1k <- sampled_counts[[c2]][gene, ]

  counts_df_full <- data.frame(counts = counts_goi_full_subset)
  counts_df_5k <- data.frame(counts = counts_goi_5k)
  counts_df_1k <- data.frame(counts = counts_goi_1k)

  labels <- c(
    paste0("Original  ", med_libsize, " UMI/cell"),
    paste0("Sampled ", so_formatter(as.integer(c1)), " UMI/cell"),
    paste0("Sampled ", so_formatter(as.integer(c2)), " UMI/cell")
  )
  counts_df_full$predicted_poisson <- PredictPoisson(counts_df_full)
  counts_df_full$sampletype <- labels[1]
  counts_df_5k$predicted_poisson <- PredictPoisson(counts_df_5k)
  counts_df_5k$sampletype <- labels[2]
  counts_df_1k$predicted_poisson <- PredictPoisson(counts_df_1k)
  counts_df_1k$sampletype <- labels[3]

  merged_counts <- rbind(counts_df_full, counts_df_5k, counts_df_1k)
  merged_counts$sampletype <- factor(merged_counts$sampletype, levels = labels)
  generated_poissons[[gene_name]] <- merged_counts
}
generated_poissons_df <- bind_rows(generated_poissons, .id = "gene")


plot.genespois <- ggplot(generated_poissons_df, aes(x = counts)) +
  geom_histogram(aes(y = ..density..), binwidth = 1, colour = "gray", fill = "white") +
  geom_line(aes(counts, predicted_poisson), color = "red") +
  facet_wrap(gene ~ sampletype, scales = "free", ncol = 3) +
  xlab("Counts") +
  ylab("Density") +
  scale_y_continuous(breaks = c(0.00, 0.10, 0.20))
plot.genespois

Version Author Date
400797a Saket Choudhary 2021-07-06
ncells <- "1000"
residual_type <- "quantile"
nb_column <- paste0("qval_ssr", "_", residual_type)
ssr_column <- paste0("ssr", "_", residual_type, "_normalized")





results_df <- GetFits.sampled(root_dir, nb_column)
fits_df.sampledx <- results_df$fits
frac_df.sampled <- results_df$frac_df

fits_df.sampled <- ProcessFits.sampled(fits_df.sampledx, nb_column)
fits_df.sampled$umi <- stringr::str_split_fixed(fits_df.sampled$key, pattern = "_", n = 5)[, 5]
frac_df.sampled$umi <- stringr::str_split_fixed(frac_df.sampled$key, pattern = "_", n = 5)[, 5]

fits_df.sampled$umi <- factor(fits_df.sampled$umi, levels = as.character(downsampled_umis), labels = paste0("Sampled ", names(downsampled_umis), " UMI/cell"))

frac_df.sampled$umi <- factor(frac_df.sampled$umi, levels = as.character(downsampled_umis), labels = paste0("Sampled ", names(downsampled_umis), " UMI/cell"))
fits_df.sampled$sample_name <- fits_df.sampled$umi
fits_df.sampled$datatype <- "heterogeneous"
fits_df.sampled$datatype <- factor(fits_df.sampled$datatype, levels = c("technical-control", "cell line", "heterogeneous"))
fits_df.sampled.subset <- fits_df.sampled[!is.na(fits_df.sampled[, ssr_column]), ]
plot.glmresiddownsample <- PlotGLMResiduals.sampled(fits_df.sampled, y = ssr_column, ncol = 5) + ylab(paste0("Variance of ", residual_type, " residuals"))
plot.glmresiddownsample

Version Author Date
400797a Saket Choudhary 2021-07-06
frac_df.sampled.subset <- frac_df.sampled[, c(
  "umi", "median_cell_total_umi", "median_gene_total_umi",
  "median_gene_avg_umi", "nonpoisson_fraction"
)]

kbl(frac_df.sampled.subset, booktabs = T) %>%
  kable_styling(latex_options = "striped")
umi median_cell_total_umi median_gene_total_umi median_gene_avg_umi nonpoisson_fraction
Sampled 10,000 UMI/cell 9998 29 0.0366162 0.4672177
Sampled 7,500 UMI/cell 7500 28 0.0280000 0.3952419
Sampled 5,000 UMI/cell 5000 23 0.0230000 0.2820475
Sampled 2,000 UMI/cell 2000 14 0.0140000 0.0224812
Sampled 1,000 UMI/cell 999 11 0.0110000 0.0047246
print(xtable(frac_df.sampled.subset, type = "latex", digits=3), include.rownames = FALSE, file=here::here("output", "tables", "frac_nonpoisson_sampled.tex"))
plot.poisvsumi <- PlotPoisVsUMI.cell_line(frac_df) + # theme(legend.position = "top")
theme(legend.position = c(0.5, 1.04))
layout <- "
AAAA
BBBC
DDDD
EEEE
"


plot.dots +
  plot.glmresid4 + plot.poisvsumi +
  plot.genespois + plot.glmresiddownsample +
  plot_layout(design = layout, tag_level = "new") + plot_annotation(tag_levels = "A") & theme(plot.tag = element_text(face = "bold"))

Version Author Date
400797a Saket Choudhary 2021-07-06
ggsave(here::here("output", "figures", "01_Figure1.pdf"), width = 12, height = 15)
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xtable_1.8-4            sparseMatrixStats_1.4.2 MatrixGenerics_1.4.3   
 [4] matrixStats_0.61.0      SeuratObject_4.0.4      Seurat_4.0.5           
 [7] scattermore_0.7         reshape2_1.4.4          readr_2.1.1            
[10] RColorBrewer_1.1-2      patchwork_1.1.1         kableExtra_1.3.4       
[13] here_1.0.1              ggridges_0.5.3          ggpubr_0.4.0           
[16] ggplot2_3.3.5           dplyr_1.0.7             workflowr_1.6.2        

loaded via a namespace (and not attached):
  [1] backports_1.4.1        systemfonts_1.0.3      plyr_1.8.6            
  [4] igraph_1.2.9           lazyeval_0.2.2         splines_4.1.2         
  [7] listenv_0.8.0          digest_0.6.29          htmltools_0.5.2       
 [10] fansi_0.5.0            magrittr_2.0.1         tensor_1.5            
 [13] cluster_2.1.2          ROCR_1.0-11            tzdb_0.2.0            
 [16] globals_0.14.0         vroom_1.5.7            svglite_2.0.0         
 [19] spatstat.sparse_2.0-0  colorspace_2.0-2       rvest_1.0.2           
 [22] ggrepel_0.9.1          xfun_0.28              crayon_1.4.2          
 [25] jsonlite_1.7.2         spatstat.data_2.1-0    survival_3.2-13       
 [28] zoo_1.8-9              glue_1.5.1             polyclip_1.10-0       
 [31] gtable_0.3.0           webshot_0.5.2          leiden_0.3.9          
 [34] car_3.0-12             future.apply_1.8.1     abind_1.4-5           
 [37] scales_1.1.1           DBI_1.1.1              rstatix_0.7.0         
 [40] miniUI_0.1.1.1         Rcpp_1.0.7             viridisLite_0.4.0     
 [43] reticulate_1.22        spatstat.core_2.3-2    bit_4.0.4             
 [46] htmlwidgets_1.5.4      httr_1.4.2             ellipsis_0.3.2        
 [49] ica_1.0-2              farver_2.1.0           pkgconfig_2.0.3       
 [52] sass_0.4.0             uwot_0.1.11            deldir_1.0-6          
 [55] utf8_1.2.2             labeling_0.4.2         tidyselect_1.1.1      
 [58] rlang_0.4.12           later_1.3.0            munsell_0.5.0         
 [61] tools_4.1.2            generics_0.1.1         broom_0.7.10          
 [64] evaluate_0.14          stringr_1.4.0          fastmap_1.1.0         
 [67] yaml_2.2.1             goftest_1.2-3          knitr_1.36            
 [70] bit64_4.0.5            fs_1.5.2               fitdistrplus_1.1-6    
 [73] purrr_0.3.4            RANN_2.6.1             pbapply_1.5-0         
 [76] future_1.23.0          nlme_3.1-152           whisker_0.4           
 [79] mime_0.12              xml2_1.3.3             compiler_4.1.2        
 [82] rstudioapi_0.13        plotly_4.10.0          png_0.1-7             
 [85] ggsignif_0.6.3         spatstat.utils_2.3-0   tibble_3.1.6          
 [88] bslib_0.3.1            stringi_1.7.6          highr_0.9             
 [91] lattice_0.20-45        Matrix_1.4-0           vctrs_0.3.8           
 [94] pillar_1.6.4           lifecycle_1.0.1        spatstat.geom_2.3-1   
 [97] lmtest_0.9-39          jquerylib_0.1.4        RcppAnnoy_0.0.19      
[100] data.table_1.14.2      cowplot_1.1.1          irlba_2.3.5           
[103] httpuv_1.6.3           R6_2.5.1               promises_1.2.0.1      
[106] KernSmooth_2.23-20     gridExtra_2.3          parallelly_1.29.0     
[109] codetools_0.2-18       MASS_7.3-54            assertthat_0.2.1      
[112] rprojroot_2.0.2        withr_2.4.3            sctransform_0.3.2.9008
[115] mgcv_1.8-38            parallel_4.1.2         hms_1.1.1             
[118] grid_4.1.2             rpart_4.1-15           tidyr_1.1.4           
[121] rmarkdown_2.11         carData_3.0-4          Rtsne_0.15            
[124] git2r_0.29.0           shiny_1.7.1           

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xtable_1.8-4            sparseMatrixStats_1.4.2 MatrixGenerics_1.4.3   
 [4] matrixStats_0.61.0      SeuratObject_4.0.4      Seurat_4.0.5           
 [7] scattermore_0.7         reshape2_1.4.4          readr_2.1.1            
[10] RColorBrewer_1.1-2      patchwork_1.1.1         kableExtra_1.3.4       
[13] here_1.0.1              ggridges_0.5.3          ggpubr_0.4.0           
[16] ggplot2_3.3.5           dplyr_1.0.7             workflowr_1.6.2        

loaded via a namespace (and not attached):
  [1] backports_1.4.1        systemfonts_1.0.3      plyr_1.8.6            
  [4] igraph_1.2.9           lazyeval_0.2.2         splines_4.1.2         
  [7] listenv_0.8.0          digest_0.6.29          htmltools_0.5.2       
 [10] fansi_0.5.0            magrittr_2.0.1         tensor_1.5            
 [13] cluster_2.1.2          ROCR_1.0-11            tzdb_0.2.0            
 [16] globals_0.14.0         vroom_1.5.7            svglite_2.0.0         
 [19] spatstat.sparse_2.0-0  colorspace_2.0-2       rvest_1.0.2           
 [22] ggrepel_0.9.1          xfun_0.28              crayon_1.4.2          
 [25] jsonlite_1.7.2         spatstat.data_2.1-0    survival_3.2-13       
 [28] zoo_1.8-9              glue_1.5.1             polyclip_1.10-0       
 [31] gtable_0.3.0           webshot_0.5.2          leiden_0.3.9          
 [34] car_3.0-12             future.apply_1.8.1     abind_1.4-5           
 [37] scales_1.1.1           DBI_1.1.1              rstatix_0.7.0         
 [40] miniUI_0.1.1.1         Rcpp_1.0.7             viridisLite_0.4.0     
 [43] reticulate_1.22        spatstat.core_2.3-2    bit_4.0.4             
 [46] htmlwidgets_1.5.4      httr_1.4.2             ellipsis_0.3.2        
 [49] ica_1.0-2              farver_2.1.0           pkgconfig_2.0.3       
 [52] sass_0.4.0             uwot_0.1.11            deldir_1.0-6          
 [55] utf8_1.2.2             labeling_0.4.2         tidyselect_1.1.1      
 [58] rlang_0.4.12           later_1.3.0            munsell_0.5.0         
 [61] tools_4.1.2            generics_0.1.1         broom_0.7.10          
 [64] evaluate_0.14          stringr_1.4.0          fastmap_1.1.0         
 [67] yaml_2.2.1             goftest_1.2-3          knitr_1.36            
 [70] bit64_4.0.5            fs_1.5.2               fitdistrplus_1.1-6    
 [73] purrr_0.3.4            RANN_2.6.1             pbapply_1.5-0         
 [76] future_1.23.0          nlme_3.1-152           whisker_0.4           
 [79] mime_0.12              xml2_1.3.3             compiler_4.1.2        
 [82] rstudioapi_0.13        plotly_4.10.0          png_0.1-7             
 [85] ggsignif_0.6.3         spatstat.utils_2.3-0   tibble_3.1.6          
 [88] bslib_0.3.1            stringi_1.7.6          highr_0.9             
 [91] lattice_0.20-45        Matrix_1.4-0           vctrs_0.3.8           
 [94] pillar_1.6.4           lifecycle_1.0.1        spatstat.geom_2.3-1   
 [97] lmtest_0.9-39          jquerylib_0.1.4        RcppAnnoy_0.0.19      
[100] data.table_1.14.2      cowplot_1.1.1          irlba_2.3.5           
[103] httpuv_1.6.3           R6_2.5.1               promises_1.2.0.1      
[106] KernSmooth_2.23-20     gridExtra_2.3          parallelly_1.29.0     
[109] codetools_0.2-18       MASS_7.3-54            assertthat_0.2.1      
[112] rprojroot_2.0.2        withr_2.4.3            sctransform_0.3.2.9008
[115] mgcv_1.8-38            parallel_4.1.2         hms_1.1.1             
[118] grid_4.1.2             rpart_4.1-15           tidyr_1.1.4           
[121] rmarkdown_2.11         carData_3.0-4          Rtsne_0.15            
[124] git2r_0.29.0           shiny_1.7.1