Last updated: 2021-12-17

Checks: 7 0

Knit directory: sct2_revision/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210706) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 8afc486. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/raw_data/
    Ignored:    data/rds_filtered/
    Ignored:    data/rds_raw/
    Ignored:    data/sampled_counts/
    Ignored:    output/snakemake_output/

Untracked files:
    Untracked:  code/02_run_seurat_noclip.R
    Untracked:  code/07AA_deseq2_muscat_simulate.R
    Untracked:  code/07A_muscat_simulate.R
    Untracked:  code/07A_simulate_muscat.R
    Untracked:  code/07BB_deseq2_muscat_process.R
    Untracked:  code/07B_muscat_process.R
    Untracked:  code/07B_process_muscat.R
    Untracked:  code/08_run_presto.R
    Untracked:  code/17A_HEK_SS3_dropseq.Rmd
    Untracked:  code/17A_HEK_SS3_dropseq_files/
    Untracked:  code/17C_HEK_Quartzeseq2_dropseq.Rmd
    Untracked:  code/17C_HEK_Quartzeseq2_dropseq_files/
    Untracked:  code/17_HEK_SS3_ChromiumV3.Rmd
    Untracked:  code/17_HEK_SS3_ChromiumV3.nb.html
    Untracked:  code/17_HEK_SS3_ChromiumV3_files/
    Untracked:  code/AA_process_muscat.R
    Untracked:  code/BB_process_muscat.R
    Untracked:  code/DD_simulate_muscat.R
    Untracked:  code/EE_simulate_muscat.R
    Untracked:  code/XX_process_muscat.R
    Untracked:  code/XX_simulate_muscat.R
    Untracked:  code/YY_simulate_muscat.R
    Untracked:  code/ZZ_simulate_muscat.R
    Untracked:  code/kang_muscat.R
    Untracked:  code/prep_sce.R
    Untracked:  code/prep_sce_ss3_dropseq.R
    Untracked:  data/azimuth_predictions/
    Untracked:  junk/
    Untracked:  mamba_update_changes.txt
    Untracked:  output/11C_VST/
    Untracked:  output/AAmuscat_simulated/
    Untracked:  output/BBmuscat_simulated/
    Untracked:  output/CCmuscat_simulated/
    Untracked:  output/CD4_NK_downsampling_DE.rds
    Untracked:  output/DDmuscat_simulated/
    Untracked:  output/EEmuscat_simulated/
    Untracked:  output/KANGmuscat_simulated/
    Untracked:  output/NK_downsampling/
    Untracked:  output/XXmuscat_simulated/
    Untracked:  output/YYmuscat_simulated/
    Untracked:  output/ZZmuscat_simulated/
    Untracked:  output/figures/
    Untracked:  output/kang_prepsce.rds
    Untracked:  output/muscat_simulated/
    Untracked:  output/muscat_simulation/
    Untracked:  output/seu_sct2_sim.rds
    Untracked:  output/simulation_HEK_QuartzSeq2_Dropseq_downsampling/
    Untracked:  output/simulation_HEK_SS3_ChromiumV3_downsampling/
    Untracked:  output/simulation_HEK_SS3_Dropseq_downsampling/
    Untracked:  output/simulation_HEK_downsampling/
    Untracked:  output/simulation_NK_downsampling/
    Untracked:  output/ss3_dropseq_prepsim.rds
    Untracked:  output/tables/
    Untracked:  output/vargenes/
    Untracked:  snakemake/.snakemake/
    Untracked:  snakemake/Snakefile_noclip.smk
    Untracked:  snakemake/Snakefile_presto.smk
    Untracked:  snakemake/cluster.yaml
    Untracked:  snakemake/install_glm.R
    Untracked:  snakemake/jobscript.sh
    Untracked:  snakemake/jobscript_ncells.sh
    Untracked:  snakemake/local_run_downsampling.sh
    Untracked:  snakemake/local_run_glm.sh
    Untracked:  snakemake/local_run_ncells.sh
    Untracked:  snakemake/local_run_noclip.sh
    Untracked:  snakemake/local_run_presto.sh
    Untracked:  snakemake/local_run_time.sh
    Untracked:  snakemake/run_glm.sh
    Untracked:  snakemake/run_ncells.sh
    Untracked:  snakemake/sct2_revision_env.yml
    Untracked:  temp_figures/

Unstaged changes:
    Deleted:    analysis/04_PBMC68k.Rmd
    Modified:   code/02_run_seurat.R
    Modified:   code/03_run_vst2_downsample.R
    Modified:   code/04_run_vst_ncells.R
    Modified:   code/06_run_sct.R
    Modified:   data/datasets.csv
    Modified:   snakemake/Snakefile_downsampling.smk
    Modified:   snakemake/Snakefile_glm_seurat.smk
    Modified:   snakemake/Snakefile_metacell.smk

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/02_Mereu.Rmd) and HTML (docs/02_Mereu.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html d736ec8 Saket Choudhary 2021-07-07 Build site.
Rmd 400797a Saket Choudhary 2021-07-06 workflowr::wflow_git_commit(all = TRUE)
html 400797a Saket Choudhary 2021-07-06 workflowr::wflow_git_commit(all = TRUE)

suppressPackageStartupMessages({
  library(Seurat)
  library(SingleCellExperiment)
})
set.seed(42)
download_dir <- here::here("data/raw_data/Mereu")
dir.create(download_dir, showWarnings = F, recursive = T)
dir.create(here::here("data/rds_raw"), showWarnings = F, recursive = T)

file_location <- here::here(download_dir, "sce.all_classified.technologies.RData")
if (!file.exists(file_location)) {
  download.file("https://www.dropbox.com/s/i8mwmyymchx8mn8/sce.all_classified.technologies.RData?dl=0", file_location, method = "wget", extra = "--content-disposition")
}
load(file = file_location)
sce
class: SingleCellExperiment 
dim: 23381 20237 
metadata(0):
assays(2): counts logcounts
rownames(23381): TSPAN6 DPM1 ... RPL31P58 RP11-553E24.1
rowData names(0):
colnames(20237): 10X2x5K_64221_AAACCTGCACTTCGAA
  10X2x5K_64221_AAACCTGCAGTACACT ...
  SMARTseqFINAL_allLanes_TTGTCGTGTCTCGGAA
  SMARTseqFINAL_allLanes_TTGTCGTGTGATCCGA
colData names(3): nnet2 ident batch
reducedDimNames(2): UMAP PCA
mainExpName: NULL
altExpNames(0):
metadata <- as.data.frame(colData(sce))
metadata$nnet2 <- as.character(metadata$nnet2)
metadata$ident <- as.character(metadata$ident)
metadata$batch <- as.character(metadata$batch)
head(metadata)
                                     nnet2         ident    batch
10X2x5K_64221_AAACCTGCACTTCGAA     B cells       B cells Chromium
10X2x5K_64221_AAACCTGCAGTACACT CD4 T cells CD4 T cells 2 Chromium
10X2x5K_64221_AAACCTGTCCACTGGG CD4 T cells CD4 T cells 1 Chromium
10X2x5K_64221_AAACGGGAGAGCTTCT   HEK cells   HEK cells 2 Chromium
10X2x5K_64221_AAACGGGAGGTGGGTT   HEK cells   HEK cells 2 Chromium
10X2x5K_64221_AAACGGGCACACGCTG CD4 T cells CD4 T cells 1 Chromium
counts_matrix <- counts(sce)
counts_matrix <- as(object = counts_matrix, Class = "dgCMatrix")
common_cols <- intersect(rownames(metadata), colnames(counts_matrix))

counts_matrix <- counts_matrix[, common_cols]
metadata <- metadata[common_cols, ]


colnames(counts_matrix) <- paste0("cell-", colnames(counts_matrix))
rownames(metadata) <- colnames(counts_matrix)

dim(counts_matrix)
[1] 23381 20237
seu <- CreateSeuratObject(counts_matrix, meta.data = metadata, project = "Mereu_2021_scBenchmark_Rdata", min.cells = 1, min.features = 1)
seu
An object of class Seurat 
23381 features across 20237 samples within 1 assay 
Active assay: RNA (23381 features, 0 variable features)
nonumi.techs <- c("C1HT-medium", "C1HT-small", "ICELL8", "Smart-Seq2")
table(seu@meta.data$batch)

 C1HT-medium   C1HT-small     CEL-Seq2     Chromium Chromium(sn)        ddSEQ 
        2216         1606         1083         1604         1515         2109 
    Drop-Seq       ICELL8       inDrop     MARS-Seq   mcSCRB-Seq  Quartz-Seq2 
        2261         1927          686         1481         1684         1333 
  Smart-Seq2 
         732 
table(seu@meta.data$ident)

                  Ambiguous                     B cells 
                        165                        1562 
  CD14 and FCGR3A Monocytes              CD14 Monocytes 
                         55                         262 
CD14+ and FCGR3A+ Monocytes             CD14+ Monocytes 
                        849                        2113 
              CD4 T cells 1               CD4 T cells 2 
                        602                         270 
                 CD4+ cells                CD4+ T cells 
                        432                         797 
                CD8 T cells                CD8+ T cells 
                        140                          88 
        CD8+ T cells and NK           Cytotoxic T cells 
                       1336                        2885 
        Cytotoxic T cells 1         Cytotoxic T cells 2 
                       1022                         400 
           FCGR3A Monocytes           FCGR3A+ Monocytes 
                         37                         181 
                  HEK cells                 HEK cells 1 
                       1665                        1265 
                HEK cells 2                 HEK cells 3 
                        579                         132 
                 HEK cells1                  HEK cells3 
                        519                          58 
   NK and Cytotoxic T cells                    NK cells 
                       2244                         249 
                    unclear                     unknown 
                        294                          36 
table(seu@meta.data$nnet2)

          B cells   CD14+ Monocytes       CD4 T cells       CD8 T cells 
             1657              2782              5693              2522 
  Dendritic cells FCGR3A+ Monocytes         HEK cells    Megakaryocytes 
              274               897              4942                45 
         NK cells 
             1425 
Idents(seu) <- "ident"
hek <- subset(seu, idents = c("HEK cells", "HEK cells 2", "HEK cells1", "HEK cells 1", "HEK cells3", "HEK cells 3"))

table(hek@meta.data$ident)

  HEK cells HEK cells 1 HEK cells 2 HEK cells 3  HEK cells1  HEK cells3 
       1665        1265         579         132         519          58 
table(hek@meta.data$nnet2)

          B cells   CD14+ Monocytes       CD4 T cells       CD8 T cells 
               88               216               243                38 
  Dendritic cells FCGR3A+ Monocytes         HEK cells    Megakaryocytes 
                9                59              3526                 7 
         NK cells 
               32 
Idents(hek) <- "nnet2"
hek <- subset(hek, idents = c("HEK cells"))

table(hek@meta.data$ident)

  HEK cells HEK cells 1 HEK cells 2 HEK cells 3  HEK cells1  HEK cells3 
       1435         996         454         124         467          50 
table(hek@meta.data$nnet2)

HEK cells 
     3526 
Idents(hek) <- "batch"

hek.umi <- subset(hek, idents = c(nonumi.techs), invert=TRUE)
table(hek.umi@meta.data$batch)

    CEL-Seq2     Chromium Chromium(sn)        ddSEQ     Drop-Seq       inDrop 
         101          326           47          517          473           49 
    MARS-Seq   mcSCRB-Seq  Quartz-Seq2 
          88           74          269 
umi_techs <- sort(unique(hek.umi$batch))
umi_techs
[1] "CEL-Seq2"     "Chromium"     "Chromium(sn)" "ddSEQ"        "Drop-Seq"    
[6] "inDrop"       "MARS-Seq"     "mcSCRB-Seq"   "Quartz-Seq2" 
clean_named_techs <- c("CEL-seq2", "ChromiumV2", "ChromiumV2_sn", "ddSeq", "Drop-seq", "inDrops", "MARS-seq", "mcSCRB-seq", "Quartz-Seq2")

names(clean_named_techs) <- umi_techs
clean_named_techs
       CEL-Seq2        Chromium    Chromium(sn)           ddSEQ        Drop-Seq 
     "CEL-seq2"    "ChromiumV2" "ChromiumV2_sn"         "ddSeq"      "Drop-seq" 
         inDrop        MARS-Seq      mcSCRB-Seq     Quartz-Seq2 
      "inDrops"      "MARS-seq"    "mcSCRB-seq"   "Quartz-Seq2" 
hek_split <- SplitObject(hek.umi, split.by = "batch")

for (given_tech in names(hek_split)){
    seu <- hek_split[[given_tech]]
    seu[["percent.mt"]] <- PercentageFeatureSet(seu, pattern = "^MT-")

    clean_tech <- clean_named_techs[[given_tech]]
    saveRDS(seu, here::here("data/rds_raw", paste0("Mereu-HEK__", clean_tech,".rds")))
    gc()
}
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0
 [3] Biobase_2.52.0              GenomicRanges_1.44.0       
 [5] GenomeInfoDb_1.28.4         IRanges_2.26.0             
 [7] S4Vectors_0.30.2            BiocGenerics_0.38.0        
 [9] MatrixGenerics_1.4.3        matrixStats_0.61.0         
[11] SeuratObject_4.0.4          Seurat_4.0.5               
[13] workflowr_1.6.2            

loaded via a namespace (and not attached):
  [1] Rtsne_0.15             colorspace_2.0-2       deldir_1.0-6          
  [4] ellipsis_0.3.2         ggridges_0.5.3         rprojroot_2.0.2       
  [7] XVector_0.32.0         fs_1.5.2               spatstat.data_2.1-0   
 [10] leiden_0.3.9           listenv_0.8.0          ggrepel_0.9.1         
 [13] fansi_0.5.0            codetools_0.2-18       splines_4.1.2         
 [16] knitr_1.36             polyclip_1.10-0        jsonlite_1.7.2        
 [19] ica_1.0-2              cluster_2.1.2          png_0.1-7             
 [22] uwot_0.1.11            shiny_1.7.1            sctransform_0.3.2.9008
 [25] spatstat.sparse_2.0-0  compiler_4.1.2         httr_1.4.2            
 [28] assertthat_0.2.1       Matrix_1.4-0           fastmap_1.1.0         
 [31] lazyeval_0.2.2         later_1.3.0            htmltools_0.5.2       
 [34] tools_4.1.2            igraph_1.2.9           GenomeInfoDbData_1.2.6
 [37] gtable_0.3.0           glue_1.5.1             RANN_2.6.1            
 [40] reshape2_1.4.4         dplyr_1.0.7            Rcpp_1.0.7            
 [43] scattermore_0.7        jquerylib_0.1.4        vctrs_0.3.8           
 [46] nlme_3.1-152           lmtest_0.9-39          xfun_0.28             
 [49] stringr_1.4.0          globals_0.14.0         mime_0.12             
 [52] miniUI_0.1.1.1         lifecycle_1.0.1        irlba_2.3.5           
 [55] goftest_1.2-3          future_1.23.0          zlibbioc_1.38.0       
 [58] MASS_7.3-54            zoo_1.8-9              scales_1.1.1          
 [61] spatstat.core_2.3-2    promises_1.2.0.1       spatstat.utils_2.3-0  
 [64] RColorBrewer_1.1-2     yaml_2.2.1             reticulate_1.22       
 [67] pbapply_1.5-0          gridExtra_2.3          ggplot2_3.3.5         
 [70] sass_0.4.0             rpart_4.1-15           stringi_1.7.6         
 [73] bitops_1.0-7           rlang_0.4.12           pkgconfig_2.0.3       
 [76] evaluate_0.14          lattice_0.20-45        ROCR_1.0-11           
 [79] purrr_0.3.4            tensor_1.5             patchwork_1.1.1       
 [82] htmlwidgets_1.5.4      cowplot_1.1.1          tidyselect_1.1.1      
 [85] here_1.0.1             parallelly_1.29.0      RcppAnnoy_0.0.19      
 [88] plyr_1.8.6             magrittr_2.0.1         R6_2.5.1              
 [91] generics_0.1.1         DelayedArray_0.18.0    DBI_1.1.1             
 [94] mgcv_1.8-38            pillar_1.6.4           whisker_0.4           
 [97] fitdistrplus_1.1-6     RCurl_1.98-1.5         survival_3.2-13       
[100] abind_1.4-5            tibble_3.1.6           future.apply_1.8.1    
[103] crayon_1.4.2           KernSmooth_2.23-20     utf8_1.2.2            
[106] spatstat.geom_2.3-1    plotly_4.10.0          rmarkdown_2.11        
[109] grid_4.1.2             data.table_1.14.2      git2r_0.29.0          
[112] digest_0.6.29          xtable_1.8-4           tidyr_1.1.4           
[115] httpuv_1.6.3           munsell_0.5.0          viridisLite_0.4.0     
[118] bslib_0.3.1           

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0
 [3] Biobase_2.52.0              GenomicRanges_1.44.0       
 [5] GenomeInfoDb_1.28.4         IRanges_2.26.0             
 [7] S4Vectors_0.30.2            BiocGenerics_0.38.0        
 [9] MatrixGenerics_1.4.3        matrixStats_0.61.0         
[11] SeuratObject_4.0.4          Seurat_4.0.5               
[13] workflowr_1.6.2            

loaded via a namespace (and not attached):
  [1] Rtsne_0.15             colorspace_2.0-2       deldir_1.0-6          
  [4] ellipsis_0.3.2         ggridges_0.5.3         rprojroot_2.0.2       
  [7] XVector_0.32.0         fs_1.5.2               spatstat.data_2.1-0   
 [10] leiden_0.3.9           listenv_0.8.0          ggrepel_0.9.1         
 [13] fansi_0.5.0            codetools_0.2-18       splines_4.1.2         
 [16] knitr_1.36             polyclip_1.10-0        jsonlite_1.7.2        
 [19] ica_1.0-2              cluster_2.1.2          png_0.1-7             
 [22] uwot_0.1.11            shiny_1.7.1            sctransform_0.3.2.9008
 [25] spatstat.sparse_2.0-0  compiler_4.1.2         httr_1.4.2            
 [28] assertthat_0.2.1       Matrix_1.4-0           fastmap_1.1.0         
 [31] lazyeval_0.2.2         later_1.3.0            htmltools_0.5.2       
 [34] tools_4.1.2            igraph_1.2.9           GenomeInfoDbData_1.2.6
 [37] gtable_0.3.0           glue_1.5.1             RANN_2.6.1            
 [40] reshape2_1.4.4         dplyr_1.0.7            Rcpp_1.0.7            
 [43] scattermore_0.7        jquerylib_0.1.4        vctrs_0.3.8           
 [46] nlme_3.1-152           lmtest_0.9-39          xfun_0.28             
 [49] stringr_1.4.0          globals_0.14.0         mime_0.12             
 [52] miniUI_0.1.1.1         lifecycle_1.0.1        irlba_2.3.5           
 [55] goftest_1.2-3          future_1.23.0          zlibbioc_1.38.0       
 [58] MASS_7.3-54            zoo_1.8-9              scales_1.1.1          
 [61] spatstat.core_2.3-2    promises_1.2.0.1       spatstat.utils_2.3-0  
 [64] RColorBrewer_1.1-2     yaml_2.2.1             reticulate_1.22       
 [67] pbapply_1.5-0          gridExtra_2.3          ggplot2_3.3.5         
 [70] sass_0.4.0             rpart_4.1-15           stringi_1.7.6         
 [73] bitops_1.0-7           rlang_0.4.12           pkgconfig_2.0.3       
 [76] evaluate_0.14          lattice_0.20-45        ROCR_1.0-11           
 [79] purrr_0.3.4            tensor_1.5             patchwork_1.1.1       
 [82] htmlwidgets_1.5.4      cowplot_1.1.1          tidyselect_1.1.1      
 [85] here_1.0.1             parallelly_1.29.0      RcppAnnoy_0.0.19      
 [88] plyr_1.8.6             magrittr_2.0.1         R6_2.5.1              
 [91] generics_0.1.1         DelayedArray_0.18.0    DBI_1.1.1             
 [94] mgcv_1.8-38            pillar_1.6.4           whisker_0.4           
 [97] fitdistrplus_1.1-6     RCurl_1.98-1.5         survival_3.2-13       
[100] abind_1.4-5            tibble_3.1.6           future.apply_1.8.1    
[103] crayon_1.4.2           KernSmooth_2.23-20     utf8_1.2.2            
[106] spatstat.geom_2.3-1    plotly_4.10.0          rmarkdown_2.11        
[109] grid_4.1.2             data.table_1.14.2      git2r_0.29.0          
[112] digest_0.6.29          xtable_1.8-4           tidyr_1.1.4           
[115] httpuv_1.6.3           munsell_0.5.0          viridisLite_0.4.0     
[118] bslib_0.3.1