Last updated: 2022-10-28

Checks: 6 1

Knit directory: synovialscrnaseq/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210105) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 816d5c9. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    '/
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .empty/
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/iSEE_interactive_document.html
    Ignored:    code/test_files/
    Ignored:    data/Culemann/
    Ignored:    data/E-MTAB-8322/
    Ignored:    data/Synovial scRNA-seq samples - Sheet1.csv
    Ignored:    data/Zhang_top20_singlecell_cluster_markers_fromGithub.csv
    Ignored:    data/findMarkers_results.rds
    Ignored:    data/findMarkers_results_v2.rds
    Ignored:    data/info/
    Ignored:    data/syn_sce_tidy_filtered.rds
    Ignored:    data/syn_sce_tidy_hvg.rds
    Ignored:    data/syn_sce_tidy_hvg_cms.rds
    Ignored:    docs/
    Ignored:    output/Figures_Paper/
    Ignored:    output/Sample_summaries_RA_comparisons.rds
    Ignored:    output/Sample_summaries_direct_dissociation.rds
    Ignored:    output/Sample_summaries_exvivo_treatment.rds
    Ignored:    output/Suppl_Figure_4d.rds
    Ignored:    output/barcodes.txt
    Ignored:    output/count_matrix_unfiltered.mtx
    Ignored:    output/emptyDrops_result_v4.rds
    Ignored:    output/emptyDrops_result_v4_tmp.rds
    Ignored:    output/emptyDrops_result_v4tmptmp.rds
    Ignored:    output/entropies_fstat_v5_ec.rds
    Ignored:    output/entropies_fstat_v5_main.rds
    Ignored:    output/entropies_fstat_v5_mp.rds
    Ignored:    output/entropies_fstat_v5_sf.rds
    Ignored:    output/entropies_fstat_v5_tc.rds
    Ignored:    output/findMarkers_results_v5_ec.rds
    Ignored:    output/findMarkers_results_v5_main.rds
    Ignored:    output/findMarkers_results_v5_mp.rds
    Ignored:    output/findMarkers_results_v5_sf.rds
    Ignored:    output/findMarkers_results_v5_tc.rds
    Ignored:    output/findMarkers_results_v6.rds
    Ignored:    output/findMarkers_results_v6_ec.rds
    Ignored:    output/findMarkers_results_v6_main.rds
    Ignored:    output/findMarkers_results_v6_mp.rds
    Ignored:    output/findMarkers_results_v6_sf.rds
    Ignored:    output/findMarkers_results_v6_tc.rds
    Ignored:    output/genes.txt
    Ignored:    output/goana_results_v6_ec.rds
    Ignored:    output/goana_results_v6_mp.rds
    Ignored:    output/preprocessing_number_of_cells.rds
    Ignored:    output/syn_v4_sce_emptyDrops_invivo.rds
    Ignored:    output/syn_v4_swappedDrops_24300_after.rds
    Ignored:    output/syn_v4_swappedDrops_24300_before.rds
    Ignored:    output/syn_v4_swappedDrops_24793_after.rds
    Ignored:    output/syn_v4_swappedDrops_24793_before.rds
    Ignored:    output/syn_v5_annot_df_manual.rds
    Ignored:    output/syn_v5_cluster_cellid_match_invivo.rds
    Ignored:    output/syn_v5_clustering_lookup_invivo.rds
    Ignored:    output/syn_v5_clustering_lookup_multiple_invivo.rds
    Ignored:    output/syn_v5_res_da_Accute_inflammation_invivo.rds
    Ignored:    output/syn_v5_res_da_Diagnosis_invivo.rds
    Ignored:    output/syn_v5_res_da_Diagnosis_main_invivo.rds
    Ignored:    output/syn_v5_res_da_Lymphoid_folicles_invivo.rds
    Ignored:    output/syn_v5_res_da_Pathotype_invivo.rds
    Ignored:    output/syn_v5_res_da_Therapy_invivo.rds
    Ignored:    output/syn_v5_res_da_Vascularisation_bin_invivo.rds
    Ignored:    output/syn_v5_res_ds_Accute_inflammation_invivo.rds
    Ignored:    output/syn_v5_res_ds_Diagnosis_invivo.rds
    Ignored:    output/syn_v5_res_ds_Diagnosis_main_invivo.rds
    Ignored:    output/syn_v5_res_ds_Lymphoid_folicles_invivo.rds
    Ignored:    output/syn_v5_res_ds_Pathotype_invivo.rds
    Ignored:    output/syn_v5_res_ds_Therapy_invivo.rds
    Ignored:    output/syn_v5_res_ds_Vascularisation_bin_invivo.rds
    Ignored:    output/syn_v5_sce.rds
    Ignored:    output/syn_v5_sce_ec_invivo.rds
    Ignored:    output/syn_v5_sce_filtered_invivo.rds
    Ignored:    output/syn_v5_sce_hvg_cms_doublet_annot_manual_invivo.rds
    Ignored:    output/syn_v5_sce_hvg_cms_doublet_cmstest_invivo.rds
    Ignored:    output/syn_v5_sce_hvg_cms_doublet_invivo.rds
    Ignored:    output/syn_v5_sce_hvg_cms_doublet_subcluster_invivo.rds
    Ignored:    output/syn_v5_sce_hvg_invivo.rds
    Ignored:    output/syn_v5_sce_mp_invivo.rds
    Ignored:    output/syn_v5_sce_sf_invivo.rds
    Ignored:    output/syn_v5_sce_tc_invivo.rds
    Ignored:    output/syn_v5_vst_out_invivo.rds
    Ignored:    output/syn_v6_cluster_cellid_match_invivo.rds
    Ignored:    output/syn_v6_clustering_lookup_invivo.rds
    Ignored:    output/syn_v6_clustering_lookup_multiple_invivo.rds
    Ignored:    output/syn_v6_sce.rds
    Ignored:    output/syn_v6_sce_Figure8.rds
    Ignored:    output/syn_v6_sce_Figure8_dic_ls.rds
    Ignored:    output/syn_v6_sce_ec_invivo.rds
    Ignored:    output/syn_v6_sce_filtered_invivo.rds
    Ignored:    output/syn_v6_sce_hdf5/
    Ignored:    output/syn_v6_sce_hvg_cms_doublet_invivo.rds
    Ignored:    output/syn_v6_sce_hvg_cms_doublet_subcluster_invivo.rds
    Ignored:    output/syn_v6_sce_hvg_invivo.rds
    Ignored:    output/syn_v6_sce_hvg_marker_genes.rds
    Ignored:    output/syn_v6_sce_mp_invivo.rds
    Ignored:    output/syn_v6_sce_sf_invivo.rds
    Ignored:    output/syn_v6_sce_tc_invivo.rds
    Ignored:    output/syn_v6_sfig1.rds
    Ignored:    output/syn_v6_vst_out_invivo.rds
    Ignored:    output/syn_v7_sce.rds
    Ignored:    output/syn_v7_sce_filtered_invivo.rds
    Ignored:    output/syn_v7_sce_hvg_invivo.rds
    Ignored:    output/syn_v7_sfig1.rds
    Ignored:    output/syn_v7_vst_out_invivo.rds

Untracked files:
    Untracked:  analysis/scRNAseq_complete_01_preprocessing_comparison.Rmd
    Untracked:  analysis/test.Rmd
    Untracked:  code/rebuild_ezRun.R
    Untracked:  nonhosted_public/
    Untracked:  singRstudio.sh.bak

Unstaged changes:
    Modified:   analysis/scRNAseq_complete_01_preprocessing.Rmd
    Modified:   analysis/scRNAseq_complete_02_HVG_Dimred.Rmd
    Modified:   analysis/scRNAseq_complete_03-2_Subcelltypes_processing.Rmd
    Modified:   analysis/scRNAseq_complete_03-3_Subcelltypes_clustering.Rmd
    Modified:   analysis/scRNAseq_complete_03-4_Subcelltypes_clustering_walktrap.Rmd
    Modified:   analysis/scRNAseq_complete_03_Batch_Clustering_Doublets.Rmd
    Modified:   analysis/scRNAseq_complete_Figures.Rmd
    Modified:   analysis/write_tsv.Rmd

Staged changes:
    Modified:   analysis/scRNAseq_complete_00_ambient_RNA.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/scRNAseq_complete_03_Batch_Clustering_Doublets.Rmd) and HTML (public/scRNAseq_complete_03_Batch_Clustering_Doublets.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html 3443cc6 Reto Gerber 2022-04-25 Update
html b5b139f Reto Gerber 2022-03-29 Update analysis
Rmd 7d99571 Reto Gerber 2022-03-21 update analysis
html 7d99571 Reto Gerber 2022-03-21 update analysis
Rmd 9133ed1 Reto Gerber 2022-03-04 update to v6
html 9133ed1 Reto Gerber 2022-03-04 update to v6
html f2e34e1 Reto Gerber 2021-07-29 Update navbar
Rmd ee1face Reto Gerber 2021-07-29 Add missing scripts
html 222b0d1 Reto Gerber 2021-07-29 Update analysis to v5
Rmd a18fb61 retogerber 2021-05-26 workflow with no cultured and no low quality samples
html a18fb61 retogerber 2021-05-26 workflow with no cultured and no low quality samples
Rmd a301681 retogerber 2021-05-19 update complete analysis, new samples added
html a301681 retogerber 2021-05-19 update complete analysis, new samples added
Rmd 1d92bf1 retogerber 2021-05-03 update nearly complete data workflow
html 1d92bf1 retogerber 2021-05-03 update nearly complete data workflow

Set up

suppressPackageStartupMessages({
  library(dplyr)
  library(ggplot2)
  library(purrr)
  library(stringr)
  library(SummarizedExperiment)
  library(SingleCellExperiment)
  library(scater)
  library(scran)
  library(igraph)
  library(scuttle)
  library(tidySingleCellExperiment)
  library(bluster)
  library(CellID)
  library(BiocParallel)
})
n_workers <- 20
RhpcBLASctl::blas_set_num_threads(n_workers)
bpparam <- BiocParallel::MulticoreParam(workers=n_workers, RNGseed = 123)


here::here()
[1] "/home/retger/Synovial/synovialscrnaseq"
remove_low_quality_samples <- TRUE
analysis_version <- 7
use_vst <- TRUE

set.seed(100)

load data from dim reduction

tmpfilename <- paste0("syn_v",analysis_version,"_sce_hvg",dplyr::if_else(remove_low_quality_samples, "_invivo",""),".rds")
syn_sce_tidy_hvg <- readRDS(file = here::here("output",tmpfilename))

colnames(syn_sce_tidy_hvg) <-
  paste0(syn_sce_tidy_hvg$Sample, '.', syn_sce_tidy_hvg$Barcode)

Batch effects

Run batch correction using batchelor. Test batch correction using CellMixS.

syn_sce_tidy_hvg_cms <- syn_sce_tidy_hvg
assay(syn_sce_tidy_hvg_cms, "vstresiduals") <- NULL
rm(syn_sce_tidy_hvg)
gc()
            used   (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells   8384620  447.8   12671118   676.8   12461602   665.6
Vcells 879573175 6710.7 2528707220 19292.6 2633225397 20090.0
bpstart(bpparam)
temp_sce <- batchelor::multiBatchNorm(syn_sce_tidy_hvg_cms, 
                                      batch=syn_sce_tidy_hvg_cms$Sample,
                                      subset.row = rownames(syn_sce_tidy_hvg_cms)[rowData(syn_sce_tidy_hvg_cms)[["is_hvg"]]],
                                      normalize.all=TRUE,
                    BPPARAM = bpparam)
bpstop(bpparam)
if(remove_low_quality_samples){
  merge_order <- list(
                    list("Syn_Bio_053","Syn_Bio_054A",
                         "Syn_Bio_062","Syn_Bio_064"),#Peripheral_Spondyloarthritis
                    list("Syn_Bio_023","Syn_Bio_079","Syn_Bio_092"),#Psoriatic_Arthritis
                    list("Syn_Bio_083","Syn_Bio_084"),#Rheumatoid_Arthritis
                    list("Syn_Bio_074"),#Seronegative_Polyarthritis
                    list("Syn_Bio_026","Syn_Bio_028",
                         list("Syn_Bio_077b","Syn_Bio_077a"),
                         "Syn_Bio_049","Syn_Bio_050", "Syn_Bio_081",
                         "Syn_Bio_093","Syn_Bio_096",
                         list("Syn_Bio_098a","Syn_Bio_098b")
                         ),#Seronegative_Rheumatoid_Arthritis
                    list("Syn_Bio_091","Syn_Bio_099"),#Spondyloarthritis
                    list("Syn_Bio_087"),#To_be_determined
                    list("Syn_Bio_078")#Undiff._Polyarthritis
                    )

} else{
  merge_order <- list(list("Syn_Bio_086"),#special
                      list("Syn_Bio_053","Syn_Bio_054A",
                           "Syn_Bio_062","Syn_Bio_064"),#Peripheral_Spondyloarthritis
                      list("Syn_Bio_023","Syn_Bio_079","Syn_Bio_092"),#Psoriatic_Arthritis
                      list("Syn_Bio_083","Syn_Bio_084"),#Rheumatoid_Arthritis
                      list("Syn_Bio_074"),#Seronegative_Polyarthritis
                      list("Syn_Bio_026","Syn_Bio_028",
                           list("Syn_Bio_077b","Syn_Bio_077a"),
                           "Syn_Bio_049","Syn_Bio_050", "Syn_Bio_081",
                           "Syn_Bio_093","Syn_Bio_096",
                           list("Syn_Bio_098a","Syn_Bio_098b"),
                           list(
                             list("Syn_Bio_055_DMSO","Syn_Bio_072_DMSO","Syn_Bio_094_DMSO"),
                             list("Syn_Bio_055_Tofa","Syn_Bio_072_Tofa","Syn_Bio_094_Tofa"))
                           ),#Seronegative_Rheumatoid_Arthritis
                      list("Syn_Bio_075","Syn_Bio_091","Syn_Bio_099"),#Spondyloarthritis
                      list("Syn_Bio_059","Syn_Bio_080","Syn_Bio_089","Syn_Bio_095"),#Systemic_Sclerosis
                      list("Syn_Bio_087"),#To_be_determined
                      list("Syn_Bio_031","Syn_Bio_078")#Undiff._Polyarthritis
                      )
}

stopifnot(all(unlist(merge_order) %in% unique(syn_sce_tidy_hvg_cms$Sample)))
stopifnot(all(unique(syn_sce_tidy_hvg_cms$Sample) %in% unlist(merge_order)))
bpstart(bpparam)
temp_sce <- batchelor::fastMNN(temp_sce, batch=temp_sce$Sample, prop.k=0.02, merge.order = merge_order,
                               subset.row = rownames(temp_sce)[rowData(temp_sce)[["is_hvg"]]], correct.all=TRUE,
                    BPPARAM = bpparam)
bpstop(bpparam)

assay(syn_sce_tidy_hvg_cms, "reconstructed") <- assay(temp_sce, "reconstructed")
reducedDim(syn_sce_tidy_hvg_cms, "corrected") <- reducedDim(temp_sce, "corrected")
rm(temp_sce)
gc()
            used   (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells   8650342  462.0   12671118   676.8   12671118   676.8
Vcells 891560730 6802.1 3648414032 27835.2 4559306385 34784.8
set.seed(100)
bpstart(bpparam)
syn_sce_tidy_hvg_cms <- runUMAP(syn_sce_tidy_hvg_cms, dimred = "corrected", name = "UMAP_corrected",
                                BPPARAM = bpparam)
bpstop(bpparam)

Dimred of corrected

ndims <- intrinsicDimension::maxLikGlobalDimEst(as.matrix(reducedDim(syn_sce_tidy_hvg_cms, "corrected")), k=20)
reducedDim(syn_sce_tidy_hvg_cms,"corrected_reduced") <- reducedDim(syn_sce_tidy_hvg_cms,"corrected")[,seq_len(ceiling(ndims$dim.est))]
reducedDimNames(syn_sce_tidy_hvg_cms)
 [1] "PCA"               "PCA_vst"           "PCA_vst_reduced"  
 [4] "UMAP_vst"          "UMAP_vst_reduced"  "PCA_reduced"      
 [7] "UMAP"              "corrected"         "UMAP_corrected"   
[10] "corrected_reduced"
ncol(reducedDim(syn_sce_tidy_hvg_cms,"corrected_reduced"))
[1] 19
set.seed(100)
syn_sce_tidy_hvg_cms <- syn_sce_tidy_hvg_cms %>% 
  runUMAP(name = "UMAP_corrected_reduced", dimred = "corrected_reduced")
cat("### Dimred plots {.tabset}\n\n")

Dimred plots

cat("#### corrected PCA\n\n")

corrected PCA

plotReducedDim(syn_sce_tidy_hvg_cms, "corrected", colour_by = "Sample")

Version Author Date
7d99571 Reto Gerber 2022-03-21
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
a18fb61 retogerber 2021-05-26
a301681 retogerber 2021-05-19
1d92bf1 retogerber 2021-05-03
cat("\n\n#### uncorrected PCA\n\n")

uncorrected PCA

plotReducedDim(syn_sce_tidy_hvg_cms, "PCA", colour_by = "Sample")

Version Author Date
7d99571 Reto Gerber 2022-03-21
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
a18fb61 retogerber 2021-05-26
a301681 retogerber 2021-05-19
1d92bf1 retogerber 2021-05-03
if(use_vst){
  cat("\n\n#### uncorrected PCA vst\n\n")
  plotReducedDim(syn_sce_tidy_hvg_cms, "PCA_vst", colour_by = "Sample")
}

uncorrected PCA vst

Version Author Date
7d99571 Reto Gerber 2022-03-21
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n#### corrected dimred PCA\n\n")

corrected dimred PCA

plotReducedDim(syn_sce_tidy_hvg_cms, "corrected_reduced", colour_by = "Sample")

Version Author Date
7d99571 Reto Gerber 2022-03-21
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n#### corrected UMAP\n\n")

corrected UMAP

plotReducedDim(syn_sce_tidy_hvg_cms, "UMAP_corrected", colour_by = "Sample")

Version Author Date
7d99571 Reto Gerber 2022-03-21
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n#### uncorrected UMAP\n\n")

uncorrected UMAP

plotReducedDim(syn_sce_tidy_hvg_cms, "UMAP", colour_by = "Sample")

Version Author Date
7d99571 Reto Gerber 2022-03-21
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
if(use_vst){
  cat("\n\n#### uncorrected UMAP vst\n\n")
  plotReducedDim(syn_sce_tidy_hvg_cms,"UMAP_vst", colour_by = "Sample")
}

uncorrected UMAP vst

Version Author Date
7d99571 Reto Gerber 2022-03-21
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n#### corrected dimred UMAP\n\n")

corrected dimred UMAP

plotReducedDim(syn_sce_tidy_hvg_cms, "UMAP_corrected_reduced", colour_by = "Sample")

Version Author Date
7d99571 Reto Gerber 2022-03-21
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n### {-}")

CMS

set.seed(123)
bpstart(bpparam)
syn_sce_tidy_hvg_cms <- CellMixS::cms(syn_sce_tidy_hvg_cms, k=300, group = "Sample",
                            dim_red = "PCA", res_name = "unaligned",
                            BPPARAM = bpparam)
bpstop(bpparam)
bpstart(bpparam)
syn_sce_tidy_hvg_cms <- CellMixS::cms(syn_sce_tidy_hvg_cms, k=300, group = "Sample",
                            dim_red = "corrected", res_name = "MNN",
                            BPPARAM = bpparam)
bpstop(bpparam)
CellMixS::visHist(syn_sce_tidy_hvg_cms)

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("### UMAP cms {.tabset}\n\n")

UMAP cms

cat("#### corrected UMAP\n\n")

corrected UMAP

plotReducedDim(syn_sce_tidy_hvg_cms, "UMAP_corrected", colour_by = "cms.MNN")

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n#### uncorrected UMAP\n\n")

uncorrected UMAP

plotReducedDim(syn_sce_tidy_hvg_cms, "UMAP", colour_by = "cms.unaligned")

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n### {-}")

CellID

Main Celltype

panglao <- readr::read_tsv("https://panglaodb.se/markers/PanglaoDB_markers_27_Mar_2020.tsv.gz")

── Column specification ────────────────────────────────────────────────────────
cols(
  species = col_character(),
  `official gene symbol` = col_character(),
  `cell type` = col_character(),
  nicknames = col_character(),
  `ubiquitousness index` = col_double(),
  `product description` = col_character(),
  `gene type` = col_character(),
  `canonical marker` = col_double(),
  `germ layer` = col_character(),
  organ = col_character(),
  sensitivity_human = col_double(),
  sensitivity_mouse = col_double(),
  specificity_human = col_double(),
  specificity_mouse = col_double()
)
panglao$organ %>% unique
 [1] "Pancreas"           "Connective tissue"  "Brain"             
 [4] "Lungs"              "Smooth muscle"      "Immune system"     
 [7] "Epithelium"         "Heart"              "Liver"             
[10] "Adrenal glands"     "GI tract"           "Reproductive"      
[13] "Kidney"             "Zygote"             "Vasculature"       
[16] "Embryo"             "Blood"              "Thyroid"           
[19] NA                   "Bone"               "Skin"              
[22] "Mammary gland"      "Eye"                "Skeletal muscle"   
[25] "Olfactory system"   "Parathyroid glands" "Oral cavity"       
[28] "Thymus"             "Placenta"           "Urinary bladder"   
# restricting the analysis to pancreas specific gene signatues
panglao_sub <- panglao %>% filter(organ %in% c("Connective tissue","Epithelium","Blood","Immune system","Vasculature"))

# restricting to human specific genes
panglao_sub <- panglao_sub %>%  filter(str_detect(species,"Hs"))

panglao_sub <- panglao_sub %>%  filter(`cell type` %in% c("Dendritic cells","B cells","Fibroblasts","Macrophages","Monocytes","Mast cells","Neutrophils","NK cells","T cells","Endothelial cells", "Neutrophils","Pericytes"))

panglao_sub <- panglao_sub %>%  
  group_by(`cell type`) %>%  
  summarise(geneset = list(`official gene symbol`))
pancreas_gs <- setNames(panglao_sub$geneset, panglao_sub$`cell type`)
print(pancreas_gs)
$`B cells`
 [1] "CD2"       "CD5"       "MS4A1"     "CR2"       "CD22"      "FCER2"    
 [7] "CD40"      "CD69"      "CD70"      "CD79A"     "CD79B"     "CD80"     
[13] "CD86"      "TNFRSF9"   "SDC1"      "TNFSF4"    "TNFRSF13B" "TNFRSF13C"
[19] "PDCD1"     "IGHD"      "IGHM"      "RASGRP3"   "HLA-DRA"   "LTB"      
[25] "HLA-DQA1"  "FLI1"      "CD14"      "SEMA6D"    "LAIR1"     "IFIT3"    
[31] "IGLL1"     "DNTT"      "MME"       "SPN"       "CD19"      "CD24"     
[37] "CD27"      "B3GAT1"    "CD72"      "MUM1"      "PAX5"      "JCHAIN"   
[43] "MZB1"      "LY6D"      "FCMR"      "BANK1"     "EDEM1"     "VPREB3"   
[49] "POU2AF1"   "CRELD2"    "DERL3"     "RALGPS2"   "FCHSD2"    "POLD4"    
[55] "TNFRSF17"  "HVCN1"     "FCRLA"     "EDEM2"     "BLNK"      "TXNDC11"  
[61] "BTLA"      "SMAP2"     "FKBP11"    "SEC61A1"   "SPCS3"     "SPIB"     
[67] "EAF2"      "CXCR4"     "BIRC3"     "IGLC2"     "IGLC3"     "IGLC1"    
[73] "IL21R"     "IGKC"      "VPREB1"    "LRMP"      "KLHL6"     "SLAMF6"   
[79] "FAM129C"   "BST1"      "MSH5"      "DOK3"      "BACH2"     "PXK"      
[85] "IGHG1"     "IGHG3"     "IGHG4"     "CD38"      "PTPRC"     "EBF1"     
[91] "BCL11A"    "CCR7"      "CD55"      "CD74"      "CD52"      "TLR9"     
[97] "SWAP70"    "HMGA1"    

$`Dendritic cells`
  [1] "IL6"      "CD86"     "CD83"     "CD1A"     "CR2"      "TLR9"    
  [7] "CD1C"     "CD209"    "LAMP3"    "CD1B"     "TREM2"    "FABP4"   
 [13] "S100A9"   "ARG1"     "HLA-DRA"  "HLA-DQA1" "HLA-DMB"  "HLA-DMA" 
 [19] "HLA-DQB1" "CLEC10A"  "HLA-DRB1" "HLA-DPA1" "HLA-DPB1" "DNASE1L3"
 [25] "CLEC9A"   "LILRB2"   "ETV6"     "CD163"    "CXCR4"    "CXCL8"   
 [31] "VSIG4"    "NR4A3"    "CCR7"     "TRAF1"    "RELB"     "BATF3"   
 [37] "CCL22"    "SLAMF7"   "XCR1"     "CXCL16"   "SCIMP"    "FCGR2B"  
 [43] "FGD2"     "RAB7B"    "NAAA"     "HCK"      "CD180"    "HFE"     
 [49] "CCR2"     "RYR1"     "ITGAE"    "SEMA4A"   "DPP4"     "SLAMF8"  
 [55] "CXCR3"    "BTLA"     "FLT3"     "TLR3"     "ITGAX"    "GPR132"  
 [61] "ADAM19"   "AP1S3"    "ASS1"     "ADGRG5"   "GPR68"    "KIT"     
 [67] "KMO"      "P2RY10"   "RAB30"    "SEPT6"    "ZBTB46"   "S100A4"  
 [73] "CLEC7A"   "AIF1"     "LST1"     "CTSS"     "IRF8"     "ADGRE1"  
 [79] "CCL17"    "CD14"     "CD207"    "CD8A"     "CX3CR1"   "ITGAM"   
 [85] "LY75"     "PDCD1LG2" "PTPRC"    "SIRPA"    "FCGR3A"   "FTL"     
 [91] "SERPINA1" "AXL"      "PPP1R14A" "SIGLEC6"  "CD22"     "DAB2"    
 [97] "S100A8"   "VCAN"     "LYZ"      "ANXA1"    "FCER1A"   "C1ORF54" 
[103] "CADM1"    "CAMK2D"   "LGALS3"   "NAPSA"    "PLBD1"    "RNASE6"  
[109] "PLAC8"    "H2AFY"    "SLC11A1"  "PDPN"     "S100B"    "CD28"    
[115] "PPL"      "SLURP1"   "HLA-A"    "HLA-B"    "HLA-C"    "HLA-DRB5"
[121] "SERPINB9"

$`Endothelial cells`
  [1] "PECAM1"   "ICAM1"    "ITGB3"    "SELE"     "VCAM1"    "MCAM"    
  [7] "PROCR"    "TEK"      "FLT4"     "APLN"     "VWF"      "NOS3"    
 [13] "THBD"     "PLVAP"    "ACKR1"    "SLCO1C1"  "TMEM100"  "ADGRF5"  
 [19] "ABCG2"    "PODXL"    "NOSTRIN"  "MFSD2A"   "ACVRL1"   "AQP1"    
 [25] "MYLK"     "RASIP1"   "FLI1"     "TIE1"     "APLNR"    "NRP2"    
 [31] "ADAMTS1"  "RPRM"     "FABP4"    "GPIHBP1"  "FHL2"     "LOX"     
 [37] "KLK1"     "ADORA2A"  "ARAP3"    "ARHGEF15" "CARD10"   "CLEC14A" 
 [43] "DLL4"     "ESM1"     "GIMAP5"   "GJA4"     "MMRN2"    "NOTCH4"  
 [49] "NPR1"     "PRKCH"    "RASGRP3"  "ROBO4"    "SCARF1"   "SOX18"   
 [55] "SOX7"     "SPNS2"    "THSD1"    "APOLD1"   "EMP1"     "CD36"    
 [61] "RNASE1"   "CTGF"     "HYAL2"    "CLEC4G"   "GPR182"   "F8"      
 [67] "RBP7"     "CALCRL"   "FOXF1"    "CASZ1"    "AQP7"     "TCF15"   
 [73] "CD300LG"  "BTNL9"    "MEOX2"    "ERG"      "HEXIM1"   "GLYCAM1" 
 [79] "CD55"     "MMRN1"    "C7"       "RAMP3"    "VEGFC"    "GJA5"    
 [85] "HEY1"     "RND1"     "BDP1"     "CD46"     "MEOX1"    "CCL19"   
 [91] "MADCAM1"  "CYP1B1"   "IRX3"     "BIRC2"    "LYVE1"    "SEMA3D"  
 [97] "EMCN"     "WFDC1"    "ADGRL4"   "VWA1"     "ECE1"     "PTPRB"   
[103] "CLDN5"    "TBX1"     "SEMA7A"   "FOXF2"    "PDGFB"    "ECSCR"   
[109] "ELK3"     "CDH5"     "PLEC"     "STAB1"    "TGFBR2"   "CD93"    
[115] "CXCL1"    "RGS5"     "SLC7A5"   "ENG"      "KDR"      "SLC2A1"  
[121] "EGFL7"    "FLT1"     "EPAS1"    "EDNRB"    "KCNJ8"    "CD82"    
[127] "CHST1"    "PLAC8"    "TSPAN8"   "ETS1"     "CD34"     "PDPN"    
[133] "PROX1"    "EHD3"     "SRGN"     "S100A10"  "CLIC4"    "USHBP1"  
[139] "MYF6"     "OIT3"     "IL1A"     "BMP2"     "C1QTNF1"  "PCDH12"  
[145] "DPP4"     "IGFBP7"   "PALMD"    "POSTN"    "BMX"      "SLC38A5" 
[151] "XDH"      "SPARC"    "MGLL"     "SLC9A3R2" "RGCC"     "ICAM2"   
[157] "MGP"      "SPARCL1"  "TM4SF1"   "ID1"      "ADIRF"    "CD9"     
[163] "SRPX"     "ID3"      "CAV1"     "GNG11"    "HSPG2"    "CCL14"   
[169] "CLEC1B"   "FCN2"     "S100A13"  "FCN3"     "CRHBP"    "IFI27"   
[175] "CCL23"    "SGK1"     "DNASE1L3" "LIFR"     "PCAT19"   "CDKN1C"  
[181] "INMT"     "PTGDS"    "TIMP3"    "GPM6A"    "FAM167B"  "LTC4S"   
[187] "STAB2"   

$Fibroblasts
  [1] "IL1R1"     "FAP"       "FLI1"      "CELA1"     "LOX"       "PDGFRB"   
  [7] "P4HA1"     "UCP2"      "CCR2"      "ITGAL"     "FGR"       "HCK"      
 [13] "TNFRSF1B"  "PRKCD"     "ENO3"      "ABI3"      "TREML4"    "PIP4K2A"  
 [19] "CD300E"    "SERPINB10" "CTHRC1"    "TBX18"     "COL15A1"   "GJB2"     
 [25] "IL34"      "EDN3"      "SLC6A13"   "VTN"       "ITIH5"     "LUM"      
 [31] "DPT"       "POSTN"     "PENK"      "MMP14"     "COL6A2"    "FABP4"    
 [37] "ASPN"      "ANGPTL2"   "EFEMP1"    "SCARA5"    "IGFBP3"    "COPZ2"    
 [43] "DPEP1"     "ADAMTS5"   "COL5A1"    "CD248"     "PI16"      "PAMR1"    
 [49] "TNXB"      "MMP2"      "COL14A1"   "CLEC3B"    "IGFBP6"    "COL5A2"   
 [55] "FBN1"      "MFAP5"     "FKBP10"    "PALLD"     "WIF1"      "SNHG18"   
 [61] "CDH11"     "PTCH1"     "ARAP1"     "FBLN2"     "IGF1"      "PRRX1"    
 [67] "FKBP7"     "OAF"       "COL6A3"    "CTSK"      "DKK1"      "C1S"      
 [73] "RARRES2"   "GREM1"     "SPON2"     "TCF21"     "PCSK6"     "COL8A1"   
 [79] "ENTPD2"    "CXCL8"     "CXCL3"     "IL6"       "CYP1B1"    "COL13A1"  
 [85] "ADAMTS10"  "CCL11"     "ADAM33"    "COL4A3"    "COL4A4"    "LAMA2"    
 [91] "ACKR3"     "CD55"      "FBLN7"     "FIBIN"     "THBS2"     "NOV"      
 [97] "PTX3"      "MMP3"      "LRRK1"     "HGF"       "FRZB"      "COL12A1"  
[103] "COL7A1"    "MEOX1"     "PRG4"      "PKD2"      "CCL19"     "NNMT"     
[109] "FOXF1"     "HAS1"      "CTGF"      "ERCC1"     "WISP1"     "TWIST2"   
[115] "RIPK3"     "DDR2"      "ELN"       "FN1"       "HHIP"      "FMO2"     
[121] "COL1A2"    "COL3A1"    "VIM"       "FSTL1"     "GSN"       "SPARC"    
[127] "S100A4"    "NT5E"      "COL1A1"    "MGP"       "NOX4"      "THY1"     
[133] "CD40"      "SERPINH1"  "CD44"      "PDGFRA"    "EN1"       "DCN"      
[139] "CEBPB"     "EGR1"      "FOSB"      "FOSL2"     "HIF1A"     "KLF2"     
[145] "KLF4"      "KLF6"      "KLF9"      "NFAT5"     "NFATC1"    "NFKB1"    
[151] "NR4A1"     "NR4A2"     "PBX1"      "RUNX1"     "STAT3"     "TCF4"     
[157] "ZEB2"      "LAMC1"     "MEDAG"     "LAMB1"     "DKK3"      "TBX20"    
[163] "MDK"       "GSTM5"     "NGF"       "VEGFA"     "FGF2"      "P4HTM"    
[169] "CKAP4"     "INMT"      "CXCL14"   

$Macrophages
  [1] "ITGAL"    "ITGAM"    "CD14"     "FUT4"     "FCGR3A"   "CD33"    
  [7] "FCGR1A"   "CD80"     "LILRB4"   "CD86"     "CD163"    "CCR5"    
 [13] "TLR2"     "TLR4"     "ADGRE1"   "GPR34"    "TREM2"    "FABP4"   
 [19] "S100A8"   "CPM"      "CHIT1"    "F13A1"    "CX3CR1"   "CXCL16"  
 [25] "TGFBR1"   "SLAMF9"   "SCIMP"    "LILRA5"   "CD83"     "C3AR1"   
 [31] "STAB1"    "MRC1"     "PARP14"   "FGD2"     "RAB7B"    "RBPJ"    
 [37] "SLCO2B1"  "NAAA"     "MARCH1"   "EGLN3"    "JAML"     "FGL2"    
 [43] "GPNMB"    "CLEC4D"   "ADAM8"    "ARL11"    "MMP12"    "LYVE1"   
 [49] "PLTP"     "VSIG4"    "MS4A4A"   "MS4A6A"   "FPR1"     "CD180"   
 [55] "GDF15"    "RAB20"    "HFE"      "TNF"      "CCR2"     "SNX20"   
 [61] "FMNL1"    "GPR132"   "SLAMF7"   "NCEH1"    "CCL24"    "C5AR1"   
 [67] "CD300A"   "CXCL2"    "CCL7"     "CCL2"     "IL1B"     "IRF5"    
 [73] "AHR"      "MYO1G"    "DUSP5"    "GPR171"   "CCR7"     "DNASE1L3"
 [79] "CXCL1"    "SAMSN1"   "NR4A3"    "CCL22"    "S100A4"   "MMP9"    
 [85] "HILPDA"   "NRP2"     "SLC37A2"  "CTSK"     "CD36"     "LPCAT2"  
 [91] "HPGDS"    "IFNAR2"   "MS4A7"    "SLC11A1"  "HPGD"     "CCL3"    
 [97] "CLEC7A"   "CD5L"     "CCL5"     "CYTH4"    "CD3E"     "CD19"    
[103] "CD74"     "CSF1R"    "LGALS3"   "CD68"     "UCP2"     "TREML4"  
[109] "FGR"      "CYBB"     "CD200"    "CD200R1"  "GATA6"    "ITGAX"   
[115] "PPARG"    "TYROBP"   "RGS1"     "DAB2"     "P2RY6"    "MAF"     
[121] "AIF1"     "CLEC10A"  "ADGRE5"   "SLC15A3"  "CYP27A1"  "SLC7A7"  
[127] "RUNX3"    "SYK"     

$`Mast cells`
  [1] "KIT"        "ENPP3"      "IL2RA"      "FCER2"      "IL17A"     
  [6] "CRH"        "HSD11B1"    "CD274"      "CXCR2"      "CXCR4"     
 [11] "CCR3"       "CCR5"       "LTC4S"      "GPR34"      "MGST2"     
 [16] "ACHE"       "CPA3"       "CPA1"       "CMA1"       "HNMT"      
 [21] "TPSAB1"     "PTGDS"      "CFP"        "RAB27B"     "SLA"       
 [26] "IL1RL1"     "GNAI1"      "CFD"        "OSBPL8"     "ADORA3"    
 [31] "HS3ST1"     "CSF2RB"     "DAPP1"      "SLC6A4"     "MAOB"      
 [36] "CSRNP1"     "CYP11A1"    "RGS1"       "PLEK"       "HDC"       
 [41] "TPSB2"      "TPH1"       "CD55"       "RGS13"      "TPSG1"     
 [46] "SOCS1"      "BTK"        "IL4"        "MS4A2"      "VWA5A"     
 [51] "FCER1A"     "MCEMP1"     "MILR1"      "CCL2"       "CREB3L1"   
 [56] "EDNRA"      "HS6ST2"     "KCNE3"      "MEIS2"      "MRGPRX2"   
 [61] "PLAU"       "ADAMTS9"    "CHST1"      "COBL"       "CTSG"      
 [66] "DNM3"       "FAM198B"    "GRIK2"      "HPGDS"      "HS3ST3A1"  
 [71] "PAPSS2"     "PCBD1"      "PCP4L1"     "RNF128"     "SLC29A1"   
 [76] "SLC45A3"    "STARD13"    "A4GALT"     "ACER3"      "ARMCX3"    
 [81] "ATL2"       "BBS10"      "BMPR2"      "C2"         "CCDC141"   
 [86] "CCL7"       "CDC42BPA"   "CDH9"       "COPZ2"      "ZDHHC13"   
 [91] "WDR60"      "UNC13B"     "TRANK1"     "TIAM2"      "TCF7L1"    
 [96] "STK32B"     "ST6GALNAC3" "SOCS2"      "SPRED1"     "SMPX"      
[101] "SMARCA1"    "SLC7A5"     "SLC4A4"     "SLC31A2"    "SGCE"      
[106] "ROR1"       "RNF180"     "RIOK2"      "RBFOX1"     "RAPGEF2"   
[111] "PDE1C"      "PCDH7"      "OPTN"       "OAF"        "NEO1"      
[116] "NDST2"      "NCEH1"      "MRGPRX1"    "MLPH"       "MITF"      
[121] "MFGE8"      "LSAMP"      "LRRC66"     "LIMA1"      "KRT4"      
[126] "KDELR3"     "IDS"        "HSPA13"     "HS2ST1"     "GPM6A"     
[131] "GP1BA"      "GNAZ"       "FAM84A"     "FAM129B"    "EXT1"      
[136] "ESYT3"      "ENO2"       "ENAH"       "DUSP1"      "DGKI"      
[141] "DDC"        "DCLK3"     

$Monocytes
 [1] "RGS1"     "APOBEC3A" "CD7"      "TET2"     "CD40"     "DYSF"    
 [7] "CMKLR1"   "MEFV"     "HCK"      "FCGR3B"   "PADI4"    "GHSR"    
[13] "ITGAX"    "SELE"     "TLR4"     "AR"       "CXCR4"    "CD86"    
[19] "CCR2"     "TNFRSF14" "ADA2"     "MGMT"     "CD14"     "CD33"    
[25] "ITGAM"    "ACE"      "FUT4"     "ICAM1"    "SELL"     "CD163"   
[31] "FCGR1A"   "FCGR2B"   "ACP5"     "MRC1"     "TNF"      "PLAU"    
[37] "GBP1"     "OAS1"     "IRF7"     "PLSCR1"   "MX1"      "IL1RN"   
[43] "HLA-DRA"  "IFIT1"    "IDO1"     "IFIT3"    "TNFSF10"  "CXCL10"  
[49] "S100A9"   "S100A8"   "S100A4"   "CLEC7A"   "CSF3R"    "MNDA"    
[55] "MS4A6A"   "ZFP36L2"  "LTA4H"    "CLEC12A"  "CD48"     "PRTN3"   
[61] "FCGR3A"   "VCAN"     "IFITM3"   "FN1"      "ADGRE1"   "CD44"    
[67] "CSF1R"    "CX3CR1"   "ITGAL"    "PECAM1"   "PTPRC"    "SPN"     
[73] "PSAP"     "FCN1"     "LYZ"      "RHOC"     "PILRA"    "NFKBIZ"  
[79] "NAAA"     "LY6E"     "LYN"      "MS4A7"    "CEBPB"    "CD68"    
[85] "IFI30"    "S100A12"  "SERPINA1" "RGS2"     "LST1"     "SPI1"    
[91] "TYMP"     "CSTA"     "FGL2"     "PYCARD"   "LYST"     "CCL3"    
[97] "IL1B"     "CFP"      "CD36"    

$Neutrophils
 [1] "MME"      "ITGAM"    "ITGAX"    "CD14"     "FUT4"     "FCGR3A"  
 [7] "PECAM1"   "CD33"     "SELL"     "CEACAM8"  "C5AR1"    "CXCR1"   
[13] "CXCR2"    "JAML"     "TLR2"     "MYLK"     "S100A9"   "MPO"     
[19] "CD24"     "CEACAM1"  "FCGR1A"   "CRP"      "LCN2"     "DEFA1"   
[25] "DEFA3"    "LTF"      "LYZ"      "ELANE"    "S100A8"   "IL1B"    
[31] "CXCL2"    "HP"       "CCL3"     "HDC"      "MMP8"     "LRG1"    
[37] "OSM"      "MMP9"     "PILRA"    "CLEC4D"   "CLEC4E"   "ASPRV1"  
[43] "CCRL2"    "CCR1"     "NCF1"     "TREM1"    "SORL1"    "ARG2"    
[49] "BST1"     "IL1R2"    "CFP"      "ADAM8"    "CD177"    "PTGS2"   
[55] "OAS3"     "PRTN3"    "AZU1"     "CTSG"     "SERPINB1" "CAMP"    
[61] "SLC1A5"   "SNX20"    "ADPGK"    "PSTPIP1"  "LYST"     "DOCK8"   
[67] "S100A4"   "NLRP3"    "CSF3R"   

$`NK cells`
 [1] "ITGAM"    "ITGAX"    "FCGR3A"   "CD69"     "KLRD1"    "IL2RB"   
 [7] "KLRB1"    "CD244"    "KLRK1"    "SLAMF7"   "SIGLEC7"  "NCR1"    
[13] "SLAMF6"   "KIT"      "CD27"     "KLRC1"    "KLRF1"    "GNLY"    
[19] "NKG7"     "IL32"     "GZMH"     "FGFBP2"   "GZMM"     "CTSW"    
[25] "HMBOX1"   "AHR"      "PRF1"     "CCL4"     "SEMA6D"   "FHL2"    
[31] "CD2"      "CD7"      "CD3G"     "CD33"     "DPP4"     "LAT"     
[37] "PCDH15"   "CCL5"     "GZMA"     "GZMB"     "CMA1"     "DUSP2"   
[43] "TXK"      "DOK2"     "CST7"     "HSD11B1"  "KLRC2"    "SAMD3"   
[49] "TBX21"    "CHSY1"    "XCL2"     "TRDC"     "CXCR4"    "IL18R1"  
[55] "SERPINB9" "DOCK2"    "SH2D2A"   "S100A4"   "XCL1"     "GZMK"    
[61] "IFNG"     "CCL3"     "CSF2"     "IL2RG"    "TGFB1"    "KIR2DL1" 
[67] "KIR3DL1"  "LILRB1"   "KLRG1"    "NCR3"     "ADAMTS14" "ITGA2"   
[73] "STYK1"    "CLEC2D"   "CD247"    "ZBTB16"   "SPON2"    "LAIR2"   
[79] "HOPX"     "CD8A"    

$Pericytes
 [1] "PECAM1"   "PDGFRB"   "CSPG4"    "ANPEP"    "ACTA2"    "DES"     
 [7] "RGS5"     "ABCC9"    "KCNJ8"    "CD248"    "DLK1"     "TEK"     
[13] "NOTCH3"   "GLI1"     "ICAM1"    "ADM"      "ANGPT1"   "VEGFA"   
[19] "ZIC1"     "FOXC1"    "POSTN"    "COX4I2"   "HIGD1B"   "PDZD2"   
[25] "HSD11B1"  "MCAM"     "MXRA8"    "PDE5A"    "NR1H3"    "SERPING1"
[31] "EMID1"    "ECM1"     "COLEC11"  "RARRES2"  "REM1"     "ASPN"    
[37] "CYGB"     "FABP4"    "VTN"      "STEAP4"   "NDUFA4L2" "SLC38A11"
[43] "ATP13A5"  "AOC3"     "ANGPT2"   "INPP4B"   "GPIHBP1"  "VIM"     
[49] "PTH1R"    "IFITM1"   "TBX18"    "NT5E"     "MFGE8"    "ALPL"    
[55] "COL1A1"   "MYO1B"    "COG7"     "P2RY14"   "HEYL"     "GNB4"    
[61] "MSX1"     "CTGF"    

$`T cells`
 [1] "CD3D"     "CD3G"     "CD3E"     "HOPX"     "CCL3"     "CCL4"    
 [7] "GIMAP2"   "SYT3"     "NOTCH3"   "SEMA6D"   "DKK3"     "IFIT3"   
[13] "CERK"     "PMCH"     "CD4"      "LTB"      "CXCR6"    "IL7R"    
[19] "SATB1"    "LEF1"     "ITK"      "TRBC2"    "PTPRCAP"  "GEM"     
[25] "CD7"      "MAFF"     "TGIF1"    "RORA"     "TNFAIP3"  "CREM"    
[31] "PXDC1"    "NABP1"    "FAM110A"  "EEF1B2P3" "TRAC"     "CD69"    
[37] "PFN1P1"   "IL32"     "CXCR4"    "SEPT1"    "BCL2"     "CYTL1"   
[43] "CD2"      "CTSW"     "PTPN22"   "TXK"      "GDPD3"    "TRAF1"   
[49] "IL2RA"    "CD8B"     "BATF3"    "GZMH"     "LAG3"     "GZMK"    
[55] "GZMB"     "SH2D1A"   "MYO1G"    "FMNL1"    "S1PR4"    "CD247"   
[61] "GIMAP5"   "CD28"     "CD160"    "TRDC"     "RHOH"     "KLRB1"   
[67] "CCR2"     "IL2RB"    "CD163L1"  "MBD2"     "ICOS"     "IL18R1"  
[73] "TNFRSF4"  "CCL20"    "CLEC2D"   "CD8A"     "CD6"      "S100A4"  
[79] "CCL5"     "LCK"      "CD81"     "THY1"     "LAT"      "SKAP1"   
[85] "TCF7"     "CCR7"     "MYB"      "CCL4L2"   "PYHIN1"   "GZMA"    
[91] "JUNB"     "DUSP2"    "IFNG"     "CD52"     "BRAF"    
syn_sce_tidy_hvg_cms <- RunMCA(syn_sce_tidy_hvg_cms, slot = "reconstructed")
Computing Fuzzy Matrix
172.05 sec elapsed
Computing SVD
198.782 sec elapsed
Computing Coordinates
33.88 sec elapsed
plotReducedDim(syn_sce_tidy_hvg_cms, "MCA", colour_by = "Sample")

Version Author Date
7d99571 Reto Gerber 2022-03-21
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
HGT_all_gs <- RunCellHGT(syn_sce_tidy_hvg_cms, pathways = pancreas_gs, minSize = 5)

calculating distance
ranking genes
calculating number of success
performing hypergeometric test
# apply(HGT_all_gs,2,function(i)i[i>2])

all_gs_prediction <- rownames(HGT_all_gs)[apply(HGT_all_gs, 2, which.max)]
all_gs_prediction_signif <- all_gs_prediction
all_gs_prediction_signif <- ifelse(apply(HGT_all_gs, 2, max)>2, yes = all_gs_prediction, "unassigned")

syn_sce_tidy_hvg_cms$main_celltype_cellid <- all_gs_prediction_signif
table(syn_sce_tidy_hvg_cms$main_celltype_cellid, syn_sce_tidy_hvg_cms$Sample)
                   
                    Syn_Bio_023 Syn_Bio_026 Syn_Bio_028 Syn_Bio_049 Syn_Bio_050
  B cells                    91           5           1          14         144
  Dendritic cells            99          94         124          11         242
  Endothelial cells         513         407         148          93         332
  Fibroblasts              1989        5995        2784          89         354
  Macrophages               819         398        1300         102         431
  Mast cells                 18          82           3           1           8
  Monocytes                 308          17          51          11         205
  Neutrophils                20           1           1           1          11
  NK cells                  519         162          70          95         198
  Pericytes                  27          51          25           6          14
  T cells                  1197         297          89          84        1075
  unassigned                232         334         146          17         137
                   
                    Syn_Bio_053 Syn_Bio_054A Syn_Bio_062 Syn_Bio_064
  B cells                    46           40           3           2
  Dendritic cells           104           40         156         250
  Endothelial cells          29           99         262          92
  Fibroblasts               340          616        1166         476
  Macrophages              1457           89        2019        2676
  Mast cells                 18           14           6           6
  Monocytes                 121          165          30         229
  Neutrophils                 7           46           2           5
  NK cells                  481          119          42         125
  Pericytes                  14           61          25          12
  T cells                  1099          474          62         315
  unassigned                159          197         272          87
                   
                    Syn_Bio_074 Syn_Bio_077a Syn_Bio_077b Syn_Bio_078
  B cells                    97          117           78          29
  Dendritic cells           153          461          372           3
  Endothelial cells         329          150          197         437
  Fibroblasts               481          559          885         190
  Macrophages              1239         2165         2138           1
  Mast cells                 47           12           13           3
  Monocytes                  97          529          597          64
  Neutrophils                10           13           28          62
  NK cells                  203          325          328         185
  Pericytes                  23           10           15         148
  T cells                   529          610          482         313
  unassigned                194          162          109          11
                   
                    Syn_Bio_079 Syn_Bio_081 Syn_Bio_083 Syn_Bio_084 Syn_Bio_087
  B cells                   233          83         375           6          77
  Dendritic cells           175         155         164         159          64
  Endothelial cells         401         290         162         138         524
  Fibroblasts               825        1980         247         530         281
  Macrophages              1295        1137         817        1455         278
  Mast cells                 27          11           9           8           9
  Monocytes                 130         124         397         402          54
  Neutrophils                 8          24          68          16          13
  NK cells                 1333         457         506         165         195
  Pericytes                  33          34          21          11          33
  T cells                  1349         739        1361         866         948
  unassigned                223         163         132         205          62
                   
                    Syn_Bio_091 Syn_Bio_092 Syn_Bio_093 Syn_Bio_096
  B cells                    29           3           3         123
  Dendritic cells           153         193         241         145
  Endothelial cells         458         802         988         689
  Fibroblasts               798        1174         528        1846
  Macrophages              1065        2052        1882         600
  Mast cells                  3          11           1          23
  Monocytes                 215         169         161         128
  Neutrophils                27           6          10           7
  NK cells                  367          83         156         297
  Pericytes                  25          32          89         180
  T cells                  1146         212         211         736
  unassigned                 90         450         175         202
                   
                    Syn_Bio_098a Syn_Bio_098b Syn_Bio_099
  B cells                    111           31           8
  Dendritic cells            199           47          94
  Endothelial cells          464          556         867
  Fibroblasts                240         1556        1371
  Macrophages                417          155         436
  Mast cells                   8           63           7
  Monocytes                  129          497         118
  Neutrophils                 10          344           5
  NK cells                   253          211          42
  Pericytes                    3           94          95
  T cells                   1112          462          78
  unassigned                  39           76         139
table(syn_sce_tidy_hvg_cms$main_celltype_cellid)

          B cells   Dendritic cells Endothelial cells       Fibroblasts 
             1749              3898              9427             27300 
      Macrophages        Mast cells         Monocytes       Neutrophils 
            26423               411              4948               745 
         NK cells         Pericytes           T cells        unassigned 
             6917              1081             15846              4013 
table(syn_sce_tidy_hvg_cms$main_celltype_cellid)/length(syn_sce_tidy_hvg_cms$main_celltype_cellid)

          B cells   Dendritic cells Endothelial cells       Fibroblasts 
      0.017020573       0.037933786       0.091739816       0.265672746 
      Macrophages        Mast cells         Monocytes       Neutrophils 
      0.257138130       0.003999689       0.048151969       0.007250044 
         NK cells         Pericytes           T cells        unassigned 
      0.067313494       0.010519862       0.154206972       0.039052920 
# colData(syn_sce_tidy_hvg_cms)
n_sam <- length(unique(syn_sce_tidy_hvg_cms$main_celltype_cellid))
splitind <- split(seq_len(n_sam),ceiling(seq(0.01,3.99,length.out = n_sam)))
colind <- unlist(purrr::map(seq_len(ceiling(n_sam/4)),
                            ~ purrr::map(seq_len(4),
                                         function(i)splitind[[i]][.x])))
colind <- colind[!is.na(colind)]
colors_used <- rainbow(n_sam)[colind]
cat("### Dimred plots celltype {.tabset}\n\n")

Dimred plots celltype

cat("#### corrected PCA\n\n")

corrected PCA

plotReducedDim(syn_sce_tidy_hvg_cms, "corrected", colour_by = "main_celltype_cellid") + 
  scale_color_manual(values = colors_used)
Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
if(use_vst){
  cat("\n\n#### uncorrected PCA vst\n\n")
  plotReducedDim(syn_sce_tidy_hvg_cms, "PCA_vst", colour_by = "main_celltype_cellid") + 
  scale_color_manual(values = colors_used)
}

uncorrected PCA vst

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n#### corrected UMAP\n\n")

corrected UMAP

plotReducedDim(syn_sce_tidy_hvg_cms, "UMAP_corrected", colour_by = "main_celltype_cellid") + 
  scale_color_manual(values = colors_used)
Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
if(use_vst){
  cat("\n\n#### uncorrected UMAP vst\n\n")
  plotReducedDim(syn_sce_tidy_hvg_cms,"UMAP_vst", colour_by = "main_celltype_cellid") + 
  scale_color_manual(values = colors_used)
}

uncorrected UMAP vst

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n#### corrected dimred UMAP\n\n")

corrected dimred UMAP

plotReducedDim(syn_sce_tidy_hvg_cms, "UMAP_corrected_reduced", colour_by = "main_celltype_cellid") + 
  scale_color_manual(values = colors_used)
Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n### {-}")

Clustering

Main cell types

Graph based clustering.

set.seed(100)
bpstart(bpparam)
kgraph_clusters <- clusterRows(reducedDim(syn_sce_tidy_hvg_cms, "corrected"),
    TwoStepParam(
        first=KmeansParam(centers=30000, nstart = 5, iter.max = 100),
        second=NNGraphParam(k=4, cluster.fun="louvain", 
                            BPPARAM=bpparam)
    )
)
bpstop(bpparam)

table(kgraph_clusters)
kgraph_clusters
    1     2     3     4     5     6     7     8     9    10    11    12    13 
  367   306   426  9762   128 14727   710  2717  1061   304  1461  9267  2792 
   14    15    16    17    18    19    20    21    22 
 9806  7928   170   767 10557  1726  6316  9187 12273 
colData(syn_sce_tidy_hvg_cms)$kgraph_clusters <- factor(kgraph_clusters)
colLabels(syn_sce_tidy_hvg_cms) <- factor(kgraph_clusters)
n_sam <- length(unique(syn_sce_tidy_hvg_cms$kgraph_clusters))
splitind <- split(seq_len(n_sam),ceiling(seq(0.01,3.99,length.out = n_sam)))
colind <- unlist(purrr::map(seq_len(ceiling(n_sam/4)),
                            ~ purrr::map(seq_len(4),
                                         function(i)splitind[[i]][.x])))
colind <- colind[!is.na(colind)]
colors_used <- rainbow(n_sam)[colind]
cat("### Dimred plots clustering {.tabset}\n\n")

Dimred plots clustering

cat("#### corrected PCA\n\n")

corrected PCA

plotReducedDim(syn_sce_tidy_hvg_cms, "corrected", colour_by = "kgraph_clusters",text_by="kgraph_clusters") + 
  scale_color_manual(values = colors_used)
Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
a18fb61 retogerber 2021-05-26
a301681 retogerber 2021-05-19
1d92bf1 retogerber 2021-05-03
if(use_vst){
  cat("\n\n#### uncorrected PCA vst\n\n")
  plotReducedDim(syn_sce_tidy_hvg_cms, "PCA_vst", colour_by = "kgraph_clusters",text_by="kgraph_clusters") + 
  scale_color_manual(values = colors_used)
}

uncorrected PCA vst

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n#### corrected UMAP\n\n")

corrected UMAP

plotReducedDim(syn_sce_tidy_hvg_cms, "UMAP_corrected", colour_by = "kgraph_clusters",text_by="kgraph_clusters") + 
  scale_color_manual(values = colors_used)
Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
if(use_vst){
  cat("\n\n#### uncorrected UMAP vst\n\n")
  plotReducedDim(syn_sce_tidy_hvg_cms,"UMAP_vst", colour_by = "kgraph_clusters",text_by="kgraph_clusters") + 
  scale_color_manual(values = colors_used)
}

uncorrected UMAP vst

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n#### corrected dimred UMAP\n\n")

corrected dimred UMAP

plotReducedDim(syn_sce_tidy_hvg_cms, "UMAP_corrected_reduced", colour_by = "kgraph_clusters",text_by="kgraph_clusters") + 
  scale_color_manual(values = colors_used)
Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
cat("\n\n### {-}")

tmpfilename <- paste0("syn_v",analysis_version,"_sce_hvg_cms_doublet",dplyr::if_else(remove_low_quality_samples, "_invivo",""),".rds")
saveRDS(syn_sce_tidy_hvg_cms, file = here::here("output",tmpfilename))
tryCatch({
  tab <- table(CellID=colData(syn_sce_tidy_hvg_cms)$main_celltype_cellid,
               Cluster=colData(syn_sce_tidy_hvg_cms)$kgraph_clusters)
  
  pheatmap::pheatmap(log2(tab+10), color=colorRampPalette(c("white", "blue"))(101))
  main_celltype_names <- c("tc","ec","sf","mp")
  main_celltype_names_regex <- c("T cells|NK cells", "Endothelial cells","Fibroblasts","Macrophages|Monocytes|Dendritic cells")
  
  cluster_names_prop <- purrr::map(seq_along(main_celltype_names_regex), function(i){
    tab_ind <- stringr::str_detect(rownames(tab),main_celltype_names_regex[i])
    rownames(tab)[tab_ind]
    freq <- colSums(tab[tab_ind,,drop=FALSE])
    freq <- freq/sum(freq)
    names(freq)[freq>0.01]
  })
  names(cluster_names_prop) <- main_celltype_names
  print(cluster_names_prop)
  
  cluster_names_max <- rownames(tab)[apply(tab,2,which.max)] %>% 
    setNames(colnames(tab))
  cluster_names_max <- purrr::map(seq_along(main_celltype_names_regex), function(i){
    names(cluster_names_max)[stringr::str_detect(cluster_names_max,main_celltype_names_regex[i])]
  })
  names(cluster_names_max) <- main_celltype_names
  print(cluster_names_max)
  
  tmpfilename <- paste0("syn_v",analysis_version,"_cluster_cellid_match",dplyr::if_else(remove_low_quality_samples, "_invivo",""),".rds")
  saveRDS(list(cluster_names_prop=cluster_names_prop,cluster_names_max=cluster_names_max), file = here::here("output",tmpfilename))
},
error=function(e) e
)

Version Author Date
9133ed1 Reto Gerber 2022-03-04
222b0d1 Reto Gerber 2021-07-29
$tc
[1] "4"  "9"  "13" "21"

$ec
[1] "5"  "12"

$sf
[1] "6"  "11" "15" "20"

$mp
[1] "8"  "14" "18" "22"

$tc
[1] "4"  "9"  "13" "21"

$ec
[1] "5"  "12"

$sf
[1] "6"  "11" "15" "20"

$mp
[1] "7"  "8"  "14" "16" "18" "22"

4th Part: Annotation


sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] gdtools_0.2.3                  BiocParallel_1.24.1           
 [3] CellID_0.1.0                   SeuratObject_4.0.0            
 [5] Seurat_4.0.0                   bluster_1.0.0                 
 [7] tidySingleCellExperiment_1.0.0 scuttle_1.0.4                 
 [9] igraph_1.2.6                   scran_1.18.7                  
[11] scater_1.18.6                  SingleCellExperiment_1.12.0   
[13] SummarizedExperiment_1.20.0    Biobase_2.50.0                
[15] GenomicRanges_1.42.0           GenomeInfoDb_1.26.7           
[17] IRanges_2.24.1                 S4Vectors_0.28.1              
[19] BiocGenerics_0.36.1            MatrixGenerics_1.2.1          
[21] matrixStats_0.58.0             stringr_1.4.0                 
[23] purrr_0.3.4                    ggplot2_3.3.3                 
[25] dplyr_1.0.4                    workflowr_1.6.2               

loaded via a namespace (and not attached):
  [1] scattermore_0.7           ggthemes_4.2.4           
  [3] intrinsicDimension_1.2.0  tidyr_1.1.2              
  [5] knitr_1.31                irlba_2.3.3              
  [7] DelayedArray_0.16.3       data.table_1.13.6        
  [9] rpart_4.1-15              RCurl_1.98-1.2           
 [11] generics_0.1.0            RhpcBLASctl_0.20-137     
 [13] cowplot_1.1.1             RANN_2.6.1               
 [15] proxy_0.4-24              future_1.21.0            
 [17] spatstat.data_2.0-0       httpuv_1.5.5             
 [19] assertthat_0.2.1          viridis_0.5.1            
 [21] xfun_0.21                 hms_1.0.0                
 [23] evaluate_0.14             promises_1.2.0.1         
 [25] DEoptimR_1.0-8            fansi_0.4.2              
 [27] readxl_1.3.1              DBI_1.1.1                
 [29] htmlwidgets_1.5.3         kSamples_1.2-9           
 [31] ellipsis_0.3.1            RSpectra_0.16-0          
 [33] deldir_0.2-10             sparseMatrixStats_1.2.1  
 [35] vctrs_0.3.6               here_1.0.1               
 [37] TTR_0.24.2                ROCR_1.0-11              
 [39] abind_1.4-5               CellMixS_1.6.1           
 [41] RcppEigen_0.3.3.9.1       withr_2.4.1              
 [43] batchelor_1.6.3           robustbase_0.93-7        
 [45] vcd_1.4-8                 sctransform_0.3.2.9008   
 [47] xts_0.12.1                goftest_1.2-2            
 [49] svglite_1.2.3.2           cluster_2.1.1            
 [51] lazyeval_0.2.2            laeken_0.5.1             
 [53] crayon_1.4.1              SuppDists_1.1-9.5        
 [55] labeling_0.4.2            edgeR_3.32.1             
 [57] pkgconfig_2.0.3           nlme_3.1-152             
 [59] vipor_0.4.5               nnet_7.3-15              
 [61] rlang_0.4.10              globals_0.14.0           
 [63] lifecycle_1.0.0           miniUI_0.1.1.1           
 [65] rsvd_1.0.3                cellranger_1.1.0         
 [67] rprojroot_2.0.2           polyclip_1.10-0          
 [69] RcppHNSW_0.3.0            lmtest_0.9-38            
 [71] Matrix_1.3-2              carData_3.0-4            
 [73] boot_1.3-27               zoo_1.8-8                
 [75] beeswarm_0.2.3            pheatmap_1.0.12          
 [77] whisker_0.4               ggridges_0.5.3           
 [79] png_0.1-7                 viridisLite_0.3.0        
 [81] bitops_1.0-6              KernSmooth_2.23-18       
 [83] DelayedMatrixStats_1.12.3 parallelly_1.23.0        
 [85] readr_1.4.0               beachmat_2.6.4           
 [87] scales_1.1.1              magrittr_2.0.1           
 [89] plyr_1.8.6                hexbin_1.28.2            
 [91] ica_1.0-2                 zlibbioc_1.36.0          
 [93] compiler_4.0.3            dqrng_0.2.1              
 [95] RColorBrewer_1.1-2        pcaMethods_1.82.0        
 [97] fitdistrplus_1.1-3        cli_2.3.0                
 [99] XVector_0.30.0            listenv_0.8.0            
[101] ps_1.5.0                  patchwork_1.1.1          
[103] pbapply_1.4-3             ggplot.multistats_1.0.0  
[105] MASS_7.3-53.1             mgcv_1.8-34              
[107] tidyselect_1.1.0          stringi_1.5.3            
[109] forcats_0.5.1             highr_0.8                
[111] yaml_2.2.1                BiocSingular_1.6.0       
[113] askpass_1.1               locfit_1.5-9.4           
[115] ggrepel_0.9.1             grid_4.0.3               
[117] fastmatch_1.1-0           tools_4.0.3              
[119] future.apply_1.7.0        rio_0.5.16               
[121] rstudioapi_0.13           foreign_0.8-81           
[123] git2r_0.28.0              yaImpute_1.0-32          
[125] gridExtra_2.3             smoother_1.1             
[127] farver_2.0.3              scatterplot3d_0.3-41     
[129] Rtsne_0.15                digest_0.6.27            
[131] shiny_1.6.0               Rcpp_1.0.6               
[133] car_3.0-10                later_1.1.0.1            
[135] RcppAnnoy_0.0.18          httr_1.4.2               
[137] colorspace_2.0-0          fs_1.5.0                 
[139] tensor_1.5                ranger_0.12.1            
[141] reticulate_1.18           umap_0.2.7.0             
[143] splines_4.0.3             uwot_0.1.10              
[145] statmod_1.4.35            spatstat.utils_2.0-0     
[147] sp_1.4-5                  systemfonts_1.0.1        
[149] plotly_4.9.3              xtable_1.8-4             
[151] jsonlite_1.7.2            spatstat_1.64-1          
[153] destiny_3.4.0             R6_2.5.0                 
[155] pillar_1.4.7              htmltools_0.5.1.1        
[157] mime_0.10                 tictoc_1.0               
[159] glue_1.4.2                fastmap_1.1.0            
[161] VIM_6.1.0                 BiocNeighbors_1.8.2      
[163] class_7.3-18              codetools_0.2-18         
[165] fgsea_1.16.0              lattice_0.20-41          
[167] tibble_3.0.6              ResidualMatrix_1.0.0     
[169] curl_4.3                  ggbeeswarm_0.6.0         
[171] leiden_0.3.7              zip_2.1.1                
[173] openxlsx_4.2.3            openssl_1.4.3            
[175] survival_3.2-7            limma_3.46.0             
[177] rmarkdown_2.6             munsell_0.5.0            
[179] e1071_1.7-4               GenomeInfoDbData_1.2.4   
[181] haven_2.3.1               reshape2_1.4.4           
[183] gtable_0.3.0