Single-cell RNA sequencing reveals spatiotemporal heterogeneity and malignant progression in pancreatic neuroendocrine tumor

Aims: Using Single-cell RNA sequencing (scRNA-seq), we explored the spatiotemporal heterogeneity of pancreatic neuroendocrine tumors (pNETs) and the underlying mechanism for malignant progression. Methods: scRNA-seq was conducted on three tumor tissues (two primary tissues from different sites, one liver metastatic lesion), one normal liver tissue, and peripheral blood mononuclear cells from one patient with a metastatic G2 pNET, followed by bioinformatics analysis and validation in a pNETs cohort. Results: The transcriptome data of 24.544 cells were obtained. We identified subpopulations of functional heterogeneity within malignant cells, immune cells, and fibroblasts. There were intra- and inter-heterogeneities of cell subpopulations for malignant cells, macrophages, T cells, and fibroblasts among all tumor sites. Cell trajectory analysis revealed several hallmarks of carcinogenesis, including the hypoxia pathway, metabolism reprogramming, and aggressive proliferation, which were activated at different stages of tumor progression. Evolutionary analysis based on mitochondrial mutations defined two dominant clones with metastatic capacity. Finally, we developed a gene signature (PCSK1 and SMOC1) defining the metastatic potential of the tumor and its prognostic value was validated in a cohort of thirty G1/G2 patients underwent surgical resection. Conclusions: Our scRNA-seq analysis revealed intra- and intertumor heterogeneities in cell populations, transcriptional states, and intercellular communications among primary and metastatic lesions of pNETs. The single-cell level characterization of the spatiotemporal dynamics of malignant cell progression provided new insights into the search for potential novel prognostic biomarkers of pNETs.

Highly variable genes were identified using the 'FindVariableGenes' function with parameters for x.low.cutoff=0.0125, x.high.cutoff=6, and y.cutoff=0.5. PCA was constructed based on the scaled data with the top 2,000 highly variable genes and top 10 principals were used for tSNE construction and UMAP construction.
For subclustering, clusters were selected and PCA was constructed based on the scaled data with the top 2,000 highly variable genes and top 10 principals were used for tSNE and UMAP construction. Utilizing the graph-based cluster method, we acquired unsupervised cell cluster results based on the PCA's top 10 principal components and we calculated the marker genes by the FindAllMarkers function with the Wilcoxon rank-sum test algorithm under the following criteria:1. lnFC > 0.25; 2.
pvalue<0.05; 3. min.pct>0.1. In order to identify the cell type, clusters of same cell type were selected for re-tSNE analysis, graph-based clustering, and marker analysis.

CNV estimation and selection of malignant cells
Initial CNVs (CNV0) were estimated using the inferCNV R package as described in previous studies. 1,2 Then, the extent of the CNV signal, which was defined as the mean of squares of CNV0 values across the genome, was calculated. Each cell was scored for the extent of the CNV signal and for the correlation between the CNV0 profile of each cell with the average CNV0 profile of all cells from the corresponding tumor. Subsequently, putative malignant cells were defined as those with a CNV signal above 0.05 and CNV correlation above 0.5, as previously described. 3 Pseudo-time analysis To assess transcription factor regulation strength, we applied the single-cell regulatory network inference and clustering (pySCENIC, v0.9.5) workflow, 4 using the 20thousand motifs database for RcisTarget and GRNboost.

Cluster gene enrichment analysis
To characterize the relative activation of a given gene set, such as pathway activation, "Angiogenesis," and "Fatty Acid Metabolism" as described before, we performed

Coregulated gene analysis
To discover the gene coregulation network, the find_gene_modules function of monocle3 was used with the default parameters.

GO analysis
Gene ontology (GO) analysis was performed to identify the biological implications of marker genes and differentially expressed genes. We downloaded the GO annotations from NCBI (http://www.ncbi.nlm.nih.gov/), UniProt (http://www.uniprot.org/), and the Gene Ontology (http://www.geneontology.org/). Fisher's exact test was applied to identify the significant GO categories and FDR was used to correct the P-values.

Pathway analysis
Pathway analysis was used to find the significant pathways of the marker genes and differentially expressed genes according to the KEGG database. We selected the significant pathways using Fisher's exact test, and the threshold of significance was defined by the P-value and FDR.

Lineage tracing based on mitochondrial mutation
For 10x data, lineage tracing based on mitochondrial mutation was performed following Zhang et al. 6 Briefly, we calculated the alternative allele frequency and the coverage of each position in the mitochondrial chromosome with the help of VarTrix To qualitative the IHC results, the evaluation of IHC staining was reference to the contents of our previous publication. 7 Briefly, IHC score was estimated by using the following formula: IHC score = staining intensity (0, no staining; 1, light brown; 2, brown; 3, dark brown) × proportion of positively stained cells (0, none; 1, < 25%; 2, 25-50%; 3, 50-75%; and 4, > 75%). All samples were scored by two independent observers in a blinded manner, and a final score ≥6 was considered as a positive expression of the corresponding protein.
Search strategy for retrieving cancer stem cell markers of pNETs

Statistical Analysis
Survival curves were compared using the log-rank test, and Kaplan-Meier survival curves were plotted. Survival was calculated from surgical resection to death or last follow-up. Correlation analyses were performed using the spearman correlation. Oneway analysis of variance (ANOVA) was used to compare continuous variables of multiple groups, while means between two groups were compared by using t-test or Wilcoxon rank sum test depending on the normality test result. All P values were two sided and P < 0.05 was considered to indicate a statistically significant difference.