Single-cell transcriptomics reveals heterogeneous progression and EGFR activation in pancreatic adenosquamous carcinoma

Pancreatic adenosquamous carcinoma (PASC) — a rare pathological pancreatic cancer (PC) type — has a poor prognosis due to high malignancy. To examine the heterogeneity of PASC, we performed single-cell RNA sequencing (scRNA-seq) profiling with sample tissues from a healthy donor pancreas, an intraductal papillary mucinous neoplasm, and a patient with PASC. Of 9,887 individual cells, ten cell subpopulations were identified, including myeloid, immune, ductal, fibroblast, acinar, stellate, endothelial, and cancer cells. Cancer cells were divided into five clusters. Notably, cluster 1 exhibited stem-like phenotypes expressing UBE2C, ASPM, and TOP2A. We found that S100A2 is a potential biomarker for cancer cells. LGALS1, NPM1, RACK1, and PERP were upregulated from ductal to cancer cells. Furthermore, the copy number variations in ductal and cancer cells were greater than in the reference cells. The expression of EREG, FCGR2A, CCL4L2, and CTSC increased in myeloid cells from the normal pancreas to PASC. The gene sets expressed by cancer-associated fibroblasts were enriched in the immunosuppressive pathways. We demonstrate that EGFR-associated ligand-receptor pairs are activated in ductal-stromal cell communications. Hence, this study revealed the heterogeneous variations of ductal and stromal cells, defined cancer-associated signaling pathways, and deciphered intercellular interactions following PASC progression.


Introduction
Pancreatic cancer (PC) is the most lethal gastrointestinal cancer in humans, with a five-year survival rate of 7% [1]. Pancreatic adenosquamous carcinoma (PASC) is a mixed pancreatic malignant tumor comprising adenocarcinoma and squamous cell carcinoma in histology, accounting for 1%-4.4% of PCs [2,3]. Notably, PASC is highly aggressive and its prognosis is poor, with a two-year survival rate of 13% [2]. Due to the low incidence, the tumorigenesis, progression, and molecular characteristics of PASC remains unclear and thus, there is no molecular target therapy for PASC.
The identification of molecular biomarkers and malignancy-driving signaling pathways are important to better understand the regulatory mechanisms of tumor progression and to develop potential therapeutic targets for PASC. Genomic and transcriptomic studies have discovered the mutations of TP53 and KRAS2 and the loss of p16 protein in PASC [4,5]. Currently, single-cell RNA sequencing Ivyspring International Publisher (scRNA-seq) aids in the profiling and clustering of multiple cell types in tumor tissues. Thus, tumor heterogeneity can be delineated to identify new biomarkers and targets, especially for rare cancer types. A previous study revealed progressive T cell depletion and stromal myofibroblast activation in pancreatic neoplasms during the progression of intraductal papillary mucinous neoplasm (IPMN) to pancreatic ductal adenocarcinoma (PDAC) [6]. Another study profiled pancreatic malignant cells, revealing intratumoral heterogeneity and critical signaling pathways related to PC improvement [7]. However, the heterogeneity of PASC and cell evolution from the healthy pancreas to precancerous lesions and then to PASC remain unclarified.
Here, we performed scRNA-seq in 9,887 cells from the pancreas of a normal individual and tumor tissues of two patients with IPMN and PASC, respectively. Since IPMN is one of the precancerous lesions of PC, we could dissect heterogeneous alterations of ductal and stromal cells along with PASC formation. We identified diverse cell types, including cancer cells, ductal cells, immune cells, myeloid cells, and fibroblasts, based on their unique gene expression patterns. We further discovered the evolutionary features of ductal cells from normal to malignant stages. Finally, we identified several novel genes and signaling pathways that possibly drive PASC progression within ductal and stromal cells. Consequently, our findings may contribute to a thorough understanding of early PASC tumorigenesis events at the single-cell level and provide novel biomarkers and therapeutic targets for PASC.

Patients and fresh tissues
Between March and May 2020, we harvested one normal pancreas, one IPMN, and one PASC sample from the Beijing Chaoyang Hospital. The normal pancreas was from a donor after cardiac death, who donated his liver for transplantation at our hospital. The patient did not suffer from intemperance or chronic pancreatitis. During surgery, we resected the main pancreatic duct along the parenchyma to obtain a large number of pancreatic ductal cells. The patients with IPMN and PASC were diagnosed with pathological findings. The patients' clinical details are presented in Table S1. All patients or donor relatives signed written informed consents. The study was approved by the Ethics Committee (2020-S-274 and 2020-S-302) of the Beijing Chaoyang Hospital.

Tissue dissociation
Three fresh specimens were stored in GEXSCOPE tissue preservation solution (Singleron Biotechnologies, Nanjing, China), washed three times with Hanks balanced salt solution, and minced into 1-2 mm fragments with a sterile surgical scalpel. Then, the crushed tissue was digested at 37 °C for 15 min in a 15-mL centrifuge tube containing 2 mL GEXSCOPE solution with sustained agitation. The slurry was filtered through a sterile 40 μm cell strainer and centrifuged at 1,000 rpm for 5 min. After discarding the supernatant, the sediment was resuspended in 1 mL PBS (HyClone, Logan, UT, USA). Then, each sample was treated with 2 mL GEXSCOPE red blood cell lysis buffer (Singleron Biotechnologies) for 10 min at 25 °C to remove the red blood cells. The solution was then centrifuged at 500 × g for 5 min and suspended in PBS. Finally, single cells were stained with trypan blue (Sigma, MO, USA) and microscopically evaluated for cell viability.

Single-cell RNA sequencing
We diluted the single-cell suspensions to 1×10 5 cells/mL with PBS (HyClone) and loaded them onto microfluidic devices. Next, we used the GEXSCOPE Single-Cell RNA Library Kit (Singleron Biotechnologies) to construct the scRNA-seq libraries [8]. Individual libraries were diluted to 4 nM, pooled, and sequenced on an Illumina HiSeq X platform with 150 bp paired-end reads.

The quantification and statistical analysis for single-cell RNA sequencing
Raw reads were processed to generate gene expression profiles using an internal pipeline ( Figure  S1). Briefly, after filtering the reads, reads without poly T tails, cell barcodes, and unique molecular identifiers (UMIs) were extracted. Adapters and poly-A tails were trimmed using fastp V1, and the reads were aligned to the GRCh38 human genome assembly and ENSEMBL version 92 gene annotation using STAR 2.5.3a and featureCounts 1.6.2 [9]. We grouped the reads with the same cell barcode and then calculated the UMIs per gene per cell.
Violin plots were used to show the number of UMIs, number of genes, and relative mitochondrial and ribosomal transcript abundance. The results suggested that dead cell reads were comparable among samples. Transcript batch effects induced by apoptosis were not observed ( Figure S2). Finally, we screened out 9,887 cells that contained more than 300 genes and less than 10% mitochondrial genes for further bioinformatics analysis. Gene expression values were log-normalized, centered, and processed into a matrix.

Cell type identification
We applied principal component [10] and t-distribution stochastic neighbor embedding (tSNE) analyses to cluster the cells [11] based on the calculated gene expression matrices. Cell type identification and clustering were performed using the Seurat program (http://satijalab.org/seurat/, R package; v.3.0.1). We set the FindClusters parameter resolution to 0.6 to fulfill cell clustering. Differentially expressed genes (DEGs) were identified using the DESeq2 R package (|log2(fold change)| > 1 and P < 0.05) and visualized as heatmaps and volcano plots. Based on the DEGs, we used the FindMarkers function to define diverse cell types among different samples or consecutive clusters.

Validation of the clinical significance of S100A2
Since S100A2 was identified as a marker gene of PASC cells, we further applied public transcriptome databases to explore its clinical significance. We downloaded the GSE28735 dataset from the Gene Expression Omnibus database to validate the DEG expression. This dataset contains mRNA expression profiles of 45-paired PC and adjacent nontumor tissues measured using Affymetrix GeneChip Human Gene 1.0 ST arrays. We then used a paired-sample t-test to compare the expression of S100A2 between these two tissue types. We collected S100A2 expression values and the clinical data from the PDAC cohort of The Cancer Genome Atlas (TCGA) database. Based on the median S100A2 value, we classified 186 PC patients into high-and low-level groups and performed Kaplan-Meier survival analyses. Finally, we used the Human Protein Atlas database to examine the S100A2 expression in pancreatic tissues.

Differential expression analysis for pancreatic adenosquamous carcinoma
We screened out DEGs by performing differential expression analysis between ductal and cancer cells and ranked them by log 2 (fold change) value. Next, we performed Gene Set Enrichment Analysis (GSEA) to evaluate the functions of this gene set using the Molecular Signatures Database v7.2. Three collection sets, including "c5.go.bp.v7.2.entrez. gmt," "c5.go.cc.v7.2.entrez.gmt," and "c5.go.mf.v7.2. entrez.gmt" were used to define the Gene Ontology of the biological process, cellular components, and molecular functions for DEGs. We also performed pathway enrichment analysis for DEGs using "c2.cp.wikipathways.v7.2.entrez.gmt" database.
Next, we pooled pancreatic ductal and cancer cells together, clustering five cell subpopulations in ductal cells and four cell types in PASC cells, with principal component analysis and tSNE analysis. Next, we performed differential expression analyses between normal ductal cells and adjacent nonmalignant ductal (ANMD) cells to define a DEG set (|log2(fold change)| > 1 and P < 0.05). Similarly, another DEG set was determined by comparing the gene expression level between ANMD and PASC cells. Thus, by performing an intersection between these two DEG sets, we could identify the continuously changing genes from normal ductal cells to ANMD and PASC cells.

Pseudotime analysis
We also used the Monocle2 package (v2.8.0) to deduce the dynamic transcriptome changes within five duct-cell clusters and four cancer cell clusters. Thereby, we could predict the future transcriptional state of an individual cell subgroup. The top ten DEGs (q-value = 0.05) of cluster (C) 1-4 were used to sort PASC cells in a pseudotime order. We further constructed a trajectory tree in a two-dimensional space using the DRTree dimensionality reduction algorithm. Based on pseudotime analysis, we could screen out the upregulated and downregulated genes ranging from ductal cells to PASC cells. After that, we employed the clusterProfiler R package to illuminate the biological processes of DEGs.

Copy number variation analysis
Based on the scRNA-seq data, we calculated the clonal copy number variation (CNV) in each cell, with a default threshold value of "0.5." Then, we used the InferCNV R package to map CNVs to DNA locations. After data normalization, we annotated each CNV as either gain or loss, ranging from −1 to +1. We used myeloid and immune cells as reference cells to exclude individual somatic CNVs.

Cell-cell communication between ductal and stromal cells
We performed cell-cell communication analysis to better understand the interactions between PASC and stromal cells. First, we inputted our scRNA-seq data into CellPhoneDB V2.0 [12]. In this tool, we calculated the average expression value and the percentage (>10%) of the transcripts encoding the ligands and their receptors in each cell type. Then, we randomly arranged the cell type markers of all cells to form a new cell group and calculated the average expression level of ligands and receptors in this cell group. In this way, a zero distribution was generated for each ligand-receptor pair in each pairwise comparison between two cell types. Thus, we calculated the actual average value of the ligand-receptor pairs between two specific cell types, such as fibroblasts and cancer cells. The significance of the ligand-receptor pair in two cell types was speculated based on the ratio of the actual average value equal to or higher than the average value of the zero distribution. We ranked these ligand-receptor pairs in two specific cell types according to the P-values and plot this cell-cell communication network.

Pathological analyses
Hematoxylin-eosin and immunohistochemical staining were conducted following standard protocols for formalin-fixed paraffin-embedded tissues. We used carcinoma embryonic antigen antibodies to dye IPMN sections and CK8/18 antibodies for normal and PASC sections (Shanghai Shunbai Biotechnology Company; Shanghai, China). From January 2019 to December 2020, we collected five paired PASC and adjacent normal tissues to validate the EGFR expression.

scRNA-seq profiles identified ten cell types in the normal pancreas, IPMN, and PASC
We used scRNA-seq to investigate the pancreatic heterogeneity ranging from the normal pancreas to those in IPMN and PASC ( Figure 1). Within the three histological types, 9,887 cells were sequenced for subsequent analyses. All the scRNA-seq datasets were deposited in the Gene Expression Omnibus site (accession number: GSE165399).
We identified ten cell subpopulations based on gene expression patterns, comprising ductal cells, cancer cells, immune cells, myeloid cells, and fibroblasts ( Figure 2A and Table S2). The distribution and percentage of each cell type across the three samples are presented in Figure 2B and C. Myeloid cells were the most dominant cells, accounting for 38.18% (3775/9887) of cell types, and the myeloid cell percentage increased from the healthy pancreas to PASC samples ( Figure 2C). Notably, immune cells, including T, B, and plasma cells, were rare in the normal pancreas (3.13%, 52/1662) and PASC samples (8.34%, 173/2074); however, these cells were highly expressed in IPMN (45.15%, 2777/6151). Ductal cells mainly existed in the normal pancreas and were present at low levels in PASC samples.
Consequently, we applied public databases to reveal the clinical significance of S100A2, a PASC cell gene marker. In the GSE28735 dataset, S100A2 was highly expressed in PC tissues compared to adjacent normal tissues. Furthermore, S100A2 was correlated with poor prognosis in the PDAC cohort of the TCGA dataset. In Human Protein Atlas, S100A2 protein was also overexpressed in PDAC tumors but was not detected in the normal pancreas ( Figure S3). These findings suggest that S100A2 is associated with the poor clinical features of PASC.

Ribosomal hyperfunction is associated with PASC cells
To elucidate the abnormal cellular functions during PASC progression, we compared the expression of 2,350 genes between cancer and ductal cells, identifying DEGs ( Figure 4A and B). Gene Ontology analyses showed that this gene set was significantly associated with the activation of "cotranslational protein targeting to membrane" and the inactivation of "positive regulation of RNA biosynthetic process" in biological processes ( Figure  4C). In molecular function, these DEGs mainly participated in the upregulation of "structural constituent of ribosomes" and the downregulation of "DNA binding and transcription activator activity" ( Figure 4D). The dysregulated genes also correlated with tumor-promoting functions in cellular components, such as the activation of "ribosomal subunits" and the suppression of "tight junctions" ( Figure 4E). We further observed that the DEGs were enriched in the activation of "cytoplasmic ribosomal proteins" and "VEGFA/VEGFR2 signaling pathway," according to the WikiPathways dataset ( Figure 4F and G). In contrast, this gene set correlated with the inactivation of "epithelial-mesenchymal transition (EMT)" and "adenoid cystic carcinoma" (Figure 4F and G).  LGALS1, NPM1, RACK1, and PERP are overexpressed from ductal to PASC cells To investigate the driving factors in PASC tumorigenesis, we pooled ductal and PASC cells together and conducted the principal component analysis using our scRNA-seq dataset. We then divided these into five ductal cell clusters and four carcinoma subgroups ( Figure 5A and B). We found that most ductal cells were derived from the normal pancreas (89.08%, 563/632), whereas only 68 ductal cells were derived from the adjacent normal tissue of the PASC sample (Figure 2A and B). Notably, these cells were "entangled," expressing antitumor genes such as LTF [24], oncogenes such as CXCL6 [25], SAA1 [26], and S100A9 [27], and the inflammation gene CRP. Therefore, we consider this cell type a transitional cell type between healthy ductal and PASC cells. Among the healthy ductal, transitional, and tumor cells, we performed differential expression analyses and screened out the intersecting DEGs. Thus, we defined four genes that were overexpressed from ductal cells to transitional ductal cells and PASC cells: LGALS1, NPM1, RACK1, and PERP ( Figure 5C and D).

The pseudotime trajectory of ductal and PASC cells
Pseudotime analyses were conducted based on the scRNA-seq data of five ductal and four carcinoma cell clusters. This analysis revealed three evolutionary dynamic states with distinct genotypes in these cells ( Figure 6A). In the initial state, the ductal cells and PASC cells expressed pancreatic feature genes, such as SPINK1 and REG3A (Table S4). SPINK1 a trypsin inhibitor suppresses the activation of zymogens via trypsin within the pancreatic duct. REG3A encodes a pancreatic secretory protein associated with pancreatic inflammation and antimicrobial activity. C-1 and C-5 ductal cells differentiated into the first cell state, expressing SPRR3 and showing keratinization characteristics (Table S5). C-3 and C-4 cancer cells exhibited the gene pattern of the second cell state however, we could not identify the specific functional genes within these cells (Table S6). C-1 and C-2 cancer cells finally differentiated into the third cells state, which expressed SFRP5, a Wnt-signaling regulator (Table S7). Overall, the formation of cancer cells appeared after the evolution of normal ductal cells. Four cancer-cell clusters evolved along two differentiation trajectories whereas ductal cells followed one evolutionary state ( Figure 6B and C). During pseudotime, the significantly altered genes with similar expression models were clustered into nine subsets ( Figure 6D). We further performed biological process analyses for these genes. The results indicated that the downregulated gene set was associated with "ATP metabolic process" (Figure 6E) [28]. Conversely, the upregulated gene set was mainly enriched in "epidermal development" and "keratinization" (Figure 6E), which was probably due to the adenosquamous properties of the tumor sample used in this study.

Features of the five PASC cell subgroups
We identified 559 cancer cells according to epithelial and cancer markers including KRT19, KRT7, EPCAM, SOX9, and KRT23. Then, we grouped the cells into five clusters (C1-C5) based on specific gene patterns ( Figure 7A).     Notably, we observed that UBE2C, ASPM, CENPF, and TOP2A were significantly overexpressed in C1 compared to the other clusters. This cluster exhibited distinct karyokinetic and proliferation activity ( Figure 7B and Table S8). For example, UBE2C maintains cell stemness in hepatocellular carcinoma [29]. The stem cell marker, ASPM, promotes cell proliferation in prostate cancer [30] through the Wnt signaling pathway [28], and CENPF is involved in the formation of the centromere-kinetochore complex and was correlated with poor prognosis in the PDAC cohort of TCGA ( Figure S4A). TOP2A was upregulated in tumor tissues [31], correlated with the poor survival of PC patients ( Figure S4B), and induced PC cell progression [32]. Furthermore, NUSAP1 and TPX2, as C1 marker genes, contribute to spindle organization. The high TPX2 expression was associated with oncogenic KRAS mutations, and TPX2 inhibition suppresses PC cell growth [33]. Additionally, SMC4 promotes glioma cell proliferation through TGF-β [34]. Since C1 specifically contains stem cell and proliferative genes, we regarded this cluster as stem-like cancer cells, which suggests that drugs targeting these marker genes inhibit PASC proliferation.
We also observed that C2 expressed several transcription factors, including MXD1, ELF3, and LMO7 (Table S9). MXD1 negatively regulates gastric cancer cell migration, invasion, and metastasis, and serves as a c-Myc antagonist [35]. ELF3 stability is associated with PDAC progression via endogenous ERK1/2 phosphorylation stimulation [36]. Additionally, LMO7 was upregulated in papillary thyroid carcinoma and is a regulatory factor that promotes cell growth by fusing with BRAF [37].
We discovered the overexpression of laminins (LAMA3, LAMC2, and LAMB3), MMP1, and SERPINE1 in C3 (Table S10), suggesting that this cell type participates in microenvironmental reconstruction. For example, MMP1 is involved in extracellular matrix breakdown and is usually upregulated during pancreatic tumorigenesis [38].
Similarly, LAMA3 was increased in tumor tissues and correlated with the poor prognosis of PDAC patients [39], while SERPINE1 promoted PC cell proliferation and invasion [40].

Overexpression of EREG, FCGR2A, CCL4L2, and CTSC in myeloid cells from the normal pancreas to PASC
Since most stromal cells were myeloid cells, we investigated the gene variations in this cell type to predict its effect on PASC development. Myeloid cells were divided into five subgroups: macrophages (88.2%), dendritic cells (8.3%), proliferative macrophages (2.0%), mastocytes (0.5), and plasmacytoid dendritic cells (1.0%; Figure 8A-C).  For macrophages, we identified four genes that were upregulated from the healthy pancreas to IPMN and PASC ( Figure 8D and E): EREG, which encodes an epidermal growth factor receptor ligand that promotes carcinogenesis [45,46]; FCGR2A, which encodes a macrophage surface receptor that is involved in phagocytosis and clearance of immune complexes and exhibits polymorphisms associated with chemotherapy efficacy in KRAS-mutated tumors [47]; CCL4L2, which encodes cytokine proteins and functions in inflammatory and immunoregulatory processes; and CTSC, which encodes a member of the peptidase C1 family that activates serine proteinases. In hepatocellular carcinoma cells, CTSC accelerates proliferation and metastasis by activating the TNF-α/p38 pathway [48]. CTSC inhibition was also reported to positively regulate autophagy [49]. Thus, we discovered aberrant changes in myeloid cells, suggesting that blocking myeloid cell progression inhibits PASC tumorigenesis.

Cancer-associated fibroblasts (CAFs) restore an immunosuppressive tumor microenvironment
To delineate fibroblast heterogeneity within the tumor microenvironment, we subdivided 1,135 cells into fibroblasts, myofibroblasts, and CAFs ( Figure  9A-C). The gene markers for each cell type were selected based on the results of previous studies [50,51], and are summarized in Table S13. We performed differential expression analysis between fibroblasts and CAFs to uncover the aberrant functions within CAFs. The GSEA results showed that in biological processes, enriched DEGs were negatively associated with the "B cell activation and immune response" and positively associated with "bone development and wound healing" (Figure 9D). For molecular function, this gene set was involved in "structural constituent of the cytoskeleton," "calcium ion binding," and "extracellular matrix structure constituent," which participate in the reconstitution of extra-and intracellular microenvironments ( Figure S5). In contrast, the DEGs were negatively correlated with "antigen-binding," which enhances the association between CAFs and the immune response ( Figure S5). In cellular components, 25 DEGs were related to an accelerated "actin cytoskeleton," which plays an essential role in cell migration ( Figure S6). Consequently, we revealed that CAFs may support tumor progression by rebuilding the immunosuppressive microenvironment.

The copy number variation landscape between ductal and PASC cells
To investigate large-scale chromosomal changes among ductal (563 cells), adjacent nonmalignant (68 cells), and PASC cells (559 cells), we performed CNV analyses based on our scRNA-seq data. The results demonstrated that compared with reference cells (myeloid, B, and T cells), DNA insertions in cancer cells mainly occurred in chromosomes 1,8,19, and 20, while deletions were present in chromosomes 3, 6, and 10 in (Figure 10). Notably, the total CNV score significantly increased in ductal and cancer cells compared with that in ANMD cells. Although the CNV score was comparable between ductal and cancer cells, the CNV features between these two cell types differed (Figure 10).

Activation of EGFR-associated ligand-receptor pairs among fibroblast, myeloid cells, and PASC cells
To examine the intercellular interactions during PASC progression, we performed cell-cell communication analyses based on scRNA-seq data and the CellPhoneDB database. The significance and expression level of the ligand-receptor of fibroblasts and myeloid cells to ductal cells are presented and visualized in Figure S7. Similarly, we showed the expression level of the ligand-receptor of fibroblasts and myeloid cells to cancer cells ( Figure S8). Thus, we selected the significantly changed ligand-receptor pairs between cancer and stromal cells from this cell-cell interaction network. The results revealed that PASC cells communicate more closely with myeloid cells and fibroblasts than other cell types ( Figure 11A). We further screened the activated ligand-receptor pairs between ductal cells and these two stroma-cell types and defined several cancer-associated pairs, such as EGFR/TGFB1, ANXA1/FPR3, EREG/EGFR, and ICAM1/AREG ( Figure 11B-E). Notably, EGFR and its receptors, including TGFB1, MIF, HBEGF, GRN, COPA, and AREG, were upregulated in cancer cells and fibroblasts. Consequently, we identified an EGFR-associated feedback loop that promotes PASC progression from ductal cells to cancer cells. EGFR in cancer cells is overexpressed, thereby activating fibroblasts and myeloid cells via several molecules such as AREG. AREG reacts with cancer cells and results in the activation of EGFR in pancreatic cancer [52]. This crosstalk between fibroblasts, myeloid cells, and cancer cells may provide a potential therapeutic target for PASC. Besides, several cancer-associated ligand-receptor pairs, such as NAMPT/IL13RA2, were also overexpressed in the interactions between fibroblasts and cancer cells ( Figure 11D) [53]. Similarly, myeloid cells communicate with cancer cells via C5AR1/RPS19 ( Figure 11E). Finally, immunohistochemistry results showed that EGFR stains stronger in PASC tissues than in adjacent normal tissues ( Figure 11F). Thus, our findings supported that anti-EGFR could be a potential drug therapy for PASC.

Discussion
PASC is a rare but highly aggressive type of pancreatic cancer, containing abundant desmoplastic stroma and a small fraction of tumor cells. Therefore, assessing the transcriptome patterns of PASC cells in a bulk tumor remains a challenge. Recently, scRNA-seq has emerged as a new method to illustrate the cellular heterogeneity within pancreatic cancer [7]. In this study, using scRNA-seq, we identified S100A2 as a novel biomarker for PASC cells; discovered cells of a stem-like (C1), transcription factor-activated (C2), and microenvironmental reconstruction-associated (C3) types within PASC cells; defined EREG, FCGR2A, CCL4L2, and CTSC as feature genes of myeloid cells ranging from the normal pancreas to IPMN and PASC; revealed that CAFs may be associated with the reconstruction of immunosuppressive microenvironment; depicted the CNV of PASC cells; and discovered EGFR activation in cell-cell communication between PASC and stromal cells. Finally, these findings explain the heterogeneity and underlying cancer-associated regulators in PASC.
Recently, several studies have deciphered the heterogeneity of cancer and stromal cells of PC to identify the potential biomarkers and therapeutic targets for treating this lethal cancer, using scRNA-seq [7,51,54]. However, transcriptome differences between normal ductal and PASC cells are generally not examined due to the inaccessibility of a healthy pancreas and the rarity of PASC. Additionally, owing to tumor heterogeneity, the effects of the tumor microenvironment on carcinogenesis are poorly understood. To overcome these problems, we harvested pancreatic cells from a healthy donor and dissected the main pancreatic duct to obtain more living ductal cells. Hence, to our knowledge, this is the first study to compare the differences between PASC and normal pancreatic ductal cells using scRNA-seq.
Clinically, early PASC detection is difficult because no specific and sensitive molecular markers are present in tumorigenesis. At the single-cell level, we defined S100A2 as a potential biomarker for cancer cells in PASC. Besides, we revealed that S100A2 was associated with undesirable clinical features in PC. According to a recent study, S100A2 is upregulated in bulk tissues and could be an independent prognostic factor for patients with PDAC [55]. Therefore, our findings indicate that anti-S100A2 therapy may be a novel treatment strategy for PASC.
GSEA analyses also revealed several recognized cancer-driving activation pathways in PASC cells, such as VEGFA/VEGFR2 [56,57], glycolysis/ gluconeogenesis [58], and oxidative phosphorylation [59]. In contrast, two genes were involved in EMT suppression; GDF15 exhibits antitumor effects [60], and PROX1, which is involved in DNA binding transcriptional activity, protects pancreatic cells from acute tissue damage and early neoplastic transformation [61]. Besides, we observed distinct CNV patterns in ductal cells compared to reference cells, suggesting the intrinsic heterogeneity of diverse cell types.
Meanwhile, the expression of four genes increased from ductal cells to transitional ductal and malignant cells. LGALS1, a member of the betagalactoside-binding protein family, is upregulated in PDAC stroma, promoting microenvironmental vascularization [62]. Mechanistically, LGALS1 may activate SDF-1 through NF-κB, thereby increasing pancreatic stellate cell migration and invasion [63]. NPM1, which is associated with tumor metastasis, is highly expressed in lung adenocarcinoma [64]. As a ribosomal protein, RACK1 promotes tumor progression in breast cancer and oral squamous cell carcinoma [65,66]. However, as a molecule downstream of METTL14, PERP can inhibit PC cell proliferation and migration [67]. Thus, these four genes may be driving regulators and prognostic biomarkers for PASC development.
Pseudotime trajectory analysis revealed that several genes may change along with the differentiation process from ductal to PASC cells. For example, CFTR -which encodes a member of the ATP-binding cassette transporter superfamily that functions as a chloride channel and controls ion and water secretion in epithelial tissues -decreased from ductal to cancer cells. Mutations of CFTR cause cystic fibrosis -a risk factor for PC [28]. Besides, the biological process analysis indicated that the highly expressed gene set was associated with cellular substrate organization ( Figure 6E), a crucial feature of PC progression.
Another valuable observation from our scRNA-seq data is the evaluation of cell-cell communication based on ligand-receptor expressions. For example, the level of the EGFR/TGFB1 pair was overexpressed from ductal cell-fibroblast to cancer cell-fibroblast communication ( Figure 11B). EGFR could alter TGF-β function from antitumorigenic to protumorigenic in breast cancer [68], while TGFB1 increased fibrogenic gene expression and myofibroblast differentiation [69]. Similarly, the expression of the ANXA1/FPR1 or FPR3 pair was upregulated from ductal cell-myeloid cell to cancer cell-myeloid cell communication ( Figure 11C). ANXA1 is an anti-inflammatory molecule in PC [70]. FPR1 and FPR3 are macrophage gene markers, participating in neutrophil recruitment [71] and tumor-associated inflammation [72]. Thus, our findings highlighted the signaling pathways between cancer and stromal cells.
We also revealed the crosstalk between fibroblasts and cancer cells via ICAM1/AREG and collagen/integrin. As a molecule downstream of IL-35, ICAM1 facilitates endothelial adhesion and transendothelial migration, which accelerates PC metastasis [73]. As an epidermal growth factor, AREG significantly influences EMT upregulation in PC via the EGFR signaling pathway [74]. The expression of COL1A1, COL1A2, and COL3A1, members of the collagen family, is increased in lung adenocarcinomas and is associated with its poor prognosis. Since the activation of integrin signaling is correlated with cancer metastasis [75], our findings indicate that the ligand-receptor interaction between fibroblasts and cancer cells could contribute to PASC migration.
Myeloid cell-cancer cell communication analyses revealed activation of C5AR1/RPS19 and HLA-C/FAM3C pairs. RPS19 is upregulated in breast and ovarian cancer cells and promotes the presence of regulatory T cells, reducing CD8 T cell infiltration [76]. Similarly, C5AR1 recruits myeloid-derived suppressor cells into inflamed colorectum to impair CD8 T cells [77]. Thus, activating C5AR1/RPS19 may induce immunosuppression during PC progression. As a member of the family with sequence similarity 3, FAM3C encodes a secreted protein that is upregulated in pancreatic cancer cells [78,79]. Consequently, the overexpression of intercellular signaling among fibroblasts, myeloid cells, and cancer cells uncover novel biological insights into abnormal cell-cell communications in PASC.
This study has some limitations; for example, due to ischemia and necrosis, most epithelial cells in the IPMN group did not undergo quality control. Therefore, we could not compare the gene patterns of ductal cells among all three sample tissues. Besides, we did not explore T, B, and plasma cells since they were only present in IPMN, and a continuous change in the gene pattern was not observed. In addition, each type of tissue sample comes from a single patient, which may affect the credibility of this study. Thus, a larger number of PASC samples and their respective stages are needed to thoroughly understand the progression of PASC.

Conclusion
We examined the ductal and stromal cell variations during PASC progression from a single-cell perspective. We identified novel molecular markers for cancer cells, defined the underlying cancer-associated signaling pathways, and deciphered cell-cell communication during PASC tumorigenesis. Collectively, these results provide valuable information to understand the critical biological processes underlying PASC and demonstrate potential therapeutic targets for PASC.

Ethics approval and consent to participate
This study was approved by the Ethics Committee of Beijing Chaoyang Hospital, affiliated with Capital Medical University (2020-S-274 and 2020-S-302).

Author Contributions
Xin Zhao, Han Li, and Shaocheng Lyu conceived and designed this study; Zhao wrote the manuscript; Jialei Zhai and Huaguang Wang performed pathological examinations; Zhiwei Ji and Zhigang Zhang analyzed the data; Xinxue Zhang, Zhe Liu, and Junming Xu collected the samples and conducted the experiments; Hua Fan, Jiantao Kou, and Lixin Li revised the paper; Ren Lang provided funding; Qiang He offered administrative support. All authors read and approved the final manuscript.