RNA Sequencing of Early-Stage Gastric Adenocarcinoma Reveals Multiple Activated Pathways and Novel Long Non-Coding RNAs in Patient Tissue Samples

Background: Gastric cancer is among the most common cancers worldwide that currently lacks effective diagnostic biomarkers and therapeutic targets. Next-generation RNA sequencing is a powerful tool that allows rapid and accurate transcriptome-wide profiling to detect differentially expressed transcripts involved in normal biological and pathological processes. Given the function of this technique, it has the potential to identify new molecular targets for the early diagnosis of disease, particularly in gastric adenocarcinoma. Methods: In this study, whole-transcriptome analysis was performed with RNA sequencing on tumoral and non-tumoral tissue samples from patients with early-stage gastric cancer. Gene ontology and pathway enrichment analysis were used to determine the main function of the specific genes and pathways present in tissue samples. Results: Analysis of the differentially expressed genes revealed 5 upregulated and 234 downregulated genes in gastric cancer tissues. Pathway enrichment analysis revealed significantly dysregulated signalling pathways, including those involved in gastric acid secretion, drug metabolism and transporters, molecular toxicology, Olinked glycosylation of mucins, immunotoxicity, metabolism of xenobiotics by cytochrome P450, and glycosylation. We also found novel downregulated non-coding RNAs present in gastric cancer tissues, including GATA6 antisense RNA 1, antisense to LYZ, antisense P4HB, overlapping ACER2, long intergenic non-protein coding RNA 2688 (LINC02688) and uncharacterized LOC25845 (PP7080). Conclusions: The transcriptomic data found in this study illustrates the power of RNA-sequencing in discovering novel genes and tumorigenic pathways involved in human carcinogenesis. The anomalies present in these genes may serve as promising tools for the development of accurate diagnostic biomarkers for the detection of early-stage gastric cancer.


Introduction
Gastric cancer (GC) is the fifth most common type of cancer, accounting for about 5.7 % of all cases (1). Risk factors include genetic, epigenetic and environmental factors such as diet and infections (2)(3)(4). Although significant progress has been made in recent years against gastric cancer, the treatment of advanced GC remains a major challenge due to late stage diagnosis and its poor prognosis (5)(6)(7). Whilst reliably detecting early-stage biomarkers in cancers is still a challenge, the next generation sequencing-based assays can help detect malignancies in their earliest stages. Genomewide transcriptome analysis methods such as microarrays and RNA-sequencing (RNA-seq) have enriched our understanding of human cancer. The advances of transcriptome analysis have and continue to enhance our understanding of the biology of tumor cells, key molecular pathways, relevant biomarkers, and the mechanisms of cancer progression and tumor resistance (8)(9)(10)(11). More importantly, RNA-seq provides new insights into the roles of the genes in signalling pathways that contribute to the development of cancer (12,13). The use of the subsequent bioinformatic analysis tools, such as gene ontology (GO) and pathway enrichment analysis could reveal genes and pathways involved in human pathology (14,15). Indeed, RNA-seq technology has been used to discover non-coding RNAs (ncRNAs), such as microRNAs, long non-coding RNAs (lncRNAs), antisense RNAs, and pseudogenes involved in the development and progression of a range of cancers, including gastric, colorectal, and cervical cancer (16)(17)(18)(19). The advent of these new technologies marked the rise of precision medicine, a novel approach that develops strategies for treatment and prevention based on a person's own unique disease-driven molecular alterations (20,21). Therefore, collecting genetic information from individuals can aid in our understanding of the causes and risk factors of diseases. In this study, we performed RNAseq transcriptomic analysis to evaluate the pathways and novel transcripts involved in the progression of GC. Differentially expressed genes (DEGs) among matched cancerous and non-cancerous human gastric tissues were identified. To identify the common biological functions and signalling pathways of the enriched DEGs, we applied bioinformatic analysis, including GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis.

Patients
Two male patients diagnosed with GC adenocarcinoma were enrolled in this study. Samples were individually snap-frozen in liquid nitrogen immediately after surgery, and thereafter stored at −80 °C without thawing for further analyses. Diagnosis of all cases was histologically confirmed by a pathologist using haematoxylin and eosin (H&E) staining. Both patients had no family history of cancer and no prior neoadjuvant chemotherapy. Histological analysis of their tumoral tissues confirmed stage Ia and grade II adenocarcinoma, with no detectable lymph node metastasis. For each patient, both tumoral and their adjacent nontumoral tissues were subjected to nextgeneration RNA-sequencing. Molecular analysis confirmed the absence of Epstein-Barr virus (EBV) and Helicobacter pylori infection in both tumoral and non-tumoral tissues. The study was approved by the ethical committee of Babol University of Medical Sciences, and all participants gave their informed written consent.

RNA extraction
Total RNA was isolated from each sample using RNX-plus reagent (Cinagen, Iran), according to the manufacturer's instructions. The purity and RNA integrity number (RIN) was assessed using a RNA Nano 6000 Assay kit with the Bioanalyzer 2100 system (Agilent Technologies, USA). RNA samples with a RIN greater than 7 were considered for cDNA library preparation and subsequent RNA-seq.

Library preparation and sequencing
Sequencing libraries and RNA-seq experiments were carried out by BGI Company (Shenzhen, China). Briefly, before constructing the RNA-seq libraries, ribosomal RNA (rRNA) was removed from total RNA using the Ribo-Zero rRNA removal kit (Epicentre Biotechnologies, Madison, WI, USA). The RNA-seq libraries were constructed using a NEBNext® Ultra™ RNA Library Prep kit (New England Biolabs, UK) according to the manufacturer's protocol.
Following this, the RNA-seq library then underwent paired-end sequencing (2 × 150 bp) using the Illumina HiSeq™ 2500 platform. The quality of the raw reads was assessed using FastQC and MultiQC. Adapter sequences were trimmed and any low-quality reads or reads below a certain length from the raw FastQC reads were removed using Trimmomatic software. The clean reads were aligned to the hg19 assembly of the human reference genome using the Hisat2 alignment program.

Identification of differentially expressed genes (DEGs)
Fragments Per Kilobase Million Mapped Reads (FPKM) was used to quantify transcript levels. Differential gene expression analysis was performed using the Bioconductor package DESeq2 (version 1.14.0). For this analysis, a Benjamini and Hochberg (BH) Pvalue adjustment was performed and a false discovery rate (FDR) of 0.05 was applied. Only genes that had a P-adjusted value of ≤ 0.05 and log2 (fold change) greater than or equal to 1 were considered differentially expressed.
Each pathway was then mapped to all DEGs, and the significantly expressed genes were recognized for every pathway according to the following rules: (1) genes with adjusted p values < 0.0001 and baseMean greater than 400; (2) pathways with at least 5 DEGs.

Differentially expressed genes (DEGs)
Transcriptome sequencing data revealed 1788 dysregulated RNAs corresponding to 578 genes that were significantly up-regulated and 1210 genes that were significantly downregulated in tumoral tissues, compared to the adjacent non-tumoral tissues. Interestingly, 111 (6.2%) DEGs corresponded to novel transcripts and non-coding transcripts. To determine how the DEGs were different among the tumor and non-tumor tissues, scatter and volcano plots were obtained from pairwise comparisons between samples (Fig.  1a, 1b). The scatter plots compared the gene expression levels pairwise among the samples, while volcano plots were generated using log2 fold-change against -log10 (p-value) displaying the amount of DEGs. To identify unique gene expression signatures and genes with similar expression patterns across tumoral and non-tumoral tissue, hierarchical clustering based on the gene list was used (Fig. 2). As shown in Figure 2, DEGs were similar in both participants. Furthermore, we considered the physical localization of DEGs along each chromosome. As shown in Figure 3, 148 out of 240 (61.6%) DEGs were placed on the q-arm of the chromosomes. In addition, 12.5%, 11.3%, and 8.7% of DEGs were located on chromosomes 19, 1, and 11, respectively. Following this analysis, the genes with low expression were removed, and those with a P value less than 0.0001 were considered for further analysis. Cut-offs for low abundant mRNAs and non-coding transcripts were baseMean ≤ 400 and ≤ 300, respectively. After excluding low abundant RNAs, a total of 240 RNAs, including 235 down-regulated (6 noncoding-RNAs) and 5 up-regulated RNAs remained.  Gene ontology and pathway enrichment analysis DEGs were annotated according to different terms, and grouped into three broad categories: biological processes, cellular components, and molecular functions (Fig. 4). The terms most commonly associated with the functional categories of biological processes, cellular components, and molecular functions, included apical part of the cell, digestion, and potassium ion transmembrane transporter activity, respectively. Additionally, pathway enrichment analysis by KOBASS 3.0 showed that gastric acid secretion, O-linked glycosylation of mucins, and metabolism of xenobiotics by cytochrome P450 were the most significant DEGs among the enriched pathways in this study (Table 1). Furthermore, pathways enriched by Qiagen panels for DEGs included drug metabolism, glycosylation, immunotoxicity, molecular toxicology, and tyrosine kinases pathways (Table  1). This difference observed between the two methods may be due to differences in the number of pathways they consider, pathway size, and the subcategories of pathways that they provide. These differences suggest that the combined use of multiple databases or integrative ones may uncover the complex combinations of dysregulated pathways present in GC. According to the national human genome research institute (NHGRI) GWAS catalogue, the DEGs identified in our study are highly correlated with GC (p value = 0.0093).

Discussion
In the present study, we performed a transcriptome-wide analysis of the modifications present within the tumoral tissues of patients diagnosed with GC, compared to their adjacent non-tumoral tissue using both in-house RNA-seq data and the publicly available TCGA database. A total of 1677 DEGs and 111 differentially expressed novel transcripts and non-coding transcripts were found to be uniquely expressed in the GC tumoral tissues, compared to healthy non-tumoral tissue. These DEGs observed in the gastric tumor samples have been previously shown to be associated with the presence of GC. The RNA-seq data obtained in this study corroborated the findings of previous reports, emphasizing its function as a reliable method for identifying DEGs in GC. Consequently, following DEGs strengthening, out of 239 DEGs found by our in-house RNA-seq analysis, we were able to confirm the dysregulation of 22 genes by utilizing TCGA dataset in 14 GC cases with stage 1a GC. Among these genes, ATP4A, ATP4B, GKN1, GKN2, and gastric type LIPF were expressed only in stomach while ghrelin, GHRL, and SLC5A5 are predominantly expressed in the stomach but also in many other tissues. Both ATP4A and ATP4B are subunits of the gastric proton pump, hydrogen potassium (H + /K + ) ATPase. This pump is found in parietal cells of the gastric oxyntic mucosa, involved in maintaining an acidic environment within the stomach through aiding in gastric acid section (22). The pathway enrichment analysis performed in this study revealed that gastric acid secretion was the most significantly enriched pathway found in the tumoral tissue samples. Recent evidence has unveiled a role for protonpump inhibitors (PPIs) in the pathogenesis of GC due to their suppression of gastric acid section (23)(24)(25). A meta-analysis of observational studies on the effect of acid suppressive drugs on the development of GC found that H2 receptor blockers, H2 receptor antagonists (H2Ras), and PPIs significantly increased the risk for GC (26). A separate in vivo study exploring the relationship between GC and gastrin secretion found that 60% of gastrin-deficient mice developed gastric tumors in the antrum of the stomach, related to the lack of acid secretion within the stomach (27). Both GKN1 and GKN2 have been identified as novel biomarkers for GC as have been found to be downregulated in GC patients. These genes are involved in the homeostatic regulation of the gastric mucosa (28)(29)(30). Several studies have shown a decrease in levels of GKN1 and GKN2 in gastric tumor tissues and GC cell lines. Yoon et al. found that exosomes carrying GKN1 inhibited cell proliferation and induced apoptosis in the human GC-derived cell lines, AGS and MKN1 (31). The in vivo portion of this study indicated that tumor volume and weight were significantly reduced following treatment of nude mice with MKN1 xenograft tumors by exosomes carrying GKN1 (31). Furthermore, Shi et al. found that restoration of GKN2 in gastric cancer cells reduced cell viability and increased apoptosis through the activation of extrinsic apoptotic pathways (32).
Gastric lipase (LIPF) is a gastric tissuerestricted protein secreted by gastric chief cells in the fundic mucosa of the stomach. In the tissues of GC patients, LIPF has been observed to be downregulated (33,34). Additionally, we found that trefoil factor 1 and trefoil factor 2, as well as mucin 5AC, mucin-like protein 3, mucin 1 and mucin 6 expressions were downregulated in GC tissues, compared to the surrounding healthy gastric tissue samples. Co-expression of trefoil peptides and mucins suggests a key role in mucosal protection by forming the mucosal barrier (35,36). Additionally, our findings show DEGs enriched for pathways involved in drug metabolism and transporters, molecular toxicology, O-linked glycosylation of mucins, immunotoxicity, metabolism of xenobiotics by cytochrome P450, and glycosylation in the GC tissues. Genes associated with drug metabolism and drug transporters are involved in the regulation of the pharmacokinetics and pharmacodynamics of many agents such as toxic chemicals and hormones. The dysregulation of genes involved in drug metabolism have been shown to predispose individuals to developing certain cancers through enhancing metabolic activation and reducing detoxification of environmental, dietary, and endogenous procarcinogens (37)(38)(39). Drug transporters and drug metabolizing enzymes also contribute to chemoresistance. Furthermore, metabolization of xenobiotics by cytochrome P450 plays an important role in the activation and/or deactivation of a wide range of xenobiotics, including anticancer drugs. Abnormalities in genes associated with xenobiotic metabolism by cytochrome P450 have been shown to have a critical function in the development and progression of many cancers, including mucinous epithelial ovarian cancer, clear cell renal cell carcinoma and GC (40)(41)(42)(43).
Glycosylation is one of the most important posttranslational modifications of proteins required for the normal biological functioning of cells. This critical process influences cell signalling, immune recognition, and cell-cell interactions. Our TCGA data identified many changes in the expression of glycosylation genes that were associated with cancer. O-GalNAc glycans and N-glycans are two main classes of glycans found in membrane and extracellular glycoproteins. Mucins are a class of glycoproteins carrying many O-GalNAc glycans, and act as key molecules in the maintenance of gastrointestinal homeostasis. Aberrant glycosylation of mucins has been linked to the pathogenesis of many diseases, including GC (44)(45)(46). Immunotoxicity is defined as the detrimental effects on the functioning of the immune system that result from exposure to xenobiotics. Environmental and industrial chemicals have the potential to alter the immune system and lead to immunosuppression. The dysregulation of genes associated with immunosuppression may increase the incidence or severity of cancer by inhibiting the ability of the immune system to detect and kill cancer cells or cancer-causing infections. We also identified novel transcripts and non-coding transcripts to be significantly downregulated in GC tissues. After low expression filtering of DEGs and the removal of reads with p values lower than 0.0001, we obtained 6 lncRNAs, including antisense to EXOC3 (PP7080, LOC25845), GATA6 antisense RNA 1, antisense to LYZ, antisense P4HB, overlapping ACER2, and long intergenic non-protein coding RNA 2688, all of which were downregulated in GC tissues in comparison with surrounding healthy tissue.
In the present study, we used RNAsequencing to perform transcriptome-wide analysis of GC tissues to identify genes that were differentially expressed in GC in comparison to matched healthy gastric tissues. Our data provides a global view of the functions of the differentially expressed RNAs within GC tissue. Gastric acid secretion, drug metabolism and glycosylation were the main pathways dysregulated at the transcriptional level in GC. From our findings, the genes found to be upregulated in GC tissues, including ATP4A, ATP4B, GKN1, GKN2, LIPF, ABC, SLC, trefoil factor, and mucins may play important roles in the development of gastric adenocarcinoma. Further research is required to understand their roles and their use in the early diagnosis and treatment of GC. Thus, deeper analysis of these pathways could be reconsidered in future gene expression studies on GC cell lines and tissues.