Abstract
Objectives: The aim of the study is to review biotechnology advances in gene expression profiling on prostate cancer (PCa), focusing on experimental platform development and gene discovery, in relation to different study designs and outcomes in order to understand how they can be exploited to improve PCa diagnosis and clinical management. Methods: We conducted a systematic literature review on gene expression profiling studies through PubMed/MEDLINE and Web of Science between 2000 and 2016. Tissue biopsy and clinical gene profiling studies with different outcomes (e.g., recurrence, survival) were included. Results: Over 3,000 papers were screened and 137 full-text articles were selected. In terms of technology used, microarray is still the most popular technique, increasing from 50 to 70% between 2010 and 2015, but there has been a rise in the number of studies using RNA sequencing (13% in 2015). Sample sizes have increased, as well as the number of genes that can be screened all at once, but we have also observed more focused targeting in more recent studies. Qualitative analysis on the specific genes found associated with PCa risk or clinical outcomes revealed a large variety of gene candidates, with a few consistent cross-studies. Conclusions: The last 15 years of research in gene expression in PCa have brought a large volume of data and information that has been decoded only in part, but advancements in high-throughput sequencing technology are increasing the amount of data that can be generated. The variety of findings warrants the execution of both validation studies and meta-analyses. Genetic biomarkers have tremendous potential for early diagnosis of PCa and, if coupled with other diagnostics (e.g., imaging), can effectively be used to concretize less-invasive, personalized prediction of PCa risk and progression.
Introduction
Prostate cancer (PCa) is the most common non-skin cancer and the leading cause of cancer-related death among men in the United States and other developed countries [1]. Although the incidence of PCa has declined recently due to changes in screening recommendations, it still poses a substantial public health burden; it is estimated there will be 180,890 new PCa cases and 26,120 PCa deaths in 2016 in the United States [1]. With such a health impact, it is important to develop “precision medicine” approaches to individualized PCa diagnosis and prediction of PCa outcomes. Genetic background has been demonstrated to contribute to PCa onset, with an estimated 58% of the risk of PCa explained by heritable factors [2]. To complement the information provided by germline variation, many recent studies have sought novel gene expression markers to help in all phases of the cancer patient spectrum, from early detection to progression, response to procedures, and survival. Recent advances in multiplexing technologies, from microarrays to high-throughput next-generation sequencing (NGS), allow expression quantification of multiple genes, which, alongside other -omics technologies, can facilitate understanding the biological mechanisms of PCa towards the development of personalized prediction models and tools.
Microarray analysis was introduced as a standard tool of gene expression profiling and transcriptome analysis a decade ago. Microarray technology uses sequence-specific probe hybridization and fluorescence detection to measure gene expression levels, and it provides a comprehensive view of gene expression profiles in biological samples. However, microarray analysis requires a reference genome and transcriptome, and is thus prone to limited detection range and high noise level [3]. After the widespread introduction of NGS technology, RNA sequencing (RNA-Seq) has emerged as a robust tool to generate information on transcriptome or transcribed regions. NGS techniques feature a massive parallelization of the sequencing process that yields a much higher volume of information. Compared to microarrays, RNA-Seq can offer significant advantages in terms reproducibility, throughput, and resolution, and therefore is becoming the preferred experimental platform for gene expression and transcriptome analysis [4].
In the past 2 decades, there has been a plethora of studies examined gene expression profiles in PCa utilizing microarrays, RNA-Seq, and other techniques on a variety of study designs and settings: thousands of genes have been identified to be expressed abnormally in PCa cells, tissues, or animal models with PCa. The PCa outcomes of interest have varied as well, from the comparison of cancerous versus non-cancerous tissues, to PCa recurrence, to metastasis, and to survival.
In the present study, we perform a systematic review of peer-reviewed scientific works investigating gene expression in PCa published between 2000 and 2016, focusing on population-based or human tissue study settings. We chose this time frame because the “modern era” of microarray technology started at the end of the 1990s [5]. Our goal is to assess findings and trends in gene expression profiling for PCa risk assessment, and provide insights on which genetic elements of PCa appear to be the most promising to pursue for diagnosis and clinical management. We will not discuss here the appropriateness of analytical methods, nor perform a quantitative meta-analysis on the gene profiles.
Methods
We conducted the systematic literature review according to the Preferred Reporting Item for Systematic Reviews and Meta-Analyses (PRISMA) guideline [6]. The process of identification and selection of papers is presented as a flowchart in Figure 1. We performed a PubMed/MEDLINE search of all articles from 2000 to June 2016 using the MeSH terms “prostate cancer” and “gene expression” or “gene expression profiling” together. We also performed a search in Web of Science using the topics “prostate cancer” and “gene expression” or “gene expression profiling” together. Articles in English were included and prioritized according to the following criteria: studies using human tissue samples, human clinical studies, studies with gene expression profiling as the main predictor, and studies with biological and clinical outcomes. To ensure the completeness of our literature review, we adopted a feedback search strategy, looking at papers cited by included studies, which were then examined for eligibility. Articles were ranked independently by 2 scientists in our team with a third scientist's vote in case of disagreement regarding inclusion. Reviews were analyzed separately.
Results
After literature screening, 12 reviews [7,8,9,10,11,12,13,14,15,16,17,18] and 137 research papers [19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155] were included in the final phase of full-text reading and review of findings. Included articles, as expected, exhibited large diversities in the experimental techniques used, the study settings and the sample sizes, the definition of PCa outcomes, the number of genes analyzed, and the genes identified as differentially expressed.
We first examined the trends in experimental techniques used for gene expression measurement over the years. Figure 2 illustrates the proportion of techniques used among original papers by 5-year calendar periods. Microarray is still the major technique being used for PCa gene expression studies, increasing from 50 to 70% between 2010 and 2015. The use of qualitative PCR and other methods has decreased, accompanied by a rise in the number of RNA-Seq studies in more recent years since the beginning of the NGS era, specifically 13% between 2010 and 2015.
Figure 3 shows the trend of study sample size (in terms of observation units, either number of tissues or subjects) per study. A significant increase in sample size is observed, especially for 2010-2015, possibly due to reduction in costs of experimental techniques as well as other improvements in biomedical research. Another benefit of the advanced technology and reduced cost is the possibility of developing gene panels to predict PCa occurrence and outcomes by screening a huge volume of genes or the whole genome. There were 7 studies from 2010 to 2015 that screened 1.4 million probe sets and identified gene panels associated with PCa occurrence [75,79,80,81,87,98,111], while there was only 1 study of a similar scale in between 2005 and 2009 [68]. After removing studies that developed gene panels, we observed a significant decrease in the number of genes screened per study (more than 90,000 in 2000-2004 to 5,500 in 2005-2009, to 1,500 in 2010-2015), which may indicate that gene expression studies in PCa have been undergoing a more focused targeting. A large proportion of included studies focused only on one target gene or one type of genes, and since we only included tissue-based PCa studies, many of them were trying to explore the expression pattern of previous identified genes from cell line studies or genes identified from other cancers.
Overall, 62% of the studies investigated whether the target genes have abnormal expression activities in PCa tissues. Genes that regulate apoptosis were examined intensively. For example, Iacopino et al. [55] found that BCL-2 were overexpressed in PCa tissues compared to BPH tissue, while some other apoptosis-related genes (e.g., FAS, c-Myc) were nondifferentially expressed. Furthermore, Sethi et al. [88] showed that BAG-5, a BCL2-associated athanogene, is overexpressed in PCa, and that BAG-5 also has inhibition on apoptosis in cell studies. In addition to apoptosis-related genes observed in other types of cancer, the development and progression of PCa has also been demonstrated to be androgen related - many studies have investigated the expression of androgen-regulated genes and showed that they may play an important role in PCa progression. For example, MYC and NCOA2 were identified as major contributors to the androgen receptor signaling pathway and exhibit higher expression in prostate tumors [72]. The overexpression of AGR-2 was also demonstrated by multiple studies as well as its association with shorter survival and higher Gleason score [59,77].
In addition, among all tissue-based PCa studies, we looked at the disease progression outcomes, including recurrence, metastasis, and survival. The ERG gene, as a highly prevalent fusion partner in PCa, also showed a positive association with both a high Gleason score and short survival in several different study settings [84,85,106]. Multiple studies have demonstrated that AMACR, a gene that encodes an enzyme that functions in breaking down fatty acids and certain toxic compounds, is overexpressed in groups with adverse outcomes [31,47,61,111]. Other notable genes associated with poor prognostic outcomes include AGR-2, IGFBP-3, and MUC-1 [21,34,48,56,59,62,67,77].
Figure 4 gives a qualitative description of the most frequently mentioned genes in the studies of interest, both reviews and original papers, with respect to different outcomes. The 3 circles represent different PCa outcomes that are commonly examined in gene expression analysis; the genes included in each circle are genes positively associated with or overexpressed for the specified outcome. Genes in intersections were associated multiple outcomes. Note that the diagram does not account for risk/hazard or sample size, and it is not the result of a quantitative meta-analysis.
Interestingly, we did not find a striking overlap among studies or among different outcomes. For example, among the 400 genes identified to be significantly overexpressed in PCa tissues in a 2001 study by Welsh et al. [20], only a few were examined in further studies. Eight studies performed large-scale or whole-genome gene expression analysis and identified thousands of differentially expressed genes, which highlights a potentially complex scenario of gene interactions [21,24,29,36,61,81,86,91]. In addition, discrepant results were observed from different studies. In a case-control study conducted by Mao et al. [60], similar expression scores of VEGFR-1 were observed in patients with and without recurrence, indicating its expression was not associated with the risk of recurrence; however, 2 more recent studies showed conflicting results and suggested VEGFR-1 was overexpressed in recurrence cases [89,108]. Of all those genes, a large proportion is in need of further investigation in terms of coexpression analysis and roles in PCa development.
Recently, researchers started to use a combination of multiple genes to build expression panels to predict PCa outcomes. Some works used prior knowledge by including already identified genes and key PCa carcinogenic pathways [68,80,87,106], while others screened ex novo thousands of genes and filtered the most significant ones [34,49,67,69,79,98,111]. For example, Penney et al. [76] analyzed more than 6,000 genes using Gene Set Enrichment Analysis (GSEA) and generated a 157-gene panel that showed improved prediction of survival among PCa cases with a Gleason score of 7. Cheville et al. [65] identified a 4-gene expression panel from a set of previously identified genes to predict survival under a cohort study setting.
Discussion
We performed a systematic review of human tissue-based gene expression analysis in PCa from 2000 to 2016, identifying 137 original research articles and 12 reviews focused on differential expression in PCa tissue and disease progression outcomes.
For more than 15 years, sample sizes of studies have been increasing, but the size of gene sets has varied considerably. The technology employed the most remains the microarray, but RNA-Seq by means of NGS is on the rise. However, certain challenges have prevented wide use of NGS in PCa gene expression research. For example, formalin-fixed paraffin-embedded (FFPE) tissues samples are often used in PCa research because a long follow-up is needed. FFPE samples can be conveniently stored at room temperature, are cost-effective, and work well for immunohistochemical staining and morphology analyses. Since FFPE is widespread, there are numerous archives from which samples can be selected. On the other hand, formalin is toxic, FFPE protocols are not standardized, and they are usually not well suited for molecular analysis. FFPE tissues consist of fragmented or cross-linked DNA and may be damaged over time, or reduce the quantity of DNA available, and will reduce the performance of NGS [156].
Thousands of genes have been associated with PCa tissue, status, or cancer progression outcomes, providing the possibility of identifying valuable biomarkers for PCa. A portion of them has been confirmed in multiple studies, and only a few have been characterized thoroughly. Qualitatively, we observed a large degree of heterogeneity, discrepancies of expression among different studies, and poor consistency across different PCa outcomes. A meta-analysis may provide a clearer picture of the independent effect of single gene expression on the PCa outcomes.
Given the large number of gene findings, analysis of coexpression is warranted: high-throughput technology and larger population size allows the development of multivariable risk prediction scores, and the execution of clustering or net discovery algorithms to better characterize mechanisms of joint gene expressions in PCa and metabolic pathways.
Despite the limitations highlighted, gene expression studies can provide abundant information and opportunities for a better understanding of PCa and potential clinical uses of gene-based biomarkers. For example, a novel PCa screening test has been developed commercially based on gene expression profiling. The test, with the brand name Prolaris, utilizes a CCP (cell cycle proliferation) score assessing the RNA expression of 46 genes to make predictions on progression and aggressiveness of PCa [80]. Another commercially available test named GenomeDX uses a 22-gene signature to predict PCa metastasis after surgery, and has shown good discrimination in detecting men who are at risk for metastatic PCa [98].
As demonstrated by studies we reviewed, strategic gene expression profiling is able to encourage recognition and treatment of the PCa patient population, especially with the combination of other new techniques such as prostate magnetic resonance imaging (MRI). The incorporation of prostate MRI can improve patient risk stratification and treatment modality, including active surveillance and monitoring. Prostate MRI has a role in discontinuation of patients in active surveillance due to changes detected in tumor size and characteristics [157]. Together, new technological advances in radiological modalities and incorporation of individualized genetic profiling will help identify patients in different risk stratification and promote a better plan of management of their disease, with possible prevention of overtreatment and its consequential risks and side effects.
Nonetheless, there are open challenges for developing risk prediction scores based on gene expression. As Sboner et al. [155 ]and coworkers pointed out, the performance of predictive models based on molecular profiling are not as good as those using clinical variables only; in other words, the prostate-specific antigen level and the Gleason score are still the best indicator and predictor of PCa progression as compared to scores based on gene expression profiles. However, PCa is a highly heterogeneous cancer; even the Gleason score is subject to variation within an individual tissue sample. It is likely that a combination of clinical, gene expression, and other -omics domains will help explain the remaining variance. Furthermore, the end points of interests for PCa gene expression studies are multiple: there are tissue-based case-control studies, and there are clinical outcomes including severity, recurrence, metastasis, and survival. Even tissue-based studies are not necessarily optimal for inferring (early) predictors of PCa onset. On the other hand, several genome-wide association studies looked at single nucleotide polymorphisms (SNPs) and identified risk scores based on SNPs comparing men who were diagnosed with PCa versus those who were not [158,159,160]. It will be useful to study the link between risk SNPs or loci and gene expression to find out if risk SNPs for incidence relate to gene expression candidates for clinical outcomes such as survival, and help establish a complete collection of biomarkers to make diagnostic and prognostic predictions. Expression quantitative trait loci studies have attempted to address this issue; however, there are a very limited number of such studies because overlapping information on SNPs, gene expression, and long-term outcomes are required. In a recent expression quantitative trait loci study, researchers examined 47 risk SNPs in a large case-control setting - only 8 of them were associated with PCa-specific mortality with relatively small magnitudes of the associations [160]. Thus, further investigations are still needed in this area.
Finally, it has been well established that there are great differences in risk of PCa among different races or ethnicity groups, and a few gene expression studies have observed discrepancies in race. For example, Hernandez et al. [59] showed that IGF-1 and IGFBP-3 were not associated with PCa in the African-American population, while other studies have shown the opposite in the Caucasian population. The number of studies that examined gene expression patterns among different races or ethnicity groups is limited, and future studies could yield valuable information on the racial disparities of PCa.
Conclusion
The last 15 years of research in gene expression in PCa have brought a lot of data and information that has been decoded only in part. Combining laboratory and clinical variables, including the addition of MRI technology, with novel predictors found in different -omics domains may help in covering the residual unexplained variance and differences not only at the racial/ethnicity level, but also at the individual and specific outcome level. Advancements in sequencing technology and big data analytics will probably increase the amount of data generated in the next years, but will also allow accurate meta-analyses and apply more complex data mining techniques to identify biomarkers for early diagnosis of PCa and personalized prediction of disease progression.