For Manuscript Submission, Check or Review Login please go to Submission Websites List.
For the academic login, please select your country in the dropdown list. You will be redirected to verify your credentials.
Gene Expression Profiles of Non-Small Cell Lung Cancer: Survival Prediction and New BiomarkersVälk K.a, d · Vooder T.a, d, e · Kolde R.b · Reintam M.-A.f · Petzold C.g · Vilo J.b · Metspalu A.a, c, d
Institutes of aMolecular and Cell Biology and bComputer Science, cEstonian Genome Center, and dEstonian Biocentre, University of Tartu, and eClinic of Cardiovascular and Thoracic Surgery and fDepartment of Pathology, Tartu University Hospital, Tartu, Estonia; gCenter for Regenerative Therapies Dresden, Dresden, Germany Corresponding Author
Riia St. 23
EE–51010 Tartu (Estonia)
Tel. +372 737 5029, Fax +372 742 0286
Objectives: Despite the well-defined histological types of non-small cell lung cancer (NSCLC), a given stage is often associated with wide-ranging survival rates and treatment outcomes. This disparity has led to an increased demand for the discovery and identification of new informative biomarkers. Methods: In the current study, we screened 81 NSCLC samples using Illumina® whole-genome gene expression microarrays in an effort to identify differentially expressed genes and new NSCLC biomarkers. Results: We identified novel genes whose expression was upregulated in NSCLC, including SPAG5, POLH, KIF23, and RAD54L, which are associated with mitotic spindle formation, DNA repair, chromosome segregation, and dsDNA break repair, respectively. We also identified several novel genes whose expression was downregulated in NSCLC, including SGCG, NLRC4, MMRN1, and SFTPD, which are involved in extracellular matrix formation, apoptosis, blood vessel leakage, and inflammation, respectively. We found a significant correlation between RNA degradation and survival in adenocarcinoma cases. Conclusions: Even though the follow-up time was too limited to draw final conclusions, we were able to show better prediction p values in a group selection based on molecular profiles compared to histology. The current study also uncovered new candidate biomarker genes that are likely to be involved in diverse processes associated with NSCLC development.
© 2011 S. Karger AG, Basel
Lung cancer has the highest mortality rate among all types of cancers, resulting in more annual deaths than breast, prostate, colorectal, and pancreatic cancers combined .
Conventional diagnosis of lung cancer is currently based on tumor histology. The vast majority of lung cancers, approximately 80%, are non-small cell lung cancers (NSCLCs). The main histological types of NSCLC include adenocarcinoma, squamous cell carcinoma, and large cell carcinoma. Despite the well-defined histological types of NSCLC, patient survival and treatment outcomes can differ substantially, even among NSCLCs of the same histological type and stage . Lung cancer morphology exhibits a broad spectrum and many tumors are atypical or lack the morphologic features necessary for improved differential diagnoses. Lung cancer diagnoses based solely on morphological features are in many cases insufficient .
In recent years, there has been an expanding interest in the molecular genetic classification of cancers. Gene expression profiling is one of the methods used for this purpose. The data obtained from gene expression analyses may allow the differentiation of cancer subgroups based on molecular phenotypes. Subgroups determined by gene expression in combination with clinical features like spread reflect the cancer’s biology and progression better than histology alone .
Many genetic alterations at both the DNA and RNA levels have been shown to be associated with the development and progression of lung cancer. However, the precise underlying molecular mechanisms associated with NSCLC remain elusive in most cases. In the current study, we analyzed NSCLC specimens using Illumina® whole-genome gene expression microarrays and discovered novel biomarkers of NSCLC.
Initially, 146 patients diagnosed with NSCLC underwent surgery at the Clinic of Cardiovascular and Thoracic Surgery of Tartu University Hospital, Estonia. All patients gave their written informed consent to participate in the survey and to allow their biological samples to be genetically analyzed. The Ethics Review Committee on Human Research of the University of Tartu approved the current study. A patient questionnaire, as well as the approval to participate in the study and to provide control and tumor samples were obtained from all subjects. Tumor specimens and control samples were collected during surgery from November 28, 2002 to December 31, 2006, and the departmental pathologist promptly examined the specimens. Tumor histology and stage were estimated according to World Health Organization (WHO)  and TNM (tumour, node, and metastasis) staging according to UICC (International Union Against Cancer) classifications . The same pathologist performed the histological classifications. Control samples were obtained from the same cancer patient at a site distant from the tumor and were approved as control samples by the pathologist. After the final diagnosis, a substantial number of patients were excluded from the gene expression study. The exclusion criteria and the number of patients excluded from the study are shown in table 1. From the initial cohort of 146 samples, patients who had received preoperative chemoradiotherapy and patients whose final histology did not show NSCLC were excluded.
After the exclusion of patients based on their medical records and final histology, 131 samples were available for the RNA extraction. When necessary, the RNA was extracted twice to meet the predefined RNA integrity number (RIN) cutoff value of ≥7. Due to rapid RNA degradation among many of the cancer specimens, an additional 50 samples were excluded (the overall RNA recovery rate was 61.8%). After the RIN cutoff value was applied, a total of 81 samples were available for the gene expression experiments.
The age of the NSCLC patients enrolled in the gene expression study ranged from 38 to 81 years (mean 65.8). The gender distribution was 9 females and 72 males (11 and 89%, respectively). According to the histology, there were 13 bronchioloalveolar carcinomas (a subtype of adenocarcinoma), 8 adenocarcinomas, and 60 squamous cell carcinomas (16, 10, and 74%, respectively). Detailed characteristics of the patients and tumors analyzed are provided in table 2.
Postsurgical tissue specimens were immediately cut to an appropriate size (maximum 1 cm3) and submerged in liquid nitrogen to inhibit RNA degradation. The samples were stored at –80°C until further processing. If necessary, the tissue samples were cut once more to meet the requirements for RNAlater®-ICE (catalog No. AM7030; Ambion) treatment before RNA extraction.
Total cellular RNA from tissue specimens of 50 mg was extracted and purified using a Ribopure Kit (catalog No. AM1924; Ambion) according to the manufacturer’s instructions. For tissue disruption, an IKA Ultra-Turrax T8 homogenizer was used. RNA quantity was assessed using a NanoDrop 1000 spectrophotometer, and Agilent Bioanalyzer lab-on-a-chip technology (Agilent RNA 6000 Nano Kit, catalog No. 5067-1511) was used to estimate RNA integrity. An RIN cutoff value of 7 was applied. An Illumina TotalPrep RNA Amplification Kit (catalog No. AMIL1971; Ambion) was used for RNA amplification and labeling. Amplifications were carried out according to the manufacturer’s instructions using 300 ng of total RNA as a template.
We used an Illumina BeadChip platform and corresponding Human-6 Expression Whole-Genome arrays containing more than 48,000 transcript probes for the microarray gene expression analysis. Experiments were carried out according to the manufacturer’s instructions, and 1.5 µg of amplified cRNA was hybridized per single array. Slides were scanned immediately after the experiment using default settings, with the exception of the ‘factor’ setting which was set to 2.5. Illumina internal controls and BeadStudio software were used for data consistency and quality control of the hybridization data.
The expression data was quantile normalized and log transformed prior to the analysis to eliminate systematic differences between the chips, thereby standardizing the expression value distribution on all of the arrays. The aim of the analysis was to identify differentially expressed genes between NSCLC and control samples as well as within various types of cancerous tissues (e.g. lung adenocarcinoma vs. epidermoid cancer).
Differential gene expression analysis was performed using t tests with an empirical Bayes correction from the Bioconductor Limma package . We used the Bonferroni correction for multiple testing and a significance level of α = 0.05 in all comparisons. Gene Ontology (GO) enrichments were calculated using a g:Profiler web toolkit . The statistically significant differentially expressed genes were clustered hierarchically using correlation distance and were visualized using a heatmap. In addition to the statistical parameters described previously, we used a minimum of a 2-fold change of expression to reduce the number of differentially expressed NSCLC-specific genes for further analysis.
To distinguish between squamous cell lung cancer and adenocarcinoma and its subtype bronchioloalveolar carcinoma of the lung, gene selection was carried out using analysis of variance. Our overall goal was to identify genes that were differentially expressed between the 2 different types of lung cancer. The genes were selected using an F test, which shows how much of the expression is determined by a given histology. The F test p values were corrected for multiple testing using a false discovery rate and an empirical Bayes procedure provided by the Limma package of R software from Bioconductor. A significance level of 0.001 was applied, which resulted in the identification of 97 genes for further analysis.
Among the biomarkers that passed the filter, there were several recently described lung cancer-related genes, such as SERPINB3, CKS1B, PRAME, CDH2, CXCL6, TPX2, and MAGEA3 [9,10,11,12,13,14,15]. We also discovered a large number of novel NSCLC-associated biomarkers that had not been previously published as lung cancer associated, although some have reported roles in the development of other forms of cancer, including BUB1B, CRABP2, and TNFRSF18. To better visualize the results of this analysis, agglomerative hierarchical gene expression clustering was performed. Furthermore, the gene expression profiles of the control samples were included in the heat map (fig. 1).
We were able to distinguish most of the lung adenocarcinomas from the squamous cell lung cancers. Interestingly, the gene expression profile of the control samples was much more similar to lung adenocarcinomas than to squamous cell lung cancers.
GO analysis of statistically significant up- and downregulated NSCLC-associated genes (1,103 and 672, respectively; p ≤ 0.05 after Bonferroni’s correction, fold change ≥2) suggested the involvement of wide-ranging cancer-related processes, including cell proliferation, tissue signaling, tissue connections, overall metabolism, and cell cycle regulation (table 3).
GO analysis of upregulated genes showed the activation of cell cycle regulation, mitosis, cell division, and DNA metabolism, suggesting a direct link between uncontrolled cell proliferation in NSCLC. Genes associated with cell adhesion, cell differentiation, inflammation, response to wounding, and defense responses were downregulated in cancer specimens. GO analysis of genes downregulated in NSCLC showed the involvement of much more diverse processes, including anatomical structure development, wound healing, and cell adhesion.
We identified 997 statistically significant differentially expressed transcripts through our comparison of NSCLC versus control samples (326 upregulated genes and 671 downregulated genes). To visualize the expression profiles, we performed a hierarchical cluster analysis (fig. 2). According to the gene expression profiles, we identified 2 distinct cancer groups (referred to herein as group 1 and group 2). Based on their expression profiles, group 2 (blue) was more similar to the control samples and had a favorable survival prognosis. Although all of the control samples were clustered together, the different histologies were aligned randomly. According to our analysis, there was no clear association between the NSCLC gene expression profiles and NSCLC stage, smoking cessation, or patient gender.
The analysis of differentially expressed genes revealed a large number of previously described NSCLC-associated genes (online suppl. table 1; for supplementary material, see www.karger.com?doi=10.1159/000322116) as well as several potentially novel biomarkers of the disease. The group of novel upregulated genes associated with NSCLC included SPAG5, POLQ, KIF23, RAD54L, RAB26, and ARHGEF19, as well as 4 additional previously uncharacterized open reading frames. The group of novel downregulated genes associated with NSCLC included SGCG, NLRC4, VAPA, SFTPA1B, MMRN1, SFTPD, SELPLG, and PCDH17 (table 4). The analysis of the gene expression profiles of these novel genes revealed no association with the histological classification of the NSCLC.
Sperm-associated antigen 5 (also known as SPAG5, MAP126, and DEEPEST) encodes a protein associated with the functional and dynamic regulation of mitotic spindles . It has been shown that the silencing of SPAG5 by RNAi in vitro results in growth arrest and the disruption of multipolar spindles, eventually leading to apoptosis . Recent studies have also shown an interaction between SPAG5 with p53  and kinetochores , showing that SPAG5 may play a key role in cell cycle regulation. Taken together with previous reports of SPAG5, our study found that SPAG5 is upregulated in NSCLC samples, suggesting that SPAG5 may be a new NSCLC biomarker with a role in cell cycle regulation and mitotic spindle organization.
Polymerase theta (also known as POLH, PRO0327, DKFZp781A0112, and POLQ) encodes a member of the Y family of specialized DNA polymerases. It copies undamaged DNA with a lower fidelity than other DNA-directed polymerases . However, it accurately replicates UV-damaged DNA, and when thymine dimers are present, POLH incorporates complementary nucleotides into the newly synthesized DNA strand, thereby bypassing the lesion and suppressing the mutagenic effect of UV-induced DNA damage . POLH is thought to be involved in hypermutation during immunoglobulin class switch recombination. Mutations in POLH result in xeroderma pigmentosum, variant type, which is a variant form of xeroderma pigmentosum [22,23].
Although we are not aware of previous reports of POLH upregulation in cancer, one could hypothesize that there is a role for POLH in DNA metabolism, particularly under the extreme conditions present in fast-replicating cancer cells. Whether POLH upregulation in NSCLC may be induced by tobacco smoke carcinogens and whether it may be responsible for the accumulation or elimination of mutations requires further analysis. Because previous studies suggest that POLH expression could be an important determinant of the cellular response to cisplatin , we propose that POLH should be added to the cisplatin marker set.
Kinesin family member 23 (also known as KIF23, CHO1, KNSL5, MKLP1, and MKLP-1) is a member of the kinesin-like protein family, which includes microtubule-dependent molecular motors that transport organelles within cells and move chromosomes during cell division . KIF23 has been shown to cross-bridge antiparallel microtubules and drive microtubule movement in vitro. KIF23 interacts with Aurora family protein kinases, thereby regulating cell division and ensuring proper late-stage cytokinesis . Because Aurora family proteins are known to be upregulated in various cancers and KIF23 was identified as an upregulated gene in the current study, we propose that KIF23 may play a role in NSCLC development.
The RAD54-like gene (also known as RAD54L, HR54, hHR54, RAD54A, and hRAD54) belongs to the superfamily of DEAD-like helicases and is proposed to be associated with homologous recombination and dsDNA break repair . The binding of RAD54L to dsDNA induces a DNA topological change which stimulates DNA recombination. Several mutations and polymorphisms in RAD54L have been associated with an elevated cancer risk and poor survival [28,29]. However, little is known about RAD54L expression in cancer. We propose that RAD54L upregulation could be a natural cellular response to increased mutation rates in cancerous tissue.
We identified sarcoglycan gamma (also known as SGCG, DMDA, SCG3, TYPE, DAGA4, DMDA1, LGMD2C, and SCARMD2) as a gene that is downregulated in NSCLC. SGCG encodes 1 of several sarcolemmal transmembrane glycoproteins that interact with dystrophin, thereby providing a structural link between the subsarcolemmal cytoskeleton and the extracellular matrix of muscle cells. Defects in the SGCG-encoded protein and downregulation of the SGCG gene can lead to the early onset of autosomal recessive muscular dystrophy, particularly limb-girdle muscular dystrophy type 2C . Although we did not find any evidence in the literature of an association between SGCG and cancer, it is well known that both the extracellular matrix and the cytoskeleton are modified during cancer progression and metastasis. The downregulation and association of SGCG with NSCLC progression and metastatic potential require follow-up studies.
NLR family CARD domain-containing 4 (also known as NLRC4, CLAN, IPAF, and CARD12) is a gene acting between TP53 and caspase I that regulates cell cycle arrest, apoptosis, and DNA repair . NLRC4 is also proposed to induce apoptosis via NF-ĸB signaling pathways . NLRC4 has a caspase recruitment domain (CARD) through which it assembles into apoptosis and NF-ĸB signaling complexes. The NLRC4 downregulation found in the current study should be monitored in subsequent studies as it is involved in several ways in the process of cancer initiation and progression.
Multimerin 1 (MMRN1), which we identified as a downregulated gene in NSCLC, is a massive, soluble protein found in platelets and in the endothelium of blood vessels. Multimerin is a factor V/Va-binding protein and may function as a carrier protein for platelet factor V . Previous studies have demonstrated that MMRN1 interacts with factor V as well as with activated factor V with a high affinity, thereby inhibiting thrombin generation . It has been suggested that MMRN1 could function as an adhesive ligand that promotes platelet adhesion at sites of vascular injury . The downregulation of MMRN1 may contribute to leakage and the poor repair of blood vessels in cancer tissue, thereby allowing access of oxygen and nutrients to the cancer cell.
Surfactant protein D (also known as SFTPD and SP-D) is a member of the collagenous subfamily of calcium-dependent lectins (collectins). SFTPD is expressed in coronary artery smooth muscle cells as well as in pulmonary alveolar type II cells where it modifies and modulates the inflammatory processes and host defense against pathogens [36,37]. Recently, a study of surfactant protein D expression in a bronchioloalveolar lavage from a smoker revealed an association between reduced SFTPD expression and the progression of bronchial dysplasia . From these limited but intriguing findings, we hypothesize that a decreased expression of SFTPD could be one of the early events leading to uncontrolled inflammation and cancer development.
These novel genes warrant further study in order to determine their NSCLC biomarker status, and to investigate their role in the initiation and progression of NSCLC.
As one of the key questions of the molecular diagnostics of cancer is survival prediction, we performed an analysis in which Kaplan-Meier diagrams were drawn based on gene expression profiles (group 1 and group 2) and histology. Although the time between the initial surgical resection and the survival analysis was limited to less than 7 years and the results of the analysis did not achieve statistical significance (p = 0.0691 for expression-based groups and p = 0.0198 for histology-based groups), we were able to show enhanced predictive p values in a group that was selected based on gene expression profiles (fig. 3).
To investigate the hypothesis that RNA degradation in NSCLC specimens is associated with the prognosis of the disease, we performed a survival analysis. We did not identify a statistically significant association between NSCLC survival and RNA integrity status when comparing the lung cancer types as a whole. However, we did identify an association between RNA degradation and survival among patients with lung adenocarcinoma. The comparison of the survival of groups representing the intact RNA in the cancer sample and the degraded RNA showed a statistically significant distinction (p = 0.0474) indicating a much better survival in adenocarcinoma patients with intact RNA in cancer tissues (fig. 4).
Although the prevalence and mortality of NSCLC are among the highest worldwide and the disease has been studied extensively, there is still a need for more accurate prognostic and diagnostic markers as well as for information about the underlying molecular events in the disease’s development. In the current gene expression study, we performed several analyses of NSCLC, including GO, clustering, new biomarker discovery, a comparison of survival based on histology and gene expression profiles, and analysis of RNA degradation status and patient survival.
In the cluster analyses of different histological subtypes of NSCLC, we could clearly distinguish adenocarcinoma and squamous cell cancer profiles. According to our results, the adenocarcinoma profile is more similar to the control than the squamous cell carcinoma profile. This phenomenon could be explained by the more prevalent cells of alveolar origin in the control tissue samples. In addition, there is evidence that lung adenocarcinoma patients have an increased prediction of survival compared to epidermoid lung cancer patients . Although there are reports of different molecular profiles of adenocarcinomas [40,41], we were not able to identify different adenocarcinoma profiles in our cohort. This might be due to the limited number of adenocarcinoma samples in our study group.
GO analysis of statistically significant upregulated genes in NSCLC uncovered biological processes involved in mitosis and cell cycle regulation, whereas GO analysis of statistically significant downregulated genes in NSCLC uncovered genes involved in much more diverse processes, such as anatomical structure development, wound healing, and cell adhesion.
The cluster analyses of statistically relevant deregulated genes between the NSCLC and control samples revealed 2 molecularly distinct patient groups with mixed histology, stage, and smoking history. The analyses of Kaplan-Meier diagrams based on gene expression profiles (group 1 and group 2) and histology showed enhanced predictive values (p = 0.0198) for gene expression-based comparison.
During the initial phase of the study, we eliminated a substantial number of cancer samples due to RNA degradation. Therefore, we also analyzed the relationship between survival in different histological groups and RNA status. Although we did not identify a statistically significant association between RNA integrity status and lung cancer in general, we did identify a statistically significant association in lung adenocarcinoma, showing that high levels of RNA degradation in patient samples were associated with decreased patient survival.
The novel NSCLC biomarkers brought to light in the current study and their applicability as therapy targets will require further validation with independent datasets in future studies.
This study was supported by Targeted Financing from the Estonian Ministry of Education and Research (SF0180142s08), Estonian Science Foundation grant ETF6465, EU FP7 grant ECOGENE (No. 205419, EBC), and by the EU via a European Regional Development Fund grant to the Centre of Excellence in Genomics, the Estonian Biocentre, and the University of Tartu.
The authors declare no conflicts of interest.
Riia St. 23
EE–51010 Tartu (Estonia)
Tel. +372 737 5029, Fax +372 742 0286
Open Access License: This is an Open Access article licensed under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported license (CC BY-NC) (www.karger.com/OA-license), applicable to the online version of the article only. Distribution permitted for non-commercial purposes only.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.