Human Papillomaviruses: Genetic Basis of CarcinogenicityBurk R.D.a, b · Chen Z.b · Van Doorslaer K.b
aDepartment of Pediatrics, Division of Genetics, and Departments of Epidemiology and Population Health, Obstetrics, Gynecology and Woman’s Health, and Albert Einstein Cancer Center, and bDepartment of Microbiology and Immunology, Albert Einstein College of Medicine, New York, N.Y., USA Corresponding Author
Persistent infection by specific oncogenic human papillomaviruses (HPVs) is established as the necessary cause of cervix cancer. DNA sequence differences between HPV genomes determine whether an HPV has the potential to cause cancer. Of the more than 100 HPV genotypes characterized at the genetic level, at least 15 are associated, to varying degrees, with cervical cancer. Classification based on nucleotide similarity places nearly all HPVs that infect the cervicovaginal area within the α-PV genus. Within this genus, phylogenetic trees inferred from the entire viral genome cluster all cancer-causing types together, suggesting the existence of a common ancestor for the oncogenic HPVs. However, in separate trees built from the early open reading frames (ORFs; i.e. E1, E2, E6, E7) or the late ORFs (i.e. L1, L2), the carcinogenic potential sorts with the early region of the genome, but not the late region. Thus, genetic differences within the early region specify the pathogenic potential of α-HPV infections. Since the HPV genomes are monophyletic and sites are highly correlated across the genome, diagnosis of oncogenic types and non-oncogenic types can be accomplished using any region across the genome. Here we review our current understanding of the evolutionary history of the oncogenic HPVs, in particular, we focus on the importance of viral genome heterogeneity and discuss the genetic basis for the oncogenic phenotype in some but not all α-PVs.
© 2009 S. Karger AG, Basel
Although an infectious cause of genital warts was suspected in ancient times, interest in ‘wart virus’ research was only galvanized by the suggestion that human papillomavirus (HPV) was the long-sought sexually transmitted etiological agent of cervical cancer. In 2009, Prof. Harold zur Hausen was awarded the Nobel Prize for this innovative idea  and demonstrating HPV genomes in cervical cancer tissues [2, 3]. Confluence of idea and technology was enabled by recombinant DNA methods, the cloning of HPV genomes  and the use of molecular hybridization. This quantum advance was critical, since standard virologic methods such as serology were not readily available for HPV molecular epidemiological investigations. The free and widespread distribution of cloned HPV genomes by the Heidelberg group and the commencement of an annual international papillomavirus conference accelerated discovery and fostered a collaborative culture within the PV scientific community. Breakthroughs in understanding the molecular pathogenesis have and continue to revolutionize the screening, diagnosis, treatment and prevention of HPV-associated diseases. From a public health viewpoint, HPV has become the model for molecular medicine and how technology can be readily applied to global health problems.
Cervical cancer accounts for nearly 12% of all female cancers and is the second most prevalent malignancy in women worldwide [5, 6]. Recent molecular-epidemiologic studies have demonstrated the strong association between persistent infection by specific oncogenic type (OT) HPVs and cervix precancerous and cancerous lesions . Thus, differences in pathogenesis of HPV OTs compared to non-oncogenic types (NOTs) are based on DNA sequence changes that have emerged over millions of years of evolution. The key to understanding the molecular basis of OT HPV carcinogenicity is realizing that the biological driving force has been evolution of specific HPVs into discreet host ecosystems, such as the epithelium from the cervix, vagina, external genital skin or skin covering other anatomic surfaces. Each bodily ecosystem has characteristics that allow adapted HPVs some type of competitive advantage to infect, replicate and transmit. Nonetheless, there is overlap in the spectrum of HPV types found in each ecosystem and HPV-16 stands out as having the most pleiotropic pathogenic phenotype (e.g. HPV-16 causes both cervix and oropharyngeal cancers).
HPVs are a heterogeneous group of viruses with circular double-stranded DNA genomes about 8,000 nucleotides in size. Figure 1 displays the characteristic HPV genome organization using the most medically important HPV-16 as the model. All human papillomavirus genomes include 3 general regions: (1) an upstream regulatory region (URR), which contains sequences that control viral transcription and replication; (2) an early region, which contains open reading frames (ORFs; e.g. E1, E2, E4, E5, E6 and E7) involved in multiple functions including trans-activation of transcription, transformation, replication, and viral adaptation to different cellular milieus, and (3) a late region, which codes for the L1 and L2 capsid proteins which form the structure of the virion and facilitate viral DNA packaging and maturation. All PVs described to date contain an E1, E2, E4, L1, L2 and some E6/E7-like functions.
|Fig. 1. The structure and organization of an HPV genome (adapted from ). A schematic picture of a representative α-HPV genome is shown. The reference HPV-16 genome is a circular, double-stranded 7,908 bp molecule. The 3 major regions of the genome contain the upstream regulatory region (URR), the early (E) gene region, and the late (L) gene region. Also shown is the region of the genome conserved in most cancers with integrated HPV genomes.|
Because of the lack of an in vitro culturing system or a xenotropic host model for HPV growth, a robust serologic classification system is not possible. Thus, HPVs have always been classified based on their DNA sequence heterogeneity. Prior to the advent of readily available DNA sequencing methods, DNA annealing studies (cot curves) were used to differentiate unique HPV types. Because the L1 ORF is highly conserved within the viral genome, similarity across this ORF was adapted as the basis for classification of HPVs at the ‘Nomenclature of Papillomaviruses’ meeting held at the 14th International Papillomavirus Conference in Quebec, July 1995. Historically, the distinction between subtypes and variants of an HPV type was based on different restriction endonuclease cleavage patterns without clear quantitative differences . A discussion of types, subtypes and variants is presented below based on current understanding of HPV genetic heterogeneity.
A distinct HPV ‘type’ is assigned when the entire genome is cloned and the complete nucleotide sequence of its L1 gene differs from that of any other types by at least 10% . Heidelberg has served as the reference center for all HPVs, confirming the sequence of submitted novel types and designating official type status. More than 100 types of HPV have been identified, of which approximately 60 infect the genital tract. However, many potential novel HPV types were initially identified through the sequencing of small DNA fragments amplified from clinical samples and were given clinical or temporary names. This has created some confusion in the field when comparison between different or older studies is desired. Table 1 presents a summary of recent, officially designated HPV types with their aliases.
|Table 1. Previous names for α (mucosal/genital) HPV types|
Based on L1 nucleotide sequences, phylogenetic analyses have clustered nearly all HPV types isolated from cervicovaginal tissues into the α-PV genus, with further grouping into species, α1–α15 . HPV types that cluster together in a species tend to share common characteristics, such as tissue tropism and oncogenic potential. To investigate the evolutionary history of the HPVs, a phylogenetic tree based on the concatenated nucleotide and amino acid sequences of 6 ORFs (E6, E7, E1, E2, L2 and L1) was generated from all the available α-PV sequences (fig. 2). Based on this tree, 3 ancestral PVs are responsible for the current heterogeneous group of genital HPV genomes [10, 11]. The tree separates the α-HPV genus into 3 major groups: (1) a low-risk 1 (LR1) or not oncogenic type 1 (NOT1) group (α1, α8, α10 and α13); (2) a LR2/NOT2 group (α2, α3, α4 and α15), and (3) high-risk (HR) or OT group (α5, α6, α7, α9 and α11). HPV types within the LR1/NOT1 group have been more commonly found in benign genital warts and oral/laryngeal lesions, and the LR2/NOT2 group have been preferentially identified in samples from benign vaginal exfoliated cells . Interestingly, established carcinogenic types (as defined by the International Agency for Research on Cancer case-control data; α5, α6, α7 and α9) are derived from a common ancestor using the full genome or early region, revealing a strong concordance of HPV evolutionary grouping with risks of persistence and progression to cervical intra-epithelial neoplasia (CIN) grade 3/cancer [10, 13, 14].
|Fig. 2. Phylogenetic tree of the mucosal/genital α-HPVs. A ‘total evidence’ phylogenetic tree was inferred from maximum parsimony, neighbor joining and Bayesian methods (for details see ). The tree shown is from the Bayesian analysis inferred from alignment of protein and nucleotide sequences of 6 concatenated ORFs (E6, E7, E1, E2, L2 and L1). Bovine PV type 1 was used as the out-group taxa. The numbers to the right represent the species group (e.g. ‘α9’ contains HPVs 16, 31, 35, 58, 33, 67 and 52). At least 3 ancestral papillomaviruses are responsible for the current heterogeneous groups of genital HPV genomes including LR1/NOT1 (α10, α8, α1 and α13), LR2/NOT2 (α2, α3, α4, α15) and HR/OT (α5, α6, α7, α9 and α11), the later joined by bold lines represents the clade that contains all known HPV types associated with cervix cancer. HR = High-risk; OT = oncogenic type; LR = low-risk; NOT = non-oncogenic type.|
Although the L1-based classification of (genital) HPVs correctly groups HPV types into species, the overall topology of the HPV tree is not identical to the α-HPV tree using complete genome sequences. This represents incongruence between phylogenetic trees inferred from different ORFs/regions [11, 15]. When phylogenetic trees are based solely on the early genes or the whole genome, all OT HPVs (α5, α6, α7, α9 and α11) cluster into 1 monophyletic group as shown in figure 3 (tree on the left side). However, in a tree inferred from the late genes (e.g. using the L1ORF) the α9 and α11 species no longer sort with the α5, α6 and α7 species (fig. 3, tree on the right side) . Nevertheless, the clear implication is that the tree generated with the early region sorts the α-HPVs associated with cancer into a biologically coherent group. This provides a clue to which domain is associated with pathogenicity and suggests that the genetic basis of oncogenicity resides in the early region. Furthermore, recent analyses have shown phylogenetic incongruence between the early and the late genes for HPV types within the same species (Chen et al., unpubl. data), favoring niche adaptation over recombination as the driving force. No evidence of recent recombination between α-HPVs has been reported.
|Fig. 3. Trees from early and late genes show phylogenetic incongruence. Phylogenetic trees were inferred using Bayesian methods . The early gene tree (left) was calculated from E1, E2, E6 and E7 concatenated nucleotide alignments, while the late gene tree (right) was derived from combined L1 and L2 nucleotide sequence data. The α-papillomavirus group designations are shown on their respective leaf branches adjacent to the name of the HPV type as shown in the center. All types within the HR/OT clade are shown and representative viruses were chosen from each of the other α-HPV species groups.|
Clinically, HPV types have been isolated from virtually all cervical cancers; nevertheless, the vast majority of papillomaviruses types are NOTs, but merely induce benign tumors or cause no detectable lesion. HPV types that are found preferentially in cervical and other anogenital cancers have been designated as HR or OTs. Epidemiologic data from a wide range of countries and ethnic groups has identified at least 15 HPV OTs (HPVs 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 68, 73 and 82), while another 2 types (HPVs 26 and 66) should be considered probable HR types; 13 HPVs are classified as NOTs or LR types based empirically on case-control studies (HPVs 6, 11, 40, 42, 43, 44, 53, 54, 61, 70, 72, 81 and 89) [13, 14, 16, 17]. It should be noted that there is some controversy about the oncogenic potential of some of these latter types that cluster with the OTs. Although a few are extremely common (e.g. HPV-53), they are essentially never detected as single types in well-isolated cervical cancer tissues. Lastly, there are types within the OT clade that are so rare or poorly amplified by the existing PCR systems used in epidemiological studies that their biological behavior remains unknown (HPVs 30, 34, 67, 69, 85, 97).
HPV-16 and HPV-18 account for approximately 50 and 20% of cervical cancer cases worldwide, respectively . Interestingly, the distributions of HPVs 16 and 18 detected in cervical adenocarcinomas and squamous cell carcinomas differ. HPV-16 is the type most frequently involved in the development of squamous cell carcinoma of the cervix (>50% of these cancers are HPV-16 positive), whereas both HPV-16 and HPV-18 play a prominent role in the development of adenocarcinomas of the cervix [18,19,20]. This suggests different viral genetic determinants for these 2 histological subgroups of cervical cancer. In addition to persistent infection with HPV OTs, host factors such as multiparity, smoking status, HLA type and gene polymorphisms particularly in antigen- processing proteins have been suggested as independent cofactors for high-grade CIN and cancer [14, 21].
The term ‘subtype’ was categorically defined for convenience as a PV genome whose L1 nucleotide sequence shares between 90 and 98% sequence similarity with its closest neighbor . To date, 5 genital α-HPV types have variants meeting the definition of subtypes, and include HPVs 34, 44, 54, 68 and 82 (see table 1 for alias names). Some of these subtypes differ from their closest relative by as much as 6–8% [22,23,24]. At this point, it is unclear whether subtype is a useful designation to describe type variants exceeding 2% differences. The category of subtype should probably be considered part of the variant spectrum, since to date no important biological differences have been described for subtypes. Nevertheless, it seems clear that subtypes represent a continuum of viral evolution and the beginnings of type speciation.
Viral isolates are referred to as ‘variants’ when the nucleotide sequences of their L1 genes differ by usually less than 2%. Sequencing the URR region has frequently been used to classify intra-typic diversity and variant lineages, because of increased sequence differences in this domain of the HPV genome. Alternatively, the early-late intergenic non-coding regions (i.e. between the E2 and E5 and the E4 and E5 ORFs) also provide segments of the viral genome showing increased genetic variability and make excellent targets for identifying variant lineages [25, 26].
Variant HPV genomes of a specific type often cluster according to the human ethnic group or specific part of the world from which they were isolated. This has been shown elegantly for HPV-16 and HPV-18 [27, 28]. The phylogeny is reflective of the spread of Homo sapiens across the globe and suggests that HPV variant lineages have diverged at least in part through genetic drift. It is noteworthy that intra-typic evolutionary trees of HPV-16 and HPV-18 variants were initially inferred from partial URR sequences. Currently, there is no consensus on naming variant lineages. Figure 4 shows a phylogeny of HPV-16 variants inferred from complete genomes, with graphic representation of the difference between the aligned complete genomes and the L1 ORFs (fig. 4b). The viral genome or L1 ORF for each isolate is compared to each of the other isolates and the differences are plotted. An isolate compared to itself always results in 0% difference. For example, the HPV-16 non-European lineages [Af-1 (African 1), Af-2 and As-Am (Asian-American)] differ by 0.3–0.9% within their L1 ORF, whereas the differences increased to 1.1–1.5% when whole genomes were used (p < 0.001) . Similarly, the differences between HPV-18 non-Af and Af variants increased with the use of sequence data from whole genomes . However, the previously termed E (European) and AA (Asian-American) HPV-18 variants are more similar to each other, even based on the whole genome, and should probably be re-classified to constitute a single group or clade. Whereas the use of the URR is probably sufficient to classify variants within a lineage, more recent data suggests that maximum sequence information is required for the taxonomic classification of viral lineages. In addition, analyses of whole genomes allows for unprecedented precision in detailing sequence-level changes that are of potential evolutionary importance, as indicated by investigation of Darwinian selection on the HPV-16 E5 and E6 ORFs [25, 29].
|Fig. 4. Phylogeny and sequence dissimilarity plots of HPV-16 variants. a A phylogenetic tree indicating the intratypic relationships of HPV-16. b The nucleotide sequence dissimilarities of HPV16 variants inferred from the complete genomes and the L1. The tree was inferred from the concatenated amino acids and nucleotide sequences of 8 ORFs (E1, E2, E4, E5, E6, E7, L1 and L2). For the dissimilarity plots, each HPV-16 variant sequence was analyzed against all other sequences and plotted. The open circles represent HPV-16 European (E) variants; triangles represent non-European variants consisting of African-1 (Af-1), African-2 (Af-2) or Asian-American (As-Am). The relationship of the European HPV-16 variants is indicated with a solid line and the non-European HPV-16 variants with a broken line. The European HPV-16 variants show less divergence as a group than the non-European HPV-16 variants. Each type is 0% different to itself.|
Characterization of HPV variants for other types is slowly accumulating. Comparison of isolates of HPVs 6 and 11 , HPVs 2, 27 and 57 , HPVs 44 and 68 , HPVs 53, 56 and 66 , HPVs 31, 35, 52 and 58 [33, 34], and some less prevalent HPV types  have confirmed that most types have limited degrees of genomic diversity consistent with a similar time frame of evolution. Of particular interest is the association between specific HPV types and sequence variants and their potential impact on viral fitness and pathogenicity. Without easily manipulated vegetative viral infectivity systems, epidemiological data on incidence, prevalence and persistence can serve as outcome measures of viral fitness.
The observation that the E6 and E7 genes inactivate the p53 and pRb tumor suppressors, respectively, has traditionally been used to explain the differences between HPV OTs and NOTs. HPV-16 E6 proteins have been reported to efficiently degrade p53 by forming a complex with E6-AP [36, 37]. This results in loss of p53-induced growth arrest and apoptosis in response to DNA damage [38, 39]. In addition, several p53-independent pathways (e.g. the activation of telomerase , the postulated inhibition of degradation of SRC-family kinases  and the binding to, and degradation of cellular proteins with a PDZ domain ) are likely to play additional roles in HPV-induced malignant transformation. HPV E7 interacts with and degrades the retinoblastoma tumor suppressor protein pRB and related ‘pocket proteins’ p107 and p130 through a conserved LXCXE motif. This in turn releases the transcription factor E2F from pRb inhibition and upregulates p16INK4A [43, 44]. It has long been suggested that only OT HPV E7 proteins could target pRb; however, it was recently shown that the NOT HPV-11 E7 was equally successful at degrading pRb . In addition, HPV-16 E7 induces transformation of rodent fibroblast cell lines , cooperates with the ras oncogene to transform primary rodent fibroblasts , and in combination with HR E6 can immortalize human keratinocytes . HPV E5 is another oncoprotein that stimulates the transforming activity of the epidermal growth factor receptor . However, E5 seems to be important in the early course of infection. It is unclear what role the remaining HPV early genes (E1, E2 and E4) play during the process of malignant transformation. The 2 structural proteins (L1 and L2) are generally not expressed in precancerous and malignant cells, whereas virus-like particles generated from expression of the L1 ORF induce neutralizing antibodies and are the basis for the currently licensed vaccines.
Further analyses are required to determine which functions, whether based on quantitative or qualitative differences in gene function, underlie the molecular basis of the HPV-associated cancer formation by some but not all HPV types. Many molecular studies investigating the transforming function of HPVs 16 and 18, in particular, have compared the activity of the early ORFs to HPV-11, a non-oncogenic type. However, based on the phylogenic differences of HPV OTs and NOTs (fig. 2) and their different tissue distributions/tropisms, molecular studies may be identifying functions more correlated with evolution than actual carcinogenicity (, R. Burk, pers. commun.).
Progression to cervical cancer occurs in only a small percentage of HPV infected women, since the majority of infected women clear their cervicovaginal infection [51,52,53,54]. A number of cohort studies have shown that incident cervicovaginal HPV is predominantly a self-limited disease with a median duration lasting approximately 8–12 months. One possibility is that specific types of HPV are more persistent than others and the ability to persist is genetically encoded and tied to pathogenicity. To examine whether persistence was the key factor to pathogenicity of HPV types, a random sample population-based study of 10,000 women in Guanacaste, Costa Rica were followed for 5–7 years [10, 55]. HPV was detected by a sensitive PCR assay  and positive samples were genotyped for over 40 individual HPV types . The prospective outcomes through the 5–7-year follow-up were tabulated for groups of women infected with specific types of HPV at baseline (i.e. prevalent infection; fig. 5a). There was no striking difference in HPV prevalence by risk category. HPV-16 was more likely to persist than other types (fig. 5b), as previously reported. However, given persistence, HPV-16 was not much more likely to be associated with high-grade CIN than other OTs (e.g. HPVs 18 and 52). Thus, one aspect of HPV-16 oncogenicity is related to its ability to better persist. When comparing other OTs to NOTs, persistence was not the critical difference (fig. 5b). Several NOTs (e.g. HPVs 54, 61 and 71) persisted just as well as OTs other than HPV-16 (e.g. HPVs 18, 31 and 58). But, given persistence, LR types did not progress to high-grade CIN (fig. 5c).
|Fig. 5. Prevalence (a), persistence (b) and progression (c) in a population-based study. Approximately 10,000 women representing a random population of women from Guanacaste, Costa Rica were followed for 5–7 years in a natural history study of HPV and cervix neoplasia as described [10, 55]. a Prevalence of HPV types at the baseline visit. b Percent persistence of an HPV type detected at baseline. c Percent of persistent infections progressing to CIN3/cancer. HPV types are divided into 3 categories: oncogenic, non-oncogenic and questionable. Since the study was a true population-based cohort, prevalence is shown as number of infections that is proportional to cross-sectional prevalence. The 3 panels demonstrate that oncogenic, non-oncogenic and other types have similar rates of prevalence and persistence, but only oncogenic types, given persistence, progress to CIN3/cancer. This data provides evidence that it is not simply persistence that determines pathogenicity, but persistence with an oncogenic HPV that is critical.|
These data demonstrate that the genome of HPV-16 is unique in its viral fitness as measured by its high prevalence, and it is especially pathogenic given its increased rate of persistence and ability to promote progression of infections to high-grade lesions. Nevertheless, other oncogenic types have significant oncogenic potential given persistence. Understanding the viral genetic determinants of persistence and progression should lead to a better understanding of the mechanisms of pathogenesis. Perhaps there will be common mechanisms of persistence utilized by both OTs and NOTs, since they persist with similar efficiencies, with HPV-16 having some unique features. In summary, it is persistence of an oncogenic type that is the key risk factor for high-grade disease and cancer. Thus, even for a virus with a small genome, the determinants of oncogenicity are complex.
HPVs are a major cause of morbidity and mortality. Characterization and classification of the large group of HPV types contributing to disease has provided important molecular tools for the medical community, resulting in novel diagnostic, screening and prevention strategies. Current studies demonstrate a viral genetic basis of pathogenicity derived from evolution of a common ancestor of all oncogenic HPV types. Nevertheless, understanding the exact genetic basis of HPV oncogenicity is highly complex and will require innovative analytic methods. Study of α-HPV genomics can serve as a model for non-recombinant genome evolution, genetic determinants of pathogenicity and application of genomics for therapeutics.
We acknowledge the members of the Burk lab that have contributed over the years to the research efforts of the group and to the contributions of all our colleagues within the Papillomavirus field. This work was supported in part by Public Health Service awards CA78527 from the National Cancer Institute (R.D.B.) and center grants to the Einstein Cancer Research Center and the Center for AIDS Research (CFAR).
Robert D. Burk, MD
Albert Einstein College of Medicine
1300 Morris Park Avenue, Bronx
New York, NY 10461 (USA)
Tel. +1 718 430 3720, Fax +1 718 430 8975, E-Mail email@example.com
Published online: August 11, 2009
Number of Print Pages : 10
Number of Figures : 5, Number of Tables : 1, Number of References : 57
Public Health Genomics
Vol. 12, No. 5-6, Year 2009 (Cover Date: August 2009)
Journal Editor: Knoppers B.M. (Montreal, Que.), Brand A. (Maastricht), Burke W. (Seattle, Wash.), Khoury M.J. (Atlanta, Ga.)
ISSN: 1662-4246 (Print), eISSN: 1662-8063 (Online)
For additional information: http://www.karger.com/PHG