A High-Resolution 15,000Rad Radiation Hybrid Panel for the Domestic CatBach L.H.a · Gandolfi B.a · Grahn J.C.a · Millon L.V.b · Kent M.S.c · Narfstrom K.d · Cole S.A.e · Mullikin J.C.f · Grahn R.A.a · Lyons L.A.a
aPopulation Health and Reproduction, bVeterinary Genetics Laboratory, and cSurgical and Radiological Sciences, School of Veterinary Medicine, University of California – Davis, Davis, Calif., dVeterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri-Columbia, Columbia, Mo., eDepartment of Genetics, Texas Biomedical Research Institute, San Antonio, Tex., and fComparative Genomics Unit, Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Md., USA
The current genetic and recombination maps of the cat have fewer than 3,000 markers and a resolution limit greater than 1 Mb. To complement the first-generation domestic cat maps, support higher resolution mapping studies, and aid genome assembly in specific areas as well as in the whole genome, a 15,000Rad radiation hybrid (RH) panel for the domestic cat was generated. Fibroblasts from the female Abyssinian cat that was used to generate the cat genomic sequence were fused to a Chinese hamster cell line (A23), producing 150 hybrid lines. The clones were initially characterized using 39 short tandem repeats (STRs) and 1,536 SNP markers. The utility of whole-genome amplification in preserving and extending RH panel DNA was also tested using 10 STR markers; no significant difference in retention was observed. The resolution of the 15,000Rad RH panel was established by constructing framework maps across 10 different 1-Mb regions on different feline chromosomes. In these regions, 2-point analysis was used to estimate RH distances, which compared favorably with the estimation of physical distances. The study demonstrates that the 15,000Rad RH panel constitutes a powerful tool for constructing high-resolution maps, having an average resolution of 40.1 kb per marker across the ten 1-Mb regions. In addition, the RH panel will complement existing genomic resources for the domestic cat, aid in the accurate re-assemblies of the forthcoming cat genomic sequence, and support cross-species genomic comparisons.
Copyright © 2012 S. Karger AG, Basel
Every genome needs a good map [Lewin et al., 2009]. The accuracy and density of genetic map construction for any particular species has improved with the evolution of molecular biology, cell culture, and genetic and genomic techniques. Chromosomal abnormalities, somatic cell hybrid maps, and family-based linkage analyses are techniques that have increased map density and can decipher gene order. Radiation hybrid (RH) mapping provided an additional leap forward as polymorphic markers were not required and resolution could be manipulated by radiation dosage. Sanger-based and next-generation sequencing now suggest the finest levels of resolution, density, and order, although correct assembly is complicated and not error-free. Overall, each independent technique has benefitted from the previously built framework maps and all mapping efforts have combined to improve accuracy of orders and distances of genes in the genome.
Successful RH mapping in humans [Cox et al., 1989] pioneered RH mapping efforts in many species such as rat, dog, cow, sheep, horse, pig, and macaque [Womack et al., 1997; Priat et al., 1998; Yerle et al., 1998; Watanabe et al., 1999; Murphy et al., 2001; Chowdhary et al., 2002; Laurent et al., 2007] and non-mammalian vertebrate models including the zebrafish [Geisler et al., 1999; Hukriede et al., 1999]. RH panels were generated for several species initially as lower-resolution RH panels, generally less than 5,000Rad for initial genome map construction; later, higher-resolution panels, generally 10,000Rad or greater, were developed for several species and used in fine mapping and genome assembly. For example, RH maps for the human genome have increased from 3,000Rad to 50,000Rad, improving resolution from approximately 1.2 Mb to an average of 94 kb between ordered markers [Hudson et al., 1995; Gyapay et al., 1996; Olivier et al., 2001]. Comparisons to other mapping techniques, such as recombination maps and genome assemblies, have indicated that each method has value and that cross-method comparisons will help to generate the ultimate map, correct in orientation, order and distance between loci.
The domestic cat has become a useful model organism for the analysis of many diseases and disorders that occur in the human population [Lyons et al., 2004; Rah et al., 2005; Meurs et al., 2007; Menotti-Raymond et al., 2010]. To facilitate the cat’s role as a model for human disease, genetic and genomic resources for the cat have been in production for nearly 3 decades, progressing from somatic cell hybrid panels [O’Brien and Nash, 1982], inter- and intra-species linkage maps [Menotti-Raymond et al., 1999; Grahn et al., 2005; Cooper et al., 2006], integration with RH panels [Sun et al., 2001; Menotti-Raymond et al., 2003a], and currently to complete genome sequencing.
Next-generation sequencing techniques are more efficient in the capture of regions of the genome that had been missed by earlier technologies [Wheeler et al., 2008]; however, the assembly of sequence data generated by non-Sanger-based, massively parallel sequencing technologies faces unique challenges due to the short read lengths produced [Pop, 2009] and higher error rates [Xu et al., 2009]. The existing Sanger-based genome assembly is necessary as a scaffold to map new sequence data [Wheeler et al., 2008]. Although an excellent initial resource, the coverage of the cat genome captured by the 1.9× sequencing effort of the domestic cat was approximately 60% [Pontius et al., 2007]; additional light sequencing of the cat genome for SNP discovery increased depth to 3× and expanded coverage to approximately 80% of the genome [Mullikin et al., 2010]. Despite the expanded coverage, low contig and short scaffold N50s of 4.6 and 162 kb, respectively, suggest that significant gaps in the initial cat sequence assemblies could prevent the formation of continuous contigs with significant length. Next-generation sequencing of the domestic cat to obtain 10–13× coverage is nearing completion (see the Felis catus entry in http://www.genome.gov/10002154). However, since the feline 3× sequence contains significant gaps in coverage and multiple unaligned contigs and the scaffolds were based on the canine assembly, accurate assembly of the deeper non-Sanger-based feline genome sequence will be highly challenging. RH maps have played an important role in facilitating the process of whole-genome sequencing and assembly for human, mouse, rat, dog and cattle [Lander et al., 2001; Waterston et al., 2002; Lindblad-Toh et al., 2005; Rhesus Macaque Genome Sequencing and Analysis Consortium et al., 2007; Bovine Genome Sequencing and Analysis Consortium et al., 2009]. To date, the available maps for the cat have fewer than 3,000 markers [Davis et al., 2009; Menotti-Raymond et al., 2009] with a resolution of ∼1 Mb, thus the generation of a high-density map is of great importance to improve the power of studies that require fine-mapping and physical mapping approaches to identify regions of interest. A high-density and high-resolution genome map could strongly support the placement of contigs and resolve placement of the singleton contigs that have yet to be assigned to a chromosome.
To complement the previous physical maps and further resolve the structure of the domestic cat genome, we present the construction and initial characterization of a 15,000Rad whole-genome RH panel suitable for high-resolution mapping.
Materials and Methods
A 15,000Rad RH panel was generated using a fibroblast donor primary culture derived from a female Abyssinian cat fused with an A23 thymidine kinase-deficient Chinese hamster fibroblast cell line. Fusion, isolation, selection, and initial cloning procedures were as previously described [Chowdhary et al., 2002] with the following modification: 30 sequential doses of 500 cGy (rad) were administered to the donor cells for a total absorbed dose of 150 Gy (15,000 cGy) using a 6-MV photon beam produced by a linear accelerator (Varian 2100C; Varian Corporation, Palo Alto, Calif., USA). Dosages were calculated using a computer software program (IMSure, version 1.22; Prodigm, Inc. Chico, Calif., USA). Hybrid cell lines (N = 204) were expanded in T25 flasks (Thermo Fisher Scientific, Rochester, N.Y., USA) in HATO medium containing DMEM medium with 10% FBS, 1× antibiotic-antimycotic (Invitrogen, Carlsbad, Calif., USA), 1× HAT (Invitrogen), and 1× Ouabain (Sigma-Aldrich, St. Louis, Mo., USA). At the second passage, approximately two-thirds of the cells of each hybrid clone were cryo-preserved in freezing medium containing 10% DMSO; the remaining cells were returned to T25 flasks (passage 3), allowed to reach confluency, then expanded in triplicate T75 flasks (passage 4) and harvested and cryo-preserved at confluency. Of the clones, 150 were successfully cultured and expanded to passage 4. DNA was isolated from approximately 10 µl of each cell pellet using the Qiagen DNeasy kit (Valencia, Calif., USA) according to the manufacturer’s protocol. In addition, each DNA sample was whole-genome amplified using the Qiagen REPLI-g Mini Kit (Valencia) following the manufacturer’s protocol for the amplification of genomic DNA from blood or cells with 3 µl extracted DNA as a template.
Isolated DNA from RH clones was tested for total DNA concentration using a NanoDrop spectrophotometer (Thermo Scientific, Wilmington, Del., USA) (online suppl. table 1, for all online suppl. material, see www.karger.com/doi/10.1159/000339416). Thirty-nine short tandem repeat (STR) markers [Menotti-Raymond et al., 1999, 2003a, b] (online suppl. table 2) were multiplexed and tested with the 150 RH clones as previously described [Lipinski et al., 2008]. For a validation test, 10 markers were tested on both whole-genome amplification (WGA) and non-WGA DNA samples (online suppl. table 3). In each reaction, 2 µl WGA DNA (diluted 1:10) or non-WGA DNA was amplified as previously described [Lipinski et al., 2008]. All STR genotypes were analyzed using STRand software [Toonen and Hughes, 2001] (http://www.vgl.ucdavis.edu/STRand) for analysis using semi-automatic calling of alleles. An internal size standard (GeneScan 500 LIZ, Applied Biosystems, Carlsbad, Calif., USA) was used to determine allele size for each STR. Allele ranges for each marker were established from genotyping the DNA of the donor cat. Additionally, both WGA and non-WGA DNA from 5 clones were genotyped with 1,536 SNPs via the Illumina (San Diego, Calif., USA) GoldenGate assay as described below (data not shown). Retention frequencies (RFs) for non-WGA and WGA DNA samples were tested for significant difference between the 2 conditions using Pearson’s χ2-test [Pearson, 1900].
Whole-genome amplified DNA from the 150 cell lines and the host hamster and donor cat DNA samples were genotyped for 1,536 SNPs using the Illumina GoldenGate genotyping assay analyzed on an Illumina® BeadStation 500G [Oliphant et al., 2002] (http://www.illumina.com) following the manufacturer’s protocol at the Genetics Core Laboratory at the Texas Biomedical Research Institute. SNPs were selected within random 1-Mb regions on 10 different cat chromosomes (A1, A2, B3, C2, D1, D2, D4, E2, F2 and X) using data from the Felis catus sequencing effort [Mullikin et al., 2010]. Each region contains ∼153 SNPs, having a geometric distribution of the SNPs in the region, averaging 1 SNP per 6 kb. GenomeStudio 2010.2 version software was used to manually score the presence or absence of loci across the RH panel clones. Data was filtered by considering the quality of the SNPs and the quality of the clones, as previously described [McKay et al., 2007]. For the primary filtering, SNPs were not scored in the clones if the SNP successfully amplified in the hamster and could not be distinguished from the cat or if the SNP failed to amplify in the donor cat DNA. Calling thresholds for the SNPs that amplified within a clone were absent (0) for call scores below 0.10, unknown (–) for call scores between 0.1 and 0.15, and present (1) for call scores above 0.15. Finally, secondary filtering removed markers with an ‘unknown’ rate of more than 5%.
The filtered data from each region was used to construct linkage groups and framework maps in CarthaGene-1.2 [de Givry et al., 2005]. Markers were tested to detect duplicate RH vectors using the ‘merge’ command. For markers within a given 1-Mb region, linkage groups were formed by 2-point analysis with a threshold LOD score >3.0. Framework maps were constructed using the ‘buildfw’ command with a saving threshold and an adding threshold of 3. The resulting maps were refined with the commands ‘flips’ (7 1 1), ‘polish’, and ‘greedy’ (3 1 1 15 25) and tested by ‘robustness’ (–3). For each chromosomal region, 10 maps with the highest likelihood were examined to confirm the reliability of major changes detected within each region.
We identified and propagated 204 initial cat-hamster fusion lines; from these, we cultured 150 clones successfully to at least passage 4 and further characterized them to develop the RH panel. To verify accurate representation of clone DNA in WGA samples, 10 STRs were genotyped on WGA and non-WGA template DNA from the same clones (online suppl. table 3). Average RFs between non-WGA and WGA DNA were 28.7 and 27.4%, respectively, and the percentage of clones with >10% RF were 96.7 and 92.7%, respectively (table 1). Comparing presence or absence of each marker for each clone, 3.5% discordancy was detected between the WGA and non-WGA DNA (online suppl. table 3). In addition, 5 non-WGA DNA clones were tested on 1,536 SNPs and no significant differences were detected (data not shown).
|Table 1. RFs for the feline 15,000Rad RH panel calculated by STR and SNP analysis|
Each clone was tested with 39 STR markers distributed across all 18 autosomes and the X chromosome for initial characterization of the panel. Marker RFs are summarized in table 1 and the data matrix is presented in online suppl. table 4. Marker RFs ranged from 0 to 54%, with an average of 21% for the panel. Clones C027 and C184 were not positive for any STRs, and clone C087 had the highest RF of 54%. Of the 150 clones tested, 88% had RFs >10%. No markers were located near the selectable thymidine kinase and HPRT loci. Figure 1 shows the graphical representation of RFs for each clone summarized in online suppl. table 1, and online suppl. figure 1 illustrates the distribution of markers within clones.
|Fig. 1. RH clone RFs estimated by STR and SNP analysis. Each bar represents 1 clone. Estimates of RF calculated by STR analysis are indicated by gray bars to the left on the graph, and estimates of RF calculated by SNP analysis are indicated by black bars to the right on the graph.|
Further characterization of the 150 clones was accomplished via high-throughput SNP genotyping. Of the 1,536 SNPs included in the Illumina GoldenGate assay, 921 (59.96%) passed primary filtering and were successfully typed. Seventy-five SNPs (4.88%) failed in the donor cat and 540 SNPs (35.15%) were positive in hamster, thus genotyping was not possible. Of the 921 SNPs that passed primary filtering, 127 SNPs failed secondary filtering, resulting in 794 SNPs available for analysis. For the SNP markers, the average RF was 29.4%, and the percentage of clones with RF >10% was 90.7% (table 1). Clone RFs estimated by SNP genotyping are summarized in online supplementary table 1 and illustrated in figure 1. The clones that were negative for all STRs had RFs >20% for the SNPs. A general trend of similar RFs was observed between STRs and SNPs (fig. 1).
The 794 SNPs that passed quality control were used to construct framework RH maps (table 2). No SNPs had identical RH vectors as determined by the ‘merge’ command. The 794 markers formed 10 linkage groups, one per chromosome region. Framework maps included 181 of the 794 (22.79%). The RH maps with best likelihoods obtained from the 10 initial 1-Mb regions are shown in online suppl. 2 and summarized in table 2. A representative map for chromosome E2 is presented as figure 2. LOD scores to the nearest markers in a framework map were generally above 20, ranging between 12.7 and 37.7. Major inversions were suggested on chromosomes A2, B3, D1, and D2, which were consistent across all 10 highest likelihood maps for each region. The overall SNP order, beside the noted inversions, was conserved across all maps. However, in every map, some SNPs showed a different location in the RH map as compared to the positions identified by the sequence assembly. The length of the RH maps ranged from 178.9 to 489.4 cR 15,000 with an average kb to cR ratio of 2.4 kb/cR15,000 and an average inter-marker distance of 40.1 kb (table 2). Chromosomes E2 (fig. 2) and D4 were the most densely mapped chromosomes, with 25 markers assigned to each 1-Mb region and lengths of 489.4 and 425.4 cR15,000. Conversely, B3 was the most sparsely mapped chromosome, with 11 markers on the framework map and a length of 178.9 cR15,000. The average map lengths in kb and in cR15,000 as well as kb/cR15,000 are shown in table 2.
|Table 2. High-throughput SNP genotyping summary and framework map marker summary|
|Fig. 2. Framework RH map of feline chromosome E2. The ideogram of the chromosome is shown on the left, the estimated position of each marker is shown in the center, and the framework 15,000Rad map is shown on the right. The star indicates 1 large rearrangement in marker order between the sequence data and the RH map.|
The domestic cat is increasingly recognized as a robust animal model for human diseases. Expanding resources, particularly the genome sequencing projects and the resulting SNP discoveries, should facilitate candidate gene identification and analyses. The anticipated deeper coverage should accelerate investigations of simple phenotypes and enable analyses of more complex traits in the cat. However, mutation detection studies are hampered by incorrect gene orders caused by statistical fluctuations in genetic maps and incorrect genome assemblies. Even with very high coverage-derived assemblies of humans and mice, imperfections and artifacts in gene order and distances are a recognized hazard [Kong et al., 2002], which can be resolved with the generation of a high-resolution high-density RH map [Marques et al., 2007].
Beside the genome sequence, many excellent resources are currently available for the domestic cat supporting the identification of diseases and traits. The current cat 5,000Rad RH map contains over 2,000 markers with a resolution of ∼1 Mb, which supported the 1.9× and 3× cat genome assemblies. The use of a more robust whole-genome assembly from the most closely related species is a common practice for comparative assemblies, as has been demonstrated in the bovine genome [Prasad et al., 2007]. The dog scaffold was used where available to direct the cat assembly below the 1-Mb level, introducing a bias towards the dog genome in the cat gene order and inherently incorporating errors into the cat sequence assembly [Pontius et al., 2007]. Thus, a high-resolution gene map for the cat, developed by an independent technology such as the RH technique, should augment genome assembly for the cat, thereby facilitating fine mapping and candidate gene studies.
The overall goal was to produce a high-resolution RH map of the cat to facilitate genome assembly. The same Abyssinian cat was used to generate the genome sequence and construct the RH panel to alleviate concerns regarding individual sequence rearrangements and variations due to repeat regions. The cat RH 15,000Rad panel is composed of 150 RH clones, which supports mapping of closely spaced markers. Marker retention has been shown to vary across the genome in RH panels [Cox et al., 1990; James et al., 1994; Walter et al., 1994] and can be estimated with different technologies, such as STRs and high-throughput SNP genotyping. An initial RF estimate of 21% was determined by testing 39 STRs, with 88% of clones having a >10% RF. As expected, this RF estimate is lower than the initial estimate of the 5,000Rad panel [Murphy et al., 1999] but similar to RFs obtained from other species’ RH panels with high radiation dose, such as the 12,000Rad ovine panel, which reported 87.5% of clones with RFs >10% [Laurent et al., 2007]. Retention frequencies were also estimated from the SNP analysis and illustrate the limitations of typing a high-resolution RH panel with few markers. The RF estimate from SNP analysis is higher than that estimated by STR analysis, with a 29.4% RF and 90.7% of clones having a >10% RF. By testing more loci, a more complete estimate of RF can be calculated, which reflects the higher retention expected from using a low-passage RH panel. Both STR and SNP analyses demonstrate the fluctuation in retention in different regions of the genome. Using these data, a subset of 94 clones could easily be selected that has an optimum average RF across the cat genome.
The newer SNP genotyping technologies should support the construction of high-density genetic maps. For this study, Illumina technology was used to genotype 1,536 SNPs on the cat RH panel. The initial selection process of SNPs did not consider conservation to hamster, implying the amplification of some SNPs was expected in the hamster background. Approximately 35% of SNPs amplified in the hamster, precluding their analysis in the cat RH panel. In addition, this analysis of the SNPs is the first genotyping of these particular loci in the cat, thus a 5% failure rate, as determined by failure in the donor cat DNA, was expected due to design and assay failure. Approximately 8% of SNPs failed to amplify robustly, resulting in ambiguous genotyping. Overall, 51.7% (794) of SNPs were suitable for map construction, supporting the rapid development of a high-density map.
The current 5,000Rad RH panel has a resolution limit exceeding 1 Mb, thus, below this level, the cat genome assembly does not have a secondary mapping method to support assembly or resolve discrepancies. The 10 framework maps constructed had an average inter-marker distance of 40.1 kb, suggesting a significant 25-fold increase in resolution as compared to the 5,000Rad feline RH map, and will support genome assembly below 1 Mb. Conservative framework maps for the 15,000Rad cat RH panel were constructed with 181 SNPs, averaging 18 markers covering an average of about 900 kb. A majority of markers in the framework maps had a consistent order with the suggested SNP positioning. However, several large inversions on 5 of 10 chromosomes were identified, which were supported by all 10 maps with the highest likelihoods. Each 1-Mb region also had smaller potential inversions. The SNPs were selected and positioned from unpublished versions of the cat genome assembly; therefore, the more robust assembly may show more concordance with the RH maps. In addition, large-scale SNP calling methods may need refinement to improve accuracy and objectivity in assigning call scores for RH data, as amplification levels fluctuate unpredictably between markers and minor differences in call rates for even 1 marker can dramatically change the final marker order of a map.
A recognized challenge of RH panels is that, as fused RH clones continue to grow and divide, donor DNA is progressively lost [Karere et al., 2010], continually changing the profile of markers present at each passage. Thus, to be comparable, gene maps must be constructed using DNA from the same passage for a particular clone, which often limits the amount of DNA available for mapping. In addition, large-scale culture steps are required to prepare DNA in sufficient quantities for distribution to multiple laboratories and for large-scale marker typing. The efficiency of RH mapping should be increased by using early passage WGA DNA samples, which have higher RFs and sufficient amounts of DNA. Moreover, WGA may also amplify low-copy fragments that would have otherwise been lost to the analysis. To avoid a significant reduction in the RF, and to make the resource available for the community, each clone was whole-genome amplified. Promising results with WGA DNA were obtained on a 10,000Rad panel for the Rhesus macaque [Karere et al., 2010] and a 3,000Rad panel for the gilthead seabream [Senger et al., 2006]. In this cat RH panel, WGA was a reliable method for genome amplification as RFs were comparable between WGA and non-WGA DNA, and only 3.5% discordancy was detected between STR amplification in WGA versus non-WGA DNA. The low requirements of DNA concentration and sample volume for SNP array technologies and the extensive amplification of RH clone DNA by WGA suggests that the cat 15,000Rad panel should be a sufficient resource for additional mapping on higher density arrays and fine mapping specific regions.
The ultimate map of a genome is not just the complete sequence of the organism, but also the correct assembly of the sequence with chromosomal assignments and orientation. Thus, the production of a whole-genome sequence does not dismiss the need for accurate and efficient mapping techniques. The cat genome sequencing coverage is shallow at ∼3×, and the highly fragmented sequence assembly further highlights the need for high-resolution RH maps in the species. This cat 15,000Rad RH panel should provide support for the continued refinement of the domestic cat genome assembly, particularly since the deeper cat sequence was developed with shorter read technologies that have higher error rates. With a comprehensive high-density, high-resolution RH map, marker order rearrangements can be identified and resolved and cross-species genome comparisons can be accomplished, providing a powerful tool for inferring genome function and evolutionary history.
Funding for this project was provided by NIH-NCRR RR016094, the Center for Companion Animal Health at UC Davis, the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health, and the George and Phyllis Miller Feline Health Fund of the San Francisco Foundation. This investigation was partially conducted in a facility constructed with support from Research Facilities Improvement Program Grant Number C06 RR013556 from NCRR, NIH. Cat genome sequence for SNP detection was provided by Dr. Wesley Warren of the Genome Institute at Washington University School of Medicine. The authors would like to thank Paul Lathrop and Hasan Alhaddad for technical support and advice.
Leslie A. Lyons
Population Health and Reproduction
School of Veterinary Medicine, University of California – Davis
4206 Vet Med 3A, Shields Avenue, Davis, CA 95616 (USA)
Tel. +1 530 754 5546, E-Mail firstname.lastname@example.org
Accepted: April 2, 2012 by M. Schmid
Published online: July 6, 2012
Number of Print Pages : 8
Number of Figures : 2, Number of Tables : 2, Number of References : 52
Additional supplementary material is available online - Number of Parts : 2
Cytogenetic and Genome Research
Vol. 137, No. 1, Year 2012 (Cover Date: August 2012)
Journal Editor: Schmid M. (Würzburg)
ISSN: 1424-8581 (Print), eISSN: 1424-859X (Online)
For additional information: http://www.karger.com/CGR