Beyond the Rainbow: A Review of Advanced Lineage Tracing Methodologies for Interrogating the Initiation, Evolution, and Recurrence of Brain Tumors

The mammalian forebrain is perhaps the pinnacle of evolution and one of the most complex structures in known existence. The origin of this complexity and diversity partly lies in dynamic behavior of progenitors during embryonic neural development, all of which is under the control of regulatory mechanisms that ensure all the elements end up in the right place at the right time. Historically, dye-base, histochemical, enzymatic, or fluorescent lineage tracing techniques have been used deconvolute developmental dynamics in tissues and cells. Technical limitations resulted from a restrictive number of fluorophores, the half-life of the dyes, or the ability to deconvolute mixed population. These limitations often impede larger scale lineage tracing using these methods in spatial and temporal contexts. Genetic barcoding techniques have been used for decades to explore clonal investigations and have now evolved with high-throughput sequencing methods to allow for impressive insights into population and even organism-level lineage relationships. In this review, we will discuss the progression of lineage tracing methodologies and how they are applied to answer questions around molecular and cellular mechanisms of gliogenesis and neurogenesis. We will also discuss recent advances in computational biology, single-cell sequencing, and in situ-based lineage tracing methodologies. Incorporation of these methods into toolset of lineage tracing promise to enable a higher resolution, multimodal view of neural lineages during development and disease processes that highjack developmental signaling such as brain tumor development and recurrence – where traditional developmental hierarchies become more plastic and less predictable. Given the dismal prognosis of high-grade brain tumors like glioblastoma multiforme, a better understanding of the lineage relationships leading to disease heterogeneity and recurrence is desperately needed to formulate efficacious approaches to treatment. Here we discuss a historical foundation on, as well as the future of, lineage tracing at the intersection of development and disease.


Abstract
The mammalian forebrain is perhaps the pinnacle of evolution and one of the most complex structures in known existence. The origin of this complexity and diversity partly lies in dynamic behavior of progenitors during embryonic neural development, all of which is under the control of regulatory mechanisms that ensure all the elements end up in the right place at the right time. Historically, dye-base, histochemical, enzymatic, or fluorescent lineage tracing techniques have been used deconvolute developmental dynamics in tissues and cells. Technical limitations resulted from a restrictive number of fluorophores, the half-life of the dyes, or the ability to deconvolute mixed population. These limitations often impede larger scale lineage tracing using these methods in spatial and temporal contexts. Genetic barcoding techniques have been used for decades to explore clonal investigations and have now evolved with high-throughput sequencing methods to allow for impressive insights into population and even organism-level lineage relationships. In this review, we will discuss the progression of lineage tracing methodologies and how they are applied to answer questions around molecular and cellular mechanisms of gliogenesis and neurogenesis. We will also discuss recent advances in computational biology, single-cell sequencing, and in situ-based lineage tracing methodologies. Incorporation of these methods into toolset of lineage tracing promise to enable a higher resolution, multimodal view of neural lineages during development and disease processes that highjack developmental signaling such as brain tumor development and recurrencewhere traditional developmental hierarchies become more plastic and less predictable. Given the dismal prognosis of high-grade brain tumors like glioblastoma multiforme, a better understanding of the lineage relationships leading to disease heterogeneity and recurrence is desperately needed to formulate efficacious approaches to treatment. Here we discuss a historical foundation on, as well as the future of, lineage tracing at the intersection of development and disease.

Introduction
The Continuum of Neural Precursor Cells At a very early stage of neural development, cells undergo symmetric proliferative division to fully amplify their population [1]. In the pallium, a monolayer of proliferating cells called neuroepithelial cells expand their population by proliferating symmetrically. Once amplified, cells switch to asymmetric neurogenic division to both self-renew as well as produce a new specialized subset of glia called radial gli2al cells (RGCs) [2][3][4], which sit with their apical processes on the ventral surface, and their basal processes extending up to the pial surface. The commonly recognized view of cortical neurogenesis follows that of the "radial unit hypothesis" [5]: RGCs divide asymmetrically to produce both RGCs as well as pyramidal neurons, which migrate upwards toward the pial surface. Cells within a given radial unit (i.e., from the same RGC precursor) are clonally related, with neurons that are formed do so following an insideout temporal sequence [1,[5][6][7]. Cells that are born earlier end up in deeper layers, whereas later-born cells end up near more superficial regions of the cortex. RGCs will renew to make more postmitotic neurons, or intermediate progenitor cells which eventually migrate to the subventricular zone whereby they expand neuronal pools [1]. (It is important to note that interneurons are largely generated outside of the cortex and migrate tangentially from the embryonic ganglionic eminences to populate various forebrain regions, including the cortex [8][9][10]). Cortical progenitor cells thus generate the many diverse and complex cell subsets in a temporally organized manner, all of which drives their spatial organization [1]. Lineage tracing data have pointed toward the commonly held view that during neurogenesis, RGCs are bona fide stem cells, and will progress through stages of division and sequestration to produce different types of pyramidal neurons [2][3][4]. The radial unit subsists as a fundamental building block across the brain, and different radial units are iteratively repeated across different brain regions.
Perinatally, RGCs largely undergo a "gliogenic switch" which produces immature astrocytic progenitor cells and OPCs [11][12][13]. These OPCs and astrocytic progenitors translocate outward from the ventricles and proliferate locally in the expanding brain parenchyma serving to both myelinate surrounding cells as well as form local polydendrocytes. Of note, some OPCs have been observed to synaptically engage with neurons and are among the most proliferative cells over the lifespan of mammals [14][15][16]. As brain development progresses, many glial progenitor subpopulations terminally differentiate to form oligodendrocytes [17] and mature astrocytes [18]. Some precursor and progenitor cells remain nonterminally differentiated along the subependymal zone, the hippocampus (both to varying levels in different species), and in the brain parenchyma, undergoing low levels of proliferation into adulthood [19,20]. Thus, from neuroepithelial cells to proliferating adult neural stem cells, these populations exist within a continuum of neural precursor populations [1]. The consensus view is that these various populations of stem cells, and oligodendroglial, astroglial, and other neural progenitor cells are the likely cells-of-origin for forebrain glioma [21][22][23] rather than a dedifferentiation of postmitotic populations. This review will cover [1] historical methods for lineage tracing with a particular focus on clonal methods, [2] insights generated into neuro and gliogenesis from such methods, [3] emerging techniques to gain deeper and more comprehensive insights into neural lineages, and [4] how these can be used to elucidate the mechanisms and lineage relationships between gliogenesis and brain tumor cell types.

Historical Lineage Tracing Methodologies
Originally, lineage tracing methodologies encompassed a few cells being labeled using dye and traced. Specifically, the Golgi method of sparse labeling ushered in a new appreciation of neuroanatomy and its use by Cajal enabled the initial identification of discrete populations in much of the nervous system due to his meticulous recreations of Golgi-stained sections [24,25], cerebellum [24], and retina [25] cells which showed nerve cells interconnected and interfaced across the brain.
Fast forward to the mid-nineteenth century, when autoradiography emerged as the standard for tracking proliferation and allowed inference of migration. Specifically, Cajal's anatomical investigations greatly elucidated the discrete cell morphologies and classes of cells in many CNS structures, but his studies lacked the ability to definitively infer lineages. [ 3 H]thymidine's ([ 3 H]dT) ability to intercalate into DNA was a first allowance for a researcher to label neural precursor cells and their progeny [26][27][28][29][30][31][32][33]. [ 3 H]dT's nascent methodology pointed to neurogenesis occurring in distinct areas of the adult mammalian brain [27]. It was not until more contemporary studies using [ 3 H]dT's analog, bromodeoxyuridine (5-bromo-2′-deoxyuridine, aka BrdU), that more definitive analyses were enabled through the combined use of fluorescent immunohistochemistry to detect BrdU [34]. These tools combined with antibodies to cell typespecific and confocal microscopy have greatly enabled proliferative pulse-chase studies of development and disease. Specifically, during the S phase of the cell cycle, BrdU is incorporated in lieu of thymidine and can be assayed to indicate proliferation. This is not without its own pitfalls: this method lacks distinguishing from other DNA synthesis events, such as gene duplication or DNA repairwhich in the context of lineage tracing, would lead to potential misinterpretation and false negatives [34].
The ideal method to unambiguously assess lineage relationships is reporter-defined live imaging. However, this approach typically only affords researchers insight using ex vivo lineage tracing via slice culture systems [2,4] or technically demanding multiphoton imaging combined with invasive surgical approaches [35][36][37][38]. Progression in multiphoton miniscopes promises to streamline these methods, but such methods are still largely under development [39][40][41]. As a result, research regarding tracing continued to rely on combinations of recombinase-based methods, viral labeling, or thymidine analog-based methods [34]. The usage of these methodologies can lead to challenges in interpretation and there remain contrary views on how some aspects of neurogenesis [42] or putative neuronal regeneration occurs [43]. For example, the neuronal reprogramming field has been shaken by controversy with competing claims that different cell types can transdifferentiate based on misexpression of proneural genes [44,45], though other research points to it being a product of nonspecific promoters or other technical confounds [46][47][48]. The rather striking inherent difference between the two datasets point to the reliance on methodologies used: that pitfalls in lineage tracing methodology that drive potential false negatives (and positives) in results. A detailed discussion of these methodologies with their benefits and pitfalls is beyond the scope of this review, and for more detailed overview, we would refer the reader to previous discussions of these topics [34,43,49,50]. Here we will focus on methods that allow for higher precision lineage tracing due to genetic reporter designs that allow for higher resolution views into neural populations either due to precise mosaic labeling and/or genetic barcoding.

Mosaic-Labeling Methodologies
While dyes or thymidine analog-based labeling of cells (such as the BrdU/EdU pulse chase methods) are robust methods for labeling subsets of cells, they often do not allow unambiguous lineage tracing due to challenges with deconvolution in large or mixed and overlapping subpopulations. With that in mind, we will be discussing alternate approaches defined by delivery method (such as viral transduction or advanced mosaic analysis systems) as well as genetic modifications (such as barcoding, dynamic barcoding) and sequencing-based in silico methods (somatic mutation analysis, mathematical modeling).

Viral Tracing
Starting in the 1980s and into the early 2000s, retroviral labeling with reporter genes emerged as a tool to be utilized in the context of understanding cell lineage structures in the developing brain [3,4,[51][52][53][54][55][56]. Viruses, or more specifically retroviruses, are mostly specific for proliferative cell populations due to their ability to incorporate into the genome in transduced mitotic cells (Fig. 1a). Viral tracing allows stable insertion of fluorophores over numerous divisions where dye or thymidine analog-tracing methods would be diluted. There are two important limitations: the first is that viruses only label hemi-lineages as insertion occurs in only one progeny cell. Second, loss of identification of the whole progeny of daughter lineages may be due to silencing, as in the case with deep layer-restricted lineages after a few rounds of division [55]. Importantly, this approach was among the first to incorporate genetic barcoding [53,57]. This barcoding technology has continued to be employed to answer critical questions such as the origins of neural stem cells and lineage potential [9,58].

Genetic Fate Mapping with Recombinases or Transposons
The use of recombinases such as Cre and Flp/FlpO in either a tissue or cell type-specific mannersometimes inducibly [59] along with cognate reporter strains [60] has been a powerful method to investigate neural lineages and cell types [61,62]. The toolset continues to expand with recombinases such as Dre, Vika, VCre, and a host of others [63][64][65]. However there are intrinsic challenges and limitations, including somewhat laborious methods to titrate recombination [66], germline recombination issues and sex bias [67], and lack of correlation between reporter mice and knockout alleles [68]. This last issue specifically portends that while recombination is a powerful tool, there are limitations in the process of deconvolving data. Other conditional or transgenic strategies that use conditional or combinatorial fluorophores [69][70][71] allow for quasi-mosaic analysis. However, these can be difficult or laborious to link to gene function. For example, PiggyBac [10,23,[72][73][74] or Sleeping Beauty [75] transposition have greatly expanded our ability to trace neural lineages in a perdurant manner and create tumor models. These transposition methods can use plasmids linked to genetic reporters but multiplexing such plasmids leads to stochastic mixing of plasmid dosage and overlap of plasmid expression in somewhat unpredictable manners [70]. A recently reported approachtermed TEMPOemploys sequential activation of chained genetic reporters by CRISPR, enabling the manipulation of connective generations of daughter cell lineages but is limited by the number of nonoverlapping, orthogonal reporters included [76]. While methods exist for deconvolution of overlapping fluorescent reporters, these require dedicated tools and sophisticated imaging expertise [70]. Moreover, because recombination relies heavily on fluorophore readout, most data will typically provide static analysis. Though many of these issues can be mitigated in precise circumstances by careful initial design whereby genetic reporters are intrinsically linked to recombinase expression or genes [77][78][79][80] rather by usage of reporters in separate alleles. Nevertheless, generation of precise mosaics where gene function is unambiguously defined by linked reporters is often challenging to generalize.

Mosaic Analysis by Lineage Tracing
Mosaic analysis by lineage tracing (MADM) works by relying on tagging cells using a Cre-dependent mitotic recombination [81] (Fig. 1b). MADM has an advantage over viral transduction due to its identifying of both sister cell lineages after cell division. Inducible labeling allows for increase in spatiotemporal resolution. However, it does not allow for continuous observation of a cell, and reconstructing lineage trees becomes cumbersome as a single cell can produce quite a large population of progeny that have a wide migration field or that generate tumors. Despite these minor caveats, MADM has been responsible for elucidating fundamental issues in the developing and diseased brain, including the deterministic versus stochastic output of radial glia [7,82], adult stem cell lineage relationships [66], uncoupling cell autonomous from non-cell autonomous gene function [83], and the cell of origin in glioma [22]. Given the expansion of MADM reporters to all chromosomes [84], this genetic approach continues to represent the most elegant germline methodology for in vivo manipulation of genes through the generation of mosaic mice. However, in the context of brain tumor modeling, given the co-existence of multiple gain-and loss-of-function driver mutations in many brain tumors, breeding could be laborious.

Mosaic Analysis by Dual Recombinase-Mediated Cassette Exchange
Mosaic analysis by dual recombinase-mediated cassette exchange (MADR) (Fig. 1c) uses expression of two recombinases (Cre and FlpO) to insert "donor" DNA elementstypically using electroporationinto engineered genomic sites containing cognate LoxP and FRT site [63]. Because electroporation preferentially targets mitotic cells, MADR allows for stable insertion (much like retroviral or lentiviral infection or transposon integration) of cassettes of interest, which allows for multiplexing to discriminate cell populations with promoters, insertion of reporters and potentially even barcoding. However, unlike viral approaches or transposon-based electroporation, MADR allows for defined copy number transgenesiseither single copy in mice heterozygous for the recipient site or dual copy in homozygous mice [63]. This largely obviates the phenotypic inconsistencies intrinsic to viral, transposon, or standard plasmid electroporation approaches where supraphysiological gene doses are often observed due to large numbers of integrations and/or episomes. In addition, there is no intrinsic payload limit as compared to standard viral approaches. MADR approach has been utilized in the context of modeling a wide variety of brain tumor subtypes, not limited to Nf1-driven glioblastoma (GBM), YAP1-MAMLD11 and ZFTA-RELA ependymoma, H3f3a-G34R and K27M mutant glioma, and H3f3a WT gliomas [63]. Moreover, akin to pattern of patient gliomagenesis, MADR electroporation allows for synchronized and relatively focal somatic transgenesis without the need for germline engineering. However, MADR requires the desired cells of mutation or regions to be targetable by electroporationwhich can be an impediment for some tumor types (e.g., spinal cord tumors or tumors associated with early-to mid-embryogenesis-like embryonal tumors with multilayered rosettes). Also, MADR insertions are dynamic until dilution of the recombinases below functional levels. Thus, acute lineage tracing is more imprecise than standard recombination events or MADM [63].

Expanding beyond Genetic Reporter Genes with Large-Scale Genetic Barcoding
Single-cell RNA sequencing (scRNA-seq) is a constantly evolving technology that can rapidly and efficiently interrogate transcriptional profiles of thousands of cells. As we will discuss below, the high throughput and dimensionality of single-cell sequencing data lends itself to a host of in silico lineage tracing approaches with caveats. However, scRNA-seq has also synergized with transgenic reporter detection (Fig. 2a) or barcoding technologies (Fig. 2b, c) to enable a new dimension of lineage tracing [63,85]. Specifically, dynamic barcoding, such as GESTALT [85,86], POLYLOX [87,88], SCARTRACE [89,90], and memory by engineered mutagenesis with optical in situ readout (MEMOIR) [91] (reconstruction of the clonal dynamics of embryonic stem cells in situ), all allow the usage of barcodes to reconstruct lineages. The usage of a CRISPR-based system to create indels either in an inserted barcode array (as in GESTALT) or targeted GFP fluorophore (as in SCARTRACE), allows for tagging of clonal populations of cells. A system like GESTALT allows for progressive accumulation of indels which happens over multiple rounds of division, allowing for temporal clonal populations, and has been shown to be robust enough to reconstruct whole lineages of zebrafish [85]. GESTALT has been shown to be widely applicable in different animal models, including GESTALT-inspired mouse models [92]. At a much larger scale, MARC1 [93,94] mice have been generated that use synthetic mutations in a barcoded mouse to generate hundreds of mutant traceable alleles, and CARLIN mice allow for seamless tagging of cells via dox induction allowing for a top-down, unbiased identification of clonal populations during development [92]. In terms of viral approaches, STICR allows for scRNA-seq optimized clonal lineage assessment based on optimized barcodes and has provided powerful insights into both mouse and human neurodevelopment [8,95]. There are some inherent pitfalls of GESTALT and similar evolving barcode approaches in that accumulations of large/maximal deletions (i.e., deletions between the two maximally separated barcodes, causing loss of the entire array) are often selected for within the barcode array making reconstruction of lineage trees a challenge. A second limitation to GESTALT is deconvolving editing patterns to assume a temporal order. This is ameliorated using a system like Tickertape/DNA Typewriter [101,102] whereby edits in an array happen sequentially, instead of randomly as in GESTALT. DAISY uses an alternate approach employing Cas12a and dual acting inverted site arrays that enable higher entropy and~66,000 potential barcodes in a smaller transgene package [103]. It should be noted that not many of these systems have been examined with spatial transcriptomics. Conversely, MEMOIR, allows for the reconstruction of clonal dynamics of cells and their progeny in situ, however, it does not allow for linking of a cell's position to its barcode [91]. With all these approaches, it will be invaluable to transition them to spatial sequencing approaches for both development and disease investigations.

Retrospective Methods and Computational Approaches
Retrospective lineage tracing uses inference to reconstruct a tree of lineage relationships between cells with branch points typically signifying alternate cell fates at the same time and uses their relationship to one another to make inferences about the differentiation branching points. For example, BrdU labeling introduces a means of marking cells in a one-time fashion with continual dilution. As barcode introduction happens only at the first generation of cells, as we move to more differentiated progenitor populations, we can only make inferences about the relationship of cells based on the concentration of a population of cells in question. If a more dynamic system is used (such as GESTALT), where labeling is introduced in a continual fashion, more assumptions can be made about where the branching events can be inferred [85]. Retrospective lineage tracing methods instead use the same paradigm as phylogenetic fate mapping, where reliable data on cell-to-cell relationships can be inferred from gained somatic and epigenetic changes (Fig. 2d). While some methods are new, quite a few rely on repurposing phylogenetic algorithms of old in the context of better-quality sequencing paradigms. These include looking at somatic mutations [96][97][98][99], 5hmc DNA modifications [104], microsatellite methylation patterns like RETrace [105], and microsatellite repeats [106]. Importantly, these approaches are invaluable for investigation of heterogenous human populations and cells but are not optimized for backcrossed strains of mice. Conversely, it should be noted that they cannot provide the entropy and potential bits of data that optimized barcoding or evolving barcoding systems can in mouse or cell models.
Improvements in computational algorithms have also afforded scientists dynamic modeling of idealized branching events based off standard scRNA-seq datasets [107][108][109][110]. Moreover, the biophysical equilibrium of RNA splicing has been used to enable RNA velocity [111] and downstream approaches such as ScVelo [112] and CellRank [113], both of which aim to strike a balance between the gradual and stochastic natures of cell fate decision making based on the ratio of spliced and unspliced reads in RNA-seq data. Major caveats exist of all above methods, of course [110,114,115]. Firstly, most computational retrospective methods rely solely on static snapshots of cellular states, and only afford descriptive and not prognostic information about cellular lineage dynamics. The second is that computational modeling is only as successful as the quality of its data while algorithms can be modified in real time to increase precision and accuracy of branching moments, it relies heavily on the notion that sequenced data are not biased by sample quality or collected cell populations. Indeed, while in many cases empirical in vivo lineage tracing often matches computational reconstructions; there are often cases of transcriptomic convergence and divergence that can confound computational derivation of lineage relationships. This is best exemplified by recent mouse and human studies where computationally inferred lineages could be compared to "ground truth" experimental lineage tracing, resulting in discrepancies and/or "false negatives" being observed [8,95,116]. In recent years, brain development has been employed as a "ground truth" for a number of pediatric brain tumors, providing tantalizing clues as to the cell(s) of origin in medulloblastoma [117,118] and diffuse midline glioma [119]. However, as noted by the groups investigating medulloblastoma, care must be taken when comparing across species and relying on the resulting correlational data as misleading conclusions can be inferred [117,118]. Specifically, because previous studies failed to incorporate humanspecific features of the rhombic lip [120], correlational analyses improperly identified differentiated unipolar brush cells and glutamatergic cerebellar nuclei lineages as the likely origins of group 4 MB [121] rather than an earlier humanspecific proliferative unipolar brush cell lineage [117,118]. Nevertheless, computational biology has made rapid progress and algorithms and approaches continue to improve and these methods provide a powerful complement to "wet lab" investigations, often synergizing when combined.

Conclusion
Barcoding in the Context of Brain Tumor Lineages Though scRNA-seq has added immensely in a few short years to our knowledge of neural development, much of the groundwork had been empirically determined over the course of decades if not almost a century of neuroanatomical investigation. Conversely, in terms of brain tumors, it can be argued that scRNAseq gave the first major insights into the fundamental nature of heterogeneity that had been hinted at but not unambiguously described [122][123][124][125]. Thus, now we must bring empirical determination of these processes to the study of brain tumors, employing many of the approaches described in this review. Indeed, in a handful of cases, this has begun. Specifically, a pioneering use of barcodes was employed to describe the clonal evolution of GBM cells in a xenograft model, demonstrating a stem cell hierarchy where slow dividing populations give rise to faster cycling progenitors and nondividing populations [126]. More recently, lentiviral barcoding was used to empirically determine the presence of common barcodes across the subpopulations of GBM, demonstrating the plasticity of these populations in terms of their ability to assume disparate cell states [122]. Still these investigations likely represent the tip of what will be a proverbial iceberg, and given the heterogeneous nature of brain tumors and their dismal response to therapy, lineage tracing has a lot of catching up to do with the massive scRNA-seq datasets being generated.