COMPOSITIONS AND METHODS CHARACTERIZING METASTASIS
20220218847 · 2022-07-14
Assignee
Inventors
Cpc classification
A01K2207/12
HUMAN NECESSITIES
C12N2740/16043
CHEMISTRY; METALLURGY
A61K49/0008
HUMAN NECESSITIES
International classification
Abstract
The invention features compositions and methods for determining the metastatic potential of cancer cell lines and tumors. Also provided is MetMap, a comprehensive database of the metastatic potential of cancer cell lines.
Claims
1. A method of characterizing the metastastic potential of a mixture of cancer cells in vivo, the method comprising (a) systemically delivering to a non-human subject the plurality of cancer cells, each cell comprising a vector encoding as a single transcript a barcode, a detectable marker suitable for in vivo imaging, and a detectable marker suitable for cell selection and/or sorting; and (b) imaging the cells and their descendants subsequent to delivery to locate where in the body the cell and/or its descendants are present, thereby characterizing the metastatic potential.
2. The method of claim 1, further comprising allowing the plurality of cells to proliferate in the subject for a period of time selected from the group consisting of days, weeks, and months.
3. The method of claim 2, further comprising isolating the cells from the subject and characterizing the identity of the cells and their abundance.
4. The method of claim 3, further comprising sorting the isolated cells.
5. The method of claim 2, wherein the identity and quantity of the cells or the sorted cells is assessed by next-generation sequencing or quantitative PCR.
6. The method of claim 1, wherein single cell RNA sequencing is carried out on each cell, thereby generating a transcriptome for each cell.
7-9. (canceled)
10. The method of claim 1, wherein the marker suitable for imaging is a bioluminescent marker.
11. (canceled)
12. The method of claim 1, wherein the expression levels of the barcode, the detectable marker suitable for in vivo imaging, and the detectable marker suitable for cell selection and/or sorting are correlated.
13. The method of claim 12, wherein the abundance of the barcodes reflects the metastatic potentials of different cells.
14. (canceled)
15. A method of characterizing the metastastic potential of a mixture cancer cells in vivo, the method comprising (a) systemically delivering to a non-human subject the plurality of cancer cells, each cell comprising a vector encoding a barcode; and (b) subsequent to delivery detecting the bar code in a cell, tissue, or organ to determine where in the body the cell and/or its descendants are present, thereby characterizing the metastatic potential.
16-21. (canceled)
22. The method of claim 15, wherein single cell RNA sequencing is carried out on each cell, thereby generating a transcriptome for each cell.
23-27. (canceled)
28. A method of generating a metastasis map, the method comprising (a) systemically delivering to a non-human subject a plurality of cells, each cell comprising a vector encoding as a single transcript a barcode, a detectable marker suitable for in vivo imaging, and a detectable marker suitable for cell selection and/or sorting; and (b) detecting the cells and their descendants subsequent to delivery to identify where in the body the cell and/or its descendants are present; (c) compiling the data of step (b) in a database; and (d) associating the data with the cell's identity, thereby generating a metastasis map.
29-31. (canceled)
32. The method of claim 28, wherein single cell RNA sequencing is carried out on each cell, thereby generating a transcriptome for each cell and the data included in the metastasis map.
33-36. (canceled)
37. The method of claim 28, wherein the data is used to generate a metastasis map that includes a visual representation of the anatomical position of the cells and their proliferation over time.
38. The method of claim 28, wherein drug sensitivity data, CRISPR knockout viability data, shRNA knockdown data, annotated cell line data, a metabolite profile, a genomic profile, a transcriptomic profile, or a proteomic profile is included as an interactive feature within the visual representation.
39. A method of generating a metastasis map, the method comprising (a) systemically delivering to a non-human subject a plurality of cells, each cell comprising a vector encoding as a barcode; (b) detecting and quantitating expression of the barcode; (c) compiling the expression data in a database; and (d) associating the expression data with the cell's identity, thereby generating a metastasis map.
40-49. (canceled)
50. The method of claim 39, wherein drug sensitivity data, CRISPR knockout viability data, shRNA knockdown data, annotated cell line data, a metabolite profile, a genomic profile, a transcriptomic profile, or a proteomic profile is included in the metastasis map.
51. (canceled)
52. A vector comprising a single transcription cassette comprising a detectable marker suitable for cell selection and/or sorting, a marker suitable for imaging a cell in vivo, and a barcode.
53-57. (canceled)
58. A method for identifying the molecular features characteristic of a metastatic cell, the method comprising using the metastasis map generated using the method of claim 1 to identify organ-specific patterns of metastasis.
59-60. (canceled)
61. A computer implemented method of generating a metastasis map quantifying metastatic potential, the method comprising: receiving, by a processor, a listing of vectors encoded as barcodes, the vectors being associated with a plurality of cells systemically delivered to a non-human subject; receiving, from an imaging device, images of the plurality of cells and their descendants within the non-human subject; storing, by the processor, the images of the plurality of cells and their descendants in a database; identifying, by the processor, locations of the plurality of cells and their descendants from the images using the barcodes; and generating, by the processor, the metastasis map based on the locations of the plurality of cells and their descendants.
62-72. (canceled)
73. A system for generating a metastasis map quantifying metastatic potential, the system comprising: a CPU, a computer readable memory and a computer readable storage medium; program instructions to receive a listing of vectors encoded as barcodes, the vectors being associated with a plurality of cells systemically delivered to a non-human subject; program instructions to receive images of the plurality of cells and their descendants within the non-human subject from an imaging device; program instructions to store the images of the plurality of cells and their descendants in a database; program instructions to identify locations of the plurality of cells and their descendants from the images using the barcodes; and program instructions to generate the metastasis map based on the locations of the plurality of cells and their descendants.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088]
[0089]
[0090]
[0091]
[0092]
[0093]
[0094]
[0095]
[0096]
[0097]
[0098]
[0099]
[0100]
[0101]
[0102]
[0103]
[0104]
[0105]
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
[0117]
[0118]
[0119]
[0120]
[0121]
[0122]
[0123]
[0124]
[0125]
[0126]
[0127]
[0128]
[0129]
[0130]
[0131]
[0132]
[0133]
[0134]
[0135]
[0136]
[0137]
[0138]
[0139]
[0140]
[0141]
[0142]
[0143]
[0144]
[0145]
[0146]
[0147]
[0148]
[0149]
[0150]
[0151]
[0152]
[0153]
[0154]
[0155]
[0156]
[0157]
[0158]
[0159]
[0160]
[0161]
[0162]
[0163]
[0164] SCAP (FDR<1e-79, highlighted in bold)
[0165]
[0166]
[0167]
[0168]
[0169]
[0170]
[0171]
[0172]
[0173]
[0174]
[0175]
[0176]
[0177]
[0178]
[0179]
[0180]
[0181]
[0182]
[0183]
[0184]
[0185]
[0186]
[0187]
[0188]
[0189]
[0190]
[0191]
[0192]
[0193]
[0194]
[0195]
[0196]
DETAILED DESCRIPTION OF THE INVENTION
[0197] The invention features compositions and methods that are useful for determining the metastatic potential of cancer cell lines, as well as an interactive metastasis map featuring information that defines such cancer cell lines (e.g., their propensity to metastasize, organs where metastasis is typically observed, sequence data, genomic data, transcriptomic data, proteomic data, metabolomic data, drug sensitivity data, CRISPR knockout viability data, shRNA knockdown data, and annotated data relating to the cell of origin).
[0198] The invention is based, at least in part, on the discovery that a cancer cell's metastatic potential can be ascertained by systemically delivering the cell, in a modified form to allow detection, to a non-human subject. Accordingly, the invention provides compositions and methods for determining the metastatic potential of a plurality of cancer cell lines in vivo. These methods and compositions have been used to generate a map of the metastatic properties of individual cell lines, and this Metastasis Map (or MetMap) represents a novel and important tool for the study of metastatic cancer.
[0199] Nucleic Acid Constructs Methods and compositions are provided herein for tracking cancer cells administered to a non-human subject in vivo. Compositions of the present invention can be used to modify cancer cells prior to administration to the subject so that the cells express identifying markers. Thus, one aspect of the present disclosure provides a nucleic acid construct comprising a barcode, a first detectable marker, and a second detectable marker. The first detectable marker allows in vivo imaging of the cells after administration to a non-human subject. In some embodiments, the first detectable marker is a bioluminescent marker, such as a luciferase. Luciferases, unlike fluorescent proteins, do not require an external light source to generate a signal, which makes this family of bioluminescent markers suitable for in vivo imaging.
[0200] The second detectable marker allows for cell selection, sorting, or both. Markers suitable for cell selection and/or sorting include, but are not limited to, fluorescent proteins. In some embodiments, the second marker is a green, red, blue, or yellow fluorescent protein (GFP, RFP, BFP, or YFP, respectively). In some embodiments, the second marker is mCherry. In some embodiments, the second detectable marker comprises an epitope to which an antibody specifically binds. In some embodiments, the antibody that specifically binds to the epitope is labeled.
[0201] In some embodiments of the present invention, the nucleic acid construct encodes a barcode but no detectable markers. In some embodiments, other selectable markers (e.g., antibiotic resistance genes) are encoded in the nucleic acid construct to enable efficient selection of transformed or transduced cells. In some embodiments, a surface protein on the cancer cell can be used to isolate or detect the cancer cell. In some embodiments, the surface protein comprises an epitope to which an antibody can specifically bind and mediate isolation of the cancer cell. In some embodiments, the antibody is labeled. In some embodiments, the label is a fluorescent or other visually detectable label.
[0202] The barcode between 10 and 30 nucleotides. For example, the barcode contemplated herein may comprise 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides. The barcodes are designed to reduce or eliminate nonspecific binding to the cancer cell's nucleic acid molecules (i.e., genomic DNA, RNA, etc.). In some embodiments, the barcode comprises a nucleic acid sequence that is not substantially complementary to any endogenous nucleic acid sequence present in the cancer cell. In some embodiments, the barcode is designed to diverge from perfect complementarity from an endogenous nucleic acid sequence present in the cancer cell by 2, 3, or 4 or more nucleotides. In some embodiments, the barcode is designed so that the most complementary sequences in an endogenous nucleic acid molecule present in the cancer cell have a conformation that disfavors barcode binding to the endogenous nucleic acid molecule.
[0203] In some embodiments, the nucleic acid construct encoding the barcode and markers is a single expression cassette. Thus, the expression of each encoded element is correlated with the expression of the other elements. In some embodiments, the nucleic acid construct is a vector (e.g., recombinant plasmids). The term “recombinant vector” includes a vector (e.g., plasmid, phage, phasmid, virus, cosmid, fosmid, or other purified nucleic acid vector) that has been altered, modified or engineered such that it contains greater, fewer or different nucleic acid sequences than those included in the native or natural nucleic acid molecule from which the recombinant vector was derived. For example, a recombinant vector may include a nucleotide sequence encoding a polypeptide (i.e., the markers) and/or a polynucleotide (i.e., the barcode), or fragment thereof, operatively linked to regulatory sequences such as promoter sequences, terminator sequences, long terminal repeats, untranslated regions, and the like, as defined herein. Recombinant expression vectors allow for expression of the genes or nucleic acids included in them.
[0204] In some embodiments of the present disclosure, one or more nucleic acid constructs having a nucleotide sequence encoding one or more of the polypeptides or polynucleotides described herein are operatively linked to one or more regulatory sequences that can integrate the nucleic acid construct into a cancer cell genome. In some embodiments, cancer cells are stably transfected or transduced by the introduced nucleic acid construct. Modified cells can be selected, for example, by detecting the first or second marker. In some embodiments, barcode, and at least one of the marker gene are encoded in different nucleic acid constructs, and will be introduced into the same cell by co-transfection or co-transduction. Any additional elements needed for optimal synthesis of polynucleotides or polypeptides described herein would be apparent to one of ordinary skill in the art.
[0205] In some embodiments, the nucleic acid construct comprises at least one adapter nucleic acid sequence that has a sequence complementary to that of a nucleic acid molecule used in a downstream sequencing reaction. For example, the adapters used in some embodiments are designed to be compatible with next-generation sequencing including, but not limited to, Ion Torrent and MiSeq platforms. In some embodiments, the length of the adapter is between 8 and 20 nucleotides. In some embodiments, the length of the adapter is 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides. The adapter's sequence is designed to reduce or eliminate nonspecific binding of the adapter to an endogenous nucleic acid molecule. In some embodiments, the adapter is designed to have a sequence that is not substantially complementary to any nucleic acid sequence present in an endogenous nucleic acid molecule. In some embodiments, the adapter is designed to diverge from perfect complementarity with the endogenous nucleic acid molecule by 2, 3, or 4 or more nucleotides.
Methods for Characterizing Metastatic Potential
[0206] One aspect of the present disclosure provides a method for characterizing the metastatic potential of a mixture of cancer cell lines in vivo. In one embodiment, the method comprises modifying the cells to comprise a nucleic acid construct encoding a barcode, a first detectable marker, and a second detectable marker, such as the constructs described above. Each distinct cell line in the mixture of cell lines will be modified to express a unique barcode, and each barcode will only be used with a single cell line. The modified cells are systemically administered to a non-human subject and allowed to propagate in the non-human subject. After a period of time, the non-human subject is imaged to detect at least one of the markers encoded in the nucleic acid construct, which allows the location of the cells in the body of the non-human subject to be determined.
[0207] The non-human subject can be any non-human mammal. In some embodiments, the non-human mammal is a mouse, rat, rabbit, pig, goat, or other domesticated mammal. In some embodiments, the non-human animal is immunocompromised. In some embodiments, the non-human subject is an immunocompromised mouse, such as a NOD scid gamma (NSG) mouse.
[0208] Methods of introducing exogenous nucleic acid molecules into a cell are known in the art. For example, eukaryotic cells can take up nucleic acid molecules from the environment via transfection (e.g., calcium phosphate-mediated transfection). Transfection does not employ a virus or viral vector for introducing the exogenous nucleic acid into the recipient cell. Stable transfection of a eukaryotic cell comprises integration into the recipient cell's genome of the transfected nucleic acid, which can then be inherited by the recipient cell's progeny.
[0209] Eukaryotic cells (e.g., human cancer cells) can be modified via transduction, in which a virus or viral vector stably introduces an exogenous nucleic acid molecule to the recipient cell. Eukaryotic transduction delivery systems are known in the art. Transduction of most cell types can be accomplished with retroviral, lentiviral, adenoviral, adeno-associated, and avian virus systems, and such systems are well-known in the art. In some embodiments of the present disclosure, the viral vector system is a lentiviral system.
[0210] In some embodiments, the viral vectors are assembled or packaged in a packaging cell prior to contacting the intended recipient cell. In some embodiments, the vector system is a self-inactivating system, wherein the viral vector is assembled in a packaging cell, but after contacting the recipient cell, the viral vector is not able to be produced in the recipient cell. In some embodiments, the first detectable marker allows in vivo imaging of the cells after delivery to a non-human subject. In some embodiments, the first detectable marker is a bioluminescent marker, such as a luciferase. Luciferases, unlike fluorescent proteins, do not require an external light source to generate a signal, which makes this family of bioluminescent markers suitable for in vivo imaging. In some embodiments, luciferin or an analogous substrate is administered to the non-human subject, which is acted upon by the luciferase to generate bioluminescence. In some embodiments, in vivo imaging comprises bioluminescence imaging. Many imaging methodologies are known in the art that can be utilized in the methods presented herein. Examples of such methodologies include, but are not limited to, those disclosed in U.S. Publication Nos. 20180160099, 20170220733, 20170212986, 20170038574, 20160370295, 20160202185, 20140333750, 20140326922, 20140063194, and 20140038201, the contents of each are incorporated herein by reference in their entirety.
[0211] The second detectable marker is used to isolate and/or sort modified cancer cells from other cells. A technique for isolating or sorting cancer cells comprising a nucleic acid construct as described herein is flow cytometry. In fluorescence activated cell sorting
[0212] (FACS), a fluorescent marker is used to distinguish modified from unmodified cells. In some embodiments, the second marker is a fluorescent polypeptide suitable for cell sorting. In some embodiments, the second marker is a polypeptide having an epitope that is specifically bound by a fluorescently labelled antibody. A gating strategy appropriate for the cells expressing the marker (or otherwise labeled) is used to segregate the cells. For example, modified cancer cells expressing a fluorescent protein (e.g., GFP or mCherry) can be separated from other cells in a sample by using a corresponding gating strategy. In one embodiment, a GFP gating strategy is employed. In some embodiments, an mCherry gating strategy is used. Other methods of isolating cells are known in the art and may be used to segregate modified cancer cells from non-modified cells and from cells derived from a non-human subject.
[0213] To determine from which cell line a particular modified cancer cell is derived from, the barcode within the modified cell is sequenced. Sequencing of the barcodes within the modified cancer cells is accomplished using a next-generation sequencing platform such as IonTorrent or MiSeq, but other platforms are contemplated herein. Additionally, single cell analysis (e.g., single cell RNA sequencing (RNA-seq)) can be used to determine barcode sequences and identify the cell lines from which the modified cancer cells present at a location or in a sample derived. RNA-seq may also be used to generate transcriptome data for the modified cancer cells.
[0214] The abundance of modified cancer cells present in a metastatic lesion is indicative of the metastatic potential of the cell lines from which the cells are derived. In some embodiments, the abundance of modified cancer cells is determined during cell isolation and/or cell sorting. In some embodiments, the modified cells are quantitated during next-generation sequencing or RNA-seq. Other methods of quantitating cells in a sample or tissue are known in the art.
Generating Metastasis maps
[0215] Another aspect of the present disclosure provides methods for generating a metastasis map of cancer cell lines. These methods include systemically delivering a mixture of cells derived from cancer lines to a non-human animal, wherein the cells are modified to comprise a vector encoding a barcode or a vector encoding a barcode and at least one marker as described above. The method for generating the map further involves detecting and quantitating the expression of the barcode, and these steps are also described above. The data derived from quantitating the expression of the barcode is then compiled in a database and associated with the cell's identity (i.e., identifying the cell line from which the cell derived).
[0216] The metastasis map may also include a genomic, transcriptomic, or proteomic profiles of the cell line. In some embodiments, the metastasis map also includes drug sensitivity data, CRISPR knockout viability data, shRNA knockdown data, annotated cell line data, and/or a metabolite profile of the cell line. The data that constitutes the profiles may be generated de novo using methods known in the art.
[0217] The practice of the present invention employs, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry and immunology, which are well within the purview of the skilled artisan. Such techniques are explained fully in the literature, such as, “Molecular Cloning: A Laboratory Manual”, second edition (Sambrook, 1989); “Oligonucleotide Synthesis” (Gait, 1984); “Animal Cell Culture” (Freshney, 1987); “Methods in Enzymology” “Handbook of Experimental Immunology” (Weir, 1996); “Gene Transfer Vectors for Mammalian Cells” (Miller and Calos, 1987); “Current Protocols in Molecular Biology” (Ausubel, 1987); “PCR: The Polymerase Chain Reaction”, (Mullis, 1994); “Current Protocols in Immunology” (Coligan, 1991). These techniques are applicable to the production of the polynucleotides and polypeptides of the invention, and, as such, may be considered in making and practicing the invention. Particularly useful techniques for particular embodiments will be discussed in the sections that follow.
[0218] The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the assay, screening, and therapeutic methods of the invention, and are not intended to limit the scope of what the inventors regard as their invention.
EXAMPLES
Example 1
Assessing the Feasibility and Reliability of In Vivo Barcoding to Monitor Metastasis
[0219] Methods of monitoring metastasis are needed to better understand similarities and differences between different types of cancer. To test the feasibility and reliability of in vivo barcoding to monitor metastasis, a pilot study of four breast cell lines was performed (
[0220] The transcribing barcode design allowed co-capturing of cancer barcodes and cancer transcriptomes of metastases from bulk RNA-Seq analysis, and a workflow was developed that analyzed both (
[0221] Eight barcoded cell lines (the four cell lines modified to express either GFP or mCheRry) were injected as a pool into the left ventricle of recipient mice. Bioluminescence imaging (BLI) revealed metastatic lesions throughout the body (
[0222] The results observed for barcodes quantitated by bulk RNA-Seq were validated by two methods: quantitative RT-PCR and single cell RNA sequencing (
Example 2
Characterizing Metastatic Behavior of Basal-Like Breast Cancer Cell Lines
[0223] Having validated the method for in vivo barcoding to monitor metastasis, a larger subset of breast cancer cells was evaluated for metastatic behaviors. Principal component analysis (PCA) of expression profiles stratified the breast cancer cell lines in the Cancer Cell Line Encyclopedia (CCLE) collection into 3 categories: (1) expression initiated with HS (termed HS cells), displaying fibroblast morphology and characteristics, (2) enriched in luminal subtype, and (3) enriched in basal subtype (
[0224] Cell lines were individually barcoded, pooled at equal numbers, and injected into mice (
[0225] To quantify the cell line metastatic potentials on an absolute scale, the cell count for each cell line in different organs was inferred based on the total number of isolated cancer cells and their compositions as measured by barcode abundance. This metric was then used to compare cell lines across the three pools analyzed (pilot, group 1, and group 2) (
TABLE-US-00001 TABLE 1 Cell Line Comparison Cl.05 Cl.95 mean penetrance BT20_BREAST 0.041739182 1.14120279 0.80999195 0.6 BT549_BREAST 0 0 0 0 CAL851_BREAST1 0 0 0 0 DU4475_BREAST 0 2.94871853 2.47257363 0.125 HCC1143_BREAST 0 0 0 0 HCC1187_BREAST 1.514757066 3.32027789 2.88141499 1 HCC1395_BREAST 0 0.48729684 0.22798267 0.2 HCC1569_BREAST 0 0 0 0 HCC1599_BREAST 1.043079445 2.32452193 2.00075199 0.6 HCC1806_BREAST 0.977933067 3.57631234 3.1808527 0.5 HCC1937_BREAST 0 0 0 0 HCC1954_BREAST 0.398355305 0.74985535 0.60319272 1 HCC38_BREAST 0 0 0 0 HCC70_BREAST 0 1.42579855 1.00116042 0.4 HDQP1_BREAST 0 0 0 0 HMC18_BREAST 0.935278877 2.97359658 2.5970876 0.375 JIMT1_BREAST 3.14528796 3.59363725 3.41008012 0.875 MDAMB157_BREAST 0 0 0 0 MDAMB231_BREAST 2.919645172 3.53553331 3.33308226 1 MDAMB436_BREAST 0 0 0 0 MDAMB468_BREAST 0.377576368 1.25085467 0.97301991 0.375 *Data are presented on a log10 scale
[0226] The analysis characterized some cell lines as pan-metastatic. For example, four cell lines, MDAMB231, HCC1187, JIMT1, and HCC1806 displayed pan-metastatic behaviors. Some showed a propensity for liver, lung, bone, or brain, and others were not metastatic (
Example 3
High-Throughput Characterization of Metastatic Potential
[0227] Having demonstrated feasibility, the metastatic potential was mapped for 500 cancer cell lines spanning 21 cancer types to generate a pan-cancer Metastasis Map (MetMap). To facilitate high throughput profiling, cell lines were used that had been barcoded for use in the PRISM method, which was previously developed for in vitro testing of drug sensitivities (Yu et al., Nat. Biotechnol. 34: 419-23 (2016) the contents of each are hereby incorporated by reference in their entirety).
[0228] PRISM lines were pooled based on their in vitro doubling speed across mixed lineages, with 25 cell lines per pool. Because PRISM barcoded cells did not express GFP or luciferase, introducing labeling markers for cancer cell purification was analyzed to determine if it was critical for the method. One PRISM pool (of 25 cell lines) that contained the JIMT1 cell line was transformed with a GFP-luciferase vector, and cells were sorted by GFP expression (
[0229] The GFP-labeled and unlabeled cell pools were subjected to the same animal workflow, tissue dissociation, and mouse cell depletion. The GFP-labeled group was further sorted to purify cancer cells. Isolated GFP-labeled cancer cells or tissue lysates from the unlabeled cell lines were subjected to barcode amplification and sequencing. A comparison of the two experiments showed highly concordant results. Although the initial barcode distribution of the pre-injected pools had altered (
[0230] MetMap (
[0231] The resulting metastasis map (MetMap) is the largest ever generated (
[0232] It was also noted that the intracardiac injection approach allowed for the evaluation of far more cell lines in vivo compared to traditional subcutaneous (subQ) injection (
[0233] To assess if MetMap reflects the metastatic behavior of various cancers the metastatic potential was compared with clinical annotations of cell lines. Significant association with (1) cancer lineage, (2) where the cell line was derived from, (3) patient age, but not with gender or ethnicity were found (
[0234] Cell lines derived from metastases showed higher metastatic potential than lines derived from primary tumors. Interestingly, multiple cell lines derived from primary tumors known to give rise to metastases in patients were metastatic as xenografts (
[0235] The association with aging of patients was unexpected, where a gradual decline of metastatic potential was observed as the age of cancer patient increased (
[0236] Perhaps most importantly, extensive variation in metastatic potential was observed within individual lineages, thereby making it possible to search for associations between metastasis propensity and genomic features of the tumors. Of note, metastatic potential was not simply explained by cell line proliferation rate or mutational burden (
Example 4
DNA Mutations Associated with Brain Metastasis
[0237] To investigate mechanisms involved in metastasis, efforts were focused on breast cancer and its potential for brain metastasis (see
[0238] Genomic data available for each of the cell lines was used to search for evidence of DNA-level mutations associated with brain metastasis. At the level of single nucleotide variant (SNV) mutations, Phosphatidylinositol-4,5-Bisphosphate 3-Kinase (PIK3CA) mutations were significantly associated with metastasis. 4 of 7 metastatic lines harbored PIK3CA mutation, compared to 0 of 14 non- or weakly-metastatic lines (p=2.3e-06, FDR =0.01,
[0239] Unbiased analysis of the DNA copy number landscape similarly pointed to an association with lipid metabolism. An association was observed between metastatic potential and deletions of chromosome 8p12-8p21.2 (p=7.3e-06, FDR=0.0017,
Example 5
Clinical Relevance of MetMap Data
[0240] To ascertain the clinical relevance of these associations, clinical tumor datasets of breast cancer, among which EMC-MSK contains organ-specific metastasis relapse information for each patient were analyzed (
[0241] To assess PI3K activity in these clinical cohorts, we utilized two PI3K-response signatures, one generated with PIK3CA mutant overexpression, and the other with PI3K-inhibitor treatment. Although the gene identities overlapped little between the two signatures, strong co-regulated patterns were observed in patient tumors (
[0242] Consistent with associations at the genetic level, expression analysis similarly showed an enrichment of a PI3K activation signature in the brain metastatic cell lines (
Example 6
Lipid Biosynthesis Associated with Brain Metastasis in Transcriptome Analysis
[0243] Transcriptomes of the breast cancer cell lines were analyzed to detect associations with brain metastasis. For this analysis, gene expression profiles of cell lines growing in vitro were compared to their profiles in in vivo metastatic lesions (see
[0244] RNA-Seq was used to characterize the transcriptomes, and this protocol captured cancer cell compositions and averaged in vivo transcriptomes of metastases from cell line pools in the breast cancer cohort study. To understand what metastases the transcriptomes encoded, differential expression analysis was performed on the in vivo transcriptomes to cells in vitro. To properly account for the different cell line compositions in each metastasis, a composite in vitro transcriptome was modeled using the barcode composition and single cell line in vitro transcriptomes and then compared to the in vivo results (
[0245] In the pilot group experiments, MDAMB231 dominated lung, liver, kidney, and bone metastases in most samples (
[0246] Having confirmed the validity of these profiles, pathway enrichment analysis was performed to query consensus programs that the differential genes encode at the 5 metastasis sites (
Example 7
Metabolite Profiles Indicate a Role Lipid Synthesis in Metastasis
[0247] To determine if a metabolite profile paralleled the gene expression profiles associated with brain metastatic potential, the abundance of 226 metabolites was analyzed across the breast cancer cell lines (Barretina et al.). As predicted from mRNA profiling, upregulation of cholesterol species in highly brain metastatic cells was observed (
[0248] In contrast, global downregulation was observed for triglycerides (triacylglycerols, TAGs) in brain metastatic cells (
Example 8
SREBF1-Mediated Lipid Metabolism is Associated with Brain Metastasis
[0249] To further investigate the functional significance of a lipid metabolic profile of cells with brain metastatic potential, genome-wide CRISPR/Cas9 viability screening data was analyzed to identify vulnerabilities associated with the brain-metastatic state (Meyers et al., Nat. Genet., 49: 1779-84 (2017), the contents of which are incorporated herein by reference in their entirety). Remarkably, SREBF1 was the top correlated dependency (i.e., cancer cells rely on SREBF1 to switch to a brain-metastatic state in vitro) for brain metastasis (p=5.9e-8, FDR=0.001,
[0250] SREBF1 is a pivotal transcription factor that mediates lipid synthesis downstream of PI3K pathway. To understand if SREBF1 confers the lipid state observed in brain metastatic cells, lipidomics were performed after knocking-out SREBF1 in brain metastatic cell lines JIMT1 (PIK3CA-mut) and HCC1806 (8p-loss). SREBF1 knock-out (KO) resulted in a dramatic shift in intracellular lipid content (
Example 9
Gene Perturbation Shows Significance of SREBF1 and the Lipid Metabolism Pathway Genes in Mediating Brain Metastasis Outgrowth
[0251] Given the repeated observation of lipid metabolism being associated with brain metastatic potential, the functional impact of perturbing the pathway on brain metastasis formation was assessed. Towards this goal, pooled in vivo CRISPR screen of 29 gene candidates in brain metastatic growth were performed using the JIMT1 model (
[0252] To assess how it compared to systemic metastasis, an intracardiac injection assay was performed, focusing on SREBF1. The most dramatic phenotype was that of brain metastasis, where SREBF1-KO cells showed a 196-fold reduction in brain metastasis compared to WT controls (
Example 10
HCC1806 Resort to Lipid Transporter and Binding Protein Upon SREBF1 Deficiency for Growing in Brain Metastasis
[0253] To determine the generality of the SREBF1 requirement for breast cancer growth in the brain, it was knocked out in additional brain metastatic lines including HCC1954, MDAMB231 and HCC1806. As with JIMT1, a significant inhibition in brain metastatic growth was also observed in these lines, although the magnitude and duration of growth inhibition varied (
[0254] This restoration of growth was not explained by escape from genome-editing, as brain metastases at the end of the experiment had evidence of editing at the SREBF1 locus (
[0255] The present disclosure describes MetMap as a new large-scale in vivo characterization of human cancer cell lines that adds a missing dimension to in vitro studies. The MetMap resource currently has metastasis profiles of 125 cell lines spanning 22 tumor types—over an order of magnitude more than was previously available. Ideally, all available cancer cell lines would be characterized for their metastatic potential, thus creating an even larger repertoire of models for exploration of metastasis mechanisms. A limitation of the use of human cell lines for such experiments is that they require the use of immunodeficient mice for in vivo characterization, and the extent to which the immune system plays an important role in mediating organ-specific patterns of metastasis remains to be determined (Topalian et al., Cell 161: 185-86 (2015), the contents of which are incorporated herein by reference in their entirety).
[0256] Multiple lines of experimental and clinical evidence pointed to the role of lipid metabolism in governing the ability of cells to survive in the brain microenvironment. The importance of lipid metabolism in cancer has been recently highlighted by a number of studies (Pascual et al., Nature 541: 41-45 (2017); Zhang et al., Cancer Discov. 8: 1006-25 (2018); Nieman et al., Nat. Med. 17: 1498-1503 (2011), the contents of each are incorporated herein by reference in their entirety), but its role in brain metastasis has not been previously recognized. Particularly intriguing is the notion that interfering with lipid or cholesterol metabolism might abrogate metastatic growth in the brain. The development of brain-penetrant inhibitors of this pathway would allow for this hypothesis to be tested pharmacologically. More generally, this disclosure highlights the complex interplay between cancer cell survival and metabolic states that can vary widely from organ to organ. Exploiting such tumor microenvironmental differences may prove useful as a therapeutic strategy to combat cancer.
[0257] The results reported herein above were obtained using the following methods and materials.
Breast Cancer Cell Lines and Barcoding
[0258] All breast cell lines were obtained from CCLE and cultured under the recommended conditions. Cell line identities were confirmed by SNP fingerprinting as well as RNA-Seq, in comparison to the CCLE results (portals.broadinstitute.org/ccle). The Fluorescence-Luciferase-Barcode (FLB) construct was engineered using the FUW lentiviral vector backbone (a gift from David Baltimore, Addgene plasmid # 14882). Barcodes of 26 nucleotide-long were designed using barcode_generator.py (ver 2.8, comailab.genomecenter.ucdavis.edu/index.php/), and cloned into the landing pad c-terminal to the TGA stop codon of Fluorescence-Luciferase using Gibson assembly (New England Biolabs). Lentivirus preparation and cell infection were performed according to published protocols available at http://www.broadinstitute.org/rnai. Infected cells were subjected to FACS with a fixed gate for GFP or mCherry, using Sony SH4800 sorter.
Animal Studies
[0259] Animal work was performed in accordance with a protocol approved by the Broad Institute Institutional Animal Care and Use Committee (IACUC). NOD scid gamma (NSG) female mice (The Jackson Laboratory) of 5-6 weeks were used. Cancer cells were suspended in PBS+0.4% BSA, and 100 μl of cell suspensions were injected into the left ventricle of anesthetized mice (ketamine 100 mg/kg; xylazine 10 mg/kg). In vivo metastasis progression was monitored via real-time BLI using the IVIS SpectrumCT Imaging System (PerkinElmer), on a weekly basis. Mice were anesthetized with inhaling isoflurane, injected intraperitonially D-Luciferin (150 mg/kg), and imaged with auto exposure setting in prone and supine positions. At the end point, ex vivo BLI was performed by submerging the excised organs in DMEM/F12 media (Thermo Fisher Scientific) containing D-Luciferin for 10 min and imaged with auto exposure setting. BLI analysis was performed using Living Image software (ver 4.5, PerkinElmer). In the case of breast cancer cohort study (pilot, group 1, group 2 in
Tissue Processing and Cancer Cell Isolation from Organs
[0260] Organs including brain, lung, liver, kidney were dissociated using gentleMACS Octo Dissociator with Heaters (Miltenyi Biotec). Bones (from both hind limbs) were chopped into fine pieces and incubated in the dissociation buffer with vigorous shaking. The dissociated cell suspensions were filtered using 100 μm filters, and washed with DMEM/F12 twice. Cell suspensions were then washed with staining buffer (PBS+2 mM EDTA+0.5% BSA), and incubated with mouse cell depletion beads according to the instructions (Miltenyi Biotec). Cell suspensions were subjected to negative selection using autoMACS Pro Separator (Miltenyi Biotec) to deplete mouse stroma. Brains were subjected to an additional myelin debri depletion step using myelin removal beads II (Miltenyi Biotec). The resultant cell suspensions were then subjected to FACS using Sony SH4800 sorter, with the fixed gate for GFP or mCherry. DAPI staining was used to exclude dead cells. For bulk RNA-Seq, cells were sorted to a single tube in PBS+0.4% BSA+RNasin Plus RNase Inhibitor (Promega), centrifuged at 1500 rpm×10 min, and cell pellets were frozen in −80C for downstream use. For single cell RNA-Seq, single cells were sorted into 96-well plates containing cold TCL buffer (Qiagen) containing 1% b-mercaptoethanol, snap frozen on dry ice, and then stored at -80° C. 90 single cells were sorted per plate, the rest wells were used for negative and positive controls.
RNA Extraction, Library Preparation and Sequencing
[0261] Individual cell lines, cell line pools prior to injection, and cells isolated from metastases were subjected to RNA-Seq. RNA extraction was performed using Quick-RNA MicroPrep according to instructions (Zymo Research). RNA was quantified using RNA 6000 Pico Kit on a 2100 Bioanalyzer (Agilent). RNA samples from cell numbers lower than 500 were not measured but all were used as input for library preparation. cDNA was synthesized using Clontech SmartSeq v4 reagents from up to 2 ng RNA input according to manufacturer's instructions (Clontech). Full length cDNA was fragmented to a mean size of 150 bp with a Covaris M220 ultrasonicator and Illumina libraries were prepared from 2 ng of sheared cDNA using Rubicon Genomics Thruplex DNAseq reagents according to manufacturer's protocol. The finished dsDNA libraries were quantified by Qubit fluorometer, Agilent TapeStation 2200, and RT-qPCR using the Kapa Biosystems library quantification kit. Uniquely indexed libraries were pooled in equimolar ratios and sequenced on Illumina NextSeq500 runs with paired-end 75bp reads at the Dana-Farber Cancer Institute Molecular Biology Core Facilities. RT-qPCR quantification of barcodes was performed using Maxima First Strand cDNA Synthesis Kit, Taqman Fast Advanced Master Mix, custom synthesized Taqman probes, and QuantStudio 6 PCR System (ThermoFisher Scientific). Single cell RNA-Seq was performed as previously described (Ramaswamy, S. et al., Nat. Genet. 33, 49-54 (2003), the contents therein are hereby incorporated by reference in their entirety).
Scalable Metastatic Potential Profiling with Barcoded Cell Line Pools.
[0262] To enable profiling of in vivo metastatic potential in a scalable manner, a barcoding vector was designed that contained (1) a fluorescence protein (GFP or mCherry) for cell sorting, (2) a luciferase for real-time in vivo imaging, and (3) a barcode for cell line identity (
[0263] The transcribing barcode design allows co-capturing cancer barcodes and cancer transcriptomes of metastases from bulk RNA-Seq, a workflow and analysis method was developed that readout both (
[0264] To validate RNA-Seq-quantitated barcode results from the pilot study, RT-qPCR was performed using Taqman assays against the barcodes. An examination of individual barcoded lines showed that the Taqman probes were highly specific to the engineered barcodes and there was no cross detection (
[0265] Having validated the feasibility of in vivo barcoding approach, efforts were focused on mapping the metastatic behaviors of basal-like breast cancers from Cancer Cell Line Encyclopedia (CCLE), a breast cancer subtype that displays substantial heterogeneity in metastasis patterns from patient to patient. Principal component analysis (PCA) of expression profiles stratified breast cancer cell lines into 3 categories: (1) one group all initiated with HS and displaying fibroblast characteristics, (2) one enriched in luminal subtype, and (3) one enriched in basal subtype (
TABLE-US-00002 TABLE 2 Cell line Barcode Barcode seq (5′flank_barcode_ name Pool Fluor. ID 3′flank) Site of origin MDAMB231 pilot (pi) GFP BC01 CGTGTAAAGTTAACCTCGAGGGaaccaa pleura aacgctgcagctggcctacgCGATATCA (metastasis) AGCTTATCGATAATCAA HCC1954 pilot (pi) GFP BC02 CGTGTAAAGTTAACCTCGAGGGaggaat primary tacaccqacgcgggactgCGATATCAAG CTTATCGATAATCAA BT549 pilot (pi) GFP BC03 CGTGTAAAGTTAACCTCGAGGGagtcgct primary gcgggttcaatcctggatgCGATATCAAG CTTATCGATAATCAA CAL851 pilot (pi) GFP BC04 CGTGTAAAGTTAACCTCGAGGGcatcag primary ccagacgcctaaggtcatCGATATCAAG CTTATCGATAATCAA MDAMB231 pilot (pi) mCherry BC05 CGTGTAAAGTTAACCTCGAGGGccgacgc pleura gagggttgttctgtqatgaCGATATCAAG (metastasis) CTTATCGATAATCAA HCC1954 pilot (pi) mCherry BC06 CGTGTAAAGTTAACCTCGAGGGgtccgaa primary gacgtctcgcctgcatcaaCGATATCAAG CTTATCGATAATCAA BT549 pilot (pi) mCherry BC07 CGTGTAAAGTTAACCTCGAGGGgtgtgac primary agaaattcctgcaggcggcCGATATCAAG CTTATCGATAATCAA CAL851 pilot (pi) mCherry BC08 CGTGTAAAGTTAACCTCGAGGGttcggcc primary gctcgaaccacgtaagtcaCGATATCAAG CTTATCGATAATCAA BT549 group 1 GFP BC03 CGTGTAAAGTTAACCTCGAGGGagtcgct primary (g1) gcqggttcaatcctggatgCGATATCAAG CTTATCGATAATCAA HDQP1 group 1 GFP BC06 CGTGTAAAGTTAACCTCGAGGGgtccgaa primary (g1) gacgtctcgcctgcatcaaCGATATCAA GCTTATCGATAATCAA JIMT1 group 1 GFP BC08 CGTGTAAAGTTAACCTCGAGGGttcggc pleura (g1) cgctcgaaccacgtaagtcaCGATATCA (metastasis) AGCTTATCGATAATCAA MDAMB157 group 1 GFP BC09 CGTGTAAAGTTAACCTCGAGGGacagct pleura (g1) ttcgacgggtccaagcagccCGATATCA (metastasis) AGCTTATCGATAATCAA MDAMB436 group 1 GFP BC10 CGTGTAAAGTTAACCTCGAGGGacgtcc pleura (g1) ggcccctcacaagcacattcCGATATCA (metastasis) AGCTTATCGATAATCAA HCC1806 group 1 GFP BC11 CGTGTAAAGTTAACCTCGAGGGataagc primary (g1) ggcgctcggtagactgcggtCGATATCA AGCTTATCGATAATCAA HMC18 group 1 GFP BC12 CGTGTAAAGTTAACCTCGAGGGcctggg pleura (g1) cattcgtgtgtccaccccttCGATATCA (metastasis) AGCTTATCGATAATCAA MDAMB468 group 1 GFP BC13 CGTGTAAAGTTAACCTCGAGGGcgccga pleura (g1) ggttgaagcacggttggaacCGATATCA (metastasis) AGCTTATCGATAATCAA DU4475 group 1 GFP BC14 CGTGTAAAGTTAACCTCGAGGGcatgca skin (g1) ggcaatacctgcgagtaacgCGATATCA (metastasis) AGCTTATCGATAATCAA CAL851 group 2 GFP BC04 CGTGTAAAGTTAACCTCGAGGGcatgca primary (g2) gccagacgccctaaggtcatCGATATCA AGCTTATCGATAATCAA HCC1143 group 2 GFP BC05 CGTGTAAAGTTAACCTCGAGGGccgacg primary (g2) cgagggttgttctgtgatgaCGATATCA AGCTTATCGATAATCAA HCC70 group 2 GFP BC07 CGTGTAAAGTTAACCTCGAGGGgtgtga primary (g2) cagaaattcctgcaggcggcCGATATCA AGCTTATCGATAATCAA HCC1395 group 2 GFP BC15 CGTGTAAAGTTAACCTCGAGGGagact primary (g2) tgtccagccgcggcgtagatcCGATAT CAAGCTTATCGATAATCAA HCC1187 group 2 GFP BC16 CGTGTAAAGTTAACCTCGAGGGcaatc primary (g2) aggtagacgggacgcgtgacgCGATAT CAAGCTTATCGATAATCAA HCC38 group 2 GFP BC17 CGTGTAAAGTTAACCTCGAGGGcaggc primary (g2) acctcgtagcagtgctttgccCGATAT CAAGCTTATCGATAATCAA HCC1569 group 2 GFP BC18 CGTGTAAAGTTAACCTCGAGGGcccca primary (g2) ctgtgcccgttcaccagtactCGATAT CAAGCTTATCGATAATCAA HCC1937 group 2 GFP BC19 CGTGTAAAGTTAACCTCGAGGGccgcc primary (g2) tgccagagctaaggtcggttaCGATAT CAAGCTTATCGATAATCAA BT20 group 2 GFP BC20 CGTGTAAAGTTAACCTCGAGGGcggcc primary (g2) ctcggtatcctcagatgtccaCGATAT CAAGCTTATCGATAATCAA HCC1599 group 2 GFP BC21 CGTGTAAAGTTAACCTCGAGGGcgtag primary (g2) cagcaagcgcctagccagtctCGATAT CAAGCTTATCGATAATCAA
[0266] To quantify the cell line metastatic potentials on an absolute scale, the cell number was inferred for each cell line based on the total cancer cell counts and their barcode-quantitated compositions from each organ. This metric was used to compare cell lines across the 3 pool studies. For data visualization, a petal plot was developed that encodes 3 information: (1) metastatic potential as quantified by inferred cell number, (2) its confidence interval that estimates animal variability, (3) and penetrance—percentage of animals in the cohort that the particular cell line was detected (
Drafting MetMap with PRISM Cell Line Pools
[0267] Expansion of metastatic potential mapping beyond breast cancer was attempted as was drafting a comprehensive MetMap for all solid tumor types. Focusing on one cancer type at a time would result in custom pooling and different group sizing, which was neither scalable nor standardizable. For pan-cancer characterization, it also didn't make sense to perform bulk RNA-Seq on mixed cancer types, as lineage would be a strong confounder. In this case, readout at DNA level would be sufficient. PRISM, a barcoded cell line mixture approach developed for high-throughput in vitro drug screen, was used. It was asked whether the PRISM platform could be applied for the in vivo MetMap purpose.
[0268] As part of PRISM profiling, cell lines were pooled based on their in vitro doubling time across mixed lineages, with a size of 25 lines per pool. PRISM barcoded cells did not harbor GFP or luciferase, thus in the first study, it was addressed whether it was critical to introduce the labeling markers for cancer cell purification. One PRISM pool (of 25 cell lines) was chosen that contained JIMT1, labeled with GFP-luciferase vector, and then sorted for GFP.sup.+ cells (
[0269] The positive control JIMT1 was pan-metastatic as expected. Importantly, cell lines such as MELHO, MHHES1 and PC14 substantially dropped in their initial abundance after GFP labeling, yet they gained similar in vivo enrichment as in the non-labeled experiment. These results suggested that we could quantitatively detect barcodes from crude lysates without the need of pure cancer cell isolation from PRISM.
[0270] The simplified workflow using PRISM pools for pan-cancer mapping was employed, and a total of 503 cancer cell lines across 21 cancer types were profiled (
Analysis of In Vivo Metastasis Transcriptomes with Polyclonal Cell Lines
[0271] RNA-Seq co-captured cancer cell composition and averaged in vivo transcriptomes of metastases from cell line pools in the breast cancer cohort study. To understand what metastasis transcriptomes encoded, differential analysis was performed on the in vivo transcriptomes to cells in vitro. To properly account for the different cell line compositions in each metastasis, a composite in vitro transcriptome was modeled using the barcode composition and single cell line in vitro transcriptomes, and then compared to the actual in vivo results (
[0272] To assess whether such comparison identified genes relevant to metastasis, the top differentially expressed genes were inspected. Notably, MUCL1 (also termed small breast epithelial mucin, SBEM) and SCGB2A2 (also known as Mammaglobin, MGB1) were strongly induced in brain metastases as well as in other sites (
[0273] Since MDAMB231 is the most investigated cell line in breast cancer metastasis, it was asked whether genes previously identified and validated as metastasis mediators were induced in the in vivo transcriptomic profiles. In the pilot group experiments, MDAMB231 dominated lung, liver, kidney and bone metastases in most samples (
[0274] Having confirmed the validity of these profiles, pathway enrichment analysis was performed to query consensus programs that the differential genes encode in the 5 organ sites (
Bioinformatic Analysis
[0275] Barcode Quantification from RNA-Seq of Metastases
[0276] Since the RNA-Seq library preparation sheared the cDNA randomly into small pieces, demultiplexed RNA-Seq reads were mapped to the barcode references using Bowtie 2 (Langmead et al., Nat. Methods 9: 357-59 (2012), the contents of which are incorporated herein by reference in their entirety) local mode for barcode detection and quantification. Mapped reads were filtered with the criteria that reads (either 5′ or 3′) must cover over 50% of the barcodes from either end, and counted using samtools. Barcode percentage corresponding to cell composition was calculated for single cell lines, pre-injected cell mixtures, and in vivo metastasis samples.
Metastatic Potential Quantification and Feature Associations
[0277] For breast cohort study, metastatic potential of cell line j targeting organ i, was calculated as:
where c.sub.i is the total cancer cell number isolated from organ i and p.sub.j is the fractional proportion of cell line j estimated by barcode quantification, and n is the number of replicates of mice. To identify features that associate with brain metastatic potential, a 2-class comparison method was used (Ritchie et al., Nucleic Acids Res. 43: e47 (2015), the contents of which are incorporated herein by reference in their entirety). The analysis was performed on mutation, copy number, metabolite (available at https://portals.broadinstitute.org/ccle/), and CRISPR-gene dependency (CERES scores, available at https://depmap.org/portal/) separately. Copy number data were binarized using a cutoff of <=−1 (loss) and >=1 (gain).
Cancer Transcriptomic Analysis from RNA-Seq of Metastases
[0278] Potential mouse contaminating reads were removed by competitive mapping to the human/mouse hybrid genome using BBSplit (https://sourceforge.net/projects/bbmap/). Reads that uniquely mapped to the human genome were then used as input for mapping and gene-level counting with the RSEM package (Li et al., BMC Bioinformatics, 12: 323 (2011), the contents of which are incorporated herein by reference in their entirety). Gene count estimates were normalized using the TMM method (Robinson et al., Bioinformatics 26: 139-40 (2010), the contents of which are incorporated herein by reference in their entirety). For differential analysis, to properly account for the cancer cell composition differences in each in vivo sample, an in silico modeled in vitro mixture was generated first. For each in silico metastasis model, the estimated expression g of gene i is computed as a weighted average of the cell lines present in the corresponding in vivo sample:
[0279] ĝ.sub.i=Σ.sub.j=1.sup.Mg.sub.i,jp.sub.j, where g.sub.i,j is the baseline in vitro expression of gene i in cell line j and pj is the fractional proportion of cell line j in the in vivo sample, as estimated by barcode quantification, and M is the number of cell lines present in the in vivo sample. The in vivo and in silico counterpart were then compared using a paired design for each organ in voom-limma (Ritchie et al.). The three studies, pilot, group 1, and group 2, were analyzed separately. Overlap significance test of two-set or multi-set intersection was performed using cpsets function in the SuperExactTest package (Wang et al., Sci. Rep. 5: 16923 (2015), the contents of which are incorporated herein by reference in their entirety). Gene set enrichment analysis (GSEA) was performed using the GSEA-preranked method in GSEA package (Subramanian et al., Proc. Natl. Acad. Sci. USA 102: 15545-50 (2005), the contents of which are incorporated herein by reference in their entirety. ssGSEA signature projection was performed in GenePattern (genepattern.broadinstitute.org) (Barbie et al., Nature 462: 108-12 (2009), the contents of which are incorporated herein by reference in their entirety). Gene signature data sets were from MSigDB (software.broadinstitute.org/gsea/msigdb/).
[0280] SREBF1 ChIP-Seq peak data were from ENCODE (www.encodeproject.org/) (Consortium et al., Nature 489, 57-74 (2012), the contents of which are incorporated herein by reference in their entirety) and analyzed using ChIPseeker (Yu et al., Bioinformatics 31: 2382-83 (2015), the contents of which are incorporated herein by reference in their entirety).
PRISM In Vivo Assay
[0281] All PRISM cell lines were initially obtained from CCLE. Cell lines were adapted to the same culture condition in pheno red-free RPMI1640 media (ThermoFisher Scientific), and barcoded as previously described (Yu et al., Nat. Biotechnol. 34: 419-23 (2016), the contents of which are incorporated herein by reference in their entirety). PRISM cell lines were pooled based on their in vitro doubling speed bins, at equal number, in the format of 25 lines per pool. Cells were thawed and recovered for 48 hours prior to in vivo injection. To form the large pool of 498 cell lines, 20 PRISM pools were mixed at equal total number right before injection.
[0282] Post in vivo experiments, organs were subjected to tissue dissociation, mouse stroma depletion, and the dissociated cell pellets were frozen in −80° C. as discussed above. The pellets (<=50 mg dry weight) were lysed in 200 μL freshly prepared lysis buffer (with proteinase K), heat digested at 60° C., and denatured at 95° C. for 10 minutes. 20 μL of lysates were used for barcode amplification per 100 μL PCR volume (multiple technical replicates per sample). PCR was performed using the following conditions: 95° C. for 3 minutes; 98° C. for 20 seconds, 57° C. for 15 seconds, 72° C. for 10 seconds (30 cycles); 72° C. for 5 minutes; 4° C. stop. PCR libraries (technical replicates combined) were quantified using 2100 Bioanalyzer (Agilent), normalized, pooled, and gel-purified using QIAquick Gel Extraction Kit (Qiagen). Purified samples were quantified, and 2 nM of libraries with 25% spike-in PhiX DNA were sequenced on Illumina MiSeq or HiSeq at 800 K/mm.sup.2 cluster density.
[0283] De-multiplexed sequencing reads were mapped to the barcode reference to generate a table of cell line barcode counts for each sample/condition. Library-size normalized read counts for each sample were used for calculation of relative metastatic potential. Relative metastatic potential of cell line j targeting organ i, rM.sub.i,j was defined as:
where c.sub.i,j is the read counts of cell line j from organ i, p.sub.jis the read counts of cell line j from pre-injected population, n (n=4˜5) is the number of replicate samples of mice, m (m=4˜5) is the number of replicates of pre-injected population. Confidence intervals reflecting animal variance were calculated using bootstrap.
In Vivo CRISPR Screen and Gene Validation
[0284] CRISPR/Cas9 versions of cell lines were generated by infecting luciferized cells with Cas9-Blast lentivirus and selecting in 5 μg/mL Blasticidin for 10 days with continuous passaging until non-infected controls were killed. For pooled in vivo screen, JIMT1-Cas9 cells were infected with a CRISPR guide library (Table 3) in an arrayed-fashion in 6-well plates, and selected in 2 μg/mL Puromycin for 4 days. At this time, non-infected controls were killed, and no growth defect was observed in the perturbed cell lines. Post antibiotic selection, cells were pooled and subjected to intracranial injection at 6e4 cells per animal in 1 of PBS. This was equivalent to 1e3 cells per guide on average per animal. Intracranial growth was allowed for progression for 4 weeks, and brain tissues were processed adopting the workflow of PRISM in vivo assay, except that guides were amplified using primers targeting the guide vector. De-multiplexed sequencing reads were mapped to the guide reference to generate a table of barcode counts for each guide for each sample. Sequencing-depth was normalized using the upper quartile method and relative depletion was quantitated using a linear model in limma. For individual gene validation (
TABLE-US-00003 TABLE 3 gene guide sequence guide # exon # AADAC AGTCTGAAGCACTAAGAAGG 1 2 AADAC GTTATGACTTGCTGTCAAGA 2 3 ACLY AGAGCAATTCGAGATTACCA 1 11 ACLY GCCAGCGGGAGCACATCGGT 2 11 ACSL3 TATCTAAAGTATCACATCCA 1 4 ACSL3 GTGGTGAAGAGTAACCAATG 2 9 ALDH1A1 AGCATCCATAGTACGCCACG 1 3 ALDH1A1 TTCCAAATGAGCATAACCAA 2 6 ALDH3B2 GGCGCCCACCAGGAGCACCA 1 4 ALDH3B2 AAGCCGTCAGAAATCAGCCA 2 5 CD36 TTCACTATCAGTTGGAACAG 1 8 CD36 AGGATAAAACAGACCAACTG 2 9 CERS4 GGTTACCACCCAATGTCACG 1 3 CERS4 GCTGACCAAGAAGTTCTGTG 2 5 CYP2J2 GTTCTCGCATAGGGGTCACG 1 2 CYP2J2 TTGCTGAAGAGAGTTTGGTG 2 5 CYP4Z1 CCCACAAGGGAACAGCACAT 1 2 CYP4Z1 AGCCAGGTTTCACAATCTGG 2 4 DEGS2 CCACGACATCTCGCACAACG 1 2 DEGS2 CTCCTTCAAGAAGTACCACG 2 2 DGAT2 CTGGCTCAATAGGTCCAAGG 1 2 DGAT2 CCAGGCCCATGATACCATGG 2 5 FASN GATGTATTCAAATGACTCAG 1 7 FASN GAGCATGCTGAACGACATCG 2 9 SCAP CTGCTGGACATAAGCCACCG 1 4 SCAP TGTTCCTGGGAAGTACAGCG 2 6 SCD CAGGTGTAGAACTTGCAGGT 1 2 SCD ATGATCAGAAAGAGCCGTAG 2 3 SCD GATCCTCATAATTCCCGACG 3 4 BDH1 CCGTCGGACTTATGCCAGTG 1 4 BDH1 GTGGCAGAAGTGAACCTTTG 2 7 HMGCS2 GATACTTGGCCAAAGGACGT 1 2 HMGCS2 GACATTGCCGTCTATCCCAG 2 3 HMGCL TGCCCTTCAAGACTTCAGTG 1 4 HMGCL AGTCAGCCAATATTTCTGTG 2 5 G6PD CTTGCCCCCGACCGTCTACG 1 5 G6PD CTTGAAGGTGAGGATAACGC 2 7 H6PD GAAAAAGGTCCCGAGTTCTG 1 2 H6PD CCTCCAGAACCATCTGACGG 2 4 SLC40A1 AAGTAGAGAGAGAATGACCA 1 2 SLC40A1 TCATCAGGATGATTCCACAC 2 4 CEBPA CAGTTCCAGATCGCGCACTG 1 1 SPTSSB TTAAACATAGATCGCTCCCA 1 3 IRX3 GGACGAGAGCACGTTGGACA 1 1 IRX3 CCGTCCCAAGAACGCCACCA 2 2 THRSP GAAGTAGGTGTAGAGATCAG 1 1 THRSP CATGCTCAAGGCCATCTGTG 2 1 SPDEF CTTTGACATGCTGTACCCTG 1 2 SPDEF TGTGGACAGAGCACCAATAC 2 3 UBIAD1 CAAGTGCTCCAGTTTCAGAG 1 1 UBIAD1 AGGAATTGGATTCAAGTACG 2 2 CCDC3 TCTTGGAAATTGACTCCGTG 1 2 CCDC3 AAACAAGGCCTTCTGCACCG 2 3 SREBF1 ACAGGGGTGGAGCTGAACTG 1 2 SREBF1 ACAGTGGTGCCAGAGACCAG 2 4 PMVK TACTGCTGTTCAGCGGCAAG 1 2 PMVK CGGGAAGGACTTCGTGACCG 2 3 CYB5B TGTCACCCGCTTCCTCAACG 1 2
Western Blot
[0285] Protein lysates were prepared in RIPA Lysis Buffer (ThermoFisher Scientific)+cOmplete Mini EDTA-free Protease Inhibitor Cocktail (Roche). Western blot was performed using NuPAGE gel (ThermoFisher Scientific)+Wet/Tank Blotting (Bio-Rad)+Odyssey detection system (LI-COR). SREBF1 primary antibodies (14088-1-AP, Proteintech), GAPDH (D16H11) XP® Rabbit mAb (Cell Signaling), and IRDye® 800CW Goat anti-Mouse IgG, IRDye® 680RD Goat anti-Rabbit IgG secondary antibodies (LI-COR) were used.
SREBF1 CRISPR Knockout Generation
[0286] JIMT1 luciferized cells were infected with Cas9-Blast lentivirus (Sanjana et al., Nat. Methods 11: 783-84 (2014), the contents of which are incorporated herein by reference in their entirety) and selected in Blasticidin (5 μg/mL) for 10 days with continuous passaging until non-infected controls were all killed. JIMT1-Cas9 cells were then subjected to lentiGuide-Puro virus infection that encode SREBF1-targeting (ACAGGGGTGGAGCTGAACTG) or non-targeting (CTCCGTTATGTGGCATGAGA) guides. Infected cells were selected in Blasticidin (5 μg/mL)+Puromycin (2 μg/mL) for 4 days until non-infected controls were all killed. Verification of knockout was confirmed by western blot 10 days after infection. Protein lysates were prepared in Cell Lysis Buffer (Cell Signaling) plus cOmplete Mini EDTA-free Protease Inhibitor Cocktail (Roche). Western blot was performed using NuPAGE gel (ThermoFisher Scientific) +iBlot 2 transfer (ThermoFisher Scientific) plus Odyssey detection system (LI-COR). SREBF1 primary antibodies (sc-17755, sc-365513, Santa Cruz), GAPDH (D16H11) XP® Rabbit mAb (Cell Signaling), and IRDye® 800CW Goat anti-Mouse IgG, IRDye® 680RD Goat anti-Rabbit IgG secondary antibodies (LI-COR) were used.
Tumor Sphere Assay
[0287] Tumor sphere assay was performed in Aggrewell400 24-well plates, according to manufacturer's instructions (StemCell Technologies). Each well contains approximately 1200 micro-wells. Cells were seeded at a density of 4000 cells/well, corresponding to 1-3 cells per micro-well. At the end point, tumor spheres were imaged and quantified using IncuCyte S3 System (EssenBioscience), using whole-well imaging modality.
Clinical Data Analysis
[0288] METABRIC, TCGA, and MSK targeted sequencing breast cancer datasets were downloaded from cBioPortal. EMC-MSK dataset including 615 primary tumors (GSE2034, GSE2603, GSE5327, GSE12276), and the 65 metastasis sample dataset (GSE14020) were collected and processed as previously described (Zhang, X. H. et al., Cell 154, 1060-1073, (2013), the contents of which are incorporated by reference in their entirety). Paired primary breast tumor and brain metastasis RNA-Seq was available from Vareslija et al. To exclude the confounding effect of brain stroma contamination in this dataset, a contamination indicator generated from GSE52604 was applied, and the contaminating effect was regressed out, generating a corrected gene matrix. PI3K-response signatures were from Gatza et al. and Creighton et al. respectively. Signature analysis was conducted as described (Malladi, S. et al., Cell 165, 45-60, (2016), the contents of which are incorporated by reference in their entirety). Hierarchical clustering and heatmap generation were generated using gplots package. Log-rank tests of survival curve difference were calculated using survival package. A multivariate Cox proportional harzards model was built using coxph function (
Computer Implemented Systems
[0289] In some embodiments, the steps of the methodologies and analysis provided herein can be implemented and/or supplemented through the use of computing devices. Any suitable computing device can be used to implement the computing devices and methods/functionality described herein and be converted to a specific system for performing the operations and features described herein through modification of hardware, software, and firmware, in a manner significantly more than mere execution of software on a generic computing device, as would be appreciated by those of skill in the art. One illustrative example of such a computing device 1500 is depicted in
[0290] The computing device 1500 can include a bus 1510 that can be coupled to one or more of the following illustrative components, directly or indirectly: a memory 1512, one or more processors 1514, one or more presentation components 1516, input/output ports 1518, input/output components 1520, and a power supply 1524. One of skill in the art will appreciate that the bus 1510 can include one or more busses, such as an address bus, a data bus, or any combination thereof. One of skill in the art additionally will appreciate that, depending on the intended
[0291] The computing device 1500 can include or interact with a variety of computer-readable media. For example, computer-readable media can include Random Access Memory (RAM); Read Only Memory (ROM); Electronically Erasable Programmable Read Only Memory (EEPROM); flash memory or other memory technologies; CD-ROM, digital versatile disks (DVD) or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices that can be used to encode information and can be accessed by the computing device 1500.
[0292] The memory 1512 can include computer-storage media in the form of volatile and/or nonvolatile memory. The memory 1512 may be removable, non-removable, or any combination thereof. Exemplary hardware devices are devices such as hard drives, solid-state memory, optical-disc drives, and the like. The computing device 1500 can include one or more processors that read data from components such as the memory 1512, the various I/O components 1516, etc. Presentation component(s) 1516 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc.
[0293] The I/O ports 1518 can enable the computing device 1500 to be logically coupled to other devices, such as I/O components 1520. Some of the I/O components 1520 can be built into the computing device 1500. Examples of such I/O components 1520 include a microphone, joystick, recording device, game pad, satellite dish, scanner, printer, wireless device, networking device, and the like.
Other Embodiments
[0294] From the foregoing description, it will be apparent that variations and modifications may be made to the invention described herein to adopt it to various usages and conditions. Such embodiments are also within the scope of the following claims.
[0295] The recitation of a listing of elements in any definition of a variable herein includes definitions of that variable as any single element or combination (or subcombination) of listed elements. The recitation of an embodiment herein includes that embodiment as any single embodiment or in combination with any other embodiments or portions thereof.
[0296] All patents and publications mentioned in this specification are herein incorporated by reference to the same extent as if each independent patent and publication was specifically and individually indicated to be incorporated by reference.