Help and Documentation

GDSC1000 cell lines

Cell line propagation and verification

The GDSC1000 collection comprises well over 1,000 human tumour cell lines. This panel represents the spectrum of common and rare types of adult and childhood cancers of epithelial, mesenchymal and haematopoietic origin. Cell lines have been categorised based on therapeutically relevant tissue descriptions (GDSC descriptions 1 and 2), as well as using the TCGA tumour type descriptions. All cell lines and associated meta-data are identified and linked by a unique COSMIC ID.

Cell lines were sourced from commercial vendors and occasionally academic collaborators. Cells were grown in RPMI or DMEM/F12 medium supplemented with 5% or 10% FBS and penicillin/streptomycin, and maintained at 37°C in a humidified atmosphere at 5% CO2. Cell lines were propagated in these two media in order to minimize the potential effect of varying the media on sensitivity to therapeutic compounds in our assay, and to facilitate high-throughput screening.

To exclude cross-contaminated or synonymous lines, a panel of 92 SNPs was profiled for each cell line (Sequenom, San Diego, CA) and a pair-wise comparison score calculated for in-house identity checking. In addition, we have confirmed the identity of our cancer cell line set against those provided by the repositories, where possible. Each of the cell lines within our core set has been tested using a panel of 16 STRs (AmpFLSTR Identifiler KIT, ABI), which includes the 9 currently used by most of the cell line repositories (ATCC, Riken, JCRB and DSMZ). STR or SNP datasets for each cell line can be accessed through the cancer cell line pages of the COSMIC database (http://cancer.sanger.ac.uk/cell_lines#).

A complete list of cell lines and associated data are available from the downloads page.

Genomic annotation of GDSC1000 cell lines

Cell lines have been comprehensively genetically characterised including:

1. Whole exome sequencing (Agilent SureSelectXT Human All Exon 50Mb bait set)

2. Gene expression (Affymetrix Human Genome U219 Array)

3. Copy number alterations (Affymetrix SNP6.0 Array)

4. DNA methylation (Illumina Human Methylation 450 Array)

5. Gene fusions (targeted PCR sequencing or split probe FISH analysis)

6. Microsatellite Instability (markers BAT25, BAT26, D5S346, D2S123 and D17S250)

Further information is available from the Download page. All datasets are available from the GDSC website, COSMIC database or the appropriate repository (ArrayExpress, GEO, EGA)

Screening

Compounds

Compounds are provided by industry, academic collaborators or sourced from commercial vendors. The range of concentrations selected for each compound is based on in vitro data of concentrations inhibiting relevant kinase activity and cell viability, as well as clinical data indicating peak and trough plasma concentrations in human subjects.

Cell viability assays

Cells are seeded at an optimised density in medium with 5% or 10% FBS and 1% penicillin/streptomycin. The optimal cell number for each cell line is determined to ensure that it is in growth-phase at the end of the assay and to maximise the dynamic range of endpoint measurements. 24 hours after plating, cells are treated with a dose titration of each compound, except for lines screened at MGH where drugging occurs the same day as plating. Following drugging, plates are returned to the incubator for assay at a 72-hour time point. (Cell-lines screened at MGH are drugged the same day as plating). Cell viability is determined using either a DNA dye (Syto60) or metabolic assay (Resazurin or CellTiter-Glo). All screening plates are subject to stringent quality control measures.

GDSC Datasets

The GDSC datasets reflect different experimental setups employed by the project since its inception. GDSC1 is an expansion of the original dataset available from this website and published by Iorio et al. (Cell 2016). GDSC2 has been screened using improved equipment and procedures (see below). Many experiments from GDSC1 have been repeated in GDSC2 and we would recommend, where duplicate IC50s exist, using the result from GDSC2. Raw and fitted data, and ANOVA results are available for GDSC1 and GDSC2 on the downloads page.

GDSC1

The GDSC1 dataset was generated jointly by the Wellcome Sanger Institute and Massachusetts General Hospital between 2009 and 2015 using a matched set of cancer cell lines (the GDSC1000).

Compounds were stored in aliquots at -80°C and were subjected to a maximum of 5 freeze-thaw cycles.

Cells were seeded in 96-well or 384-well plates and compound dose titrations were delivered using tip based liquid handling apparatus. Cell viability was measured using either Syto60 or Resazurin. Drug treatments in this dataset used two formats:

9-point dose curve incorporating a 2-fold dilution step (256-fold range)
5-point dose curve incorporating a 4-fold dilution step (256-fold range)

GDSC2

GDSC2 has been generated at the Wellcome Sanger Institute since 2015 following improvements to the screen design and assay.

Compounds are stored in Storage Pods (Roylan Developments) providing a moisture-free, low oxygen environment, and protection from UV damage.

Cells are seeded in 1536-well plates and an Echo555 Acoustic Dispenser (Labcyte) used to deliver compound doses. Promega CellTiter-Glo is used to measure cell viability at the assay endpoint. Drug treatments use a standard dose response format:

7-point dose curve incorporating a half-log dilution step (1000-fold range)
7-point dose curve with 2 x 2-fold dilutions followed by 4 x 4-fold dilutions (1024-fold range)

Analysis

Datasets are analysed independently. Raw viability readouts are processed using the R package gdscIC50. Viability data are normalized per plate using available negative and positive controls:

GDSC1 - negative controls were cell treatments with media alone, and the positive controls were blank wells with media but no cells
GDSC2 - the negative controls are treatments of the cell with media + DMSO (the compound vehicle in most cases), while the positive controls are again blanks.

Dose-response curves are fitted using the non-linear mixed effects model of Vis et al., incorporated in the gdscIC50 package. All available replicates for an experimental combination (cell line + compound) are used to fit each curve and obtain IC50 and AUC estimates (previous editions of GDSC data have fitted a single dose response). Biomarker discovery uses the GDSCTools python package of Cokelaer et al. to run ANOVA for each dataset independently.

Curve fitting

Fluorescence intensity data from screening plates for each dose response curve is fitted using a multi-level fixed effect model (PubMed ID: 27180993). The viability of the concentration dilution series is assumed to be sigmoidal, the classical dose-response S-shape. This function is fitted to all of the cell line - compound combinations screened. In the multilevel mixed effect model used here, two parameters are used to describe the sigmoidal curve. However, instead of fitting each dose-response series in isolation, the complete set with all combinations of cell lines - compounds screened, is fitted simultaneously. The shape parameter varies only across cell lines, while the position parameter varies across cell lines and compounds. This is a faithful and efficient representation of the data, but most importantly, it allows for borrowing strength by using all observations, which in turn allows for more accurate IC50 estimates.

Statistical analysis

To identify genomic features associated with drug response an analysis of variance (ANOVA) is used to correlate drug response (IC50 values) with genomic alterations in cancer cells including point mutations, recurrently copy number altered chromosomal segments and selected cancer gene re-arrangements (see below for details).

A pan-cancer analysis was performed using all cell lines for which drug response data were available as well as a cancer-specific analyses for each specific cancer type where sufficient data are available.

Below are some guidelines that may be useful when interpreting the data from these analyses:

When evaluating gene-drug associations it is important to consider both the statistical significance of the interactions as well as the effect size (i.e. difference in sensitivity between cell lines with or without a genetic feature). Associations with large-effect size and high statistical significance should be prioritised.
Statistical associations between drug sensitivity and cancer features can occur either in the context of a specific cancer type, and/or with a pan-cancer analysis.
We recommend that you use the scatter plot function to investigate the drug sensitivity data underlying each statistical association. Responses driven by outlier data points should be interpreted cautiously.
Our drug screens are for the testing and generation of new hypothesis. Additional validation steps are essential to demonstrate the clinical significance of our findings.
Our analysis is constrained by the set of cell lines and compounds screened, the set of cancer genetic features utilised for our statistical analysis, as well as the inherent limitation associated with the use of in vitro cancer cell lines for modelling drug response.

Oncogenic aberrations

To guide our statistical analyses we have built a comprehensive map of the oncogenic aberrations in >11,000 human tumors using publically available data from TCGA, ICGC and other studies. This map includes: 1) genes whose mutation patterns in whole exome sequencing (WES) data are consistent with positive selection; and 2) focal recurrently aberrant copy number segments from SNP6 array profiles (RACSs). We identified cancer functional events by combining data across all tumors (pan-cancer) as well as for each cancer type (cancer-specific).

Driver mutations in cancer genes were detected by combining the outputs of three algorithms: MutSigCV, OncodriveFM and OncodriveCLUST (Gonzalez-Perez and Lopez-Bigas, 2012; Lawrence et al., 2013; Rubio-Perez et al., 2015; Tamborero et al., 2013a). Furthermore, we mined the COSMIC database to identify recurrent, and therefore likely oncogenic, gain- or loss-of-function variants within these cancer genes. The detection of RACSs was performed using the ADMIRE algorithm (Chapman et al., 2011; Mok et al., 2009; Shaw et al., 2013; van Dyk et al., 2013). RACs were filtered to require segments to include at least one protein coding or antisense gene, but no more than 100.

The set of clinically relevant features identified from patient tumours was utilised for subsequent downstream ANOVA analysis to identify cancer features associated with drug response in cancer cell lines.

ANOVA model

We perform an analysis of variance (ANOVA) to associated drug sensitivity with individual genomic features. A drug–response vector consisting of n IC50 values from treatment of n cell lines was constructed for each drug. The model was linear (no interaction terms) with dependant variables represented by the described vector and factors including tissue type (for the pan-cancer analysis only), micro-satellite instability status (for the cancer types with positive samples for this feature) and the status of a cancer features. For the pan-cancer analysis, the union of all the cancer-specific features was used. Only cancer features occurring in at least 3 cell lines were considered and features with identical pattern of positive occurrence were merged together, thus resulting into a final set of 667 (individual or combined) features across 988 cell lines (screened against at least 1 drug). In order to include as many cell lines as possible in the pan-cancer analysis (even those not matching a TGCA type), values of the tissue factor were determined by looking at the GDSCdescription_1 label. Whereas for the cancer-specific analysis, only cell lines with a matching TCGA label were used. The tissue factors corresponding to ‘digestive_system’ and ‘urogenital_system’ were further sub-classified by using the more specific GDSCdescription_2 label. For all the tested gene-drug associations, effect size estimations vs. pooled standard deviation (quantified through the Cohen’s d), effect sizes vs. individual standard deviations (quantified through two different Glass deltas, for the feature positive and the feature negative population respectively), p-values and all the other statistical scores were obtained from the fitted models. A Benjamini–Hochberg multiple testing correction was finally applied to the resulting p-values (correcting together all those obtained in the pan-cancer analysis and on a cancer type basis those obtained in a given cancer-specific analysis). A p-value threshold of 10-3 and a false discovery rate threshold equal to 25% were finally used to call significant associations across all the performed analyses.

STR profiles of cell lines

Cell line authentication

Concerns around the identity of cancer cell lines used in scientific research have been increasing over several years and was the topic of a recent editorial in Nature (pubmed 19225471: -, 2009). Cross-contamination has even been shown to be present in such widely used and supposedly well characterised groups of cell lines as the NCI60 set. For instance the NCI60 cell lines OVCAR-8 and NCI-ADR-RES have been shown to be over 97% identical using the Affymetrix SNP6.0 array in this laboratory - this result has been confirmed by multiple laboratories around the world and was recently reported in the scientific literature (pubmed 16504380: Liscovitch and Ravid, 2007). Two other such pairing are also present within the NCI60 series of lines - both M14/MDA-MB-435 (pubmed 17004106: Rae et al, 2007) and U251MG/SNB-19 have identities over 94% when compared using the SNP6 array.

Many of the cell line repositories are now providing short tandem repeat (STR) profiles of the lines they hold allowing identity of lines within the scientific community to be confirmed by a simple assay. We are currently confirming the identity of our cancer cell line set against those provided by the repositories, where possible. Each of the cell lines within our core set is being tested using a panel of 16 STRs (AmpFLSTR Identifiler KIT, ABI), which includes the 9 currently used by most of the cell line repositories (ATCC, Riken, JCRB and DSMZ). We are also providing a single nucleotide polymorphism (SNP) profile based on a panel of 63 SNPs assayed using the Sequenom Genetic Analyser which we use for in-house identity checking whenever a cell line is propagated.

The provision of STR profiles by the cell line repositories and of our in-house cell lines is ongoing and will be updated when appropriate.

Prior to accessing the STR or SNP datasets a Data Access Agreement must be completed: http://www.sanger.ac.uk/genetics/CGP/Archive

The username and password provided can be used to download the STR and SNP profiles for each cell line at the CGP Data Archive: http://www.sanger.ac.uk/research/projects/cancergenome/archive/#t_cl

References

Identity crisis.

No authors listed

Nature 2009;457;7232;935-6

PUBMED: 19225471; DOI: 10.1038/457935b
MDA-MB-435 cells are derived from M14 melanoma cells--a loss for breast cancer, but a boon for melanoma research.

Rae JM, Creighton CJ, Meck JM, Haddad BR and Johnson MD

Division of Hematology/Oncology, Department of Internal Medicine, University of Michigan Medical Center, 1150 West Medical Center Drive, Med Sci I, Room 5323, Ann Arbor, MI 48109-0612, USA. jimmyrae@umich.edu

Background: The tissue of origin of the cell line MDA-MB-435 has been a matter of debate since analysis of DNA microarray data led Ross et al. (2000, Nat Genet 24(3):227-235) to suggest they might be of melanocyte origin due to their similarity to melanoma cell lines. We have previously shown that MDA-MB-435 cells maintained in multiple laboratories are of common origin to those used by Ross et al. and concluded that MDA-MB-435 cells are not a representative model for breast cancer. We could not determine, however, whether the melanoma-like properties of the MDA-MB-435 cell line are the result of misclassification or due to transdifferention to a melanoma-like phenotype.

Methods: We used karyotype, comparative genomic hybridization (CGH), and microsatalite polymorphism analyses, combined with bioinformatics analysis of gene expression and single nucleotide polymorphism (SNP) data, to test the hypothesis that the MDA-MB-435 cell line is derived from the melanoma cell line M14.

Results: We show that the MDA-MB-435 and M14 cell lines are essentially identical with respect to cytogenetic characteristics as well as gene expression patterns and that the minor differences found can be explained by phenotypic and genotypic clonal drift.

Conclusions: All currently available stocks of MDA-MB-435 cells are derived from the M14 melanoma cell line and can no longer be considered a model of breast cancer. These cells are still a valuable system for the study of cancer metastasis and the extensive literature using these cells since 1982 represent a valuable new resource for the melanoma research community.

Funded by: NIGMS NIH HHS: U-O1 GM61373

Breast cancer research and treatment 2007;104;1;13-9

PUBMED: 17004106; DOI: 10.1007/s10549-006-9392-8
A case study in misidentification of cancer cell lines: MCF-7/AdrR cells (re-designated NCI/ADR-RES) are derived from OVCAR-8 human ovarian carcinoma cells.

Liscovitch M and Ravid D

Department of Biological Regulation, Weizmann Institute of Science, P.O.B. 26 Rehovot 76100, Israel. moti.liscovitch@weizmann.ac.il

Multidrug-resistant MCF-7 breast adenocarcinoma cells (originally named MCF-7/AdrR cells and later re-designated NCI/ADR-RES) have served as an important and widely used research tool during the last two decades. However, the real identity of these cells has been in doubt since 1998 and has since been debated. The origin of NCI/ADR-RES cells has now been revealed by SNP and karyotypic analyses, carried out at the Sanger Institute and the NCI, respectively. The results of these analyses, recently posted on the Web, show that NCI/ADR-RES cells are derived from OVCAR-8 ovarian adenocarcinoma cells. The case of NCI/ADR-RES cells highlights a wide-spread problem of cell line cross-contamination and misidentification. Fortunately, this is a tractable problem that can be avoided by scrupulous genotyping of cell stocks and adoption of a few simple rules in cell culture practice.

Cancer letters 2007;245;1-2;350-2

PUBMED: 16504380; DOI: 10.1016/j.canlet.2006.01.013

Description of data presented on the volcano plot

A volcano plot is used to visualise the correlation of drug sensitivity data with genetic events calculated using an ANOVA.

We use two different types of volcano plots to represent our data. Gene specific volcano plots represent the effect of a mutated gene (e.g. BRAF) on the responses to all drugs analysed. A drug-specific volcano plot represents how genomic changes influence response to a specific drug (e.g. BRAF inhibitor PLX4720).

In each volcano plot three pieces of data are represented:

x-axis:: The magnitude of the effect that each genetic events has on cell lines IC50 values in response to a drug. The effect size is proportional to the difference in mean IC50 between wild-type and mutant cell lines. Numbers less than 0 indicate drug sensitivity, numbers greater than 0 indicate drug resistance.
y-axis:: The p-value from the ANOVA of a drug-gene interaction on an inverted log10 scale. For clarity the axis is capped at p = 1 x 10-8 and a plus sign (+) next to a circle indicates that the p-value is smaller than this threshold.
Size of each circle:: The number of cell lines contributing to the analysis for a given genetic feature or drug.
Color: Cancer features associated with statistically significant sensitizing or resistance effects (20% FDR) are coloured green and red, respectively. By hovering over each circle the following information is provided: genetic feature, sample size (number of cell lines with feature screened), effect size and p-value. Non-signficant associations are grey.

Publications

A Landscape of Pharmacogenomic Interactions in Cancer.

Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, Aben N, Gonçalves E, Barthorpe S, Lightfoot H, Cokelaer T, Greninger P, van Dyk E, Chang H, de Silva H, Heyn H, Deng X, Egan RK, Liu Q, Mironenko T, Mitropoulos X, Richardson L, Wang J, Zhang T, Moran S, Sayols S, Soleimani M, Tamborero D, Lopez-Bigas N, Ross-Macdonald P, Esteller M, Gray NS, Haber DA, Stratton MR, Benes CH, Wessels LFA, Saez-Rodriguez J, McDermott U and Garnett MJ

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK; Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK.

Systematic studies of cancer genomes have provided unprecedented insights into the molecular nature of cancer. Using this information to guide the development and application of therapies in the clinic is challenging. Here, we report how cancer-driven alterations identified in 11,289 tumors from 29 tissues (integrating somatic mutations, copy number alterations, DNA methylation, and gene expression) can be mapped onto 1,001 molecularly annotated human cancer cell lines and correlated with sensitivity to 265 drugs. We find that cell lines faithfully recapitulate oncogenic alterations identified in tumors, find that many of these associate with drug sensitivity/resistance, and highlight the importance of tissue lineage in mediating drug response. Logic-based modeling uncovers combinations of alterations that sensitize to drugs, while machine learning demonstrates the relative importance of different data types in predicting drug response. Our analysis and datasets are rich resources to link genotypes with cellular phenotypes and to identify therapeutic options for selected cancer sub-populations.

Funded by: Cancer Research UK; European Research Council: 268626; Marie Curie; NCI NIH HHS: U24 CA143835; Wellcome Trust: 086375, 102696

Cell 2016;166;3;740-754

PUBMED: 27397505; PMC: 4967469; DOI: 10.1016/j.cell.2016.06.017
Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells.

Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, Ramaswamy S, Futreal PA, Haber DA, Stratton MR, Benes C, McDermott U and Garnett MJ

Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

Alterations in cancer genomes strongly influence clinical responses to treatment and in many instances are potent biomarkers for response to drugs. The Genomics of Drug Sensitivity in Cancer (GDSC) database (www.cancerRxgene.org) is the largest public resource for information on drug sensitivity in cancer cells and molecular markers of drug response. Data are freely available without restriction. GDSC currently contains drug sensitivity data for almost 75 000 experiments, describing response to 138 anticancer drugs across almost 700 cancer cell lines. To identify molecular markers of drug response, cell line drug sensitivity data are integrated with large genomic datasets obtained from the Catalogue of Somatic Mutations in Cancer database, including information on somatic mutations in cancer genes, gene amplification and deletion, tissue type and transcriptional data. Analysis of GDSC data is through a web portal focused on identifying molecular biomarkers of drug sensitivity based on queries of specific anticancer drugs or cancer genes. Graphical representations of the data are used throughout with links to related resources and all datasets are fully downloadable. GDSC provides a unique resource incorporating large drug sensitivity and genomic datasets to facilitate the discovery of new therapeutic biomarkers for cancer therapies.

Funded by: Cancer Research UK; NIGMS NIH HHS: T32 GM071340; Wellcome Trust: 086357

Nucleic acids research 2013;41;Database issue;D955-61

PUBMED: 23180760; PMC: 3531057; DOI: 10.1093/nar/gks1111
Systematic identification of genomic markers of drug sensitivity in cancer cells.

Garnett MJ, Edelman EJ, Heidorn SJ, Greenman CD, Dastur A, Lau KW, Greninger P, Thompson IR, Luo X, Soares J, Liu Q, Iorio F, Surdez D, Chen L, Milano RJ, Bignell GR, Tam AT, Davies H, Stevenson JA, Barthorpe S, Lutz SR, Kogera F, Lawrence K, McLaren-Douglas A, Mitropoulos X, Mironenko T, Thi H, Richardson L, Zhou W, Jewitt F, Zhang T, O'Brien P, Boisvert JL, Price S, Hur W, Yang W, Deng X, Butler A, Choi HG, Chang JW, Baselga J, Stamenkovic I, Engelman JA, Sharma SV, Delattre O, Saez-Rodriguez J, Gray NS, Settleman J, Futreal PA, Haber DA, Stratton MR, Ramaswamy S, McDermott U and Benes CH

Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

Clinical responses to anticancer therapies are often restricted to a subset of patients. In some cases, mutated cancer genes are potent biomarkers for responses to targeted agents. Here, to uncover new biomarkers of sensitivity and resistance to cancer therapeutics, we screened a panel of several hundred cancer cell lines--which represent much of the tissue-type and genetic diversity of human cancers--with 130 drugs under clinical and preclinical investigation. In aggregate, we found that mutated cancer genes were associated with cellular response to most currently available cancer drugs. Classic oncogene addiction paradigms were modified by additional tissue-specific or expression biomarkers, and some frequently mutated genes were associated with sensitivity to a broad range of therapeutic agents. Unexpected relationships were revealed, including the marked sensitivity of Ewing's sarcoma cells harbouring the EWS (also known as EWSR1)-FLI1 gene translocation to poly(ADP-ribose) polymerase (PARP) inhibitors. By linking drug activity to the functional complexity of cancer genomes, systematic pharmacogenomic profiling in cancer cell lines provides a powerful biomarker discovery platform to guide rational cancer therapeutic strategies.

Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: 1U54HG006097-01; NIGMS NIH HHS: P41GM079575-02; Wellcome Trust: 086357

Nature 2012;483;7391;570-5

PUBMED: 22460902; PMC: 3349233; DOI: 10.1038/nature11005

Background reading

The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehár J, Kryukov GV, Sonkin D, Reddy A, Liu M, Murray L, Berger MF, Monahan JE, Morais P, Meltzer J, Korejwa A, Jané-Valbuena J, Mapa FA, Thibault J, Bric-Furlong E, Raman P, Shipway A, Engels IH, Cheng J, Yu GK, Yu J, Aspesi P, de Silva M, Jagtap K, Jones MD, Wang L, Hatton C, Palescandolo E, Gupta S, Mahan S, Sougnez C, Onofrio RC, Liefeld T, MacConaill L, Winckler W, Reich M, Li N, Mesirov JP, Gabriel SB, Getz G, Ardlie K, Chan V, Myer VE, Weber BL, Porter J, Warmuth M, Finan P, Harris JL, Meyerson M, Golub TR, Morrissey MP, Sellers WR, Schlegel R and Garraway LA

The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.

The systematic translation of cancer genomic data into knowledge of tumour biology and therapeutic possibilities remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacological annotation is available. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacological profiles for 24 anticancer drugs across 479 of the cell lines, this collection allowed identification of genetic, lineage, and gene-expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Together, our results indicate that large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of 'personalized' therapeutic regimens.

Funded by: NCI NIH HHS: R33 CA126674, R33 CA126674-04, R33 CA155554, R33 CA155554-02; NIH HHS: DP2 OD002750, DP2 OD002750-01

Nature 2012;483;7391;603-7

PUBMED: 22460905; PMC: 3320027; DOI: 10.1038/nature11003
Reproducible pharmacogenomic profiling of cancer cell line panels.

Haverty PM, Lin E, Tan J, Yu Y, Lam B, Lianoglou S, Neve RM, Martin S, Settleman J, Yauch RL and Bourgon R

Department of Bioinformatics and Computational Biology, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, USA.

The use of large-scale genomic and drug response screening of cancer cell lines depends crucially on the reproducibility of results. Here we consider two previously published screens, plus a later critique of these studies. Using independent data, we show that consistency is achievable, and provide a systematic description of the best laboratory and analysis practices for future studies.

Nature 2016;533;7603;333-7

PUBMED: 27193678; DOI: 10.1038/nature17987

Pharmacogenomic agreement between two cancer cell line data sets.

Cancer Cell Line Encyclopedia Consortium and Genomics of Drug Sensitivity in Cancer Consortium

Large cancer cell line collections broadly capture the genomic diversity of human cancers and provide valuable insight into anti-cancer drug response. Here we show substantial agreement and biological consilience between drug sensitivity measurements and their associated genomic predictors from two publicly available large-scale pharmacogenomics resources: The Cancer Cell Line Encyclopedia and the Genomics of Drug Sensitivity in Cancer databases.

Funded by: Cancer Research UK: A16629; NHGRI NIH HHS: 1U54HG006097-01; Wellcome Trust: 086357, 102696

Nature 2015;528;7580;84-7

PUBMED: 26570998; DOI: 10.1038/nature15736

Harnessing Connectivity in a Large-Scale Small-Molecule Sensitivity Dataset.

Seashore-Ludlow B, Rees MG, Cheah JH, Cokol M, Price EV, Coletti ME, Jones V, Bodycombe NE, Soule CK, Gould J, Alexander B, Li A, Montgomery P, Wawer MJ, Kuru N, Kotz JD, Hon CS, Munoz B, Liefeld T, Dančík V, Bittker JA, Palmer M, Bradner JE, Shamji AF, Clemons PA and Schreiber SL

Center for the Science of Therapeutics, Broad Institute, Cambridge, Massachusetts.

Unlabelled: Identifying genetic alterations that prime a cancer cell to respond to a particular therapeutic agent can facilitate the development of precision cancer medicines. Cancer cell-line (CCL) profiling of small-molecule sensitivity has emerged as an unbiased method to assess the relationships between genetic or cellular features of CCLs and small-molecule response. Here, we developed annotated cluster multidimensional enrichment analysis to explore the associations between groups of small molecules and groups of CCLs in a new, quantitative sensitivity dataset. This analysis reveals insights into small-molecule mechanisms of action, and genomic features that associate with CCL response to small-molecule treatment. We are able to recapitulate known relationships between FDA-approved therapies and cancer dependencies and to uncover new relationships, including for KRAS-mutant cancers and neuroblastoma. To enable the cancer community to explore these data, and to generate novel hypotheses, we created an updated version of the Cancer Therapeutic Response Portal (CTRP v2).

Significance: We present the largest CCL sensitivity dataset yet available, and an analysis method integrating information from multiple CCLs and multiple small molecules to identify CCL response predictors robustly. We updated the CTRP to enable the cancer research community to leverage these data and analyses.

Funded by: Howard Hughes Medical Institute; NCI NIH HHS: RC2 CA148399, U01 CA176152, U01CA176152

Cancer discovery 2015;5;11;1210-23

PUBMED: 26482930; PMC: 4631646; DOI: 10.1158/2159-8290.CD-15-0235

An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules.

Basu A, Bodycombe NE, Cheah JH, Price EV, Liu K, Schaefer GI, Ebright RY, Stewart ML, Ito D, Wang S, Bracha AL, Liefeld T, Wawer M, Gilbert JC, Wilson AJ, Stransky N, Kryukov GV, Dancik V, Barretina J, Garraway LA, Hon CS, Munoz B, Bittker JA, Stockwell BR, Khabele D, Stern AM, Clemons PA, Shamji AF and Schreiber SL

The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

The high rate of clinical response to protein-kinase-targeting drugs matched to cancer patients with specific genomic alterations has prompted efforts to use cancer cell line (CCL) profiling to identify additional biomarkers of small-molecule sensitivities. We have quantitatively measured the sensitivity of 242 genomically characterized CCLs to an Informer Set of 354 small molecules that target many nodes in cell circuitry, uncovering protein dependencies that: (1) associate with specific cancer-genomic alterations and (2) can be targeted by small molecules. We have created the Cancer Therapeutics Response Portal (http://www.broadinstitute.org/ctrp) to enable users to correlate genetic features to sensitivity in individual lineages and control for confounding factors of CCL profiling. We report a candidate dependency, associating activating mutations in the oncogene β-catenin with sensitivity to the Bcl-2 family antagonist, navitoclax. The resource can be used to develop novel therapeutic hypotheses and to accelerate discovery of drugs matched to patients by their cancer genotype and lineage.

Funded by: NCI NIH HHS: K08 CA148887, R01 CA097061, R01 CA161061, RC2 CA148399, RC2-CA148399, U54 CA112962

Cell 2013;154;5;1151-61

PUBMED: 23993102; PMC: 3954635; DOI: 10.1016/j.cell.2013.08.003

Cell line-based platforms to evaluate the therapeutic efficacy of candidate anticancer agents.

Sharma SV, Haber DA and Settleman J

Center for Molecular Therapeutics, Massachusetts General Hospital Cancer Center and Harvard Medical School, 149 13th Street, Charlestown, MA 02129, USA.

Efforts to discover new cancer drugs and predict their clinical activity are limited by the fact that laboratory models to test drug efficacy do not faithfully recapitulate this complex disease. One important model system for evaluating candidate anticancer agents is human tumour-derived cell lines. Although cultured cancer cells can exhibit distinct properties compared with their naturally growing counterparts, recent technologies that facilitate the parallel analysis of large panels of such lines, together with genomic technologies that define their genetic constitution, have revitalized efforts to use cancer cell lines to assess the clinical utility of new investigational cancer drugs and to discover predictive biomarkers.

Nature reviews. Cancer 2010;10;4;241-53

PUBMED: 20300105; DOI: 10.1038/nrc2820
High-throughput lung cancer cell line screening for genotype-correlated sensitivity to an EGFR kinase inhibitor.

McDermott U, Sharma SV and Settleman J

Center for Molecular Therapeutics, Massachusetts General Hospital Cancer Center and Harvard Medical School, Charlestown, Massachusetts, USA.

Human cancer cell lines that can be propagated and manipulated in culture have proven to be excellent models for studying many aspects of gene function in cancer. In addition, they can provide a powerful system for assessing the molecular determinants of sensitivity to anticancer drugs. They have also been used in recent studies to identify genomic alterations and gene expression patterns that provide important insights into the genetic features that distinguish the properties of tumor cells associated with similar histologies. We have established a large repository of human tumor cell lines (>1000) corresponding to a wide variety of tumor types, and we have developed a methodology for profiling the collection for sensitivity to putative anticancer compounds. The rationale for examining tumor cell lines on this relatively large scale reflects accumulating evidence indicating that there is substantial genetic heterogeneity among human tumor cells-even those derived from tumors of similar histologies. Thus, to develop an accurate picture of the molecular determinants of tumorigenesis and response to therapy, it is essential to study the nature of such heterogeneity in a relatively large sample set. Here, we describe the methodologies used to conduct such screens and we describe a "proof-of-concept" screen using the EGFR kinase inhibitor, erlotinib (Tarceva), with a panel of lung cancer lines to demonstrate a correlation between EGFR mutations and drug sensitivity.

Methods in enzymology 2008;438;331-41

PUBMED: 18413259; DOI: 10.1016/S0076-6879(07)38023-3
Identification of genotype-correlated sensitivity to selective kinase inhibitors by using high-throughput tumor cell line profiling.

McDermott U, Sharma SV, Dowell L, Greninger P, Montagut C, Lamb J, Archibald H, Raudales R, Tam A, Lee D, Rothenberg SM, Supko JG, Sordella R, Ulkus LE, Iafrate AJ, Maheswaran S, Njauw CN, Tsao H, Drew L, Hanke JH, Ma XJ, Erlander MG, Gray NS, Haber DA and Settleman J

Center for Molecular Therapeutics, Massachusetts General Hospital Cancer Center and Harvard Medical School, 149 13th Street, Charlestown, MA 02129, USA.

Kinase inhibitors constitute an important new class of cancer drugs, whose selective efficacy is largely determined by underlying tumor cell genetics. We established a high-throughput platform to profile 500 cell lines derived from diverse epithelial cancers for sensitivity to 14 kinase inhibitors. Most inhibitors were ineffective against unselected cell lines but exhibited dramatic cell killing of small nonoverlapping subsets. Cells with exquisite sensitivity to EGFR, HER2, MET, or BRAF kinase inhibitors were marked by activating mutations or amplification of the drug target. Although most cell lines recapitulated known tumor-associated genotypes, the screen revealed low-frequency drug-sensitizing genotypes in tumor types not previously associated with drug susceptibility. Furthermore, comparing drugs thought to target the same kinase revealed striking differences, predictive of clinical efficacy. Genetically defined cancer subsets, irrespective of tissue type, predict response to kinase inhibitors, and provide an important preclinical model to guide early clinical applications of novel targeted inhibitors.

Funded by: NCI NIH HHS: R01 CA115830

Proceedings of the National Academy of Sciences of the United States of America 2007;104;50;19936-41

PUBMED: 18077425; PMC: 2148401; DOI: 10.1073/pnas.0707498104

Publications utilising our data

Title	Journal	PubMed ID
A novel heterogeneous network-based method for drug response prediction in cancer cell lines.	Sci Rep	PMID: 29463808
PharmacoDB: an integrative database for mining in vitro anticancer drug screening studies	Nucleic Acids Res.	PMCID: PMC5753377
EWS/FLI Confers Tumor Cell Synthetic Lethality to CDK12 Inhibition in Ewing Sarcoma	Cancer Cell	PMID: 29358035
Unearthing new genomic markers of drug response by improved measurement of discriminative power.	BMC Med Genomics	PMID: 29409485
DeSigN: connecting gene expression with therapeutics for drug repurposing and development	MBC Genomics	PMID: 28198666
Discordancy Partitioning for Validating Potentially Inconsistent Pharmacogenomic Studies	Sci Rep	PMID: 29123200
A tool for discovering drug sensitivity and gene expression associations in cancer cells	PLOS one	PMCID: PMC5409143
Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data	F1000Research	PMID:28299173
Intra- and interspecies gene expression models for predicting drug response in canine osteosarcoma	BMC Bioinformatics	PMCID: PMC4759767
Reproducible pharmacogenomic profiling of cancer cell line panels	Nature	PMID:27193678
Pharmacogenomic agreement between two cancer cell line data sets	Nature	PMID:26570998
CATTLE (CAncer treatment treasury with linked evidence): An integrated knowledge base for personalized oncology research and practice	CPT Pharmacometrics Syst Pharmacol	PMID: 28296354
Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours	Oncotarget	PMID:29228590
Suppression of 19S proteasome subunits marks emergence of an altered cell state in diverse cancers	Proc Natl Acad Sci	PMID:28028240
Pharmacoproteomic characterisation of human colon and rectal cancer	Mol Syst Biol.	PMID:29101300
Pharmaco-genomic investigations of organo-iridium anticancer complexes reveal novel mechanism of action	Metallomics	PMID:29131211
Colorectal Cancer Cell Line Proteomes Are Representative of Primary Tumors and Predict Drug Sensitivity	Gastroenterology	PMID:28625833
Integrated genomic analysis of recurrence-associated small non-coding RNAs in oesophageal cancer	Gut	PMID:27507904
Drug Sensitivity Assays of Human Cancer Organoid Cultures	Methods Mol Biol	PMID:27628132
Acquired savolitinib resistance in non-small cell lung cancer arises via multiple mechanisms that converge on MET-independent mTOR and MYC activation	Oncotarget	PMID:27472392
Cancer biomarker discovery is improved by accounting for variability in general levels of drug sensitivity in pre-clinical models	Genome Biol.	PMID: 27654937
Identifying anti-cancer drug response related genes using an integrative analysis of transcriptomic and genomic variations with cell line-based drug perturbations.	Oncotarget	PMCID: PMC4891048
Integrating Domain Specific Knowledge and Network Analysis to Predict Drug Sensitivity of Cancer Cell Lines.	PloS One	PMID: 27607242
Oncogenic KRAS triggers MAPK-dependent errors in mitosis and dependent errors in mitosis and MYC-dependent sensitivity to anti-mitotic agents	Scientific Reports	PMID: 27412232
Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy	Scientific Reports	PMID: 27876821
Consistency in large pharmacogenomic studies	Nature	PMID:27905415
Drug response consistency in CCLE and CGP	Nature	PMID:27905419
Consistency in drug response profiling	Nature	PMID:27905421
Cancer biomarker discovery is improved by accounting for variability in general levels of drug sensitivity in pre-clinical models	Genome Biology	PMCID: PMC5031330
PharmacoGx: an R package for analysis of large pharmacogenomic datasets	JBUON	PMID: 26656004
A Vulnerability of a Subset of Colon Cancers with Potential Clinical Utility	Cell	PMID: 27058664
HER2+ Cancer Cell Dependence on PI3K vs. MAPK Signaling Axes is determined by Expression of EGFR, ERBB3 and CDKN1B	PLoS Comput Biol	PMID: 27035903
Integrating heterogeneous drug sensitivity data from cancer pharmacogenomic studies.	Oncotarget	PMID: 27322211
The tandem duplicator phenotype as a distinct genomic configuration in cancer	PNAS PLUS	PMID: 27071093
Assessment of pharmacogenomic agreement	F1000Research	PMID: 27408686
Multilevel models improve precision and speed of IC50 estimates	Pharmacogenomics	PMID: 27180993
Identification of differential PI3K pathway target dependencies in T-cell acute lymphoblastic leukemia through a large cancer cell panel screen	Oncotarget	PMID: 26989080
Exploitation of the Apoptosis-Primed State of MYCN-Amplified Neuroblastoma to Develop a Potent and Specific Targeted Therapy Combination	Cancer Cell	PMID: 26859456
Integration of genomic, transcriptomic and proteomic data identifies two biologically distinct subtypes of invasive lobular breast cancer	Sci Rep	PMID: 26729235
Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties	peerJ	PMID: 26644976
From drug response to target addiction scoring in cancer cell models	Disease Models & Mechanisms	PMID: 26438695
Predicting Anticancer Drug Responses Using a Dual-Layer Integrated Cell Line-Drug Network Model	PLOS one	PMCID: PMC4587957
Optimal Drug Prediction From Personal Genomics profiles	IEEE Journal of Biomedical and Health Informatics	PMID: 25781964
High selectivity of PI3Kβ inhibitors in SETD2-mutated renal clear cell carcinoma	J BUON	PMID: 26537074
Compromising the 19S proteasome complex protects cells from reduced flux through the proteasome	eLIFE	PMCID: PMC4551903
Integrated Analysis of Transcriptome in Cancer Patient-Derived Xenografts	PLOS one	PMID: 25951608
PI3Kb Inhibitor TGX221 Selectively Inhibits Renal Cell Carcinoma Cells with Both VHL and SETD2 mutations and Links Multiple Pathways	Scientific Reports	PMID: 25853938
Characterization of the Tyrosine Kinase-Regulated Proteome in Breast Cancer by Combined use of RNA interference (RNAi) and Stable Isotope Labeling with Amino Acids in Cell Culture (SILAC) Quantitative Proteomics	Molecular & Cellular Proteomics	PMID: 26089344
Loss of MLH1 confers resistance to PI3Kβ inhibitors in renal clear cell carcinoma with SETD2 mutation	Tumour Biol	PMID: 25528216
Denoising perturbation signatures reveals an actionable AKT-signaling gene module underlying a poor clinical outcome in endocrine treated ER+ breast cancer	Genome Biology	PMID: 25886003
Cell Index Database (CELLX): a web tool for cancer precision medicine	Pac Symp Biocomput.	PMID: 25592564
A comprehensive transcriptional portrait of human cancer cell lines	Nat Biotechnol.	PMID: 25485619
Predicting Response to Histone Deacetylase Inhibitors Using High-Throughput Genomics	J Natl Cancer Inst	PMCID: PMC4643634
Using drug response data to identify molecular effectors, and molecular "omic" data to identify candidate drugs in cancer.	Hum Genet.	PMID:25213708
Assessment of ABT-263 activity across a cancer cell line collection leads to a potent combination therapy for small-cell lung cancer	Proc Natl Acad	PMID: 25737542
High selectivity of PI3Kβ inhibitors in SETD2-mutated renal clear cell carcinoma	J BUON	PMID: 26537074
CDK4/6 inhibitor suppresses gastric cancer with CDKN2A mutation	Int J Clin Exp Med.	PMID: 26380006
Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel	Bioinformatics.	PMID: 26351271
Co-active receptor tyrosine kinases mitigate the effect of FGFR inhibitors in FGFR1-amplified lung cancers with low FGFR1 protein expression	Oncogene.	PMID: 26549034
Recursive Random Lasso (RRLasso) for Identifying Anti-Cancer Drug Targets	PLoS One.	PMID: 26544691
Data Mining Approaches for Genomic Biomarker Development: Applications Using Drug Screening Data from the Cancer Genome Project and the Cancer Cell Line Encyclopedia	PLoS One	PMID: 26132924
LIM kinase inhibitors disrupt mitotic microtubule organization and impair tumor cell proliferation	Oncotarget	PMID: 26540348
A Semi-Supervised Approach for Refining Transcriptional Signatures of Drug Response and Repositioning Predictions	PLoS ONE	PMID: 26452147
Designing of promiscuous inhibitors against pancreatic cancer cell lines	Scientific Reports	PMID: 24728108
Modeling RAS phenotype in colorectal cancer uncovers novel molecular traits of RAS dependency and improves prediction of response to targeted agents in patients	Clin Cancer Res.	PMID: 24170544
Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines.	Genome Biol.	PMID: 24580837
The REST gene signature predicts drug sensitivity in neuroblastoma cell lines and is significantly associated with neuroblastoma tumor stage	Int J Mol Sci.	PMCID: PMC4139778
Disruption of CRAF-Mediated MEK Activation Is Required for Effective MEK Inhibition in KRAS Mutant Tumors	Cancer Cell	PMID: 24746704
A community effort to assess and improve drug sensitivity prediction algorithms	Nature Biotechnology	PMID: 24880487
The evolving role of cancer cell line-based screens to define the impact of cancer genomes on drug response	Curr Opin Genet Dev.	PMID: 24607840
Inconsistency in large pharmacogenomic studies	Nature	PMID: 24284626
Targeting MYCN in neuroblastoma by BET bromodomain inhibition	Cancer Discov.	PMID: 26631615
Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells	Nucleic Acids Res.	PMID: 23180760
Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties	PLoS One.	PMID: 23646105
VS-5584, a novel and highly selective PI3K/mTOR kinase inhibitor for the treatment of cancer	Mol Cancer Ther.	PMID: 23270925
Mcl-1 and FBW7 control a dominant survival pathway underlying HDAC and Bcl-2 inhibitor synergy in squamous cell carcinoma	Cancer Discov.	PMID: 23274910
Systematic identification of genomic markers of drug sensitivity in cancer cells	Nature	PMID: 22460902
MED12 controls the response to multiple cancer drugs through regulation of TGF-β receptor signaling	Cell	PMID: 23178117
Subtypes of primary colorectal tumors correlate with response to targeted treatment in colorectal cell lines	BMC Med Genomics	PMID: 23272949
Integrative and Personalized QSAR Analysis in Cancer by Kernelized Bayesian Matrix Factorization		PMID: 25046554
Systematic Assessment of analytical methods for drug sensitivity predictions from cancer cell line data	Sage Bionetworks	PMID: 24297534

Contact us

For questions regarding data, analyses and results please contact us by email at: cancerrxgene@sanger.ac.uk.

We are committed to working with collaborators to extend the scope of our research. We currently collaborate with more than 30 organisation from academia, biotech and the pharmaceutical industry. Please feel free to contact us to initiate a discussion on potential collaborations by email at: GDSCscreening@sanger.ac.uk .

Interested in receiving 'Genomics of Drug Sensitivity in Cancer' news and release information? Then sign up for the Translation-announce mailing list.

Genomics of Drug Sensitivity in Cancer

GDSC1000 cell lines

Cell line propagation and verification

Genomic annotation of GDSC1000 cell lines

Screening

Compounds

Cell viability assays

GDSC Datasets

Analysis

Statistical analysis

Oncogenic aberrations

ANOVA model

STR profiles of cell lines

Cell line authentication

References

Identity crisis.

MDA-MB-435 cells are derived from M14 melanoma cells--a loss for breast cancer, but a boon for melanoma research.

A case study in misidentification of cancer cell lines: MCF-7/AdrR cells (re-designated NCI/ADR-RES) are derived from OVCAR-8 human ovarian carcinoma cells.

Description of data presented on the volcano plot

Publications

A Landscape of Pharmacogenomic Interactions in Cancer.

Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells.

Systematic identification of genomic markers of drug sensitivity in cancer cells.

Background reading

The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Reproducible pharmacogenomic profiling of cancer cell line panels.

Pharmacogenomic agreement between two cancer cell line data sets.

Harnessing Connectivity in a Large-Scale Small-Molecule Sensitivity Dataset.

An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules.

Cell line-based platforms to evaluate the therapeutic efficacy of candidate anticancer agents.

High-throughput lung cancer cell line screening for genotype-correlated sensitivity to an EGFR kinase inhibitor.

Identification of genotype-correlated sensitivity to selective kinase inhibitors by using high-throughput tumor cell line profiling.

Publications utilising our data

Contact us