Help and Documentation

Screening methodologies

Cell line verification and propagation

All cell lines were sourced from commercial vendors. Cells were grown in RPMI or DMEM/F12 medium supplemented with 5% or 10% FBS and penicillin/streptomycin, and maintained at 37°C in a humidified atmosphere at 5% CO2. Cell lines were propagated in these two media in order to minimize the potential effect of varying the media on sensitivity to therapeutic compounds in our assay, and to facilitate high-throughput screening. To exclude cross-contaminated or synonymous lines, a panel of 92 SNPs was profiled for each cell line (Sequenom, San Diego, CA) and a pair-wise comparison score calculated. In addition, to confirm the identity of each cell line we performed short tandem repeat (STR) analysis (AmpFlSTR Identifiler, Applied Biosystems, Carlsbad, CA) and matched this to an existing STR profile generated by the providing repository.

Screening drugs

Compounds were provided by academic collaborators or were sourced from commercial vendors. Where possible we have provided a full description for each compound including its name, alternative names of synonyms, PubChem and/or CHEMBL IDs, screening concentration as well as reported therapeutically relevant molecular target(s). Compounds were generally stored as 10 mM aliquots at -80°C, and were subjected to a maximum of 5 freeze-thaw cycles. The range of concentrations selected for each compound was based on in vitro data of concentrations inhibiting relevant kinase activity and cell viability, as well as clinical data indicating peak and trough plasma concentrations in human subjects.

Cell viability assays

Cells were seeded in either 96-well or 384-well microplates at ~15% confluency in medium with 5% or 10% FBS and penicillin/streptomycin. The optimal cell number for each cell line was determined to ensure that each was in growth phase at the end of the assay. For adherent cell lines, after overnight incubation cells were treated with 9 concentrations of each compound (2-fold dilution series over a 256-fold concentration range) and then returned to the incubator for assay at a 72 hour time point. Cells were fixed in 4% formaldehyde for 30 minutes and then stained with 1 µM of the fluorescent nucleic acid stain Syto60 (Invitrogen) for 1 hour. For suspension cell lines, cells were treated with compound immediately following plating, returned to the incubator for a 72 hour time point, then stained with 55 µg/ml Resazurin (Sigma) prepared in Glutathione-free media for 4 hours. Quantitation of fluorescent signal intensity was performed using a fluorescent plate reader at excitation and emission wavelengths of 630/695 nM for Syto60, and 535/595 nM for Resazurin. All screening plates were subjected to stringent quality control measures using a modified Z-factor score.

Genomic and transcriptional characterization of cancer cell lines

A panel of the most frequently mutated cancer genes was sequenced to base-pair resolution across all coding exons for each gene by capillary sequencing in our panel of human cancer cell lines, and which formed the basis for the cell lines chosen for this drug screen. The presence of commonly rearranged cancer genes (e.g. BCR-ABL, MLL-AFF1 and EWS-FLI1) was determined across the drug screen cell line panel by the design of breakpoint-specific sequence primers that enabled the detection of the rearrangement following capillary sequencing. Analysis of microsatellite instability (MSI) was carried out according to the guidelines set down by "The International Workshop on Microsatellite Instability and RER Phenotypes in Cancer Detection and Familial Predisposition" workshop. Samples were screened using the markers BAT25, BAT26, D5S346, D2S123 and D17S250 and were characterised as MSI if two or more markers showed instability. Total integral copy number values across the footprints of the cancer genes were determined from Affymetrix SNP6.0 microarray data using the 'PICNIC' algorithm to predict copy number segments in each of the cell lines. For a gene to be classified as amplified, the entire coding sequence must be contained in one contiguous segment defined by PICNIC, and have a total copy number of eight or more. Deletions must occur within a single contiguous segment with copy number zero. For gene expression analysis, RNA was extracted from each cell line using a standard Trizol protocol and hybridized to the HT-HGU122A Affymetrix whole genome array. Normalised gene expression intensities were generated using the Robust Multi-Array Average (RMA) algorithm. A complete description of the characterization of our cancer cell lines collection is available from the Cancer Genome Project webpages (

Curve-fitting summary

Compounds are screened at 9 concentrations using a 2-fold dilution series, spanning a 256-fold concentration range of drug. Cell line sensitivity is measured 72-hours post-drug treatment using a fluorescence-based assay and dose response curves are fitted to raw fluorescence intensity values using a bespoke Bayesian sigmoid model.

Parameters reported by the curve-fitting algorithm

The following parameters are reported for each dose response curve and are available on the Downloads page of our website.

Parameter Description
IC50 Half-maximal inhibitory concentration.
IC25, IC75 and IC90 25%, 75% and 90% inhibitory concentration.
Beta Slope of the dose response curve
AUC Area under the curve. Bounded by controls (no drug), blanks (no cells) and the highest and lowest screening concentration (no response, value = 1, complete response, value = 0).
IC50_low and IC50_high Confidence intervals for IC50 value.
B Variance parameter of the dose response curve.
Alpha Shape parameter of the dose response curve.
D Sum of normalized residual values for curve.
IC results ID A unique identifier for each drug response curve.

Note: All IC-values are reported as natural logs in microMolar.

Interpreting IC50 values

We are screening a large collection of cell lines with a diverse selection of anti-cancer drugs and consequently observe a wide-range of drug responses in our experimental data. The curve-fitting algorithm readily models acute and partial responses to a drug that fall within the range of experimental screening concentrations (Figure 1). In many instances however, a significant proportion of cell lines will be resistant to a given drug within the range of experimental screening concentrations. The curve-fitting algorithm reports IC-values for these cell lines, which are associated with large confidence intervals. For completeness we have reported these values but they should be interpreted carefully and, before performing further analyses, it may be appropriate to restrict the IC50 value to the maximum screening concentration, or use an alternative output such as AUC.

Figure 1: Illustrative examples of how the curve-fitting algorithm is used to model drug sensitivity in cells. Experimental data for the response of D-423G cells to 3 different drugs. An example of drug sensitivity is provided at the top, partial sensitivity in the middle, and drug resistance (no response within screening concentration) at the bottom. For each example, the left-hand panel shows the control (n = 56) and blank values (n = 32) on the plate (blue dots), as well as the drug treated values (n = 9; red dots) which are used by the curve-fitting algorithm to model drug response. The right-hand side panel shows a detailed view of the same data (normalised to controls) with IC10, IC25, IC50, IC75 and IC90 values together with their associated confidence intervals.

Curve-fitting algorithm and determination of IC50s

For our analysis we require a method that allows us to model the heteroscedasticity in the luminescence data, and secondly allow us to incorporate prior knowledge of response, especially at drug concentrations where the data are less informative. A bespoke Bayesian sigmoid model was thus implemented to facilitate this, yielding a full description of the uncertainties in the data, and allowing reasonable interpretation of predicted response at concentrations outside the tested range. Response curves were fitted to the fluorescence signal intensities using a Bayesian sigmoid model. Drug response data consisted of drug-free positive controls, negative (no cells) controls and drug response points for nine half-fold concentrations. Generalized sigmoidal response curves are fitted as follows.

Intensity xlc is assumed to have the mean value

mean value formula.

Parameters Imax and Imin are the mean intensities of the positive and negative controls. α and β are scale and gradient responses. f is a shape parameter. lc donotes the log-concentration. We assume that the intensity xlc has variance Var(xlc) = B.E(xlc), where B represents a a noise parameter. We assume that xlc has a gamma distribution:

xlc gamma distribution

Positive and negative controls xmax and xmin are also gamma distributed:

xmax gamma distribution

xmin gamma distribution

The concentration giving p% response is given by:

conc formula

We use Markov Chain Monte Carlo simulations to obtain mean posterior parameter estimates. The IC50 has a normal prior with 95% probability mass covering range from 1000 fold below minimum concentration tested to 1000 fold higher than the maximum tested concentration. We assume uninformative priors on the remaining parameters. Response curves are plotted using mean posterior values of ICp for p ranging between 0% and 100%. Confidence intervals for ICp are obtained from the associated posterior.

Statistical analysis

To identify genomic features associated with drug response we use two complementary analytical approaches. A multivariate analysis of variance (MANOVA) is used to correlate drug response with genomic alterations in cancer cells including point mutations, amplifications and deletions of common cancer genes, cancer gene rearrangements and microsatellite instability. We also utilize elastic net (EN) regression, a penalized linear modeling technique, to identify cooperative interactions among multiple genes and transcripts with respect to drug response. We apply the EN approach using all of our available genomic data (including transcriptional profiles and tissue type) as input variables. Each of these statistical methods provides a distinctive insight into the data and it is best to consider the results from both in your interpretation. Further descriptions of the MANOVA and EN are provided in the following sections.

Below are a few key points and guidelines that may be useful when interpreting the data from these analyses:

  • The EN incorporates a larger genomic dataset than the MANOVA and so may identify associations not described in the MANOVA.
  • The MANOVA analysis includes the IC50 values and slope of the dose response curve as input variables whereas the elastic net utilizes only the IC50 value.
  • The MANOVA identifies individual genomic features associated with drug response whereas the elastic net provides a genomic signature including multiple features. The MANOVA associations benefit from easy interpretability whereas the elastic net signatures, by their nature, can be complex to interpret and may include multiple different sub-groups with differential sensitivity.
  • Unlike the MANOVA analysis, gene-specific correlations for the elastic net analysis are not represented since the EN describes how multiple genes affect drug sensitivity together.
  • In some instances, the elastic net is not able to generate a model to describe sensitivity to a compound.
  • The MANOVA is well suited for detecting drug-gene effects driven by small populations of outlier cell lines.
  • Prior to initiating follow up studies we recommend that you investigate the primary screening data to ensure that the statistical associations we have reported are informative with respect to your understanding of mechanism of drug action (if possible) and from a therapeutic perspective.

A one-way ANOVA is also used to determine the effect of tissue sub-type on cell line drug sensitivity (as determined by the IC50).

Multivariate Analysis of Variance

A fixed effects multivariate ANOVA (MANOVA) was used to correlate cell line drug response with mutations in cancer genes. An n X 2 dose-response matrix consisting of IC50 and slope parameter β for n cell lines was constructed for each drug. A linear (no interaction terms) model explained these observables with factors including tissue type, the mutation status of cancer genes, chromosomal re-arrangements, and microsatellite instability status. Size effects and significances were obtained. A gene was defined as mutated if it fulfilled any of these criteria: a coding sequence variant in the cancer gene, a total copy number of 0 (homozygous deletion) or more than 7 (amplification). Only those genes with >1 mutated cell lines in the panel used for analysis were included. The effect measures the relative difference in the mean IC50 from the wild-type to mutant group (for example, an effect of 0.1 or 10 indicates a ~10-fold decrease or increase in drug concentration, respectively). A Benjamini-Hochberg multiple testing correction threshold with false discovery rate of 20% was used to identify significant associations.

Elastic net

We apply elastic net regression, a penalized linear modelling technique, to identify cooperative interactions among multiple genes and transcripts to identify response signatures for each drug. Genomic data including mutation status of cancer genes, chromosomal rearrangements, copy number data from genes causally implicated in cancer, and genome-wide transcriptional profiles, as well as tissue type, are used as input variables. The elastic net is used to select which of these features are associated with drug response as measured by IC50 across the cell line panel.

For each drug, a feature list is built comprised of genes, transcripts, and tissue with effect size assigned to each. Features with higher stability of correlation in cross-validation (f) are considered having the greatest confidence of being associated with drug response. The most significant features associated with drug response are those with both large frequency (maximum =1) and effect size. For each feature negative effect size is associated with drug sensitivity and a positive value with drug resistance. Only statistically significant associations are reported here.

The output files from analysis can be downloaded from Downloads page.

STR profiles of cell lines

Cell line authentication

Concerns around the identity of cancer cell lines used in scientific research have been increasing over several years and was the topic of a recent editorial in Nature (19225471). Cross-contamination has even been shown to be present in such widely used and supposedly well characterised groups of cell lines as the NCI60 set. For instance the NCI60 cell lines OVCAR-8 and NCI-ADR-RES have been shown to be over 97% identical using the Affymetrix SNP6.0 array in this laboratory - this result has been confirmed by multiple laboratories around the world and was recently reported in the scientific literature (16504380). Two other such pairing are also present within the NCI60 series of lines - both M14/MDA-MB-435 (17004106) and U251MG/SNB-19 have identities over 94% when compared using the SNP6 array.

Many of the cell line repositories are now providing short tandem repeat (STR) profiles of the lines they hold allowing identity of lines within the scientific community to be confirmed by a simple assay. We are currently confirming the identity of our cancer cell line set against those provided by the repositories, where possible. Each of the cell lines within our core set is being tested using a panel of 16 STRs (AmpFLSTR Identifiler KIT, ABI), which includes the 9 currently used by most of the cell line repositories (ATCC, Riken, JCRB and DSMZ). We are also providing a single nucleotide polymorphism (SNP) profile based on a panel of 63 SNPs assayed using the Sequenom Genetic Analyser which we use for in-house identity checking whenever a cell line is propagated.

The provision of STR profiles by the cell line repositories and of our in-house cell lines is ongoing and will be updated when appropriate.

Prior to accessing the STR or SNP datasets a Data Access Agreement must be completed:

The username and password provided can be used to download the STR and SNP profiles for each cell line at the CGP Data Archive:


  • Identity crisis.

    No authors listed

    Nature 2009;457;7232;935-6

  • MDA-MB-435 cells are derived from M14 melanoma cells--a loss for breast cancer, but a boon for melanoma research.

    Rae JM, Creighton CJ, Meck JM, Haddad BR and Johnson MD

    Division of Hematology/Oncology, Department of Internal Medicine, University of Michigan Medical Center, 1150 West Medical Center Drive, Med Sci I, Room 5323, Ann Arbor, MI 48109-0612, USA.

    Background: The tissue of origin of the cell line MDA-MB-435 has been a matter of debate since analysis of DNA microarray data led Ross et al. (2000, Nat Genet 24(3):227-235) to suggest they might be of melanocyte origin due to their similarity to melanoma cell lines. We have previously shown that MDA-MB-435 cells maintained in multiple laboratories are of common origin to those used by Ross et al. and concluded that MDA-MB-435 cells are not a representative model for breast cancer. We could not determine, however, whether the melanoma-like properties of the MDA-MB-435 cell line are the result of misclassification or due to transdifferention to a melanoma-like phenotype.

    Methods: We used karyotype, comparative genomic hybridization (CGH), and microsatalite polymorphism analyses, combined with bioinformatics analysis of gene expression and single nucleotide polymorphism (SNP) data, to test the hypothesis that the MDA-MB-435 cell line is derived from the melanoma cell line M14.

    Results: We show that the MDA-MB-435 and M14 cell lines are essentially identical with respect to cytogenetic characteristics as well as gene expression patterns and that the minor differences found can be explained by phenotypic and genotypic clonal drift.

    Conclusions: All currently available stocks of MDA-MB-435 cells are derived from the M14 melanoma cell line and can no longer be considered a model of breast cancer. These cells are still a valuable system for the study of cancer metastasis and the extensive literature using these cells since 1982 represent a valuable new resource for the melanoma research community.

    Funded by: NIGMS NIH HHS: U-O1 GM61373

    Breast cancer research and treatment 2007;104;1;13-9

  • A case study in misidentification of cancer cell lines: MCF-7/AdrR cells (re-designated NCI/ADR-RES) are derived from OVCAR-8 human ovarian carcinoma cells.

    Liscovitch M and Ravid D

    Department of Biological Regulation, Weizmann Institute of Science, P.O.B. 26 Rehovot 76100, Israel.

    Multidrug-resistant MCF-7 breast adenocarcinoma cells (originally named MCF-7/AdrR cells and later re-designated NCI/ADR-RES) have served as an important and widely used research tool during the last two decades. However, the real identity of these cells has been in doubt since 1998 and has since been debated. The origin of NCI/ADR-RES cells has now been revealed by SNP and karyotypic analyses, carried out at the Sanger Institute and the NCI, respectively. The results of these analyses, recently posted on the Web, show that NCI/ADR-RES cells are derived from OVCAR-8 ovarian adenocarcinoma cells. The case of NCI/ADR-RES cells highlights a wide-spread problem of cell line cross-contamination and misidentification. Fortunately, this is a tractable problem that can be avoided by scrupulous genotyping of cell stocks and adoption of a few simple rules in cell culture practice.

    Cancer letters 2007;245;1-2;350-2

Tissue classification of cell lines

The collection comprises well over 1000 human tumour cell lines which we plan to screen as part of this project. This panel represents the spectrum of common and rare types of adult and childhood cancers of epithelial, mesenchymal and haematopoietic origin. In addition, we are continuously acquiring new cell lines to expand cancer types poorly represented in the collection. All cell line are authenticated by SNP and STR analysis and subjected to extensive genomic characterization.

Tissue classification of cell lines

Tissue type Number
Adrenal Gland 2
Autonomic Ganglia 37
Biliary Tract 6
Bone 32
Breast 45
Central Nervous System 59
Cervix 13
Endometrium 10
Eye 1
Gastrointestinal Tract (site indeterminate) 1
Haematopoietic/Lymphoid Tissue 128
Kidney 22
Large Intestine 40
Liver 10
Lung 153
NS 2
Oesophagus 23
Ovary 20
Pancreas 17
Placenta 2
Pleura 6
Prostate 5
Salivary Gland 1
Skin 47
Small Intestine 1
Soft Tissue 19
Stomach 21
Testis 4
Thyroid 12
Upper Aerodigestive Tract 23
Urinary Tract 18
Vulva 3
Total 783

Selected Publications

  • Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells.

    Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, Ramaswamy S, Futreal PA, Haber DA, Stratton MR, Benes C, McDermott U and Garnett MJ

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Alterations in cancer genomes strongly influence clinical responses to treatment and in many instances are potent biomarkers for response to drugs. The Genomics of Drug Sensitivity in Cancer (GDSC) database ( is the largest public resource for information on drug sensitivity in cancer cells and molecular markers of drug response. Data are freely available without restriction. GDSC currently contains drug sensitivity data for almost 75 000 experiments, describing response to 138 anticancer drugs across almost 700 cancer cell lines. To identify molecular markers of drug response, cell line drug sensitivity data are integrated with large genomic datasets obtained from the Catalogue of Somatic Mutations in Cancer database, including information on somatic mutations in cancer genes, gene amplification and deletion, tissue type and transcriptional data. Analysis of GDSC data is through a web portal focused on identifying molecular biomarkers of drug sensitivity based on queries of specific anticancer drugs or cancer genes. Graphical representations of the data are used throughout with links to related resources and all datasets are fully downloadable. GDSC provides a unique resource incorporating large drug sensitivity and genomic datasets to facilitate the discovery of new therapeutic biomarkers for cancer therapies.

    Funded by: Cancer Research UK; Wellcome Trust: 086357

    Nucleic acids research 2013;41;Database issue;D955-61

  • Systematic identification of genomic markers of drug sensitivity in cancer cells.

    Garnett MJ, Edelman EJ, Heidorn SJ, Greenman CD, Dastur A, Lau KW, Greninger P, Thompson IR, Luo X, Soares J, Liu Q, Iorio F, Surdez D, Chen L, Milano RJ, Bignell GR, Tam AT, Davies H, Stevenson JA, Barthorpe S, Lutz SR, Kogera F, Lawrence K, McLaren-Douglas A, Mitropoulos X, Mironenko T, Thi H, Richardson L, Zhou W, Jewitt F, Zhang T, O'Brien P, Boisvert JL, Price S, Hur W, Yang W, Deng X, Butler A, Choi HG, Chang JW, Baselga J, Stamenkovic I, Engelman JA, Sharma SV, Delattre O, Saez-Rodriguez J, Gray NS, Settleman J, Futreal PA, Haber DA, Stratton MR, Ramaswamy S, McDermott U and Benes CH

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Clinical responses to anticancer therapies are often restricted to a subset of patients. In some cases, mutated cancer genes are potent biomarkers for responses to targeted agents. Here, to uncover new biomarkers of sensitivity and resistance to cancer therapeutics, we screened a panel of several hundred cancer cell lines--which represent much of the tissue-type and genetic diversity of human cancers--with 130 drugs under clinical and preclinical investigation. In aggregate, we found that mutated cancer genes were associated with cellular response to most currently available cancer drugs. Classic oncogene addiction paradigms were modified by additional tissue-specific or expression biomarkers, and some frequently mutated genes were associated with sensitivity to a broad range of therapeutic agents. Unexpected relationships were revealed, including the marked sensitivity of Ewing's sarcoma cells harbouring the EWS (also known as EWSR1)-FLI1 gene translocation to poly(ADP-ribose) polymerase (PARP) inhibitors. By linking drug activity to the functional complexity of cancer genomes, systematic pharmacogenomic profiling in cancer cell lines provides a powerful biomarker discovery platform to guide rational cancer therapeutic strategies.

    Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: 1U54HG006097-01; NIGMS NIH HHS: P41GM079575-02; Wellcome Trust: 086357

    Nature 2012;483;7391;570-5

  • Cell line-based platforms to evaluate the therapeutic efficacy of candidate anticancer agents.

    Sharma SV, Haber DA and Settleman J

    Center for Molecular Therapeutics, Massachusetts General Hospital Cancer Center and Harvard Medical School, 149 13th Street, Charlestown, MA 02129, USA.

    Efforts to discover new cancer drugs and predict their clinical activity are limited by the fact that laboratory models to test drug efficacy do not faithfully recapitulate this complex disease. One important model system for evaluating candidate anticancer agents is human tumour-derived cell lines. Although cultured cancer cells can exhibit distinct properties compared with their naturally growing counterparts, recent technologies that facilitate the parallel analysis of large panels of such lines, together with genomic technologies that define their genetic constitution, have revitalized efforts to use cancer cell lines to assess the clinical utility of new investigational cancer drugs and to discover predictive biomarkers.

    Nature reviews. Cancer 2010;10;4;241-53

  • Factors underlying sensitivity of cancers to small-molecule kinase inhibitors.

    Jänne PA, Gray N and Settleman J

    Dana Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA.

    Selective small-molecule kinase inhibitors have emerged over the past decade as an important class of anti-cancer agents, and have demonstrated impressive clinical efficacy in several different diseases, including relatively common malignancies such as breast and lung cancer. However, clinical benefit is typically limited to a fraction of treated patients. Genomic features of individual tumours contribute significantly to such clinical responses, and these seem to vary tremendously across patients. Additional factors, including pharmacogenomics, the tumour microenvironment and rapidly acquired drug resistance, also contribute to the clinical sensitivity of various cancers, and should be considered and applied in the development and use of new kinase inhibitors.

    Nature reviews. Drug discovery 2009;8;9;709-23

  • The cancer genome.

    Stratton MR, Campbell PJ and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    All cancers arise as a result of changes that have occurred in the DNA sequence of the genomes of cancer cells. Over the past quarter of a century much has been learnt about these mutations and the abnormal genes that operate in human cancers. We are now, however, moving into an era in which it will be possible to obtain the complete DNA sequence of large numbers of cancer genomes. These studies will provide us with a detailed and comprehensive perspective on how individual cancers have developed.

    Funded by: Wellcome Trust: 077012, 088340

    Nature 2009;458;7239;719-24

  • High-throughput lung cancer cell line screening for genotype-correlated sensitivity to an EGFR kinase inhibitor.

    McDermott U, Sharma SV and Settleman J

    Center for Molecular Therapeutics, Massachusetts General Hospital Cancer Center and Harvard Medical School, Charlestown, Massachusetts, USA.

    Human cancer cell lines that can be propagated and manipulated in culture have proven to be excellent models for studying many aspects of gene function in cancer. In addition, they can provide a powerful system for assessing the molecular determinants of sensitivity to anticancer drugs. They have also been used in recent studies to identify genomic alterations and gene expression patterns that provide important insights into the genetic features that distinguish the properties of tumor cells associated with similar histologies. We have established a large repository of human tumor cell lines (>1000) corresponding to a wide variety of tumor types, and we have developed a methodology for profiling the collection for sensitivity to putative anticancer compounds. The rationale for examining tumor cell lines on this relatively large scale reflects accumulating evidence indicating that there is substantial genetic heterogeneity among human tumor cells-even those derived from tumors of similar histologies. Thus, to develop an accurate picture of the molecular determinants of tumorigenesis and response to therapy, it is essential to study the nature of such heterogeneity in a relatively large sample set. Here, we describe the methodologies used to conduct such screens and we describe a "proof-of-concept" screen using the EGFR kinase inhibitor, erlotinib (Tarceva), with a panel of lung cancer lines to demonstrate a correlation between EGFR mutations and drug sensitivity.

    Methods in enzymology 2008;438;331-41

  • Identification of genotype-correlated sensitivity to selective kinase inhibitors by using high-throughput tumor cell line profiling.

    McDermott U, Sharma SV, Dowell L, Greninger P, Montagut C, Lamb J, Archibald H, Raudales R, Tam A, Lee D, Rothenberg SM, Supko JG, Sordella R, Ulkus LE, Iafrate AJ, Maheswaran S, Njauw CN, Tsao H, Drew L, Hanke JH, Ma XJ, Erlander MG, Gray NS, Haber DA and Settleman J

    Center for Molecular Therapeutics, Massachusetts General Hospital Cancer Center and Harvard Medical School, 149 13th Street, Charlestown, MA 02129, USA.

    Kinase inhibitors constitute an important new class of cancer drugs, whose selective efficacy is largely determined by underlying tumor cell genetics. We established a high-throughput platform to profile 500 cell lines derived from diverse epithelial cancers for sensitivity to 14 kinase inhibitors. Most inhibitors were ineffective against unselected cell lines but exhibited dramatic cell killing of small nonoverlapping subsets. Cells with exquisite sensitivity to EGFR, HER2, MET, or BRAF kinase inhibitors were marked by activating mutations or amplification of the drug target. Although most cell lines recapitulated known tumor-associated genotypes, the screen revealed low-frequency drug-sensitizing genotypes in tumor types not previously associated with drug susceptibility. Furthermore, comparing drugs thought to target the same kinase revealed striking differences, predictive of clinical efficacy. Genetically defined cancer subsets, irrespective of tissue type, predict response to kinase inhibitors, and provide an important preclinical model to guide early clinical applications of novel targeted inhibitors.

    Funded by: NCI NIH HHS: R01 CA115830

    Proceedings of the National Academy of Sciences of the United States of America 2007;104;50;19936-41

Description of data presented on the volcano plot

A volcano plot is used to visualise the correlation of drug sensitivity data with genetic events calculated using a multivariate ANOVA.

We use two different types of volcano plots to represent our data. Gene specific volcano plots represent the effect of a mutated gene (e.g. BRAF) on the responses to all drugs analysed. A drug-specific volcano plot represents how genomic changes influence response to a specific drug (e.g. BRAF inhibitor PLX4720).

In each volcano plot three pieces of data are represented:

The magnitude of the effect that genetic events have on cell lines IC50 values in response to a drug. IC50 values were correlated with the status of commonly altered cancer genes using a two way multivariate ANOVA, with mutation status and tissue type as factors. The effect size is proportional to the difference in mean IC50 between wild-type and mutant cell lines. Numbers less than 1 indicate drug sensitivity, numbers greater than 1 indicate drug resistance.
The p-value from the MANOVA of a drug-gene interaction on an inverted log10 scale. For clarity the axis is capped at p = 1 x 10-8 and a plus sign (+) next to a circle indicates that the p-value is smaller than this threshold.
Size of each circle:
The number of genetic events contributing to the analysis for a given gene or drug.
Red line
The dashed red line represents a Benjamini-Hochberg multiple testing correction threshold with a false discovery rate of 20%. Gene effects associated with statistically significant sensitizing or resistance effects are coloured green and red, respectively. By hovering over each circle the following information is provided: genetic event sample size (number of cell lines with mutation screened), effect size and p-value.

Analysis file description

Sensitivity data

A table of cell line IC50 values with confidence intervals as well as the slope of the dose response curve (Beta) for cell lines treated with the indicated drug.

Cell Line Name
Cell line name
Tissue type of cell line
Cancer sub-type of cell line based on tissue and histology
Unique cell line identifier
unique result identifier
Half maximal inhibitory (50%) drug concentrations (natural log microMolar)
IC50 low confidence interval (natural log microMolar)
IC50 high confidence interval (natural log microMolar)
slope parameter from curve-fitting
The date when the cell line was screened

Genomic alterations in cell lines

This file contains a table of cell lines screened including their mutational status (sequence variants, amplifications or deletions) for common cancer genes.

Please note that there is a separate file with genomic and transcriptomic data used for the EN analysis in the download page.

Many of the drugs have been screened against a subset of the cell lines in the table.

Cell Line
Cell line name
Unique cell line identifier
Tissue type of cell line
Cancer sub-type of cell line based on tissue and histology
Genetic information
Genetic mutation data for cancer genes. Includes MSI status (1 = unstable and 0 = stable) and gene-fusions. A binary code 'x::y' description is used for each gene where 'x' identifies a coding variant and 'y' indicates copy number information from SNP6.0 data. For gene fusions, cell lines are identified as fusion not-detected (0) or the identified fusion is given. The following abbreviations are used: not analysed (na), not detected or wild-type (wt), no copy number information (nci).

Genomic correlations with MANOVA

This file contains the results of our MANOVA for genomic correlates of drug sensitivity and resistance. The presence or absence of mutations in cancer genes was correlated with cell line IC50 values and the slope of the dose response curve for the indicated drug.

Drug Name
drug name
Drug ID
unique drug identifier
Gene Name
Name of gene used for genetic correlation
Sample size
Total number of cell lines screened against the drug
Total number of mutation
Total number of mutant cell lines used for correlation
Number of sequence variants
Number of cell lines with a coding sequence variant in the indicated gene
Number of amplifications
Number of cell lines with a amplifcation in the indicated gene
Number of deletions
Number of cell lines with a homozygous deletion in the indicated gene
P value
significance value from MANOVA
IC50 Effect
This value is 10^(2 * (anova effect divided by LN (10))). It is used to represent the data on volcano plots.
slope effect
Effect of mutation on slope (BETA). Proportional to the difference in the mean slope of the drug response curve between wild-type and mutant cell lines.
Mean of wild type
Mean IC50 of wild-type cell lines (natural log microMolar)
Mean of Mutant
Mean IC50 of mutated cell lines (natural log microMolar)
20% FDR
p-value threshold for 0.2 FDR
Analysis date
The date when MANOVA Analysis was done.

Elastic net analysis of drug sensitivity

This file contains a table of results from the EN analysis of drug response.

The name of the gene present in an EN model. CN indicates that the feature is a copy number change. Mut indicates that the feature is a mutational event. No indication indicates the feature is an expression level change. In some cases two genes are assigned to a single microarray probe and both gene are listed (gene 1 /// gene 2).
Drug ID
a unique drug identifier
Drug name
The common name of the drug
The therapeutically relevant drug target.
100 modeling iterations were performed and the frequency at which each feature is present in the resulting model is reported (e.g. a frequency of 1 indicates that the feature was present in all 100 models).
Strength of the association between gene and drug response. Effect < 0: Sensitizing feature. For expression change: Higher expression in cell lines with lower IC50. For Copy Number (CN), amplified in cell lines with lower IC50 or deleted in cell lines with higher IC50. For Mutation (Mut), mutated in cell lines with lower IC50.

Elastic net heatmaps

A heatmap and bar-plot are used to visually represent the results from the elastic net (EN) analysis of drug sensitivity.

There are 3 main pieces of information in each image which together summarise the result for a given drug:

EN effect size
A bar-plot displays the effect size for significant features from the EN statistical analysis. The feature name is given to the left of the heatmap containing the genomic data (see below). Features with negative effect size are associated with drug sensitivity and features with positive effect size are associated with drug resistance. For clarity a maximum of 10 significant features are represented for each drug.
Genomic Data
A heatmap provides a detailed description of each feature for the 20 most and least sensitive cell lines to a particular drug (see below). Features may include tissue-type, mutations in cancer genes, expression levels and gene copy number. Mutation and tissue features are at the top of the heatmap to represent the presence (black) or absence (grey) of a mutation/tissue sub-type. Below this are gene expression and copy number features with blue corresponding to lower expression or copy number, and red to indicate higher expression or copy number.
Cell line IC50 values
The IC50 values for the 20 most and least sensitive cell lines are represented as a heatmap. The name of each cell line is provided and is associated with the genomic data described above.
A scale bar is provided for the expression and copy number features as well as for IC50 values.

Contact us


We are committed to working with collaborators to extend the scope of our research. We currently collaborate with more than 30 organisation from academia, biotech and the pharmaceutical industry. We work with these organisation to screen compounds, to access cell lines and primary tissues, elucidate mechanisms of drug sensitivity and resistance, and to share expertise. Please feel free to contact us to initiate a discussion on potential collaborations.

You may contact us by email at

Contact us | Cookies policy | Terms & Conditions. This site is hosted by the Wellcome Trust Sanger Institute.