The EFFECT benchmark suite

With the availability of large preclinical datasets on cancer drug sensitivity and gene essentiality, computational biology models for predicting cancer sensitivity are gaining popularity. However, comparing these models proves to be a challenging task, as there are numerous published models and methods available, making it difficult to conduct meaningful comparisons without reproducing them on your own data.

Armed with the experience of benchmarking our own models at Turbine, we publish the Turbine Benchmark Suite. This carefully composed benchmark set focuses on models’ ability to identify biologically applicable predictions. While this benchmark set is not entirely foolproof and can potentially be overfit with sufficient attempts, we have made substantial efforts to ensure its resilience.

Our approach revolves around three key principles:

True holdout train/test splits: We prioritize results based on true holdout train/test splits. Unlike random splits, we believe that cell-, gene-, and drug-exclusive splits offer more meaningful insights in real-life scenarios for predicting cancer sensitivity.
Selective performance: Instead of solely identifying universally ineffective or harmful drugs across all cells, we measure per target node performance. A successful predictor must discern the specific contexts in which drugs are beneficial or detrimental.
Bias detection and mitigation: We employ a Bias Detector to identify biases orthogonal to measured metrics. This helps detect models that rely on trivial associations.

By adhering to these principles, our aim is to provide a benchmark that facilitates fair and meaningful comparisons of computational biology models in predicting cancer sensitivity.

Release notes

1.0 release notes & download

Usage

This data is not intended to be a competition set; it is designed as a resource for your own projects. The test data is publicly available, which also means it can be overfit with enough attempts.

You can access the train/test data from the Releases section. Splits and target metrics are provided in separate JSON files, categorized into “ko” (gene essentiality) and “drug” (drug sensitivity).

For gene essentiality predictions, we created the following splits based on DepMap data: https://depmap.org/portal/, https://www.nature.com/articles/ng.3984

TRAIN: Training set.
RND: Random test set with known entities but unseen pairs.
CEX: Predict effects of known perturbations on new cell lines.
GEX: Predict effects of new perturbations on known cell lines.
AEX: Predict effects of new perturbations on new cell lines.

We also provide EXT_GEX and EXT_AEX splits for genome-wide evaluation.

Each JSON file contains:

cell_line: The perturbed cell.
perturbation: The perturbed gene (CRISPR KO).
gene_effect: Target variable from DepMap indicating fitness impact.

For drug sensitivity prediction, splits are based on GDSC2 data: https://www.cancerrxgene.org/, https://www.cell.com/cell/fulltext/S0092-8674(16)30746-2

TRAIN: Training set.
RND: Random test set.
CEX: Known drugs on new cell lines.
DEX: New drugs on known cell lines.
AEX: New drugs on new cell lines.

Each JSON file contains:

cell_line: The perturbed cell.
perturbation: PubChem ID of the drug.

Target metrics:

LN_IC50: Log half inhibitory concentration.
z-score: Normalized LN_IC50.
AUC: Area under the response curve.

We created three split variants to evaluate robustness. Split 0 is the primary test set.

Evaluation scripts:

The downloadable package includes example data, precomputed biases, and notebooks for evaluation and bias detection.

FAQ

Q: I have separate models per drug/KO. Can I use them?
A: Yes. Skip PEX and AEX splits; CEX results remain valid.

Q: Should I run all split variants?
A: It’s recommended once for robustness, but split 0 is generally sufficient.

Q: Where are the rest of the genes?
A: Only genes with node2vec embeddings from Omnipath are included.

Roadmap

Planned features:

Unified train/test sets for drugs and genes.
RNAi tests.
Synthetic lethality and combination tests.

License

Evaluation scripts and sample models are released under CC-BY-SA 4.0. You may use them commercially, but not resell the datasets.

If you publish results using EFFECT, cite: https://www.biorxiv.org/content/10.1101/2023.10.02.560281

Drug sensitivity data is based on GDSC (their license applies). Gene dependency data is from DepMap (CC-BY-4.0).