EFFECT release 1.0

Initial release containing two independent datasets: one for gene dependency model training and prediction (based on DepMap Achilles data), and another for benchmarking drug sensitivity capabilities (based on GDSC2 data).

An important caveat: don’t use the drug training sets to train for the gene dependency test or the other way around! It will leak data into the holdout sets, invalidating your results.

Also, if you assemble your own train sets to test for these benchmark targets, make sure the drugs’ targets don’t overlap with any genes in the GEX test set and vice versa, genes in your training set shouldn’t overlap with drugs’ targets in the DEX test set.

Downloads

Statistics

Drug dataset:

Cell lines in TRAIN (& DEX): 555
Cell lines in CEX (& AEX): 139
Drugs in TRAIN (& CEX): 117
Drugs in DEX (& AEX): 18

Total set sizes:

split 0
(primary)
split 1 split 2
TRAIN 46.038 42.896 42.654
CEX 14.334 13.460 13.427
DEX 9.479 13.579 13.821
AEX 2.424 3.396 3.489

CRISPR KO dataset:

Cell lines in TRAIN (& GEX): 803
Cell lines in CEX (& AEX): 201
Genes in TRAIN (& CEX): 1036
Genes in GEX (& AEX): 258
Genes in extended GEX: 6052

Total set sizes:

split 0
(primary)
split 1 split 2
TRAIN 665.430 662.953 657.971
CEX 208.196 211.308 217.520
GEX 207.150 206.364 204.828
AEX 51.850 52.620 54.172
EXT_GEX 4.859.048 4.840.880 4.804.580
EXT_AEX 1.216.216 1.234.368 1.270.684