EFFECT release 1.0
Initial release containing two independent datasets: one for gene dependency model training and prediction (based on DepMap Achilles data), and another for benchmarking drug sensitivity capabilities (based on GDSC2 data).
An important caveat: don’t use the drug training sets to train for the gene dependency test or the other way around! It will leak data into the holdout sets, invalidating your results.
Also, if you assemble your own train sets to test for these benchmark targets, make sure the drugs’ targets don’t overlap with any genes in the GEX test set and vice versa, genes in your training set shouldn’t overlap with drugs’ targets in the DEX test set.
Downloads
Statistics
Drug dataset:
Cell lines in TRAIN (& DEX): 555
Cell lines in CEX (& AEX): 139
Drugs in TRAIN (& CEX): 117
Drugs in DEX (& AEX): 18
Total set sizes:
| split 0 (primary) |
split 1 | split 2 | |
|---|---|---|---|
| TRAIN | 46.038 | 42.896 | 42.654 |
| CEX | 14.334 | 13.460 | 13.427 |
| DEX | 9.479 | 13.579 | 13.821 |
| AEX | 2.424 | 3.396 | 3.489 |
CRISPR KO dataset:
Cell lines in TRAIN (& GEX): 803
Cell lines in CEX (& AEX): 201
Genes in TRAIN (& CEX): 1036
Genes in GEX (& AEX): 258
Genes in extended GEX: 6052
Total set sizes:
| split 0 (primary) |
split 1 | split 2 | |
|---|---|---|---|
| TRAIN | 665.430 | 662.953 | 657.971 |
| CEX | 208.196 | 211.308 | 217.520 |
| GEX | 207.150 | 206.364 | 204.828 |
| AEX | 51.850 | 52.620 | 54.172 |
| EXT_GEX | 4.859.048 | 4.840.880 | 4.804.580 |
| EXT_AEX | 1.216.216 | 1.234.368 | 1.270.684 |