Reproducing Experiments#
In this tutorial you will learn how to seamlessly reproduce experiments run with ChemTorch. This tutorial assumes that you have already setup logging with W&B.
Run your experiments with
log=true.chemtorch +experiment=my_experiment log=true
When logging is active, ChemTorch will also log the fully resolved config to W&B and it will be available in the
wandb/folder.The W&B config format is slightly different from that used by Hydra. So you first need to parse the config to Hydra format using the
wandb_to_hydra.pyscript located in thescripts/folder of ChemTorch.wandb_to_hydra.pyusage#usage: wandb_to_hydra.py [-h] --run-id RUN_ID --output-path OUTPUT_PATH [--wandb-dir WANDB_DIR] This script parses the config of a WandB run to Hydra format, removing the `value` wrappers and saving the result to a specified output path. options: -h, --help show this help message and exit --run-id RUN_ID, -i RUN_ID Run ID of the WandB run. --output-path OUTPUT_PATH, -o OUTPUT_PATH Path to save the resulting hydra config file. If not absolute, it is considered relative to the project directory (parent of the script directory). --wandb-dir WANDB_DIR, -d WANDB_DIR Path to the `wandb/` directory. If not provided, the script will search in the project directory (parent of the script directory).If we save the config to
conf/saved_configs/my_experiment/my_run.yaml, we can run the following command to reproduce the experiment:chemtorch -cd=conf/saved_configs/my_experiment -cn=my_run
This will run the exact same experiment as before, using the same config parameters.
Reproducing The ChemTorch Benchmarks#
The ChemTorch white paper includes a set of benchmark experiments that can be easily reproduced using the steps outlined above.
The configs for these experiments are available in the conf/saved_configs/chemtorch_benchmarks/ folder:
conf/saved_configs/chemtorch_benchmarks/
├── optimal_model_configs
│ ├── atom_han.yaml
│ ├── cgr_dmpnn.yaml
│ ├── dimreaction.yaml
│ ├── drfp_mlp.yaml
│ └── ...
├── dmpnn_data_split_benchmark
│ ├── random_split.yaml
│ ├── reactant_scaffold_split.yaml
│ ├── reaction_core_split.yaml
│ └── ...
└── ...
The optimal model configs contain the best hyper-parameters found for each model/reaction representation on the benchmark datasets. The data split benchmark configs use the same hyper-parameters as the optimal model config, but vary the data splitting strategy to demonstrate the effect of data splits on model performance.
For example, run the following command to reproduce the CGR/D-MPNN experiment with seeds 0 to 9:
chemtorch -m -cd=conf/saved_configs/chemtorch_benchmarks/optimal_model_configs -cn=cgr_dmpnn +experiment=chemtorch_benchmarks seed=0,1,2,3,4,5,6,7,8,9
The multirun (-m) flag will run the experiment with all the specified seeds in sequence (see CLI Usage).