Config Reference#
This page documents all ChemTorch-specific configuration options that can be set in experiment configs or via the CLI.
If you look for guidance on designing configuration files, see Hydra Config Design which covers ChemTorch-specific extensions to the standard Hydra configuration system.
Note
Defaults are set in conf/base.yaml, however, not all supported options necessarily have a corresponding default value since specifyng key=null in the config is the same as not setting the key at all.
Just know that you can still set all options listed below in your experiment config or add them in the CLI via +key.
Execution Control#
tasks
Controls which stages of the pipeline to execute.
Type: List of strings
Options:
fit,test,validate,predictDefault:
null(must be specified)
tasks:
- fit
- test
chemtorch +experiment=graph tasks='[fit,test]'
fit: Training and validationtest: Evaluate on test set(s)validate: Evaluate on validation set onlypredict: Run inference without training (see Running Inference).
The predict task runs inference only (no training).
Before using it, you must already have a trained model checkpoint — first run fit to produce one, then set load_model: true together with ckpt_path (see Model Loading).
For predict the data pipeline must not perform any splitting (set the data splitter to null) and the column mapper must omit the label/target column if it exists. ChemTorch will enforce these automatically for a predict-only run, but declaring them explicitly can make configs clearer and easier to audit.
To persist outputs from a predict-only run, use the keys described under Prediction Saving: predictions_save_path for a single partition (tasks: [predict]) or save_predictions_for plus predictions_save_dir when multiple partitions are available.
If instead you want to capture predictions for just one of the train, val, or test partitions during/after training (without a separate predict-only task), configure the same Prediction Saving keys as explained in that section.
Note that to specify a single task you also need to use the list syntax.
tasks:
- predict
chemtorch +experiment=graph tasks='[predict]'.
seed
Random seed for reproducibility.
Type: Integer
Default:
0
seed: 42
chemtorch +experiment=graph seed=42
Sets the random seed for Python, NumPy, PyTorch, and PyTorch Lightning to ensure reproducible results.
Logging (Weights & Biases)#
log
Enable or disable Weights & Biases logging.
Type: Boolean
Default:
false
log: true
chemtorch +experiment=graph log=true
project_name
W&B project name for organizing runs.
Type: String
Default:
"chemtorch"
project_name: my_project
chemtorch +experiment=graph project_name=my_project
group_name
W&B group name for organizing related runs (e.g., hyperparameter sweeps).
Type: String or null
Default:
null
group_name: hyperparameter_sweep_1
chemtorch +experiment=graph group_name=my_group
run_name
W&B run name for identifying individual runs.
Type: String or null
Default:
null(W&B generates a random name)
run_name: baseline_experiment
chemtorch +experiment=graph run_name="baseline run"
Model Loading#
load_model
Load a pre-trained model from a checkpoint.
Type: Boolean
Default:
falseIf
true,ckpt_pathmust be specified.
ckpt_path
Path to the checkpoint file to load.
Type: String or null
Default:
nullRequired if:
load_model=true
load_model: true
ckpt_path: path/to/checkpoint.ckpt
chemtorch +experiment=graph load_model=true ckpt_path=path/to/checkpoint.ckpt
Prediction Saving#
save_predictions_for
Specify which dataset partitions to save predictions for.
Type: String, list of strings, or null
Options:
"train","val","test","predict","all"Default:
null
Only required if mutliple data partations are used. For example:
tasks: [fit]: training and validation partitions are availabletasks: [fit, test]: training, validation, and test partitions are available
Not required if only a single partition is used (e.g., tasks: [test] or tasks: [predict]).
If set to all, predictions will be saved for all available partitions.
predictions_save_path
Path to save predictions (for single data partition).
Type: String or null
Default:
null
tasks: predict
predictions_save_path: my_experiment/predictions.csv
chemtorch +experiment=graph predictions_save_path=my_experiment/predictions.csv
predictions_save_dir
Directory to save predictions (for multiple data partitions).
The predictions will be saved in files named <partition>.csv within this directory, where <partition> is one of train, val, test, or predict.
Type: String or null
Default:
nullRequires that
save_predictions_foris set.
# Save for multiple partitions
save_predictions_for:
- train
- test
chemtorch +experiment=graph save_predictions_for='[train,test]' predictions_save_dir=predictions/
Data Subsampling#
data_module.subsample
Subsample the dataset for quick testing (not a ChemTorch root-level key, but very useful).
Type: Float (0.0 to 1.0), int (0 to dataset size) or null
Default:
null(use full dataset)
# Use 5% of data for quick testing
chemtorch +experiment=graph data_module.subsample=0.05
# Use 100 samples
chemtorch +experiment=graph data_module.subsample=100
Useful for debugging or testing configurations quickly.
Advanced Options#
parameter_limit
Limit the number of model parameters (for testing or architecture search). If the parameter limit is exceeded ChemTorch will skip the run.
Type: Integer or null
Default:
null(no limit)
parameter_limit: 1000000 # Max 1M parameters
Common CLI Patterns#
Quick Testing#
# Fast test run: subsample data, run a single batch from every dataloader, no logging
chemtorch +experiment=graph \
data_module.subsample=0.01 \
+trainer.fast_dev_run=true \
log=false
Inference#
# Load model and run predictions
chemtorch +experiment=graph \
load_model=true \
ckpt_path=path/to/checkpoint.ckpt \
tasks=[predict] \
predictions_save_path=predictions.csv
For advanced Hydra features, see Hydra Config Design or the official Hydra documentation.