Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various features #6

Merged
merged 123 commits into from
Jan 5, 2024
Merged

Various features #6

merged 123 commits into from
Jan 5, 2024

Conversation

sfluegel05
Copy link
Collaborator

@sfluegel05 sfluegel05 commented Dec 19, 2023

This branch includes several new features, including:

  • stratification of train-val-test split (ensures near-equal distribution of classes between data splits)
  • Inner cross-validation
  • specifying chebi version (e.g. v227 instead of default v200), also specfying separate chebi versions for train and test set
  • using DeepSMILES and SELFIES instead of SMILES for pretraining on Pubchem and classification on ChEBI
  • fixed calculation of macro-f1 during training
  • added functions for model evaluation
  • support for wandb logging
  • best checkpoints are saved based on maximum micro-f1 score
  • support for chebai_graph module
  • command predict_from_file to predict SMILES strings directly

Compatibility issues with current dev version:

  • structure of data-folder changed to include chebi-version, keeping existing datasets requires moving data manually, e.g.
    • old: data/ChEBI100/processed/smiles_token/test.pt
    • new: data/ChEBI100/chebi_v200/processed/smiles_token/test.pt
  • some config files might have been changed / renamed
  • content and location of tokens.txt have changed (leads to different encoding of tokens during dataset creation)
  • might have some dependencies not included in requirements.txt

Open problems:

  • Cross-validation does not work as intended -> reimplemented, works now
  • uses wandb by default -> separate configs for wandb logger and csv logger
  • macro-f1 results are wrong -> fixed
  • Lint checks fail -> fixed after reformating

@sfluegel05 sfluegel05 marked this pull request as ready for review January 3, 2024 15:34
@sfluegel05 sfluegel05 merged commit 0ccd7fb into dev Jan 5, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant