Spaces:
Running
Running
| Metadata-Version: 2.1 | |
| Name: mhg-gnn | |
| Version: 0.0 | |
| Summary: Package for mhg-gnn | |
| Author: team | |
| License: TBD | |
| Classifier: Programming Language :: Python :: 3 | |
| Classifier: Programming Language :: Python :: 3.9 | |
| Description-Content-Type: text/markdown | |
| Requires-Dist: networkx>=2.8 | |
| Requires-Dist: numpy<2.0.0,>=1.23.5 | |
| Requires-Dist: pandas>=1.5.3 | |
| Requires-Dist: rdkit-pypi<2023.9.6,>=2022.9.4 | |
| Requires-Dist: torch>=2.0.0 | |
| Requires-Dist: torchinfo>=1.8.0 | |
| Requires-Dist: torch-geometric>=2.3.1 | |
| # mhg-gnn | |
| This repository provides PyTorch source code assosiated with our publication, "MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network" | |
| **Paper:** [Arxiv Link](https://arxiv.org/pdf/2309.16374) | |
| For more information contact: SEIJITKD@jp.ibm.com | |
|  | |
| ## Introduction | |
| We present MHG-GNN, an autoencoder architecture | |
| that has an encoder based on GNN and a decoder based on a sequential model with MHG. | |
| Since the encoder is a GNN variant, MHG-GNN can accept any molecule as input, and | |
| demonstrate high predictive performance on molecular graph data. | |
| In addition, the decoder inherits the theoretical guarantee of MHG on always generating a structurally valid molecule as output. | |
| ## Table of Contents | |
| 1. [Getting Started](#getting-started) | |
| 1. [Pretrained Models and Training Logs](#pretrained-models-and-training-logs) | |
| 2. [Replicating Conda Environment](#replicating-conda-environment) | |
| 2. [Feature Extraction](#feature-extraction) | |
| ## Getting Started | |
| **This code and environment have been tested on Intel E5-2667 CPUs at 3.30GHz and NVIDIA A100 Tensor Core GPUs.** | |
| ### Pretrained Models and Training Logs | |
| We provide checkpoints of the MHG-GNN model pre-trained on a dataset of ~1.34M molecules curated from PubChem. (later) For model weights: [HuggingFace Link]() | |
| Add the MHG-GNN `pre-trained weights.pt` to the `models/` directory according to your needs. | |
| ### Replacicating Conda Environment | |
| Follow these steps to replicate our Conda environment and install the necessary libraries: | |
| ``` | |
| conda create --name mhg-gnn-env python=3.8.18 | |
| conda activate mhg-gnn-env | |
| ``` | |
| #### Install Packages with Conda | |
| ``` | |
| conda install -c conda-forge networkx=2.8 | |
| conda install numpy=1.23.5 | |
| # conda install -c conda-forge rdkit=2022.9.4 | |
| conda install pytorch=2.0.0 torchvision torchaudio -c pytorch | |
| conda install -c conda-forge torchinfo=1.8.0 | |
| conda install pyg -c pyg | |
| ``` | |
| #### Install Packages with pip | |
| ``` | |
| pip install rdkit torch-nl==0.3 torch-scatter torch-sparse | |
| ``` | |
| ## Feature Extraction | |
| The example notebook [mhg-gnn_encoder_decoder_example.ipynb](notebooks/mhg-gnn_encoder_decoder_example.ipynb) contains code to load checkpoint files and use the pre-trained model for encoder and decoder tasks. | |
| To load mhg-gnn, you can simply use: | |
| ```python | |
| import torch | |
| import load | |
| model = load.load() | |
| ``` | |
| To encode SMILES into embeddings, you can use: | |
| ```python | |
| with torch.no_grad(): | |
| repr = model.encode(["CCO", "O=C=O", "OC(=O)c1ccccc1C(=O)O"]) | |
| ``` | |
| For decoder, you can use the function, so you can return from embeddings to SMILES strings: | |
| ```python | |
| orig = model.decode(repr) | |
| ``` | |