ChemDFM-v2.0-14B

ChemDFM-v2.0 is the latest non-thinking model of ChemDFM, the pioneering open-sourced dialogue foundation model for Chemistry and molecule science.

To achieve better chemical capabilities, both the domain pre-training stage and the instruction tuning stage are upgraded. In the domain pre-training stage, we introduce web-scale molecules and reactions into the corpus along with their functional-group information and properties. In this way, ChemDFM is able to better acquire chemical knowledge at a finer level of granularity. In the instruction tuning stage, we significantly improve the diversity of our instruction tuning dataset by introducing more tasks and increasing the variability in the phrasing and expression of the instruction texts.

News

local inference

To load and run ChemDFM-v2.0 locally, here is an example:

import re
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name_or_id = "OpenDFM/ChemDFM-v2.0-14B"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16).to("cuda")

instruction = "Can you please give detailed descriptions of the molecule below?\nCl.O=C1c2c(O)cccc2-c2nn(CCNCCO)c3ccc(NCCNCCO)c1c23"
message = [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": instruction
    }
]

input_text = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
    do_sample=True,
    top_k=20,
    top_p=0.9,
    temperature=0.9,
    max_new_tokens=1024,
    repetition_penalty=1.05,
    eos_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)

generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
input_text = tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)
generated_text = generated_text[len(input_text):].strip()
print(f"{generated_text=}")

SMILES preprocess

When there involves SMILES notation in your input, we recommend to preprocess the SMILES with the rdkit package to canonicalize the SMILES. Here is an example:

from rdkit import Chem
def canonicalize_smiles(smiles):
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        return None
    return Chem.MolToSmiles(mol, isomericSmiles=True, kekuleSmiles=False)

or directly:

from rdkit import Chem
def canonicalize_smiles(smiles):
    return Chem.CanonSmiles(smiles, useChiral=True)

Citation

@article{zhao2025developing,
         title={Developing ChemDFM as a large language foundation model for chemistry},
         author={Zhao, Zihan and Ma, Da and Chen, Lu and Sun, Liangtai and Li, Zihao and Xia, Yi and Chen, Bo and Xu, Hongshen and Zhu, Zichen and Zhu, Su and others},
         journal={Cell Reports Physical Science},
         volume={6},
         number={4},
         year={2025},
         publisher={Elsevier}
}

@misc{zhao2025chemdfmr,
      title={ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge}, 
      author={Zihan Zhao and Bo Chen and Ziping Wan and Lu Chen and Xuanze Lin and Shiyang Yu and Situo Zhang and Da Ma and Zichen Zhu and Danyang Zhang and Huayang Wang and Zhongyang Dai and Liyang Wen and Xin Chen and Kai Yu},
      year={2025},
      eprint={2507.21990},
      archivePrefix={arXiv},
      primaryClass={cs.CE},
      url={https://arxiv.org/abs/2507.21990}, 
}

Disclaimer

Current version of ChemDFM may generate incorrect or misleading information. Please use it with caution and verify the results with domain experts before making any decisions based on the results.

Contact

If you have any questions or further requests, please contact Zihan Zhao, Bo Chen, and Lu Chen.

Downloads last month
8
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenDFM/ChemDFM-v2.0-14B

Quantizations
2 models

Collection including OpenDFM/ChemDFM-v2.0-14B