ChemDFM-v2.0-14B
ChemDFM-v2.0 is the latest non-thinking model of ChemDFM, the pioneering open-sourced dialogue foundation model for Chemistry and molecule science.
To achieve better chemical capabilities, both the domain pre-training stage and the instruction tuning stage are upgraded. In the domain pre-training stage, we introduce web-scale molecules and reactions into the corpus along with their functional-group information and properties. In this way, ChemDFM is able to better acquire chemical knowledge at a finer level of granularity. In the instruction tuning stage, we significantly improve the diversity of our instruction tuning dataset by introducing more tasks and increasing the variability in the phrasing and expression of the instruction texts.
News
2025-10-26: The parameter of ChemDFM-R-14B is open-sourced!
2025-10-26: ChemDFM-v2.0-14B is released! The improved domain pre-training and instruction tuning procedure is implemented on Qwen2.5-14B to achieve a more advanced general LLM in Chemistry. More details can be found here.
2025-07-29: The paper of ChemDFM-R-14B is released on arXiv: ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge.
2024-11-09: ChemDFM-v1.5-8B is released! We implemented our domain pre-training and instruction tuning procedure on a stronger base model LLaMA-3-8B.
2024-03-12: The parameter of ChemDFM-v1.0-13B is open-sourced!
2024-01-26: The paper of ChemDFM-13B is released on arXiv: ChemDFM: Dialogue Foundation Model for Chemistry
local inference
To load and run ChemDFM-v2.0 locally, here is an example:
import re
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_name_or_id = "OpenDFM/ChemDFM-v2.0-14B"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16).to("cuda")
instruction = "Can you please give detailed descriptions of the molecule below?\nCl.O=C1c2c(O)cccc2-c2nn(CCNCCO)c3ccc(NCCNCCO)c1c23"
message = [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": instruction
}
]
input_text = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
do_sample=True,
top_k=20,
top_p=0.9,
temperature=0.9,
max_new_tokens=1024,
repetition_penalty=1.05,
eos_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
input_text = tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)
generated_text = generated_text[len(input_text):].strip()
print(f"{generated_text=}")
SMILES preprocess
When there involves SMILES notation in your input, we recommend to preprocess the SMILES with the rdkit package to canonicalize the SMILES. Here is an example:
from rdkit import Chem
def canonicalize_smiles(smiles):
mol = Chem.MolFromSmiles(smiles)
if mol is None:
return None
return Chem.MolToSmiles(mol, isomericSmiles=True, kekuleSmiles=False)
or directly:
from rdkit import Chem
def canonicalize_smiles(smiles):
return Chem.CanonSmiles(smiles, useChiral=True)
Citation
@article{zhao2025developing,
title={Developing ChemDFM as a large language foundation model for chemistry},
author={Zhao, Zihan and Ma, Da and Chen, Lu and Sun, Liangtai and Li, Zihao and Xia, Yi and Chen, Bo and Xu, Hongshen and Zhu, Zichen and Zhu, Su and others},
journal={Cell Reports Physical Science},
volume={6},
number={4},
year={2025},
publisher={Elsevier}
}
@misc{zhao2025chemdfmr,
title={ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge},
author={Zihan Zhao and Bo Chen and Ziping Wan and Lu Chen and Xuanze Lin and Shiyang Yu and Situo Zhang and Da Ma and Zichen Zhu and Danyang Zhang and Huayang Wang and Zhongyang Dai and Liyang Wen and Xin Chen and Kai Yu},
year={2025},
eprint={2507.21990},
archivePrefix={arXiv},
primaryClass={cs.CE},
url={https://arxiv.org/abs/2507.21990},
}
Disclaimer
Current version of ChemDFM may generate incorrect or misleading information. Please use it with caution and verify the results with domain experts before making any decisions based on the results.
Contact
If you have any questions or further requests, please contact Zihan Zhao, Bo Chen, and Lu Chen.
- Downloads last month
- 8