-
Notifications
You must be signed in to change notification settings - Fork 245
Open
Labels
Description
Bug Description
I'm trying to create a rmgpy.molecule.Molecule() using smiles
rmgpy.molecule.Molecule(smiles='O=C=[X]')
But it fails from an RDKit error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_18284/1886713532.py in <module>
----> 1 rmgpy.molecule.Molecule(smiles='O=C=[X]')
~/rmg/RMG-Py/rmgpy/molecule/molecule.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.molecule.Molecule.__init__()
~/rmg/RMG-Py/rmgpy/molecule/molecule.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.molecule.Molecule.from_smiles()
~/rmg/RMG-Py/rmgpy/molecule/translator.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.translator.from_smiles()
~/rmg/RMG-Py/rmgpy/molecule/translator.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.translator.from_smiles()
~/rmg/RMG-Py/rmgpy/molecule/translator.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.translator._read()
~/rmg/RMG-Py/rmgpy/molecule/translator.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.translator._rdkit_translator()
ValueError: Could not interpret the identifier 'O=C=[X]'
Possible Solution
I'm wondering if we can get around RDKit's limitations with the following procedure:
- replace the surface sites X with a nonreactive species like Ar
- generate that molecule's adjacency list
- replace Ar with X and delete extra pairs of electrons on the X
It would look something like adding this to the _rdkit_translator() function in RMG-Py/rmgpy/molecule/translator.py:
from rdkit import Chem
import rmgpy.molecule
COX_smiles = 'O=C=[X]'
rdkitmol = Chem.MolFromSmiles(COX_smiles.replace('X', 'Ar'))
mol = rmgpy.molecule.molecule.Molecule()
output = rmgpy.molecule.converter.from_rdkit_mol(mol, rdkitmol)
lines = output.to_adjacency_list().split('\n')
for i, line in enumerate(lines):
if 'Ar' in line:
lines[i] = lines[i].replace('Ar', 'X')
# remove any extra electron pairs...
lines[i] = lines[i].replace('p3', 'p0')
lines[i] = lines[i].replace('p2', 'p0')
lines[i] = lines[i].replace('p1', 'p0')
adj_list = '\n'.join(lines)
COX = rmgpy.molecule.molecule.Molecule().from_adjacency_list(adj_list)
print(COX.smiles)
print(COX.to_adjacency_list())
Other Possible Solution
One workaround is to just instantiate the molecule using the adjacency list. But sometimes that's annoying- especially since we can use smiles for gas phase molecules without any issue. At the very least, a more descriptive error message is probably required, telling the user that you have to use the adjacency list.