4. Discussion¶
In this study, we assembled a dataset of CP solution structures. Additionally, we created computational workflows to run and analyse accelerated MD simulations, compare simulation outputs to experimental NMR data and chemical informatics-based conformer generators. The MacroConf dataset is limited by the available experimental data, which can be seen in the heterogeneous composition and varying NMR measurements and NOE quality. The dataset does not cover the whole chemical space of short cyclic peptides, with some compounds similar to others. Further, the underlying NMR experiments were performed in different solvents and were not necessarily of equal quality, which is reflected in the different reporting of the NOE values: some NOEs are binned, others include upper & lower bounds, as well as experimental distances or various combinations thereof.
As a first step, we only evaluated natural cyclic peptides with high quality NOE data. This made the forcefield choice slightly easier and avoided manual parametrisation or charge derivations. The Amber-ff14SB protein forcefields are usually designed for analysing bigger (more ordered) proteins. CPs are disordered, and the ring constraint leads to less accurate descriptions. To consider the chemically modified macrocycles of the MacroConf dataset, we will need to use different forcefield parametrisations.
The GaMD and aMD simulations performed comparably at sampling conformations of the CPs studied when considering dPCA PES. Both methods were superior at sampling compared to cMD, which does not converge within equivalent simulation times. We required long simulation times (1000 ns or more) to achieve convergence for (G)aMD, which makes these methods much more expensive (runtime ~ 7 days for a 2000 ns simulation on a Nvidia GeForce RTX2080 GPU) than the chemical informatics conformer generators (runtime: several seconds – 10’s of minute on an Intel Core i9-9920X CPU). However, MD simulations allow us to retrieve a time resolved trajectory, which includes thermodynamic information and explicit solute-solvent interactions that are unavailable for chemical informatics conformer generators. Our analysis confirms that these simpler methods only enumerate geometrically plausible structures. A caveat of GaMD/aMD simulations is that they are strongly parameter dependent. Additionally, the reweighting step requires several parameters which creates sources for error.
In terms of their capability of reproducing experimental NOE values, all MD methods including cMD, produced comparable results for compound 22. This could be for two reasons. First, while the sampling looked dissimilar in the dihedral PCA representation, the observed structures in the cMD or non-converged GaMD/aMD simulations are close enough to the experimental structures, such that no significantly different performance was observed. Alternatively, the NOE metric is not sensitive enough to pick up subtle quality differences of the different methods implemented. The type of solvent used, or convergence of simulations did not make a significant difference for the studied compounds. This could be because the DMSO parameters might not have been good enough (or the simulations were not converged) to reproduce experiments better than with water or other solvents. Alternatively, the experimental NOEs might not have been sensitive enough to pick up minor changes between different solvents. Explicit solvent interactions in the MD simulations are important. Different solvents produced different structures (different side-chain arrangements), and relative populations as confirmed by visual inspection of the trajectories and dPCA PES. However, it seems that for reproducing the experimental NOEs, this is only of minor relevance. This conclusion depends on the compound studied. Compound 24 (not shown in the report) is present in solution in an equilibrium of cis/trans isomers. In our cMD simulations, only the cis isomer was sampled, the trans structure was not observed at all. (G)aMD however, was able to sample both isomers and produce good agreement with the experimental NOE values.
A general issue of the NOE metric is its bias to short distances. In the end, the simulations are reduced to whether they sample enough short distances to fulfil the experimental NOEs. Only a fraction of the conformers found in MD simulations were relevant for certain NOEs, but this is true for experiments, where the NOEs can be biased by a small minority of conformations.[49] NOEs by themselves are not enough to limit how many different conformers are present in solution, we can only show if the observed set of NOE values can be described satisfyingly by a single or multiple structures, e.g. via a NAMFIS analysis.
The previous conclusions depend on the dimensionality reductions performed, mainly dPCA. dPCA only incorporates dihedral angles. This may include implicit side chain information, but side chains are not explicitly considered in this work. A minimum in the dPCA PES might be composed of multiple distinct structures with similar backbones. We need to further assess how similar these structures are by considering other dimensionality reductions, combinations of the already performed dimensionality reductions, or by considering other metrics such as shape-based metrics.