Protein mutation¶

In this tutorial we will show how pmx can be used to predict changes in protein stability, similarly to what was done in Gapsys et al. All input files needed for this tutorial can be downloaded here: input files.

In order to calculate how a mutation affects the folding free energy of a protein (\(\Delta \Delta G_{Folding}^{Mutation}\)), we need to calculate the free energy difference of alchemically mutating both the folded and unfolded state of the protein (\(\Delta G_{Folded}^{Mutation} - \Delta G_{Unfolded}^{Mutation}\)). Effectively, we use the following thermodynamic cycle to recover \(\Delta \Delta G\) via the non-physical transformations shown as the horizontal legs of the cycle.

Here, we will go through the steps for the calculation of \(\Delta G_{Folded}^{Mutation}\), using the protein.pdb file provided. \(\Delta G_{Unfolded}^{Mutation}\) can be obtained by following exactly the same steps, but using the file tripeptide.pdb as the starting structure. These input pdb files have already been processed so to contain hydrogen atoms and amber atom names.

Preparation of hybrid files¶

As an example system, here we use the Trp-cage miniprotein (file protein.pdb). Firstly, we need to create a structure file containing the correct hybrid residue needed for the mutation of interest. This can be done using the pmx mutate script:

$ pmx mutate -f protein.pdb -o mutant.pdb

The script will ask what force field you would like to use; to get the same results as we report here, choose the amber99sb-star-ildn-mut one. Then, choose to mutate the Trp6 (W6) residue into Phe (F).

Next, to create the corresponding topology to this hybrid structure, you need to use the gmx tool pdb2gmx using the same force field you previously chose:

$ gmx pdb2gmx -f mutant.pdb -o conf.pdb -p topol.top -ff amber99sb-star-ildn-mut -water tip3p

The output topology topol.top contains the hybrid residue W6F, like the structure file conf.pdb. However, the B-state columns do not yet contain the right information for Gromacs to run the calculations. To fill the B-states with the necessary data, you can use the pmx gentop script:

$ pmx gentop -p topol.top -o newtop.top

Now with the conf.pdb and newtop.top files you are ready to setup the system as you would do for a typical Gromacs simulation.

The steps just described can also be performed using the pmx webserver, which you can find in the navigation bar on the left. Alternatively, you can automate the above steps in a script using the pmx API as shown here and also discussed in the examples:

#! /usr/bin/env python

from pmx import *

# load, mutate, and save the PDB file
m = Model('peptide.pdb', rename_atoms=True)
m2 = mutate(m=m, mut_resid=6, mut_resname='F', ff='amber99sb-star-ildn-mut')
m2.write('mutant.pdb')

# run pdb2gmx
gmx.pdb2gmx(f='mutant.pdb', o='conf.pdb', p='topol.top', ff='amber99sb-star-ildn-mut',
            water='tip3p')

# load topology, fill B states, then write a new topology file
topol = Topology('topol.top')
pmxtop, _ = gen_hybrid_top(topol)
pmxtop.write('newtop.top')

System setup¶

Now we are ready to set up the system to be simulated in Gromacs. The following are the standard steps used in any Gromacs simulation; you can find a more detailed explaination of these steps in this tutorial.

$ gmx editconf -f conf.pdb -o box.pdb -bt dodecahedron -d 1.0
$ gmx solvate -cp box -cs spc216 -p newtop -o water.pdb
$ gmx grompp -f genion.mdp -c water.pdb -p newtop.top -o genion.tpr
$ echo "SOL" | gmx genion -s genion.tpr -p newtop.top -neutral -conc 0.15 -o ions.pdb

Now we have the file ions.pdb containing our protein, solvated in a dodecahedral box and with Na and Cl ions neutralising the solution at a concentration of 150 mM. These setup steps can also be automated using the pmx.gmx module:

from pmx import gmx

gmx.editconf(f='conf.pdb', o='box.pdb', bt='dodecahedron', d=1.0)
gmx.solvate(cp='box.pdb', cs='spc216', p='newtop.top', o='water.pdb')
gmx.grompp(f='em.mdp', c='water.pdb', p='newtop.top', o='genion.tpr')
gmx.genion(s='genion.tpr', p='newtop.top', neutral=True, conc=0.15, o='ions.pdb')

Equilibrium MD¶

At this stage, we are ready to start simulating the system. What we need to do is to collect equilibrium configurations of the system in both its two end states; i.e. state A in which residue 6 is a Trp, and state B in which it is Phe. Thus, we will run energy minimisation, a short equilibration, and the production equilibrium MD for both the end states. To keep things tidy, we use the following folder structure, which should already be prepared in the archive with the input files you downloaded at the top of this page.

pwd/
|
 -- forward/
|   |
|    -- equil_md/
|   |
|    -- nonequil_md/
|
 -- reverse/
    |
     -- equil_md/
    |
     -- nonequil_md/

where pwd is the current working directory where you have setup the system, forward is the folder that will contain the equilibrium and non-equilibrium data concerning the forward transformation (\(T6 \to F6\)), and reverse the folder with the data concerning the reverse transformation (\(F6 \to T6\)).

These are not very intensive simulations, so you might be able to run them on your workstation if you have a GPU. However, they might still take some time (e.g. 2-3 hours), so you might want to go get a coffee or submit them to a cluster and work on something else in the meantime.

We start with running the equilibrium simulations needed for the forward transformation. First we go into the relevant folder:

$ cd forward/equil_md

Then, we run the minimisation:

$ gmx grompp -f f_enmin.mdp -c ../../ions.pdb -p ../../newtop.top -o enmin.tpr
$ gmx mdrun -s enmin.tpr -deffnm enmin -v

The equilibration with position restraints:

$ gmx grompp -f f_npt.mdp -c enmin.gro -p ../../newtop.top -o npt.tpr -maxwarn 1
$ gmx mdrun -s npt.tpr -deffnm npt -v

And finally the actual equilibrium MD run:

$ gmx grompp -f f_equil.mdp -c npt.gro -p ../../newtop.top -o equil.tpr -maxwarn 1
$ gmx mdrun -s equil.tpr -deffnm equil -v

Now we have the equilibrium sampling needed for the forward transformation. For the reverse one, we do exactly the same from the reverse/equil_md folder and using the reverse mdp files (r_*.mdp) you find in it. Note that the only difference in the forward and reverse mdp files is whether we choose to be in state A (i.e. 0, and in this case W6) or state B (i.e. 1, F6):

# this is in the forward (f_*) mdp files
free-energy       = yes  ; use free energy code
init-lambda       = 0    ; start from state A
delta-lambda      = 0    ; stay in state A

# and this is in the reverse (r_*) mdp files
free-energy       = yes  ; use free energy code
init-lambda       = 1    ; start from state B
delta-lambda      = 0    ; stay in state B

As before, these steps can be automated using the pmx.gmx module. Nonetheless, in practice, if you need to run many or more demanding calculations (e.g. longer/more simulations of larger systems) you will probably do this on some computing cluster, so that you will find more convenient to include the above steps in your submit script.

Non-equilibrium MD¶

Once we have collected equilibrium samples of the systems of interest, we can move on with the non-equilibrium part of the calculations, which will provide the work values needed for the free energy estimate.

For this, we move into the forward/nonequil_md folder:

$ cd ../../forward/nonequil_md/

Then, we extract 50 snapshots from the 5 ns equilibrium simulation (i.e. one every 100 ps, starting with the one at 100 ps) using gmx trjconv:

$ echo "System" | gmx trjconv -f ../equil_md/equil.trr -s ../equil_md/equil.tpr -sep -b 100 -o frame_.gro

The .trr file is used so that the starting snapshots contain the velocities from the equilibrium sampling. At this point you should have 50 gro files in the folder, indexed from 0 to 49. Now we create a folder for each starting frame, so to have a different folder (index from 1 to 50) for each non-equilibrium trajectory:

$ for i in $( seq 0 49 ); do n=$((i+1)); mkdir frame$n; mv frame_$i.gro frame$n/frame.gro; done

Now you are ready to run 50 non-equilibrium forward transitions starting from these frames. The length of each non-equilibrium trajectory we use here is 80 ps; however, depending on the specifics of the system, you might need to use longer trajectories or you might get away with shorter ones (see the analysis tutorial for more details on this). The important/non-standard bit in the f_nonequil.mdp file is the following:

free-energy       = yes
init-lambda       = 0        ; start from state A
delta-lambda      = 2.5e-05  ; complete transition in this number of steps
sc-coul           = yes   ; use soft-core also for coulombic interactions
sc-alpha          = 0.3   ; soft-core
sc-sigma          = 0.25  ;
sc-power          = 1     ;
nstdhdl           = 1     ; write to dhdl at each step

which effectively says to use the free energy code, to go from state \(0\) to \(1\) in \(40,000\) steps of size \(2.5 \times 10^{-5}\), and to use a soft-core potential for both Lennard-Jones and Coulombic interactions.

Depending on your computing setup/situation, you can run these simulations in parallel or one after the other one. Here, we run them locally and in series in this way:

$ for i in $( seq 1 50 ); do
$   cd frame$i
$   gmx grompp -f ../f_nonequil.mdp -c frame.gro -p ../../../newtop.top -o nonequil.tpr -maxwarn 1
$   gmx mdrun -s nonequil.tpr -deffnm nonequil -dhdl dgdl.xvg -v
$   cd ../
$ done

The files you really need for the free energy estimate are the dgdl.xvg files. You need these also for the reverse transformation, which you can obtain by going through the same steps described here while using the files in the reverse folder. The r_nonequil.mdp file you will use is the same as the f_nonequil.mdp file, apart from the two following lines:

init-lambda       = 1        ; start from state B
delta-lambda      = -2.5e-05 ; complete transition in this number of steps

Since in the reverse case we need to start from state \(1\) (i.e. B, F6) and go to state \(0\) (i.e. A, W6).

Free energy estimation¶

Great! Now we have all the data needed for estimating \(\Delta G_{Folded}^{Mutation}\), which is stored in the forward/nonequil_md/frame*/dgdl.xvg and reverse/nonequil_md/frame*/dgdl.xvg files.

The simplest way to estimate the free energy difference is to use the pmx analyse tool. For instance, here we use it to estimate \(\Delta G_{Folded}^{Mutation}\) with BAR:

$ pmx analyse -fA forward/nonequil_md/frame*/dgdl.xvg -fB reverse/nonequil_md/frame*/dgdl.xvg -t 298.15 -m bar

For a more detailed discussion about free energy estimation and data analysis you can have a look at the analysis tutorial. In brief, for this calculation, this is the distribution of work values and the BAR \(\Delta G\) estimate we obtained:

Often, the easiest way to get reliable error estimates is to just repeat the whole calculation multiple times to obtain a standard error. For this example, we repeated the above steps 5 times, and obtained the following mean and standard error:

\[\Delta G_{Folded}^{Mutation} = 9.75 \pm 1.48\ kJ/mol\]

Final results¶

At this point, if you still remember, the number we were interested in was \(\Delta \Delta G_{Folding}^{Mutation}\), that is, the change in the folding free energy upon introducing the W6F mutation. This value is recovered from \(\Delta G_{Folded}^{Mutation}\), which we just calculated, and \(\Delta G_{Unfolded}^{Mutation}\), which you can obtain by running the calculations starting from tripeptide.pdb rather than protein.pdb. Using 5 repeats, we obtained the following mean and standard error for the latter free energy difference:

\[\Delta G_{Unfolded}^{Mutation} = -4.82 \pm 0.19\ kJ/mol\]

Thus, our final result of interest is the following:

\[\begin{split}\Delta \Delta G_{Folding}^{Mutation} &= \Delta G_{Folded}^{Mutation} - \Delta G_{Unfolded}^{Mutation} = \\ &= 9.75 \pm 1.48\ - (-4.82 \pm 0.19) = \\ &= 14.57 \pm 1.49\ kJ/mol\end{split}\]

This suggests that the W6F mutation has a destabilising effect on the fold of Trp-cage of about \(14.6 \pm 1.5\ kJ/mol\), which is not far from the experimentally determined value of \(12.5 \pm 0.6\ kJ/mol\).