Protein mutation ================ In this tutorial we will show how **pmx** can be used to predict changes in protein stability, similarly to what was done in `Gapsys et al `_. All input files needed for this tutorial can be downloaded here: :download:`input files `. In order to calculate how a mutation affects the folding free energy of a protein (:math:`\Delta \Delta G_{Folding}^{Mutation}`), we need to calculate the free energy difference of *alchemically* mutating both the folded and unfolded state of the protein (:math:`\Delta G_{Folded}^{Mutation} - \Delta G_{Unfolded}^{Mutation}`). Effectively, we use the following thermodynamic cycle to recover :math:`\Delta \Delta G` via the non-physical transformations shown as the horizontal legs of the cycle. .. image:: cycle.png :width: 400px :align: center Here, we will go through the steps for the calculation of :math:`\Delta G_{Folded}^{Mutation}`, using the ``protein.pdb`` file provided. :math:`\Delta G_{Unfolded}^{Mutation}` can be obtained by following exactly the same steps, but using the file ``tripeptide.pdb`` as the starting structure. These input ``pdb`` files have already been processed so to contain hydrogen atoms and *amber* atom names. Preparation of hybrid files --------------------------- As an example system, here we use the *Trp-cage* miniprotein (file ``protein.pdb``). Firstly, we need to create a structure file containing the correct hybrid residue needed for the mutation of interest. This can be done using the ``pmx mutate`` script:: $ pmx mutate -f protein.pdb -o mutant.pdb The script will ask what force field you would like to use; to get the same results as we report here, choose the *amber99sb-star-ildn-mut* one. Then, choose to mutate the Trp6 (W6) residue into Phe (F). Next, to create the corresponding topology to this hybrid structure, you need to use the ``gmx`` tool ``pdb2gmx`` using the same force field you previously chose:: $ gmx pdb2gmx -f mutant.pdb -o conf.pdb -p topol.top -ff amber99sb-star-ildn-mut -water tip3p The output topology ``topol.top`` contains the hybrid residue W6F, like the structure file ``conf.pdb``. However, the B-state columns do not yet contain the right information for Gromacs to run the calculations. To fill the B-states with the necessary data, you can use the ``pmx gentop`` script:: $ pmx gentop -p topol.top -o newtop.top Now with the ``conf.pdb`` and ``newtop.top`` files you are ready to setup the system as you would do for a typical Gromacs simulation. The steps just described can also be performed using the **pmx webserver**, which you can find in the navigation bar on the left. Alternatively, you can automate the above steps in a script using the **pmx** API as shown here and also discussed in :ref:`the examples `:: #! /usr/bin/env python from pmx import * # load, mutate, and save the PDB file m = Model('peptide.pdb', rename_atoms=True) m2 = mutate(m=m, mut_resid=6, mut_resname='F', ff='amber99sb-star-ildn-mut') m2.write('mutant.pdb') # run pdb2gmx gmx.pdb2gmx(f='mutant.pdb', o='conf.pdb', p='topol.top', ff='amber99sb-star-ildn-mut', water='tip3p') # load topology, fill B states, then write a new topology file topol = Topology('topol.top') pmxtop, _ = gen_hybrid_top(topol) pmxtop.write('newtop.top') System setup ------------ Now we are ready to set up the system to be simulated in Gromacs. The following are the standard steps used in any Gromacs simulation; you can find a more detailed explaination of these steps in `this tutorial `_. :: $ gmx editconf -f conf.pdb -o box.pdb -bt dodecahedron -d 1.0 $ gmx solvate -cp box -cs spc216 -p newtop -o water.pdb $ gmx grompp -f genion.mdp -c water.pdb -p newtop.top -o genion.tpr $ echo "SOL" | gmx genion -s genion.tpr -p newtop.top -neutral -conc 0.15 -o ions.pdb Now we have the file ``ions.pdb`` containing our protein, solvated in a dodecahedral box and with Na and Cl ions neutralising the solution at a concentration of 150 mM. These setup steps can also be automated using the :mod:`pmx.gmx` module: :: from pmx import gmx gmx.editconf(f='conf.pdb', o='box.pdb', bt='dodecahedron', d=1.0) gmx.solvate(cp='box.pdb', cs='spc216', p='newtop.top', o='water.pdb') gmx.grompp(f='em.mdp', c='water.pdb', p='newtop.top', o='genion.tpr') gmx.genion(s='genion.tpr', p='newtop.top', neutral=True, conc=0.15, o='ions.pdb') Equilibrium MD -------------- At this stage, we are ready to start simulating the system. What we need to do is to collect equilibrium configurations of the system in both its two end states; i.e. state A in which residue 6 is a Trp, and state B in which it is Phe. Thus, we will run energy minimisation, a short equilibration, and the production equilibrium MD for both the end states. To keep things tidy, we use the following folder structure, which should already be prepared in the archive with the input files you downloaded at the top of this page. :: pwd/ | -- forward/ | | | -- equil_md/ | | | -- nonequil_md/ | -- reverse/ | -- equil_md/ | -- nonequil_md/ where ``pwd`` is the current working directory where you have setup the system, ``forward`` is the folder that will contain the equilibrium and non-equilibrium data concerning the forward transformation (:math:`T6 \to F6`), and ``reverse`` the folder with the data concerning the reverse transformation (:math:`F6 \to T6`). These are not very intensive simulations, so you might be able to run them on your workstation if you have a GPU. However, they might still take some time (e.g. 2-3 hours), so you might want to go get a coffee or submit them to a cluster and work on something else in the meantime. We start with running the equilibrium simulations needed for the forward transformation. First we go into the relevant folder:: $ cd forward/equil_md Then, we run the minimisation:: $ gmx grompp -f f_enmin.mdp -c ../../ions.pdb -p ../../newtop.top -o enmin.tpr $ gmx mdrun -s enmin.tpr -deffnm enmin -v The equilibration with position restraints:: $ gmx grompp -f f_npt.mdp -c enmin.gro -p ../../newtop.top -o npt.tpr -maxwarn 1 $ gmx mdrun -s npt.tpr -deffnm npt -v And finally the actual equilibrium MD run:: $ gmx grompp -f f_equil.mdp -c npt.gro -p ../../newtop.top -o equil.tpr -maxwarn 1 $ gmx mdrun -s equil.tpr -deffnm equil -v Now we have the equilibrium sampling needed for the *forward* transformation. For the reverse one, we do exactly the same from the ``reverse/equil_md`` folder and using the *reverse* mdp files (``r_*.mdp``) you find in it. Note that the only difference in the *forward* and *reverse* mdp files is whether we choose to be in state *A* (i.e. *0*, and in this case W6) or state *B* (i.e. *1*, F6):: # this is in the forward (f_*) mdp files free-energy = yes ; use free energy code init-lambda = 0 ; start from state A delta-lambda = 0 ; stay in state A # and this is in the reverse (r_*) mdp files free-energy = yes ; use free energy code init-lambda = 1 ; start from state B delta-lambda = 0 ; stay in state B As before, these steps can be automated using the :mod:`pmx.gmx` module. Nonetheless, in practice, if you need to run many or more demanding calculations (e.g. longer/more simulations of larger systems) you will probably do this on some computing cluster, so that you will find more convenient to include the above steps in your submit script. Non-equilibrium MD ------------------ Once we have collected equilibrium samples of the systems of interest, we can move on with the non-equilibrium part of the calculations, which will provide the work values needed for the free energy estimate. For this, we move into the ``forward/nonequil_md`` folder:: $ cd ../../forward/nonequil_md/ Then, we extract 50 snapshots from the 5 ns equilibrium simulation (i.e. one every 100 ps, starting with the one at 100 ps) using ``gmx trjconv``:: $ echo "System" | gmx trjconv -f ../equil_md/equil.trr -s ../equil_md/equil.tpr -sep -b 100 -o frame_.gro The ``.trr`` file is used so that the starting snapshots contain the velocities from the equilibrium sampling. At this point you should have 50 ``gro`` files in the folder, indexed from 0 to 49. Now we create a folder for each starting frame, so to have a different folder (index from 1 to 50) for each non-equilibrium trajectory:: $ for i in $( seq 0 49 ); do n=$((i+1)); mkdir frame$n; mv frame_$i.gro frame$n/frame.gro; done Now you are ready to run 50 non-equilibrium forward transitions starting from these frames. The length of each non-equilibrium trajectory we use here is 80 ps; however, depending on the specifics of the system, you might need to use longer trajectories or you might get away with shorter ones (see the :ref:`analysis tutorial ` for more details on this). The important/non-standard bit in the ``f_nonequil.mdp`` file is the following:: free-energy = yes init-lambda = 0 ; start from state A delta-lambda = 2.5e-05 ; complete transition in this number of steps sc-coul = yes ; use soft-core also for coulombic interactions sc-alpha = 0.3 ; soft-core sc-sigma = 0.25 ; sc-power = 1 ; nstdhdl = 1 ; write to dhdl at each step which effectively says to use the free energy code, to go from state :math:`0` to :math:`1` in :math:`40,000` steps of size :math:`2.5 \times 10^{-5}`, and to use a soft-core potential for both Lennard-Jones and Coulombic interactions. Depending on your computing setup/situation, you can run these simulations in parallel or one after the other one. Here, we run them locally and in series in this way:: $ for i in $( seq 1 50 ); do $ cd frame$i $ gmx grompp -f ../f_nonequil.mdp -c frame.gro -p ../../../newtop.top -o nonequil.tpr -maxwarn 1 $ gmx mdrun -s nonequil.tpr -deffnm nonequil -dhdl dgdl.xvg -v $ cd ../ $ done The files you really need for the free energy estimate are the ``dgdl.xvg`` files. You need these also for the reverse transformation, which you can obtain by going through the same steps described here while using the files in the ``reverse`` folder. The ``r_nonequil.mdp`` file you will use is the same as the ``f_nonequil.mdp`` file, apart from the two following lines:: init-lambda = 1 ; start from state B delta-lambda = -2.5e-05 ; complete transition in this number of steps Since in the reverse case we need to start from state :math:`1` (i.e. B, F6) and go to state :math:`0` (i.e. A, W6). Free energy estimation ---------------------- Great! Now we have all the data needed for estimating :math:`\Delta G_{Folded}^{Mutation}`, which is stored in the ``forward/nonequil_md/frame*/dgdl.xvg`` and ``reverse/nonequil_md/frame*/dgdl.xvg`` files. The simplest way to estimate the free energy difference is to use the ``pmx analyse`` tool. For instance, here we use it to estimate :math:`\Delta G_{Folded}^{Mutation}` with BAR:: $ pmx analyse -fA forward/nonequil_md/frame*/dgdl.xvg -fB reverse/nonequil_md/frame*/dgdl.xvg -t 298.15 -m bar For a more detailed discussion about free energy estimation and data analysis you can have a look at the :ref:`analysis tutorial `. In brief, for this calculation, this is the distribution of work values and the BAR :math:`\Delta G` estimate we obtained: .. image:: protein_mut_wdist.png Often, the easiest way to get reliable error estimates is to just repeat the whole calculation multiple times to obtain a standard error. For this example, we repeated the above steps 5 times, and obtained the following mean and standard error: .. math:: \Delta G_{Folded}^{Mutation} = 9.75 \pm 1.48\ kJ/mol Final results ------------- At this point, if you still remember, the number we were interested in was :math:`\Delta \Delta G_{Folding}^{Mutation}`, that is, the change in the folding free energy upon introducing the W6F mutation. This value is recovered from :math:`\Delta G_{Folded}^{Mutation}`, which we just calculated, and :math:`\Delta G_{Unfolded}^{Mutation}`, which you can obtain by running the calculations starting from ``tripeptide.pdb`` rather than ``protein.pdb``. Using 5 repeats, we obtained the following mean and standard error for the latter free energy difference: .. math:: \Delta G_{Unfolded}^{Mutation} = -4.82 \pm 0.19\ kJ/mol Thus, our final result of interest is the following: .. math:: \Delta \Delta G_{Folding}^{Mutation} &= \Delta G_{Folded}^{Mutation} - \Delta G_{Unfolded}^{Mutation} = \\ &= 9.75 \pm 1.48\ - (-4.82 \pm 0.19) = \\ &= 14.57 \pm 1.49\ kJ/mol This suggests that the W6F mutation has a destabilising effect on the fold of Trp-cage of about :math:`14.6 \pm 1.5\ kJ/mol`, which is not far from the experimentally determined value of :math:`12.5 \pm 0.6\ kJ/mol`.