Electrostatics in proteins and proteinligand complexes


Download Electrostatics in proteins and proteinligand complexes


Preview text

Review
For reprint orders, please contact [email protected]
Electrostatics in proteins and protein–ligand complexes

Accurate computational methods for predicting electrostatic energies are of major importance for our understanding of protein energetics in general for computer-aided drug design as well as for the design of novel biocatalysts and protein therapeutics. Electrostatic energies are of particular importance in such applications as virtual screening, drug design and protein–protein docking due to the high charge density of protein ligands and small-molecule drugs, and the frequent protonation state changes observed when drugs bind to their protein targets. Therefore, the development of a reliable and fast algorithm for the evaluation of electrostatic free energies, as an important contributor to the overall protein energy function, has been the focus for many scientists over the past three decades. In this review we describe the current state-of-the-art in modeling electrostatic effects in proteins and protein–ligand complexes. We focus mainly on the merits and drawbacks of the continuum methodology, and speculate on future directions in refining algorithms for calculating electrostatic energies in proteins using experimental data.

Electrostatic interactions are among the most important factors to be considered when analyzing the function of biological molecules. Since the pioneering work of Linderstrøm-Lang [1], significant progress has been made in the qualitative understanding and quantitative evaluation of electrostatic interactions. It is now generally recognized that one must analyze the electrostatic forces in a biomolecule to properly understand its function. Structure-based calculations of the electrostatic potential in and around a biomolecule, therefore, play a large role in virtually any protein-related research effort, as witnessed by the omnipresence of graphical representations of electrostatic potential surface maps in publications and oral presentations.
A variety of computational methods and algorithms for studying proteins and their interactions with other molecules has been constructed. Methods such as molecular dynamics (MD) simulations, Brownian dynamics simulations, protein pKa calculations, protein-design algorithms and protein–drug and protein–protein docking algorithms are widely used in modern biological research. All these algorithms use 3D structures of protein molecules to predict and analyze protein characteristics, such as catalytic activity, folding pathway, stability, solubility and ligand/drug binding specificity. Electrostatic energies play a large role in determining all of the aforementioned characteristics, and the full potential of methods for predicting physical/biochemical characteristics will, therefore, only be realized once we are able

to accurately calculate electrostatic energies and forces in and around proteins. The calculation of electrostatic interactions in biomolecules arguably presents one of the largest obstacles in improving the accuracy and usefulness of structure-based energy-calculation algorithms.
Unfortunately, electrostatic phenomena in biomolecular systems are very complex and much research is needed in this field to understand the factors that are important for determining their properties. The comparatively long-range nature of electrostatic forces presents a significant obstacle to progress, since one must consider interactions between a large number of solvent and solute atoms in order to solve the problem correctly. The many charged groups in proteins and the proteininduced variation in amino acid pKa values further complicate the calculation, since the protein charge distribution, which itself is dependent on the electrostatics in the protein, has to be calculated. The hetero­geneous dielectric properties of the protein interior complicates the evaluation of electrostatic interactions even further by requiring energy-calculation methods to model protein and solvent reorganization in response to the variations in electrostatic field. Calculating the electrostatic field in a protein molecule correctly is thus akin to hitting a moving target using a shotgun with a bent barrel while being in the middle of an earthquake.
Despite the somewhat bleak analogy and the aforementioned difficulties, current algorithms have made significant headway in accurately predicting electrostatic effects in proteins. In this

Predrag Kukić1,2 & Jens Erik Nielsen1† †Author for correspondence 1School of Biomolecular and Biomedical Science, Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland 2School of Computer Science and Informatics, Complex and Adaptive Systems Laboratory, University College, Dublin, Belfield, Dublin 4, Ireland Tel.: +35 317 166 724 Fax: +35 317 166 898 E-mail: [email protected]

10.4155/FMC.10.6 © 2010 Future Science Ltd

Future Med. Chem. (2010) 2(4), 647–666

ISSN 1756-8919

647

Review | Kukić & Nielsen

Key Terms
Continuum/macroscopic model: Model that mimics dielectric properties of a solvent using the homogenous dielectric medium of a high dielectric constant
Microscopic model: Model that explicitly represents all atoms in the protein–solvent system
Poisson–Boltzmann equation: Partial differential equation, which, for a known distribution of charges, the dielectric constant and ionic strength gives the electrostatic potential in and around the protein

review we focus on the methods that have proven effective and accurate in modeling electros­tatic effects in proteins and protein–ligand complexes. We begin our review by introducing the concepts of electrostatic interactions in proteins and the basic laws and models necessary for electrostatic energy calculations, such as Coulomb’s law, the Poisson equation, the Boltzmann distribution of ions in solution and the Born formula. Later, we provide an overview of the specific computational approaches used for modeling electrostatic interactions in proteins and protein–ligand complexes. We describe the widely used continuum models; the Poisson–Boltzmann equation (PBE) solvers and the models based on the Generalized Born (GB) framework, as well as highlight recent progress in improving the accuracy of these methods. We discuss the role of electrostatic energies in the association of proteins and ligands. We concentrate on the electrostatic contribution to protein– ligand binding affinities, the pH-dependence of the free energy of the binding process, and on the role of electrostatics in determining the association rates for diffusion-limited enzymes. Finally, we comment on current problems and discuss possible directions for future advances in the modeling of protein electrostatics.
This review gives an overview of basic methods for predicting protein electrostatic forces and their application to protein–ligand binding problems. Many other exciting areas of research also present formidable challenges to current models for predicting electrostatic energies, with computational protein design, the engineering of the pH-dependence of enzyme catalysis, prediction of proteinfolding pathways and prediction of mutationinduced changes in protein stability presenting major challenges to computational biologists. It is outside the scope of this review to address all of these areas, and the interested reader is referred to a selection of recent reviews [2–11].

„„Coulomb’s law The fundamental problem in classical electrostatic theory is to find the electrostatic potential at every point in space for a given distribution of charges. In the simplest scenario, when point charges are placed in a homogenous medium, such as a vacuum, the electrostatic potential φ(r) at a particular position in the space can be calculated with Coulomb’s law:

/ {^ r h = 1

qi

4rf0 f i ri

Equation 1

where φ represents the electrostatic potential, qi
is the charge of that point charge and ri is the distance from the point charge i. e represents the

dielectric constant of the medium, and a value of

e0 ~ 8.85 × 10-12 F/m quantifies the propagation of the electric field in vacuum.

Another form of Coulomb’s law frequently used

in protein applications evaluates the total electro-

static interaction energy of the system consisting

of N point charges immersed in the homogenous

medium. The equation can be written as:

DGel = 332kcfarl/mol /iN= 1 /Nj = 1, qriiqj j
i Equation 2
where DGel designates the electrostatic interaction energy (kcal/mol at room temperature) relative to the energy between charges positioned at infinite separation. qi and qj are point charges given in units of electron charge (e ~ 1.6 × 10-19 C) and divided by the distance rij ( Å). er represents the relative dielectric constant of the system and is expressed by the value relative to that of a vacuum. This form of Coulomb’s law has been frequently used in the microscopic modeling of electrostatic interactions in proteins, where the electrostatic free energies between all charges in the protein–solvent system are evaluated using Equation 2 and low dielectric constant [4].

Electrostatic interactions in proteins Before embarking on a detailed account of protein electrostatic modeling, we describe the basic equations that allow a qualitative understanding of the factors that influence the strength of electrostatic interactions in and around proteins. We begin our discussion with Coulomb’s law, the Poisson equation, the Boltzmann distribution of ions in the solution and the Born model. We will proceed to explain the significance of permanent dipoles, mobile ions and electronic polarization in modulating the electrostatic field in the protein–solvent system.

„„Poisson equation Evaluation of the electrostatic energy in systems containing a vast amount of point charges, such as a protein–solvent system, can be a very complex and time-consuming process. Therefore, instead of explicitly representing all point charges in the system and their abilities to reorient, these effects are approximated in continuum models using the dielectric constant. The electrostatic potential in continuum models can be calculated using the Poisson equation:
4: f^ r h 4 {^ r h + 4rt^ r h = 0
Equation 3

648

Future Med. Chem. (2010) 2(4)

future science group

Electrostatics in proteins & protein–ligand complexes | Review

where the dielectric constant, e(r), is a function of the position r, and r(r) similarly represents the charge density in the medium. The charge density is simply the distribution of charges throughout the system, whereas e(r) reflects all effects that are not explicitly modeled, such as induced dipoles and/or the relaxation of charges. Thus, in continuum models the protein is usually represented by a dielectric constant in the range of e ~ 2–20 and the solvent is modeled as a continuum of high dielectric (e ~ 80).
The value of the dielectric constant in continuum models that accounts for the protein interior will be called ‘the protein dielectric constant’ ep in the following discussion. It is very important to emphasize here that the term ‘protein dielectric constant’ in continuum modeling does not take a constant value, since it depends on the way the constant is used in the calculations and which continuum model is used. This protein dielectric constant in continuum models has little to do with the fluctuations of dipole moments, as defined in microscopic modeling, but rather characterizes the parameters and effects that are not described explicitly in the specific continuum model in question. For a detailed account of the nature of the protein dielectric constant see the later section ‘Modulators of electrostatic interactions in proteins’ and discussions elsewhere [12].

„„Boltzmann distribution Construction of the charge density r(r) function in the Poisson equation is a straightforward process when positions of all charges are known. Permanent dipoles representing the backbone C=O bonds in proteins are not likely to reorient and the dipoles on side chains can only reorient within accepted geometries if the protein does not undergo large conformational changes. By contrast, ions in solution (Na+, Cl-, K+ and Mg2+) constantly change their position according to the local electrostatic potential and their interactions with surrounding water molecules. Thus, describing mobile ions in a solution explicitly in the Poisson equation is a very complicated process. Instead, their positions around a protein are described using a probability distribution function, based on the Boltzmann distribution:

q{(r)

n (r)

=

-
Ne

kT

Equation 4

where n(r) and φ(r) represent the concentration

of ions (positive or negative) and mean potential

at particular location r in the solution. N is the

bulk concentration of ions, k is Boltzmann’s constant (k ~ 1.38 × 10-23 J/K) and q represents the charge of the ion considered.
Given the expression for the concentration of ions in the solution (Equation 4), the charge density of mobile ions rion(r) in the solution can be derived using the simple equation:
tion (r) = qn (r)
Equation 5
Equations 4 & 5 for all ion types present in the solution are usually incorporated in the Poisson equation yielding the well-known PBE.

„„The Poisson-Boltzmann equation The PBE is the most frequently used equation in continuum modeling of the protein–solvent system and can be written as:

d : f (r) d{ (r) - f (r) l (r) 2 sinh( { (r)) + 4rt (r) = 0 kT

Equation 6
where k is the parameter dependent on the ionic strength of the solution, and is called the Debye–Hückel inverse length. It is related to the ionic strength (I) by the equation:

l2 = 8rN Ae2 I 1000kT

Equation 7

where NA, e and k represent Avogadro’s number, the electronic charge and Boltzmann’s constant,

respectively. T is the absolute temperature of the

solution. The ionic strength (I) of the solution has

a profound effect on electrostatic interactions in

protein solutions. By changing the ionic strength

of the solution one can increase or decrease elec-

trostatic attractions/repulsions between charges

in the solution and, thus, obtain an experimen-

tal estimate of the importance of electrostatic

interactions for the quantity observed.

Equation 6 represents the nonlinear form of the

PBE, and it is often linearized in protein applications

by assuming sinhφ(r) ~ φ(r) , which results in:

d : f (r) d{ (r) - f (r) l (r) 2 { (r) + 4rt (r) = 0 kT
Equation 8
Analytic solutions of the PBE are only possible for simple geometric objects and, therefore, the PBE is typically solved for biomolecules using an iterative finite-difference approach [13]. Briefly, the protein and the solvent are mapped onto a cubic lattice with the parameters of the PBE [e(r), r(r) and k(r)] assigned to each point in the grid. Using these position-dependent parameters one

future science group

www.future-science.com

649

Review | Kukić & Nielsen

can account for not only the heterogeneous nature of the protein dielectric constant in continuum models, but also the variable concentration of ions with different valence in solution [14,15]. Once the parameters have been assigned, the PBE relates the flux of the electric field out of each grid cube to the charge inside that cube and the potentials on all grid points are iterated until the flux and potentials of neighboring cubes match.
Typically the PBE is solved on a 65 × 65 × 65 cubic grid, although larger and smaller grids can be used. For most proteins, a 65-cubed grid provides an adequate resolution for producing qualitative images of protein surface electrostatic potentials, whereas for quantitative calculations a resolution of 0.25 Å per grid point is required. In these cases, successive ‘focusing’ runs are typically performed to avoid the computational resource problems associated with solving the PBE on a very large grid (Figure 1).
„„Solvation energy & the Born model The solvation free energy (DGsol) is an integral component of all free energy calculations in biomolecules, defined as a free energy change associated with transferring a molecule from vacuum to its position in the solvent. The solvation free energy consists of polar and nonpolar contributions. The polar contributions are typically dominant for charged molecules and consist of

the so-called ‘self-energy’ and interaction energies with charges and dipoles in solution. The nonpolar contributions consist of the energy needed to create the cavity against the solvent pressure, the Van der Waals interactions with the solvent and the entropic penalty incurred by reorganizing the solvent around the molecule. In the following discussion we will only consider the polar part of the solvation free energy.
Born derived a very useful approximation of the solvation free energy associated with moving a charge from vacuum to a spherical solvent cavity, which is known as the Born model [16]:
DGspooll = -166kcal/mol qa2 ` f1y - f1 j, fy = 1
Equation 9
where q refers to the amount of charge (units of e), e is the relative dielectric constant of the solvent and a is the radius of the cavity ( Å). The Born model has found application in almost all models of electrostatic interactions in proteins. The Born equation (Equation 9) in combination with Coulomb’s law (Equation 2) can be applied not only in moving a charge from a vacuum, but also in moving a charge between any two homogenous media, such as when calculating the free energy change of moving a drug from solution to its position in the target protein (see the later dicussion on GB formalism).

φ5

ε

5

φ

3

ε

ε2

3
ε

1

φ2 φ4 ε4 q0 φ1

ε6

Figure 1. ‘Sequential focusing’ algorithm in solving the finite difference Poisson–Boltzmann equation. The coarse grid is used for the protein and the solvent, whereas the fine grid focuses only on the region of interest. Dielectric constant, charge density and ionic strength are assigned to each point of the grid in the iterative procedure. Electrostatic potential values calculated in the current step are used as the initial values in the next step.

650

Future Med. Chem. (2010) 2(4)

future science group

Electrostatics in proteins & protein–ligand complexes | Review

„„Modulators of electrostatic interactions in proteins Protein x-ray and NMR structures are used extensively in free energy calculations and in the following discussion we describe the relevant factors that should be considered when one applies classical electrostatic theory to protein 3D structures. We pay particular attention to the role of permanent dipoles and induced dipoles in calculations of the electrostatic potential (electrostatic field) inside and outside proteins.
Permanent dipoles�o��ri�g�i�n�a�t�e��f�r�o�m���th��e��te�n��dency of electrons to concentrate around atoms with large electronegativities, forming an excess of negative charge in these atoms. The relative absence of electrons similarly creates a partial positive charge on atoms with small electro­ negativities. In electrostatic calculations, permanent dipoles are represented by partial charges in molecular force fields and these are usually derived from experimental data or from ab initio calculations [17–21].
A permanent dipole in the solute molecule can interact with other permanent and induced dipoles in the solute and with surrounding water molecules. These interactions are very complex and can produce the reorientation of permanent protein dipoles. However, the degree of reorientation is limited by covalent bonds and other steric effects in proteins (e.g., hydrogen bonds and Van der Waals interactions), such that the dipoles on side chains can only reorient within the accepted geometries and the backbone dipoles generally remain fixed. Therefore, knowing the protein 3D structure, we are able to assign the partial charges to specific atoms and bonds in the protein structure, and use them explicitly in the PBE (Equation 8) as an integral part of the charge distribution, the value r(r). The reorientation of protein partial charges due to a structural perturbation, such as the introduction of a ligand or the titration of an ionizable group, can be estimated from MD simulations or directly from a known 3D protein structure. If the response of the protein structure is modeled correctly and the resulting structure(s) are used for PBE calculations, then all other polarization effects are included implicitly in the value of the protein dielectric constant, ep, when applying the PBE.
Permanent dipoles of water molecules show different behavior depending on their position in the solution. Water dipoles far away from the protein surface, so-called bulk water, have higher degrees of freedom than interfacial water molecules and permanent protein dipoles. Thus, the

complex translational and rotational motions of bulk water molecules are treated implicitly in the PBE and are modeled as a continuum with a large dielectric constant. The dielectric constant of bulk water, ew, can be measured accurately, and a value of 78.46 is used in calculations of the phenomena at 25°C (0.1-MPa pressure) [22].
The reorientation of interfacial water molecules due to an imposed electrostatic field is considerably limited by the network of hydrogen bonds with solute atoms and other interfacial water molecules. Water molecules bound to the charged surface residues have recently been suggested to behave more like extensions of the solute rather than mobile bulk waters [23,24]. This is particularly the case with internal water molecules, which can bridge electrostatic interactions between solute atoms and, thus, play a prominent role in enzyme catalysis and ligand binding [25,26]. Although theoretical studies have provided insights into the role of these molecules, our knowledge of their importance is still quite limited. For reasons of simplicity, interfacial and internal water molecules are often treated as bulk water in PBE calculations. This simplification is likely to be one of the main reasons for inaccurate predictions of the electrostatic potential in proteins in which such discrete water molecules play an important role [27].
Induced dipoles�e�x��is�t��i�n��p�r�o�t�e�i�n�s��a�s��a��c�o�n��s�e�quence of the phenomenon known as electron polarization. Electron polarization accounts for the reorientation of the electronic cloud around a nucleus in the presence of the local electrostatic field E(ri), and is given by:
(ni) n+1 = aE (ri) n
Equation 10
where µi represents the dipole moment of the induced dipole i, defined by µi = liqi (li is the vector between two opposing induced charges qi and -qi), and a is the proportionality constant called the molecular polarizability. E(ri) designates the sum of electrostatic fields originating not only from partial charges and ions in solute and solution, but also from reoriented electronic clouds existing on other atoms in and around proteins. The values of µi for all induced dipoles in the protein are solved using the iterative procedure, where µin+1 values are determined by the field E(ri)n from the previous iteration. Iterations of this complex many-body problem converge only with difficulty (sometimes leading to a so-called ‘polarization catastrophe’) and represent one of the main obstacles in accurately

future science group

www.future-science.com

651

Review | Kukić & Nielsen

evaluating electrostatic interactions in proteins at the atomic level. I�n���o�r�d�e�r��to��a�v��o�i�d��th��e��p�o�l�a�r�i�zation catastrophe, the damping parameter is usually introduced and, thus, the particularly long-range effect of electrostatic interactions in some proteins can be underestimated [Kukic et al., Manuscript in Preparation].
In order to avoid the explicit representation of induced dipoles in the solute by including them in the charge distribution r(r) (Equation 8), the induced dipoles are instead replaced by ep = 2 if all other effects (permanent dipoles and protein relaxation) are included explicitly in the PBE [12]. Induced dipoles of water molecules are also modeled implicitly in PBE using a dielectric constant of water of ew = 80.
By using the spatially invariable protein dielectric constant (ep) in the PBE to account for all polarizability phenomena, the hetero­geneous dielectric nature of the protein interior is ignored. Therefore, efforts are currently being made to develop polarizable force fields that explicitly account for the effect of electronic polarization in the PBE. Although polarizable force fields such as AMOEBA [28], CHARMM [20,21] and PFF [18] have been developed in the last few years, parameter sets for these polarizable force fields are still under development and their behavior in free energy calculations is not yet completely understood. �R�e�a�d��e�rs��i�n�t�e�r�e�s�te�d���in��u��s�in��g��p�o�l�a�r�izable force fields to evaluate protein–ligand binding free energies are encouraged to read the review by Warshel et al. [19] and recent article by Maple et al. 18].
Modeling electrostatic interactions in proteins The importance of predicting protein characteristics, such as catalytic activity, stability and ligand/drug binding affinities, from protein 3D structures has led to the development of a wide range of electrostatic models in the last few decades. These models have been divided by Warshel and co-workers into three groups: microscopic all-atom models, simplified microscopic models and continuum (macroscopic) models [4] (see Figure 1 as presented by Warshel et al. [4]).
Microscopic all-atom models use detailed allatom representations of both the protein and the solvent and can be further divided into classical mechanical and quantum mechanical models. Formally rigorous all-atom models based on statistical mechanical approaches or molecular mechanics (MD; free energy perturbation [FEP]

models and others) are very slow and often use inappropriate treatment of long-range electrostatic interactions [27]. These models are less attractive for calculating protein electrostatic energies because of the long convergence/calculation times. However, less computationally expensive all-atom methods, such as the linear response approximation (LRA) and the linear interaction energy (LIE) methods, have emerged as useful tools in structure-based drug design. These methods, together with more approximate and significantly faster approaches (PBE, Poisson–Boltzmann surface area [PBSA] and GB), which have proved to be quite effective, will be the main focus of this section.
„„Simplified microscopic models It is a time-consuming and complicated process to correctly treat the large number of water molecules in microscopic all-atom models when calculating the electrostatic field [4]. Therefore, many simplified models have been developed. The protein dipoles langevin dipoles (PDLD) model represents a simplified microscopic model where water molecules close to the catalytic/binding site are accounted for by dipole moments µiL of so-called Langevin dipoles, whereas distant water molecules are treated as a continuum with an ew of 80. Langevin dipoles are point dipoles whose time-averaged polarization is represented by a Langevin-type function rather than linear function given by Equation 10. An example of Langevin dipoles positioned on a cubic lattice around a protein–ligand complex is depicted in Figure 2.
Induced dipoles of solutes are included explicitly in the PDLD model, by assigning dipole moments µi to all atoms, according to Equation 10. The values of the solute point charges are obtained from force fields, whereas the distribution of mobile ions in the solution is approximated using the Boltzmann distribution (Equation 4). The reorganization of the solute due to changes in the charge distribution is accounted for by MD simulations [29].
Since the charges in the PDLD model are not treated implicitly by a dielectric constant and since the relaxation of the protein structure is approximated using MD simulations, all relevant electrostatic interaction energies between solute charges are calculated using Coulomb’s law (Equation 2) and a dielectric constant of 1. The solvation energy of the solute due to the surrounding Langevin dipoles is also estimated using Coulomb’s law, whereas the solvation

652

Future Med. Chem. (2010) 2(4)

future science group

Electrostatics in proteins & protein–ligand complexes | Review

energy of the solute due to the surrounding distant water, represented as a continuum in the PDLD model, is calculated using the GB solvent model. The electrostatic energy between ions and solute is estimated using Coulomb’s law with a value of the dielectric constant being ep approximately 40–80 (for more details on the PDLD model see elsewhere [4]).
The PDLD model, like all microscopic all-atom models, deals with large energy contributions (i.e., electrostatic energies are not scaled by the dielectric constant in Coulomb’s law). Therefore, even small errors in calculating individual electrostatic energies can lead to large absolute errors when treating systems with thousands of atoms. In order to scale large energy contributions in the PDLD model and achieve the precision of the current semi-macroscopic models, Warshel and co-workers further introduced the ‘scaled microscopic’ PDLD/S model [30]. The novelty in the PDLD/S model is represented by the fact that the solute-induced dipoles (and the reorganization of solute permanent dipoles) are accounted for implicitly by using a protein dielectric constant value of ep approx­imately 2–6. The scaled PDLD model defined in this way gives results that are more stable that its microscopic counterpart [29,30].
The accuracy of the PDLD and PDLD/S models in reproducing electrostatic free energies has been assessed with a diverse set of problems in recent years. The problems range from the examination of solvation energies of small molecules in solutions, the prediction of solvation energies of ionizable groups in solvated proteins and electrostatic interactions between them, to the evaluation of electrostatic free energies in enzyme catalysis and ligand binding [4,27,30–33]. Calculations with the standard PDLD model showed good agreement between calculated and experimental solvation energies of a variety of molecules in solution [30]. Furthermore, both the PDLD and the PDLD/S models were demonstrated as being capable of reproducing self-energies of acidic residues in proteins, charge–charge interaction energies between surface titratable groups and charge–charge interaction energies between buried titratable groups in the selected protein systems. The relevant bench­mark energies in these studies were derived using experimental pH-dependent titration curves of nuclei and mutagenesis studies of bovine pancreatic trypsin inhibitor, subtilisin and cytochrome C protein systems. Compared with the PDLD, the PDLD/S model increases the precision of the

Figure 2. Simulation set-up for efficient free energy ligand-binding calculation. The protein (given by the secondary structure) and substrate (red) are represented explicitly, as is a sphere (radius 20 Å ) of water (purple) centered on the binding site. The electrostatic effect of distant water on the binding site is approximated by Langevin dipoles (blue). The atoms in the volume defined by the explicit solvation sphere are simulated freely while those outside the sphere are restrained.
energy calculations, mainly by scaling the relevant electrostatic energy terms (solvation and charge–charge interaction energy).
The PDLD and PDLD/S models have been further tested in enzyme catalysis and protein–ligand binding applications, and have produced reasonable results [4,27,30–33]. The PDLD/S model with ep = 6 gave best results in reproducing absolute reaction free energies of the trypsin and lysozome-catalyzed reactions.
„„Continuum models Continuum models for modeling electrostatic interactions in proteins achieve a high level of computational efficiency by significantly reducing the level of detail used to represent the solvent surrounding the protein. All continuum models treat the solute(s) as low-dielectric media with embedded atomic charges that are immersed in a high-dielectric ionic solution. These models predict the electrostatic potential at every point in and around a protein for a given distribution of atomic charges and a given ionic strength.

future science group

www.future-science.com

653

Review | Kukić & Nielsen

Key Terms
NMR titration experiment: Experiment that measures chemical shift perturbations of nuclei when changing the pH of the solution (i.e., changing the charges on titratable groups in a protein)
GB model: Fast approximation to the PBE model in evaluating electrostatic free energy

Graphical representations of electro­static potential maps produced with continuum models have revealed the importance of electrostatic interactions in many biomolecular complexes in the last three decades. In particular, the areas of ligandbinding specificity and diffusion-controlled ligand binding have been treated extensively with continuum methods. Here, we give an overview of the main characteristics of current continuum models. We concentrate on models based on the PBE and the GB framework.
„„PBE solvers & the protein dielectric constant Poisson–Boltzmann equation solvers are able to calculate the electrostatic potential of complex protein molecules by using a spatial variation of the dielectric constants (e), the ionic strength (I), and the charge distribution (r) (Equation 8). The PBE can furthermore be solved quickly by employing an iterative procedure (Figure 1) [34]. Permanent solute dipoles are modeled explicitly in the PBE as point charges located at atom centers, whereas mobile ions in solution are accounted for using the Boltzmann distribution (Equation 4). The dielectric constant models all contributions that are not explicitly included in the PBE and its value differs for the protein and the solvent. The dielectric constant used for the solvent (ew = 80) describes both induced and permanent dipoles in the solvent, as well as the relaxation of the solvent around the protein. The protein dielectric constant ep usually has a value between 2–20 depending on the exact type of calculation and the protein structure modeling scheme. When ep reflects only the effect of electronic polarizability, values of approximately 2 are used. When the reorganization of charges together with electronic polarizability are represented implicitly in the PBE model, the optimal value of ep can sometimes increase to values of more than 20 [35,36].
The main problem in employing ep is that the value used for one application cannot be transferred to any other. Typical examples of calculations that use different ep are those of charge–charge interactions and/or calculations of charge-solvation energies. When calculating charge–charge interactions, assuming the rigid structure of the protein, the interaction energies calculated depend on the shape of the macromolecule and the exact position of the charges in the protein. Therefore, different values of ep in PBE are necessary in order to mimic electrostatic interaction energies between

different charged groups in proteins obtained by NMR titration experiment. Values as large as ep = 40 can be optimal for the calculation of charge–charge interactions in bovine bLG and HEWL [Kukic et al., Manuscript in Preparation]. Furthermore, calculations of solvation penalties for moving ionizable groups from water to their position in the protein have been suggested to be more accurate if a smaller protein dielectric constant is used (ep ~ 2–6) [4,27]. Values of the protein dielectric constant are dependent on the parameters that are not described explicitly and, therefore, should be optimized separately for each particular application in order to achieve the desired level of accuracy.
A variety of PBE solvers are readily available, ranging from multipurpose software packages (UHBD [37] and CHARMM [38], for example) to stand-alone PBE solvers (APBS [13], DelPhi [39] and ZAP [201], for example). These PBE solvers have been used successfully in recent years for predicting solvation energies and electrostatic energy contributions to a wide range of processes. The role of solvation and electrostatic interaction energies on ligand/drug-binding processes and, therefore, the importance of PBE models in this phenomenon will be considered in detail later in this review. For more information on using PBE to calculate protein pKa values of titratable groups as well as to predict the pHdependence of catalytic activity, protein stability and ligand-binding energies, see the review by Nielsen and the references therein [6].

„„PBE models in conjunction with

solvent‑accessible surface area

Predictions of solvation free energies using PBE

models are often augmented with a term that

describes the nonpolar contribution to solva-

tion. The additional nonpolar term is usually

estimated from the solvent-accessible surface

area (SA), and PBE models used in conjunction

with SA are referred to as PBSA models [40,41].

The

nonpolar

solvation

free

energy

DG non–pol sol

that accounts for the cost of cavity formation

and Van der Waals interactions with the solvent

can be approximated using an equation of the

following form:

DGsnooln-pol = cA + b
Equation 11
with A representing the SA, and g and b being the adjustable parameters. The values of g and b are usually calibrated using experimentally determined solvation energies of small molecules.

654

Future Med. Chem. (2010) 2(4)

future science group

Electrostatics in proteins & protein–ligand complexes | Review

With recent improvements in calculating the polar contributions of solvation free energies using MD simulations in conjunction with PBE/GB models (see ‘Macroscopic models turn microscopic), the limitations of this simple SA model (Equation 11) are becoming more obvious [42–44]. New efforts to develop an extended treatment of nonpolar solvation free energies, especially by separating the costs of Van der Waals interactions with the solvent and costs of cavity formations, have been constructed [45], and will hopefully increase the level of accuracy of current continuum models.

„„GB formalism Binding free energies of protein–ligand complexes can be evaluated using the PBE (or PBSA) models in minutes of computer time. However, the screening of thousands of potential drugs usually requires higher computational efficiency than the PBE model can deliver. Therefore, very fast analytical methods based on the Born equation (Equation 9), GB models, have been widely used in the calculation of free energies of ligand binding, in computer-aided drug design (CADD) and in conformational ana­lysis of proteins.
The GB models for ligand binding are based on the assumption that the screening of electro­ static interactions between two charges can be estimated by the degree to which the charges interact with the solvent [46]. The solutes in the GB models are described by a collection of atoms, where each atom is represented by a sphere of a radius ai, referred to as ‘effective Born solvation radius’. The point charge of each atom qi is positioned at the sphere’s centre. As in the PBE models, the molecule in the GB model is described as a low-dielectric volume, ep, surrounded by a high-dielectric aqueous environment ew. Electrostatic interaction energies between point charges in GB models are calculated using Coulomb interactions in vacuo (Equation 2). The electrostatic solvation energy (‘polarization energy’) of transferring a molecule from the protein to the aqueous environment is given by [40]:

// ` j DG pol = -166kcal/mol 1 - 1 N N

qi q j

fp fw i j fGB (rij, ai, a j)

Equation 12
where N is the number of atoms in the system,
rij is the distance between atoms i and j, and fGB is the function that involves several empirical

parameters. The effective Born solvation radius ai of the atom i can be thought of as the distance between the charge and protein–solvent boundary. This parameter is adjustable and its value is usually determined using the solvation free energy from the PBE by:
ai = -166Ac qa2 ` f1p - f1w j DG1ipol
Equation 13
where DGipol designates the solvation free energy of the unit charge positioned at the centre of the atom i in the uncharged solute, whereas ep and ew represent relative dielectric constants of the solute and the solution in the PBE.
Generalised Born-based models are unable to calculate electrostatic potential maps of proteins and their parameters are usually optimized using the results obtained with PBE solvers [47–51]. Optimized and calibrated GB models are usually augmented by the hydrophobic contribution to binding and referred to as GBSA models. With improvements in recent years, current GBSA models are capable of achieving the similar level of accuracy as numerical PBSA methods [42,52]. Their high computational efficiency makes them suitable for drug-design applications and different versions of GBSA models are now implemented in many ligand-docking programs [53–56].
„„Macroscopic models turn microscopic The precision of macroscopic models that use rigid protein structures, as emphasized many times in the previous discussion, is highly connected with the value used for the protein dielectric constant ep. Therefore, recent efforts in improving the existing macroscopic models have mainly focused on limiting the number of effects that are treated implicitly and ‘hidden’ in the protein dielectric constant. In other words, there is a drive towards making macroscopic models more microscopic. The number of effects treated implicitly by the ep value can be limited by treating these effects explicitly, for example by modeling electronic polarization or protein conformational flexibility explicitly.
Several polarizable force fields have been reported recently, and it is only a matter of time before continuum models will be able to explicitly account for the induced dipoles in protein interiors. Particularly promising is the work of Schnieders and co-workers in developing the Polarizable Multipole Poisson–Boltzmann (PMPB) continuum electrostatics model, which is built on the AMOEBA polarizable force

future science group

www.future-science.com

655

Review | Kukić & Nielsen

field [28]. The PMPB model has been tested on predicting the change of the protein dipole moment when the protein is moved from vacuum to solution. The model proved able to mimic results obtained from the MD simulations with explicit water.
Another direction in improving the continuum paradigm includes the explicit incorporation of structural flexibility in the models. In classical PBE models, the response of the solute structure to changes in charge distribution (e.g., the pH-dependent titration of a titratable group) is accounted for by an average isotropic ep value. Therefore, PBE models are not able to account for the local anisotropic structural rearrangements in the protein.
The first step in treating the structural relaxation microscopically in PBE models includes the optimization of hydrogen bonds and/or side chains of individual residues [57,58]. For example, selected side chains in the protein were assigned a few predetermined rotamers depending on the charge distribution. These models have proven very successful in pKa predictions and they are still widely used. However, they still approximate the relaxation of the rest of the protein implicitly through the poorly defined protein dielectric constant.
A more radical step in including protein conformational flexibility explicitly in contin­ uum models is to treat the solute using MD simulations. With this approach, the dielectric relaxation of a solute is modeled by structural rearrangements during the MD simulations. On the contrary, the effects of a high-dielectric solvent are still approximated by a continuum, and, therefore, the computationally expensive sampling over water degrees of freedom is still avoided. The first MD simulations of a protein using both the PBE and the GB models were performed by Gilson et al. for HIV-1 protease [59]. Ever since, MD simulations with GB and PBE models have been applied to study structural and dynamic properties of proteins [60], protein stability [61] and to predict pKa values [62,63].
The successful implementation of continuum models in MD simulations influenced the development of constant pH MD. Unlike the classical MD simulations, where protonation states of all titratable groups are kept fixed, the constant pH MD simulations allow the change of protonation states of titratable groups during the simulations. The sampling of protonation states in the protein is usually implemented using the PBE [64–66] or GB model [62], combined with Monte Carlo

sampling. The constant pH MD simulations account for the aqueous environment in a realistic way and, therefore, their main contribution is likely to be in pH-dependent conformational changes and the pH-dependence of protein stability [42]. The accuracy of constant pH MD simulations have been assessed in reproducing experimental pKa values, and results have generally been encouraging [63,66,67].
The recently developed MD-based models are also prevalent in studies of ligand binding and computational drug design. The LRA and the LIE methods have been developed by Warshel and Åquist as an alternative to more time-consuming FEP-based approaches. Both the LRA and the LIE methods proved to be more useful in computationally demanding problems, as they employ conformational averaging only in the associated and dissociated states of a protein and a ligand.
The free binding energy in the LRA method is decomposed in electrostatic and nonelectrostatic components. The nonelectrostatic component is hard to estimate and consists of hydrophobic, Van der Waals, conformational entropy and water-penetration terms. The electrostatic part in the LRA model is evaluated from MD simulations in charged and noncharged forms of the ligand, by averaging electrostatic energies using the following equation (see the thermodynamical cycle in Figure 2 in the paper by Sham et al. [68]):

6 @ DGbelind = 1
2

UPel-L on+ UPel-L off + UWel -L on+ UWel -L off

Equation 14
where Uel represents the electrostatic energy between the ligand (L) and its surroundings, protein (P) and water (W ), and is calculated using Coulomb’s law (Equation 2). Brackets designate MD averages over trajectories obtained in both charged (on) and noncharged (off ) states of the ligand.
The LRA method has usually been tested in combination with PDLD (PDLD–LRA) and PDLD/S (PDLD/S–LRA) models and performed particularly well in discriminating ligands with different binding affinities for the given receptor [27]. The exceptionally good correlation between PDLD/S–LRA predictions with a low ep value and observed binding free energies of HIV protease inhibitors has been recently reported [27,68].
The LIE approach adopts the electrostatic contribution to binding free energy given by Equation 14, but neglects the (Uel)off terms in both protein and water environments. It has been shown that this approximation is valid in

656

Future Med. Chem. (2010) 2(4)

future science group

Preparing to load PDF file. please wait...

0 of 0
100%
Electrostatics in proteins and proteinligand complexes