# ChargeCalculator:Theoretical background

The *Electronegativity Equalization Method* (EEM) is the approach employed by ACC to calculate atomic charges.

EEM is an empirical method developed as a cost-effective alternative to quantum mechanics (QM) based methods, as it enables the determination of atomic charges that are sensitive to the molecule's topology and three-dimensional structure. EEM has been successfully applied to zeolites, small organic molecules, polypeptides and proteins.

ACC implements one classical EEM formalism, along with two additional modifications. We give a brief description of each below. Please refer to the literature 20,32-45 for a more in-depth description of EEM and a few examples of applications 44,46-50.

# EEM

The classical EEM formalism estimates atomic charges via a set of coupled linear equations:

In order solve this system of equations and calculate the atomic charges for all atoms, the following terms need to be known:

- distances between all pairs of atoms
- total molecular charge
- empirical parameters (here k,A,B) covering all atom types present in the molecule

ACC calculates the interatomic distances based on the atomic positions it reads from the molecular structure file. The user is required to provide the total charge, or ACC will assume the molecule is neutral (total molecular charge is 0). EEM parameters for each atom type (e.g., carbon, oxygen) present in the molecule are read from a set of EEM parameters suitable for the molecule in question. Many sets of EEM parameters have been published in literature, and are available in ACC as built-in sets. These sets may be used as they are, or with user modifications where necessary.

EEM parameters are generally developed based on reference QM calculations. A QM-based charge calculation approach is characterized by the setup of the wave function calculation (theory level, basis set, environment), as well as by the procedure used to partition the molecular electron density, or to deduce the electrostatic contribution of each atom. We refer to the sum of these characteristics as the "charge definition".

The maximum accuracy and potential application of any set of EEM parameters is given by the charge definition used during its development. Performance is further influenced by the procedure used when fitting the EEM parameters to the reference data.

# EEM Cutoff

While EEM is very fast compared to QM methods, handling large molecules or complexes still requires significant time and memory resources. In order to make such calculations accessible to you in real time, ACC implements two special EEM approximations.

The first approximation employs a cutoff for the size of a given system of equations being solved. Specifically, for each atom, ACC solves a system containing only the equations for atoms within a certain distance in angstrom (*cutoff radius*) from the given atom. The number of equations considered depends on the density of the molecular structure and overall shape of the molecule in the area of that particular atom.

Thus, for a molecule with 10000 atoms and a cutoff radius of 10, instead of solving one matrix with 10000 x 10000 elements, ACC will solve 10000 matrices of much smaller size (approximately from 50 x 50 up to 400 x 400). The essence of the *EEM Cutoff* method is that, instead of a very large calculation, ACC will run many small calculations, each of them being less memory and time demanding than the original one. *EEM Cutoff* is therefore efficient only for large molecules, containing at least several thousands of atoms.

In other words, running *EEM Cutoff* is like running *EEM* for a set of overlapping fragments of the original molecule. A fragment is generated for each atom. The position and type of the atoms in each fragment are the same as in the original molecule. The only issue is the total charge of the fragment. *EEM Cutoff* assigns each fragment a quota of the total molecular charge proportionally to the number of atoms in the fragment, and irrespective of the nature of these atoms. Then ACC solves the EEM equation for each fragment. The charge on each atom in the molecule is then computed as the sum of its charge contributions from each fragment. Further, each atomic charge is corrected in such a way that the sum of all atomic charges equals the total molecular charge. While this algorithm may not be chemically rigorous, has proven both robust and sufficiently accurate (RMSD less than 0.003e compared to the classical EEM) if the cutoff radius is relevant (over 8 angstrom).

# EEM Cutoff Cover

To further enhance the time and memory efficiency of EEM, ACC implements an additional approximation with specific focus on large biomolecular complexes with hundreds of thousands of atoms. This additional approximation is applied to the *EEM Cutoff* method in order to reduce the number of EEM matrices that will be solved.

While in the *EEM Cutoff* method ACC generates one fragment for each atom in the molecule, this further approximation generates fragments only for a subset of atoms. The algorithm by which this subset of atoms is obtained ensures that each atom in the molecule will eventualy contribute to at least one fragment. In other words, the entire volume of the molecule is covered, and the method is thus termed *EEM Cutoff Cover*.

In *EEM Cutoff Cover', the subset of fragment generating atoms is obtained in such a way that:*

- no two atoms in this subset are connected to each other
- each atom in the molecule has at least one neighbor (within two bonds) included in this subset.

The fragments for *EEM Cutoff Cover* are generated in the same way as for *EEM Cutoff*, according to the *cutoff radius*. Thus, the average size of the resulting EEM matrices will not differ. However, since fewer fragments are generated for *EEM Cutoff Cover*, the final number of EEM matrices to be solved will be up to 4 times lower than for *EEM Cutoff*.

*EEM Cutoff Cover* has also proven robust and sufficiently accurate (RMSD less than 0.003e compared to the *EEM Cutoff Cover* of comparable cutoff radius), and is the method of choice for biomolecular complexes of tens of thousands of atoms and higher.