MACROMOLECULAR CHARGE FLIPPING

                                                                                                   updated august 2013



Adapting the Charge Flipping algorithm to biological macromolecule diffraction data.

The charge-flipping algorithm introduced by Oszlányi and Sütő for single crystals in 2004 has been adapted to accommodate protein crystals diffraction data in the computer program SUPERFLIP. A flow diagram of the procedure is given below.

Two main applications are described:

* ab initio procedure for the determination of protein crystal structures using diffraction data at atomic resolution;

* procedure for heavy atom or anomalous scatterers substructure determination from isomorphous or anomalous differences.


Flowchart                 flowchart


References:


SUPERFLIP program and utilities:

We refer to the official SUPERFLIP site at the Department of Structure Analysis, Institut of Physics, Praha and the École Polytechnique Fédérale de Lausanne (EPFL) for source files, documentation, and license agreement

Download source code or the appropriate binaries for your system => Current version: 03/15/2013 12:43

Source code,  zipped executables for MacOSX (Intel) or Windows, GNU-Linux x86 (32-bit statically linked) or GNU-Linux x86-64 (64-bit statically-linked)

Uncompress the binary, rename it to superflip, make it executable (chmod +x superflip) and move it in your $PATH  (/usr/local/bin or ~/bin are good places).


Macromolecular structures can be solved by SUPERFLIP in two ways:

* by setting up an input file to be used with a user-provided hkl-file and running superflip program ($ superflip example.inflip). 

      Two examples (input and log files) can be found here:

           heavy atom sub-structure solution:

anomalous differences data, 40 sites P1 space-group  substructure40.inflip substructure40.sflog

anomalous differences data, 120 sites, C2 space-group  substructure120.inflip substructure120.sflog

           ab initio structure solution at atomic resolution  protein.inflip   protein.sflog         

* by using C-shell scripts, flipsub for heavy-atom substructure solution, and fliprot for ab initio structure solution at atomic resolution.

                       These scripts create the SUPERFLIP input file on the fly using a limited number of command line options,

                            input files in CCP4 MTZ format, output files in CCP4 map format and pdb format (heavy atom sites).


Download the C-shell scripts using CCP4 environment (version  6.3.x or 6.2.x) :        fliprot (version 07/28/2013)        flipsub (version 07/28/2013)

The user should install fliprot or flipsub file in a path directory (see your $PATH) and make them executable (chmod +x flipsub fliprot)

Various application examples follow here.




Examples of applications and test data

Ab initio protein structure solution at atomic resolution


usage: fliprot mydata.mtz FP=Fobs name=mytest
   or  fliprot 2anv-sf.cif name=2anv SG=5

where:
mydata.mtz (or pdbcode-sf.cif) input structure factor file: MTZ or mmCIF(pdb) format

optional key words:
SG=18         ...... space group number (read from mtz file, required for some PDB-structure-factor files)
FP=Fobs       ...... MTZ label assignement for amplitude (default FP=FP)
name=flip     ...... generic name for output files (default fliprot)
1.05A         ...... dmin resolution (default all reflections)
ked=1.25      ...... coefficient for delta threshold parameter (default 1.3)
weak=0.1      ...... weak reflection threshold (default 0.05)
trial=5       ...... number of repeated trials (default 1 repeat=never)
maxcycl=5000  ...... maximum number of cycles per trial (default 20000)








mode=peakiness...... convergence detection mode = peakiness  or symmetry (by default except SG=P1).

conv=4.0      ...... convergence threshold criterion (peakiness, default 3.0 or symmetry, default 80.0)

example 1:

1mfmTest data used: pdb code 1mfm  [PubMed]
1152  non-H protein atoms, 283 waters &
Cd/Cu/Zn atoms in the asymmetric unit, space group P212121
Ab initio phasing of superoxyde dismutase using charge flipping: electron

density map at 1.03 Ĺ resolution (C. Dumas & A. van der Lee)

Download 1mfm-sf.cif and 1MFM.pdb from PDB site 
and use it as input file for fliprot script.

Command:   fliprot  1mfm-sf.cif  SG=19  name=mfm

  The procedure asks the unit cell parameters (not in the cif file):  
  CRYST1 from pdb file:   34.99  48.11  81.08   90.0 90.0 90.0

Annotated log file (typical cpu-time 3 to 5 minutes on an Intel 2.4GHz cpu processor)

After convergence, the  reference model (1MFM.pdb) 
can be superimposed on the CF density map
and the correct phase enantiomorph selected:

Use the PHENIX commands, compare the overall Correlation coefficient and display mfm.map and offset.pdb:  

phenix.get_cc_mtz_pdb mfm.mtz 1MFM.pdb any_offset=true labin="FP=Fobs PHIB=PHIcf"

 phenix.get_cc_mtz_pdb mfm.mtz 1MFM.pdb any_offset=true labin="FP=Fobs PHIB=PHIcfi"


example 2:

2anv

Test data used:  pdb code 2anv  [PubMed]

2385  non-H atoms, 517 waters &
(Sm,I,Mg,SO4) atoms in the asymmetric unit, space group C2
Ab initio phasing of lysozyme from f22 bacteriophage using 
charge flipping: electron density map at 1.04 Ĺ resolution 
(C. Dumas & A. van der Lee)

Download 2anv-sf.cif  and 2ANV.pdb from PDB site and 
use it as input file for fliprot script.


Command
  fliprot  2anv-sf.cif  SG=5  name=anv

Annotated log file (typical cpu-time 3 to 5 minutes on an Intel 2.4GHz cpu processor)
After convergence, the reference model (2ANV.pdb) can be superimposed on
the CF density map and the correct phase enantiomorph selected (see example 1).







 
Heavy atom or anomalous scatterers substructure determination


usage:
flipsub data.mtz DANO=Dano_peak name=HAtest

where:
data.mtz      ...... input reflection file in merged MTZ format, with DANO or FA label(s)
  or
data.sca      ...... input reflection file in merged scalepack format

optional:
SG=18         ...... space group number (default read from .mtz file or .sca file)
DANO=Dano_pk  ...... MTZ label assignement for anomalous amplitude (default DANO=DANO)
name=HAtest   ...... generic name for output files (default flipsub###)
2.5A          ...... high resolution cutoff (default all reflections)
conv=4.0      ...... convergence criterion threshold (for peakiness default 2.5 and symmetry 85.0)
norm=no       ...... no local normalization of amplitude differences (default norm=yes)
ked=1.25      ...... coefficient for delta flipping  parameter (default 1.2)
weak=0.35     ...... weak reflection threshold (default 0.3)
trial=5       ...... number of repeated trials (default 1)
maxcycl=3000  ...... maximum number of cycles per trial (default 2000)




mode=peakiness ..... convergence detection mode = peakiness, or mode=symmetry (by default, except SG=P1)

If necessary, test various combinations of ked (1.15, 1.2, 1.25) and weak (0.25, 0.3, 0.35, 0.4) parameters,
change the resolution cutoff (according to anomalous signal/noise ratio)

 
flipsub data.mtz DANO=Dano_peak name=HAtest 4A ked=1.15 weak=0.35




output files:

CCP4 CF heavy-atom map in P1 space group             ......... HAtest.map
CCP4 CF heavy-atom map (asymmetric unit)             ......... HAtest-au.map 
PDB file for Heavy-atom positions (in P1 unit cell)  ......... HAtest.pdb 
PDB file for Heavy-atom positions (asymmetric unit)  ......... HAtest-au.pdb


Heavy atom positions in fractional coordinates       ......... HAtest-au.ha

  The resulting coordinate files can be used as input file for your favorite phasing program SHARP, PHENIX (Autosol or Phaser-EP), CCP4, ... Typically edit the xxx-au.pdb file (or xxx-au.ha file, in fractional units) to select the appropriate number of heavy-atom sites in the asymmetric unit and remove non-significant sites. 




Various test datasets for MAD, SAD phasing are available here:  

example 1:   Locating heavy-atom substructure containing 20-22 bromide sites used for SAD phasing

Download sfdata-haptbr.tgz   (AUTOSTRUCT / CCP4 site)  untar the archive and use haptbr.mtz as input data for flipsub script 

Command:       flipsub haptbr.mtz 2.0A name=haptbr

                                   (using normalized anomalous differences up to 2 Ĺ resolution, using symmetry score to detect convergence)

              flipsub haptbr.mtz 2.0A name=haptbr mode=peakiness conv=2.6

                                   (using normalized anomalous differences up to 2 Ĺ resolution, using peakiness convergence criteria with threshold=2.6)

example 2:   Locating heavy-atom substructure containing 40 selenium sites used for SAD/MAD phasing

Download sfdata-cynsemet.tgz , (AUTOSTRUCT / CCP4 site)  untar the archive and use cynsemet.mtz file as input data for flipsub scripts 

Commands:       flipsub cynsemet.mtz 2.4A DANO=DANO_SE3 name=cyn-pk

(using normalized anomalous differences in the peak wavelength dataset, up to 2.4 Ĺ resolution)

                          flipsub cynsemet.mtz 2.6A DANO=DANO_SE2 name=cyn-ip

 (using normalized anomalous differences in the inflection wavelength dataset, up to 2.6 Ĺ resolution)

                          flipsub cynsemet.mtz DANO=DANO_SE3

                                                (using anomalous differences in the peak wavelength dataset, no resolution cutoff)

example 3:   Locating heavy-atom substructure containing 8 selenium sites and  determination of the correct space-group:

Download sfdata-jia.tgz,  (AUTOSTRUCT / CCP4 site) untar the archive and use jia_peak.sca file (scalepack format) as input data for flipsub script:

Commands:          flipsub jia_peak.sca  name=jia_peak

                            (using normalized anomalous differences in the peak wavelength dataset).

            flipsub jia_peak.sca  5A

                              (using normalized anomalous differences in the peak wavelength dataset and 5 A resolution cutoff).


 Simulation of a wrong space-group assignment, C222 instead of C2221 (the true space group), a

typical ambiguity in symmetry determination when axial reflection row or systematic extinctions are missing :

copy the file jia_peak.sca to jia_peak-C222.sca, 

edit this file to remove crucial (00l) reflections  (0,0,14 to 0,0,66)  and change space-group from c2221 to c222.

All informations on this screw axis are now removed (systematic extinctions) and SG is assigned to SG#21(C222).

Now test flipub procedure with this wrong space-group :      

                    flipsub jia_peak-C222.sca SG=21 name=wrongC222 mode=peakiness conv=6.0

The substructure solution is solved in P1 by SUPERFLIP and the correct space-group C2221 is proposed (success rate ~90%) based on a symmetry analysis of the structure-factor phases obtained: the symmetry agreement factors for 2(1,0,0), 2(0,1,0) and 2_1(0,0,1) symmetry operations have a good score (less than 5). The output map wrongC222.map incorporates the wrong input symmetry operators C222 (default option searchsymmetry average) as shown by the poor "Overall agreement factor" (>70-80).


Now restart the script with the correct SG (C2221, #20) determined by SUPERFLIP.

flipsub   jia_peak-C222.sca   SG=20  name=newC2221

   

example 4:   Locating heavy-atom substructure in CSN5 crystal SAD data,  20 SeMet residues and 2 zinc atoms (4F7O.pdb) in the asymmetric unit.

Download the SAD dataset 4F7O.mtz (mtz format, 2.6 A resolution) as input data for flipsub script:

  Command:                  flipsub 4F7O.mtz DANO=DANO_x1 weak=0.2 name=CSN5 trial=10


The substructure solution is solved in P1 by SUPERFLIP  (success rate 40% on 10 trials) up to 2.6 A resolution,
 using symmetry score to detect convergence (See log file). The averaged HA map (best densities # 1, 6, 7 and 9) was used to extract
21 heavy atom sites: 19 correspond to Se atoms and 2 to Zn atoms (rmsd with refined positions in the final model = 0.59 A)      


Other links:

A java applet  illustrating the charge flipping algorithm

RSCB Protein Data Bank:  atomic coordinate files and structure factors of biological macromolecules;
CCP4:    
software suite for macromolecular crystallography;

SHARP:  software suite for  experimental phasing of macromolecular crystal structures.
Phenix:  software suite for  the automated determination of macromolecular crystal structures.
SHELX: software suite for  crystal structure determination from single-crystal diffraction data.
Chimera:  visualisation of electron density maps;
Coot:  
     visualisation of electron density maps and model building;
Uppsala Software Factory:
  software for macromolecular crystallography;
ARP/wARP: 
interpretation of  electron density maps and automatic construction of macromolecular models;


Contact information                Christian.Dumas Christian Dumascbs.cnrs.fr    or   avderlee Arie van der Lee univ-montp2.fr

 Last modifications: august, 28, 2013