Spectrum Research, LLC.

 

 

 

 

 

CONTRAST

 

Connectivity Tracing Assignment Tools for Automated Assignment of Protein NMR Data

User Guide
Version 2.0

 

 

 

Copyright Notice

 

Copyright © 1996 through 2001 Spectrum Research, LLC.  All rights reserved.

No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form by any means without the written permission of Spectrum Research, LLC.  Spectrum Research, LLC. reserves the right to change the information in this document without prior notice.

 

Trademarks

 

Contrast is a trademark of Spectrum Research, LLC.

Acknowledgments

 

Contrast software program was developed by Drs. John Markley and John Olson at the National Magnetic Resonance Facility located at the University of Wisconsin-Madison.  All rights, title, and interest in Contrast are owned by the Wisconsin Alumni Research Foundation ("WARF").  The commercial version of Contrast has been exclusively licensed to Spectrum Research LLC by WARF. 

 

Credits

 

If the results (figures and/or data) obtained by Contrast TM application are used for publication purposes, please refer to them in the following manner or any other equivalent form:

 

"ContrastTM software, developed by Spectrum Research, LLC., was used to compute the results in this publication."

 

 

 

 

Chapter 1

Introduction

1.1 Program Features

CONTRAST is a non-graphical software tool for automating NMR peak assignment. The program works with NMR data in the form of ASCII lists of peak coordinates and intensities.. The program provides the user with several versatile tools for manipulating peak lists in order to design a custom strategy. The program can itself generate customizable procedures for automatic assignment of NMR data. It should be possible to use CONTRAST and the strategies it was designed to employ for working with any type of multidimensional NMR spectral data set (although not all combinations of NMR spectra are likely to yield complete assignments).

1.2 Disclaimer

The CONTRAST program was designed to be an in-house research tool and not a commercial package. We have successfully applied the program to many real and synthesized NMR data sets, but we are always careful to check all results. We provide no warranty or guarantee of its performance. Use the program at your own risk.

Chapter 2

Software Licensing and Installation

2.1 How to Obtain the Program

The CONTRAST executable can be downloaded from the Spectrum Research website (www.specres.com/download.asp) or a demo CD can be requested from Spectrum Research.

2.2 Installation

The CONTRAST executable, contrast.exe, needs no special installation. We recommend that the executable and help files (or corresponding symbolic links) be placed in the directory that contains the spectral data to be assigned.

If you have obtained source code for CONTRAST, the file "contrast.c" contains all of the functions and header information necessary to compile CONTRAST. The program was written on a Silicon Graphics Indigo workstation, but since all but a few minor functions are implemented using ANSI C, the program can be ported easily to other platforms by changing the system calls that are specific for the Silicon Graphics platform. To compile the program copy contrast.c to the target directory and type:

 

cc -o contrast -g contrast.c -lm

 

at the operating system prompt. The ASCII text file, contrast.hlp, is a crude manual for the CONTRAST program. The manual is designed so that it can be easily searched while running CONTRAST with the CONTRAST "page" function, which is called by typing "ctrl-h" at a prompt or "h" at the command line. The contrast.hlp file should be located in the same directory as the CONTRAST executable in order to use this feature.


Chapter 3

Getting Started

This section introduces loading spectrum files, searching spectra, displaying the results of a search, writing the results of a search to a file, and quitting the CONTRAST program. A simple example is given to illustrate each point, and the use of both the command line interface and macro files is described. The following CONTRAST commands will be described.

lf cosy.con

scan cosy (d1 <.5> 8.0 && d2 > 4.0) |results

d

btf |results > search.cosy.con

q

3.1 Starting CONTRAST

To run CONTRAST simply type the name of the CONTRAST executable at the system prompt (e.g. contrast.exe). The computer's display will be cleared, and after several lines of copyright information you will be asked for the name of the log (starting macro) file that you wish to run. If you want to run a session macro, then type its file name at the prompt. If your log file name is "usr.log" (the standard session log file name) simply type return at the prompt. The text that appears in the angle braces in a CONTRAST prompt is always the default value for the prompt. If you do not already have a session macro, type a new file name at the prompt. It is customary to use the suffix ".log" for session macros and ".mac" for subroutine or branching macros. After the name of the log file is typed in, the user is prompted by a '>' symbol for the next command.

3.2 Loading Peak Lists

The LoadFile command (abbreviated lf) is used to load peak list files into CONTRAST. CONTRAST peak list files are typically created from the name of the experiment with the '.con' suffix appended, but they can have any name. They must, however, adhere to the format outlined in Section @@. The LoadFile command can also be used to load the sequence of the protein, since the formats of the files are similar. The following line loads the file cosy.con into the program:

> lf cosy.con

3.3 Searching Peak Lists

The Scan command (abbreviated sc) is used to search peak lists. It is an extremely versatile command and will be described in more detail in section @@. In order to search for peaks in the COSY spectrum read into the program the user could type a command similar to the following:

> sc cosy (d1 <.5> 8.0 && d2 > 4.0) |results

In this example the COSY peak list is searched for peaks in which the first dimension of each peak (d1) is within a tolerance of 0.5 units (<.5>) from 8.0 and (&&) the second dimension of each peak (d2) is greater than (>) 4.0. The results of the search are placed in a buffer called |results. The units of the tolerances and peak coordinates are dependent on the units used in the input files. Since the coordinates are typically expressed in terms of parts per million (PPM), we will assume that input files use PPM in the rest of the manual.

3.4 Displaying Search Results

The display command (abbreviated 'd') is used to examine the contents of CONTRAST buffers. When a search is performed using the Scan command or one of several other related commands, the results of the search are placed in a named buffer which is added to the end of a master list of buffers. The buffers persist until the user deletes them or quits the program. Associated with each buffer is a number and the search Boolean that was used to create the buffer. Upon typing 'd' at the CONTRAST command line, the program enters a crude 'display' mode that has a unique set of subcommands for changing the way the buffers are displayed. These subcommands are executed as each character is typed. To exit display mode type 'q' at the display command line prompt. Section @@ gives more information on the different subcommands available within the display mode.

3.5 Writing Buffers to a File

The buffertofile command (abbreviated 'btf') is used to write the contents of a particular buffer to a file. In the following example:

> btf |results >search.cosy.con

the |results buffer is written to the file, search.cosy.con.

3.6 Quitting CONTRAST

There are two pathways for exiting CONTRAST. The quit command (abbreviated 'q') can be used to exit CONTRAST from the command line. If CONTRAST is not at the command line, the program can be exited by typing Ctrl-C to interrupt the action of the program followed by 'x' at the new prompt. Typing 'q' at this new prompt causes the program to resume the action that was interrupted by the Ctrl-C command.

3.7 CONTRAST Macros

Most of the commands that can be executed at the CONTRAST command line can also be executed from a CONTRAST macro. For our purposes a macro is an ASCII file that contains CONTRAST commands. When a macro is executed, CONTRAST interprets each non-whitespace line as if it were typed at the CONTRAST command line. Each line is executed serially until a quit command is reached, until the macro branches to another macro, or until the end of the file is reached. If the end of the file is reached the program returns to the CONTRAST command line and waits for user input. All text in a macro between two consecutive asterisks (**) and the next end-of-line marker is considered to be a comment and is ignored by the program.

The 5 commands just described can be typed into a file using a text editor and run as a CONTRAST macro. CONTRAST macros can be run in many different ways. Macro files can be specified at the UNIX command line when the program is started using the '<' sign to redirect input into the program as follows:

CONTRAST <user.macro

Alternately the name of the macro can be specified at the initial prompt by typing the name of the macro file and hitting enter. Macros can be launched from within other macros or from the CONTRAST command line using the execute command (abbreviated exe).

> exe user.macro

In this case control is transferred to user.macro until the end of the file is reached at which time control will be returned to the calling macro or initial command line. If the macro is terminated with a quit command, however, the CONTRAST program will be exited without returning to the calling procedure. The branch command can be used instead of the exe command in order to fully transfer control to the called macro.

> branch user.macro

Chapter 4

Input File Formats

CONTRAST input files use a free format in which blank lines are ignored and white space (any number and combination of spaces and/or tabs) is used to delimit fields. Comments can be inserted anywhere in an input file by prefacing the comment with double asterisks (**). All text following the double asterisks (up to the end of the line on which they appear) is considered to be part of the comment and is effectively ignored by CONTRAST. Most CONTRAST input files are either a form of a spectrum file or a macro file. In the next release of CONTRAST the user will be given the option of reading in spectrum files in a macro format, but an understanding of the spectrum file format is currently essential to using CONTRAST effectively.

4.1 CONTRAST Spectrum Files

A CONTRAST spectrum file consists of a header followed by a peak list. The header of a spectrum file should contain information about the spectrum. Since most of this information is the same for all instances of a particular type of spectrum, it is usually safer to copy and modify an existing header from a similar spectrum than to write a header from scratch. When copying a header from the spectrum file of the same kind of experiment it is usually only necessary to modify the number of peaks, the tolerances, and the comments. The fields in a spectrum file must appear in the given order. Although comments and blank lines can appear anywhere in a spectrum file it is a good practice to settle upon and stick to a style in order to maximize readability and to minimize the possibility of making mistakes. As long as fields appear in the correct order, it does not matter if they are arranged on a different lines or if they are all placed on the same line or some combination of the two arrangements. As all combinations have not been rigorously tested, however, we recommend that a format similar to the one shown below be used. Bold print is used to show essential information which must be included in a spectrum file, normal print is used to show optional information, and italics is used to show those elements of optional fields that are even more optional. The following is the file format for an n-dimensional spectrum (with as many as C correlations) that contains i peaks.

 

4.2 Spectrum File Format

name

n i (qual)

comment = numCom

d1lab d1atm d1tol d1cor1 (prob1) d1cor2 (prob2) d1corC (probC)

d2lab d2atm d2tol d2cor1 (prob1) d2cor2 (prob2) d2corC (probC)

dnlab dnatm dntol dncor1 (prob1) dncor2 (prob2) dncorC (probC)

** comments

** comments

p1coord1 p1coord2 p1coord3 p1ntens * p1comment

p2coord1 p2coord2 p2coord3 p2ntens * p2comment

picoord1 picoord2 picoord3 pintens * picomment

4.3 Spectrum Field Definitions

name The name of the spectrum. The name of a CONTRAST spectrum file is generally the

spectrum name with the '.con' suffix appended to it.

n The dimensionality of the spectrum.

i The number of peaks in the spectrum.

(qual) An estimation of the quality of the spectrum couched in terms of a probability. A

qual factor of 1.0 indicates that 100% of the expected peaks will be present in the

spectrum, and that very little noise (false peaks) are present. A qual factor of 0.9

indicates that 90% of the expected peaks are present.

comment = Text that indicates that the next field (numCom) is the number of characters the

program should allocate for the comment associated with each peak. 'ment =' is

italicized to indicate that only 'com' is needed to signal that the next field is

numCom.

numCom The number of characters that the program should allocate for the comment

associated with each peak.

d#lab The label of the #'th dimension of the peaks in the spectrum.

d#atm The resonance code (also called atom code) describing all of the atoms of the #'th

dimension of the peaks in the spectrum. Since some dimensions of a spectrum

often detect several different resonances, wild cards are frequently used in this

field. A description of resonance codes is found in section @.@.

d#tol The default tolerance of the #'th dimension of the peaks in a spectrum. A tolerance

is one-half of the resolution of that dimension.

d#cor## The resonance code (also called atom code) of the #'th dimension of the ##'th

correlation in the spectrum. Correlations describe the types of peaks that one

would expect to see in a spectrum. An HNCA spectrum, for example, contains an

Hni,Nai,Cai correlation (amide proton, amide nitrogen, alpha carbon) and an

Hni,Nai,Ca- correlation (amide proton, amide nitrogen, alpha carbon from

previous residue). The last resonance code for a given dimension will be repeated

if previous or subsequent dimensions contain more resonance codes. A description

of resonance codes is found in section @.@.

(prob##) The estimated probability of seeing the previous correlation in the spectrum.

 

 

Note that only the last probability listed in a vertical column will be used to describe the

##'th correlation. Other probabilities are used only to make the file more readable.

** Comment markers. Comment markers indicate that the text that follows on that

line is a comment and should be ignored by the program. Users are encouraged to

use comments to document the origin of the spectrum files and each modification

that the files undergoe. Most CONTRAST functions that modify a spectrum or

spectrum file will append a comment to the file that tells what was done to the file

and the date it was done.

comments Any text that the user wants to include in the file.

p##coord# The #'th coordinate (frequency dimension) of the ##'th peak in the spectrum

(usually in ppm units).

p##ntens The intensity of the ##'th peak in the spectrum.

* A special peak comment marker that causes the program to read in the comment

and associate it with the peak that the comment follows. The 'comment =

numCom' line described above is used to specify the maximum number of

characters that can be stored in each peak comment.

p#comment The comment associated with the #'th peak of the spectrum.

4.4 Example Spectrum File

hnca

3 4 (90)

comment length = 30

H Hni .02 Hni

N Nai .1 Nai ** Don't need to repeat last resonance code

Ca Ca .1 Cai (90) Ca- (60)

** Created 9/9/99 from hnca.ppm file.

** Comments can be inserted at any point in the file after an

** asterisk.

8.61 114.3 180.2 100073 * peak 1

9.12 122.4 178.2 20073 * peak 2

7.43 118.9 134.2 10034.5 * peak 3

8.74 110.3 181.2 67896 * peak 4

 

4.5 Resonance Codes

Resonance codes are special CONTRAST words that describe the type of atom that gives rise to an NMR signal. These codes are sometimes called atom codes since they specify an atom type or group of atom types. Resonance codes can contain a maximum of 4 characters with each character describing a different aspect of an atom. If any character representing a particular aspect is omitted then CONTRAST assumes the most general case to hold for that aspect. For example the resonance code 'H' contains only the atom type specifier. This resonance code thus includes all hydrogen atoms. The resonance code 'Hb' represents all beta protons in the protein, and the resonance code 'Hi' represents all protons on the current residue. In this release of CONTRAST all resonance codes make reference to amino acids in a protein or peptide. At this time there is no way simple way to refer to nucleic acids or other molecules. A list of the valid resonance code characters grouped by the different aspects that they describe follows:

Atom Specifiers:

C Carbon atom.

N Nitrogen atom.

H Hydrogen atom.

O Oxygen atom.

P Phosphorous atom.

X Wildcard. Matches any atom type.

Q NULL. Can never match another atom type.

IntraResidue Position Specifiers:

a Alpha. Bonded to or at the alpha position in the residue.

b Beta. Bonded to or at the beta position in the residue.

g Gamma. Bonded to or at the gamma position in the residue.

d Delta. Bonded to or at the delta position in the residue.

e Epsilon. Bonded to or at the epsilon position in the residue.

f F. Bonded to or at the F position in the residue.

z Z. Bonded to or at the Z position in the residue.

k Backbone. All backbone atoms in the residue.

s Sidechain. All sidechain atoms in the residue.

r Ring. All ring atoms in the residue.

c Carbon. Bonded to a carbon atom in the residue.

h Hydrogen. Bonded to a hydrogen atom in the residue.

n Nitrogen. Bonded to a nitrogen atom in the residue.

o Oxygen. The carbonyl position or bonded to an oxygen atom in the residue.

x Wildcard. All positions within a residue.

IntraResidue Position Specifiers:

- Within the previous residue.

i Within the current residue.

+ Within the next residue.

* Can be within any residue in the protein (often from NOE).

Atom number:

0 Matches all other single character atom numbers.

1-9 This single character number is used to distinguish between atoms at the same

position. For example two beta protons can be distinguished by referring to one as

Hb2 and the other as Hb3.

4.6 Resonance Code Examples

Cai Matches alpha carbons within the current residue.

Hbi2 Matches the second beta proton within the current residue.

X Matches all atoms in the protein.

X- Matches all atoms in the previous residue.

Co- Matches the carbonyl carbon of the previous residue.

Nai Matches the amide nitrogen of the current residue.

Q Does not match any atom in the protein.

Cs+ Matches all carbon atoms in the side chain of the next residue.

Cxi Matches all carbon atoms in the current residue.

Hxi1 Matches all number 1 protons in the current residue.

Hxi0 Matches all protons in the current residue.

Hb*1 Matches all number 1 beta protons in the entire protein.

Hn* Matches all amide protons in the protein.

4.7 Sequence Files

CONTRAST sequence files follow the same general format as spectrum files and are read into the program with the same command, LoadFiles (abbreviated lf). Sequence files are one-dimensional spectrum files in which the name of the spectrum is 'sequence' and the "peak comments" are amino acid names. The next section shows a schematic of a sequence file. Bold print is used to show essential information which must be included in a sequence file, normal print is used to show optional information, and italics is used to show those elements of optional fields that are even more optional. The following is the file format for a sequence file for a protein that contains i amino acids in the sequence.

4.8 Sequence File Format

sequence

1 i

comment = lenAA

lab Q qual

** comments

** comments

1 prob1 * AAname1

2 prob2 * AAname2

i probi * AAnamei

4.9 Sequence Field Definitions

sequence Indicates that the file is a sequence file.

1 The dimensionality of the file. Sequence files can make use of more dimensions to

associate sequence positions with additional numerical information.

i The number of residues in the sequence.

comment = Text that indicates that the next field (lenAA) is the number of characters the

program should allocate for the amino acid names. 'ment =' is italicized to indicate

that only 'com' is needed to signal that the next field is 'lenAA'.

lenAA The maximum number of characters used in residue names.

lab Label to be used to identify sequence position numbers.

Q 'Q' = NULL place holder.

qual Quality of sequence determination (usually 1.0).

** Comment markers. Comment markers indicate that the text that follows on that

line is a comment and should be ignored by the program. Users are encouraged to

use comments to document the origin of the sequence files and each modification

that the files undergo. Most CONTRAST functions that modify a sequence or

spectrum file will append a comment to the file that tells what was done to the file

and the date it was done.

comments Any text that the user wants to include in the file.

1,2,,i Sequence position numbers. If there is ambiguity about the type of residue at a

sequence position, the sequence position number can be repeated at the end of the

file with alternative residue types. The probability value for the sequence position

should reflect this ambiguity.

prob# Probability that the #'th sequence position contains that residue type.

AAname# Name of the amino acid at the #'th sequence position. The name can be in any

desired format as long as the format matches that used elsewhere in the program.

One letter abbreviations, three letter abbreviations, and the entire names of the

standard 20 amino acids are understood and interconverted by CONTRAST.

4.10 Example Sequence File

The following is the sequence file for a hexapeptide. The third residue of the sequence is ambiguous and is thought to be either a glutamate or a glutamine residue.

seq

1 6

# Q 0.9

** Hex1 hexapeptide sequence.

** 9/9/99 by Fred

1 1 * Ala

2 1 * V

3 .6 * Q

4 1 * A

5 1 * Serine

6 1 * t

3 .4 * E

** Note that the id of residue 3 is ambiguous.

4.11 Macro Files

Macro files are ASCII files that contain a list of valid CONTRAST commands. The format for CONTRAST macro files is open and very simple. The only general requirements are that lines must be less than 1000 characters long, and lines can not contain more than one CONTRAST command. If a line contains more than one command the second command is generally ignored without causing a problem, but sometimes the second can interfere with the first command.

Each command has its own required format, but a few general rules apply to all CONTRAST commands:

1. Their first non-whitespace character must be the beginning of the command name. Leading whitespace is ignored.

2. Command names can be typed in as abbreviations, complete command names, or any partial command name in between (eg. 'q', 'qu', 'quit', and 'quitcontrastnow' will all quit CONTRAST).

3. Command names are case independent. (eg. 'q' and 'Q' will quit CONTRAST).

4. A command's fields are all delimited by whitespace (tabs and spaces).

5. The '->' marker can be used at the end of a line to indicate that the command is continued on the next line.

6. The '**' marker (comment marker) will cause the program to ignore the rest of the line.

7. All variables (marked by the '&' prefix) contained in a command are replaced by the values or text strings that they contain before the command is interpreted. Thus variables can be substituted for command names and/or command fields.

Chapter 5

Checking Input Files

CONTRAST input files should all be carefully checked before beginning a CONTRAST run. If the input spectra are not referenced correctly or if the peaks in the input spectra do not "line up", then this problem must be dealt with before proceeding with making assignments. The following macro provides a simple way to check the alignment of input spectra.

**Macro template for checking the alignment of i input spectra.

**NOTE: Make sure tolerances are conservative (large).

lf spec1.con ** Load input spectrum 1.

lf spec2.con ** Load input spectrum 2.

lf speci.con ** Load input spectrum i.

contrace 1, >contrace.mac ** Automatically build spin systems.

dtf >display.out ** Save internal buffers to file.

q ** Quit.

The Contrace function automatically finds the best way to correlate the input spectra. In this example it uses the first input spectrum as the starting point for searches. (The command "contrace 2, >contrace.mac" specifies that the second input spectrum be used as the starting point for searches.) The spectrum specified to be the starting point is called the source spectrum, and for the purposes of checking spectral correlation, the source spectrum should be spectrum with the most reliable referencing that overlaps the most with the other spectra. If you are unsure of which spectrum to designate as the source spectrum, don't specify a source (contrace >contrace.mac) and Contrace will determine a good source spectrum for you. The Contrace function and the macro it generates will be described in more detail in the next two sections.

The file ('display.out') created by running a macro similar to that shown above can be examined to determine if there are any problems with the input spectra. A simplified example of 'display.out' contents is shown below:

hnco Hn_N_hnca Hn_N_hncoca Hn_N_tocsy ... hnco Hn_N_hnca ...

----- --------- ----------- ---------- ----- ---------

peak1 peak18 peak100 ... peak2 peak149 ...

peak34 peak23 ... ...

peak190 ... ...

The buffers in the file are organized into repeating groups (fragments) based on the peaks of the source spectrum which in this case is hnco. Each fragment starts with the source buffer and ends right before the next source buffer. The buffers following the source buffer are named with prefixes (that represent the resonances that were used to search the spectra) that preceded the name of the spectrum that was searched. The peaks found in each buffer are all the peaks that matched the given resonances within a specified tolerance. It is not unusual for several peaks to be missing in a spectrum and thus for several buffers to be empty, but if very few of a spectrum's buffers contain peaks that correlate well to the peak in the source buffer, then there is a problem. Either the tolerances used are too small or there is a problem with the spectrum. Often times problems arise from using the wrong magnitude or sign for the sweep width when referencing. If this is the case the resonances near the center of that dimension's spectrum will often match but the resonance frequencies towards the edges of the dimension will be off by a considerable amount.

After major referencing problems have been corrected, attention should be given to choosing the best tolerances possible. Ideal tolerances are as small as possible, but not so small that legitimate correlations fall outside the tolerance range. It is helpful to subtract the correlated

resonances for a large number of fragments in order to get a good feel for what tolerances should be used in the spectrum files. The sum of the tolerances for the two spectra under consideration should be larger than most of the differences. If the average difference is not close to zero, then this could indicate another referencing problem. Referencing problems can be corrected using the operate function (section @.@) or the set function (section @.@), but it is not wise to use spectra to calculate assignments if there is an unknown problem with the referencing. There are also several commands in CONTRAST that calculate reference offsets automatically the most reliable being the align function (section @.@). Until you are familiar with working with peak lists, however, we recommend that you use the macro described above.

Chapter 6

Arithmetic Expressions and Booleans

Arithmetic expressions and Booleans must be able to access many different fields within the major data structures of the CONTRAST program. The sometimes combinatorial and sometimes synchronous nature of assignment algorithms adds to the complexity of the syntax of these expressions. This section first describes the system used for accessing CONTRAST's variables and data structures; next it describes CONTRAST arithmetic expressions; and finally it describes CONTRAST Boolean expressions.

6.1 Accessing CONTRAST lists

CONTRAST accesses three kinds of data which we will refer to as lists: spectra, buffers, and files. Spectra and buffers can be thought of as lists of peaks while files are lists of the lines of text that make up the file.

6.1.1 Spectrum Data Structures

 

A spectrum is a CONTRAST spectrum file that has been read into memory by the program. It consists of the header information, peak list, and any other information that becomes associated with the spectrum during the course of the CONTRAST session. Outside of arithmetic expressions and Booleans, spectra can be specified by name or by the cardinal number that corresponds to their position in the sequence of spectra read into CONTRAST. Within arithmetic expressions or Booleans, however, the name or number of the spectrum must be preceded by the spectrum symbol '$'. Examples are:

1 The first spectrum loaded.

$2 The second spectrum loaded.

cosy The spectrum named cosy.

hnca The spectrum named hnca.

Different fields within a spectrum are referred to by single character abbreviations preceding the spectrum symbol ('$'). If there are several fields of the same type (eg. dimensions in a spectrum) then a digit is appended to the abbreviation. The following is a partial list of the spectral fields that can be accessed using this method.

 

6.1.2 Fields of a Spectrum

di The coordinate of dimension i (where i = 1 to the number of dimensions)

i The intensity of a peak. (Note: d0 = i)

c The comment associated with a peak.

C The numeric value of the comment associated with a peak.

N Variable associated with the spectrum.

X Variable associated with the spectrum.

l The level (a variable) of the spectrum.

m The number of dimensions of a spectrum.

k The number of buffers associated with each peak.

w Current printed column width.

ti The tolerance for dimension i.

# The number of peaks.

The following examples show how different fields of a COSY spectrum (the third spectrum read into the CONTRAST program) are specified.

Examples

d1$cosy The frequency of the first dimension of a peak.

c$3 The comment associated with a peak.

l$cosy The level of the COSY spectrum

 

6.1.3 Buffer Data Structures

Buffers are internal working lists which contain peaks and any information associated with those peaks. Peaks are generally added to buffers by performing searches of spectra or other buffers. Multiple buffers are stored in the program in a linear list. Buffers can be added to and deleted from the program's linear list of buffers just as peaks can be added and deleted from individual buffers. Peaks from multiple spectra can be added to a single buffer. The command line designation of a buffer is its name or its position number in the list of buffers preceded by the '|' symbol (eg. |hncoBuff or |1). Buffer names should be alphanumeric although the # and @ can be used in special cases. Buffer names beginning with "|@" (e.g. |@hnca) must refer to buffers that are not linked to a particular peak in a source spectrum. Each peak in a buffer can have associated with it, in addition to all of the original information associated with it in the spectrum, the following fields (pieces of information).

6.1.4 Fields of a Buffer

# Number. The number of peaks in the buffer.

v Value. The first coordinate that wasn't matched in the search.

t Tolerance. The tolerance of that value's dimension.

n N. Integer variable.

x X. Real variable.

r Repeats. The number of different instances of that value in the buffer within that value's tolerance.

c Comment. The text comment associated with the peak.

C Comment number. The numeric value of the comment associated with the peak.

di Dimension i. The frequency of dimension i.

D Deviation. Score between numDims*0.2 and numDims*1.2 that rates how close the peak is to the target(s), where numDims*1.2 is the value of the best deviation (closest match) and numDims*0.2 is the worst deviation value (on the edge of the tolerance ranges).

s Score. Used by several routines to determine the rank of the peaks.

l Level. General purpose progress and scoring variable for the peak.

w wLevel. General purpose progress and scoring variable for the whole buffer.

6.1.5 Files

ASCII files can be accessed directly by the CONTRAST program. File names are specified with the '>' prefix (eg. >filename.txt). Fields in a file are considered are delineated by white space (spaces and tabs). Each field in a line is considered a dimension of that line and uses the same 'di' convention used by spectra and buffers. For example d3>filename.txt = "See" for the line, "See Spot. See Spot run." CONTRAST uses the same conventions for specifying a line or range of lines in a file as it does the peaks in a spectrum or buffer.

6.2 Designating Peaks or Lines in a List

Peaks or lines are specified by suffixes added to the field and list descriptors after a comma. Either a single peak (line) or a range of peaks (lines) can be referenced. If no peak or line is specified then the entire range is assumed. Boolean expressions will go through every peak or line in a range and evaluate the value of the expression automatically. The following is a list of peak specifiers.

 

6.2.1 Peak Specifiers

,i The i'th peak or line in a list.

,H The peak or line with the highest specified field value.

,L The peak or line with the lowest specified field value.

,b The first peak or line in a list.

,fi The first i peaks or lines in a list.

,e The last peak or line in a list.

,li The last i peaks or lines in a list.

,i-j The i'th peak or line through the j'th peak or line in a list.

,i+ The i'th peak or line through the last peak or line in the list.

 

6.2.2 Examples

i|fred,f4H the highest intensity of the first 4 peaks in buffer fred.

i$fred,f4 the intensity of the first four peaks in spectrum fred.

i|fred,l4 the intensity of the last four peaks in fred

d1>fred,4+ the first field of the fourth through the last lines in file fred.

i|fred,2-5 the intensity of the second through fifth (inclusive) peaks

v|fred,H the value of the highest valued peak in buffer fred

s|fred,L the lowest grade in buffer fred.

C|fred,e the numeric part of the comment from the last peak in buffer fred.

c|fred,1 the comment text string from the first peak in buffer fred.

d|fred,b the deviation of the first peak in buffer fred.

#$fred the number of peaks in spectrum fred.

w$fred the column width of the spectrum fred.

6.3 Arithmetic Expressions

CONTRAST arithmetic expressions are straightforward. They can appear in most CONTRAST expressions in which a variable or parameter is set to a discrete value. In Boolean expressions they can operate on sets and ranges of values as long as there is only one variable or less in each term of the Boolean. If a range is specified for a simple arithmetic expression, the function always uses the highest value in the range for the calculation. CONTRAST arithmetic expressions use a standard order of mathematical operations but the order can be controlled by use of parenthesis. Nesting of parenthesis is permitted. Use of white space within an arithmetic expression is optional except for a few situations -- namely that the '+' and '-' operations should be preceded by white space if they follow immediately after a list expression. A list of arithmetic and text string operators follows. The accompanying examples assume the following: #|hnca = 2, d1|cosy,1 = 8.5, and c$hnca,1 = "His23Ca2". Boolean operators will be discussed in the next section.

 

6.3.1 Arithmetic Operators

+ Addition 4 + #|hnca = 2

- Subtraction d1|cosy,1 - 2 = 6.5

/ Division 10/4 = 2.5

* Multiplication #|hnca*d1|cosy,1 = 17

^ To the power of 4 ^ 3 = 64

% Modulus 5 % #|hnca = .5

sin Sine (in degrees) sin(90) = 1

cos Cosine (in degrees) cos(90) = 0

tan Tangent (in degrees) tan(180) = 0

log Logarithm base ten log(1) = 0

ln Natural logarithm ln(d1|cosy,1) = 2.14

 

6.3.2 Text Operators

 

vali(text) The ith numeric part of text. val2("fr2ed4.1") = 4.1

+ Union "fred" + "ted" = "fredted"

^ Intersection "fred" ^ "ted" = "ed"

- Delete Intersection "fred" - "ted" = "fr"

* Number of Intersections "freded" * "ed" = 2

/ Remove Characters "fred" / "det" = "fr"

% Remove all but characters "fred" % "det" = "ed"

6.3.3 Example Arithmetic Expressions

(#|hnca*(d1|cosy,1 + .5))+2 = 20

val2(c|hnca,1) * 10 = 20

C|hnca,1 - 3 = 20

10 * (c|hnca,1 * "2") = 20 His23Ca2

val1(c|hnca,1 - "2") = 3

cos( val1(c|hnca,1/"ABC")-52) = -1

 

6.4 Boolean Expressions

Booleans are expressions that reduce to 1 (meaning true) or 0 (meaning false). Many different CONTRAST functions use Boolean expressions to determine whether or not the function will be executed for a particular value, peak, or line. CONTRAST uses a versatile Boolean format that allows sets, ranges, "boxes", and variables to be coded into an expression so that one expression can be evaluated for many different arrangements of data.

Boolean expressions are always marked by enclosure in parenthesis (). If a command contains both a Boolean expression and a separate mathematical expression that uses parenthesis, the Boolean expression must be listed first. In the following example the Boolean is "(d1|hnca>3)".

set level |hnca (d1|hnca>3) += (47 / i|hnca)

The Boolean in the preceding example is straightforward. The level of each peak in the hnca buffer whose d1 value is greater than 3 is incremented by 47 divided by the intensity of that peak. Since no specific peak in the hnca buffer is specified, the Boolean is evaluated for each peak in the buffer. The levels of only those peaks for which the Boolean evaluates to 'true' are incremented.

CONTRAST Booleans can combine an unlimited number of expressions by using the conjunctions '||' (or) and '&&' (and). For instance the following command uses a

Boolean composed of three parts.

set level |hnca ( l|hnca = 2 || (d1|hnca>3 && d2|hnca <= 9) ) += (47 / i|hnca)

In this Boolean the level of an HNCA peak will be incremented if the peak's level is currently equal to 2 or ('||') if the d1 value of the peak is greater than 3 and ('&&') the d2 value of the peak is less than or equal to 9. Note that expressions must be combined with conjunctions. Expressions such as " x > y > z " are not permitted in CONTRAST. Note also that some CONTRAST functions have not yet been implemented with "short-circuit logic". Short circuit logic allows the program to skip evaluating the rest of a Boolean when the expression is guaranteed to evaluate to true or false. In the above example if the level of an HNCA peak is equal to 2, then the full Boolean is guaranteed to evaluate to true so the program does not need to continue by testing the d1 and d2 values of the peak. Since several functions including the set function do not use short-circuit logic, we recommend that the user avoid writing Booleans that rely on this feature.

CONTRAST Booleans often compare values from different lists. These comparisons can be made synchronously or combinatorily. The preceding example used a synchronous mechanism for making comparisons. It was understood that each time the hnca buffer was referenced in the Boolean, that it referred to the same peak. The following Boolean also uses a synchronous mechanism, but this time it is not so obvious.

set level |fred (d1|fred,f5 > 3 && d1|tom,f5 <= 8) += 2

In this example when the first peak of buffer fred is being compared to 3, the first peak of buffer tom is being compared to 8, then the second peaks in each buffer are compared, the third, and so on. The above expression is equivalent to the following 5 expressions.

set level |fred (d1|fred,1 > 3 && d1|tom,1 <= 8) += 2

set level |fred (d1|fred,2 > 3 && d1|tom,2 <= 8) += 2

set level |fred (d1|fred,3 > 3 && d1|tom,3 <= 8) += 2

set level |fred (d1|fred,4 > 3 && d1|tom,4 <= 8) += 2

set level |fred (d1|fred,5 > 3 && d1|tom,5 <= 8) += 2

Synchronous expressions are signaled by using double conjunctions or operators. If a single '&' symbol had been used, a combinatorial comparison would have been performed. The following is an example of the use of a combinatorial conjunction.

set level |fred (d1|fred,f2 > 0 & d1|tom,f3 <= 8) += 10

 

In this example each of the first 2 peaks in fred is compared to zero once for each of the first three peaks in tom. In this case the level of one of the peaks in fred can be incremented by as much as 30 (3 * 10). This expression is equivalent to the following 6 commands.

set level |fred (d1|fred,1 > 0 & d1|tom,1 <= 8) += 10

set level |fred (d1|fred,1 > 0 & d1|tom,2 <= 8) += 10

set level |fred (d1|fred,1 > 0 & d1|tom,3 <= 8) += 10

set level |fred (d1|fred,2 > 0 & d1|tom,1 <= 8) += 10

set level |fred (d1|fred,2 > 0 & d1|tom,2 <= 8) += 10

set level |fred (d1|fred,2 > 0 & d1|tom,3 <= 8) += 10

All fields in a Boolean from the same list are automatically synchronized even if combinatorial operators are used. The following is an example of a case in which fields that are synchronized even though a combinatorial conjunction ('&&') is specified.

set level |fred (d1|fred,f2 > 0 & d2|fred,f2 <= d1|tom,f3) += 10

This expression is equivalent to the following 6 commands.

set level |fred (d1|fred,1 > 0 & d2|fred,1 <= d1|tom,1) += 10

set level |fred (d1|fred,1 > 0 & d2|fred,1 <= d1|tom,2) += 10

set level |fred (d1|fred,1 > 0 & d2|fred,1 <= d1|tom,3) += 10

set level |fred (d1|fred,2 > 0 & d2|fred,2 <= d1|tom,1) += 10

set level |fred (d1|fred,2 > 0 & d2|fred,2 <= d1|tom,2) += 10

set level |fred (d1|fred,2 > 0 & d2|fred,2 <= d1|tom,3) += 10

Note that the two fields of |fred are synchronized, but that the |fred and |tom lists are compared combinatorily. In order to synchronize d2|fred,f2 and d1|tom,f3 we must use the "synchronous

less than or equal to" operator ("<<=" or "<<=="). Doubling Boolean operator symbols makes the two operands of the operator synchronous just as doubling Boolean conjunctions makes the left and right hand sides of the expressions synchronous. In the following example a synchronous operator is used to synchronize d2|fred,f2 and d1|tom,f3 in the expressions above.

set level |fred (d1|fred,f2 > 0 & d2|fred,f2 <<= d1|tom,f3) += 10

 

This expression is equivalent to the following two expressions.

set level |fred (d1|fred,1 > 0 & d2|fred,1 <= d1|tom,1) += 10

set level |fred (d1|fred,2 > 0 & d2|fred,2 <= d1|tom,2) += 10

Note that the third peak in |tom is never used since the first two peaks in |fred were specified and |tom was synchronized to |fred.

The default synchronization behavior of fields in a Boolean can be over-ridden by appending "i" suffixes to the field descriptions. The following is an example of the use of such suffixes.

set lev |fred (d1|fred,f2 > d1|tom,f2 && d2|fred,f2i2 = d2|tom,f3i1 && i|fred,f2i > 0) += 1

The following expressions are equivalent to the expression above.

set lev |fred (d1|fred,1 > d1|tom,1 && d2|fred,1 = d2|tom,1 && i|fred,1 > 0) += 1

set lev |fred (d1|fred,1 > d1|tom,1 && d2|fred,1 = d2|tom,1 && i|fred,2 > 0) += 1

set lev |fred (d1|fred,1 > d1|tom,2 && d2|fred,2 = d2|tom,1 && i|fred,1 > 0) += 1

set lev |fred (d1|fred,1 > d1|tom,2 && d2|fred,2 = d2|tom,1 && i|fred,2 > 0) += 1

set lev |fred (d1|fred,2 > d1|tom,1 && d2|fred,1 = d2|tom,2 && i|fred,1 > 0) += 1

set lev |fred (d1|fred,2 > d1|tom,1 && d2|fred,1 = d2|tom,2 && i|fred,2 > 0) += 1

set lev |fred (d1|fred,2 > d1|tom,2 && d2|fred,2 = d2|tom,2 && i|fred,1 > 0) += 1

set lev |fred (d1|fred,2 > d1|tom,2 && d2|fred,2 = d2|tom,2 && i|fred,2 > 0) += 1

If all of the terms that contain field descriptions in a Boolean are numbered from n = 1 to N, then the number n is used after an 'i' suffix to specify the field description that the expression is synchronized to. If no n value is specified after an 'i' suffix, then the containing expression is made independent (a combinatorial operation). In the above example the third field "d2|fred,f2i2" is synchronized to the second field "d1|tom,f3" and the fourth field "d2|tom,f3i1" is synchronized to the first field "d1|fred,f2". The last field is independent. If the 'i' suffix had not been added to the field description, then the last field would have been synchronized to the first field since they make reference to the same list.

6.4.1 Boolean operators and conjunctions

> combinatorial "greater than"

>= combinatorial "greater than or equal to"

< combinatorial "less than"

<= combinatorial "less than or equal to"

= combinatorial "equals"

!= combinatorial "not equal"

<> combinatorial "within a tolerance of "

>< combinatorial "outside a tolerance of "

& combinatorial "and"

| combinatorial "or"

>> synchronous "greater than"

>>= synchronous "greater than or equal to"

<< synchronous "less than"

<<= synchronous "less than or equal to"

== synchronous "equals"

!!= synchronous "not equal"

<<>> synchronous "within a tolerance of "

>><< synchronous "outside a tolerance of "

&& synchronous "and"

|| synchronous "or"

Tolerance operators contain a tolerance values embedded in the operator. This value can take the form of a constant, a variable, a field, a range, a set, or a box just like normal Boolean operands. (Sets and boxes will be described in a subsequent section.) If a field description is used as a tolerance, it is good practice to specify synchrony directly using the 'i' suffix unless the field makes reference to a list referenced elsewhere in the Boolean. The following is an example of an expression that uses tolerances.

set lev |fred (d1|fred,f2 <.02> d1|tom,f3 && d2|fred >t|Hai,1< d2|tom) += 1

 

Boolean expressions can contain mathematical expressions as well as field descriptions and constants. The only limitation is that no term in the expression can contain more than 1 range, set, or box. The following is an example of a Boolean expression in which arithmetic expressions occur.

set lev |fred (cos(d1|fred,f2*2)+8 <.02> 8.2 && val2(C|fred)+6 >t|Hai,1/2< d2|tom) += 1

 

If the Boolean of a command is preceded by a NOT symbol '!', then the set of peaks or lines for which the Boolean does not evaluate to true is operated on by the command. In this special case the NOT symbol '!' performs a complementarity operation rather than the negation operation that it typically performs. For example in the command

 

set level |hnca !(d1|hnca>3) += 10

the level of each peak in |hnca that has a d1 value less than or equal to 3 is incremented by 10.


Chapter 7

An Adaptable Fully Automated Assignment Macro

This section contains an overview of the simplest and most automated assignment procedure available in CONTRAST. The procedure is implemented as a simple 6 part macro that can be used for most data sets with minimal modification. The performance of the algorithm is highly dependent on the type and quality of the data. The program always makes all possible assignments given the input data set, even when the data is insufficient to make an assignment. Therefore the output produced by the procedure should always be carefully checked and the evidence for every assignment should be examined and evaluated.

Figure 7.1 is an information flow diagram of the main steps in the fully-automated assignment procedure. The main body of the assignment program consists of three functions which generate CONTRAST macros for the user (Contrace, Reside, and Overlap) and a single function (AnnBF) that generates sequential assignments based on the output of the previous three functions. Arrows in the diagram represent the flow of information from one function to another.


Figure 7.1
 

 

The fully-automated approach to assignments is illustrated using sample macros written for two very different data sets. The first macro is written for a 2D homonuclear data set consisting of three experiments: COSY, TOCSY, and NOESY.

7.1 Fully-Automated 2D Macro

lf cosy.con

lf tocsy.con

lf noesy.con

lf seq.con

exe shifts.mac

contrace >contrace.mac -n -F

overlap 5 >overlap.mac

annbf 5, -l -x3

stf 5 >output.file

The next macro is written for a 3D heteronuclear data set consisting of 9 experiments:

HNCO, HNCA, HN(CO)CA, HN(CO)CACB, HNCACB, HCACO, HN-TOCSY-HMQC, HCCH-COSY, and HCCH-TOCSY.

7.2 Fully-automated 3D Macro

lf hnco.con

lf hnca.con

lf hncoca.con

lf hncocacb.con

lf hncacb.con

lf hcaco.con

lf hntocsy.con

lf hcchcosy.con

lf hcchtocsy.con

lf seq.con

exe shifts.mac

contrace 1, >contrace.mac -n -F

overlap 1 >overlap.mac

annbf 1, -l -x3

stf 1 >output.file

A comparison of the two macros shows that the main difference between them is the input data. The first step in both macros is to load the data into the program. The first three lines in the 2D macro and the first 9 lines in the 3D macro simply read the peak lists into the program, and the next line reads in the protein sequence. This step has already been described in Section @4.

In the next step a macro is executed which contains a database of the characteristic chemical shifts of the common amino acids. This database is experiment independent and should contain as much information as possible about the distribution of chemical shifts. The chemical shift database is described in Section 8.

The next step is the heart of the CONTRAST automated assignment procedure. The Contrace command generates a strategy for assembling spin systems using data from the input spectra and the chemical shift database. The strategy generated by the Contrace routine is output as a CONTRAST macro (in the cases above named contrace.mac). The function implements the strategy as it is being generated. The result of the function is a list of buffers that contain the modified results of searches and other manipulations of the data. These buffers are grouped into fragments that roughly correspond to amino acid spin systems.

The starting point for each fragment is a peak from a "source" spectrum. There is a one to one correspondence between the peaks of the source spectrum and fragments. The ideal source spectrum meets all of the following criteria:

1) The source spectrum is of high resolution and is well-referenced.

2) The source spectrum is very complete -- very few peaks are missing.

3) The source spectrum can be correlated to peaks from the other spectra.

4) The source spectrum contains one correlation (peak) per residue.

5) The source spectrum is relatively noise free; there are very few extra peaks.

 

These criteria should be taken as ideals which can be used to govern the choice of a source spectrum. They are ordered in order of decreasing importance.

In the 2D macro above the selection of the source spectrum was left to the Contrace function. In the case above the function generally constructs a spectrum from the Hn,Ha or fingerprint region using peaks from the COSY and TOCSY spectra. This spectrum is added to the list of spectra and becomes spectrum 5 (the sequence is treated as if it were a spectrum). The references to "5" in the following commands all refer to the newly created source spectrum. On the other hand, the HNCO spectrum (spectrum 1) is specified to the Contrace function as being the source spectrum. If it had not been specified, a new source would have been constructed from either the HNCOCA or HNCO spectra, and any missing peaks would have been filled in by the other spectra.

Each fragment starts off with the peak from the source spectrum which yields the first 2 (in the case of a 2D source) or 3 (in the case of a 3D source) assignments. A series of search and filter steps creates additional buffers (lists of peaks) within a fragment. These buffers are called working buffers, because they are used to build assignment buffers which are special buffers named for the resonance assignment that they contain. One of the chemical shift dimensions of the first peak in the assignment buffer is the actual frequency assignment for the resonance.

The Contrace function stops when there is an assignment buffer for each resonance mentioned in the correlation lists of the input spectra. Generally there is not enough information in the spectra to correctly assign all the resonances and usually the assignments of the last assignment buffers are the most uncertain. Spin systems are usually assigned all the way out to the epsilon position for every residue in the protein. The fragments can be considered to be "fuzzy" since they contain alternate assignments, and since no hard-fast endpoint decisions are made at this point of the analysis.

The next step of the macros is the "overlap" step. The Overlap function generates what are known as overlap tests which will be used in the sequential assignment step to score the likelihood that two fragments are derived from sequential residues. These overlap tests are generally very simple. They consist of commands that award points when resonances from overlapping assignment buffers are within a specific tolerance of one another. Overlapping assignment buffers are assignment buffers from two different fragments that are expected to contain the same resonance. For example the "previous Ca buffer" generated from a peak in the hn(co)ca spectrum should contain the same Ca resonance as the "Ca buffer" generated from a peak in the HNCA spectrum from the previous residue in the sequence. When NOESY spectra are used to score for sequential fragments, working buffers containing NOESY peaks are used in addition to assignment buffers in making overlap tests.

 

The next step of the automated assignment macros is the shuffling step in which the fragments created by the Contrace function are shuffled into the correct sequential order using the sequence of the protein. In this example the annbf (best first simulated annealing) algorithm is used to shuffle the peaks. This function uses the overlap tests generated by Overlap to place fragments in the correct order, and it uses the chemical shift database to match fragments to the correct positions in the sequence. The shuffling routine can also use other tests for matching fragments to sequence positions. These tests can be written by hand or automatically generated by the Reside function. In this simple case we do not illustrate the use of such tests, but they are often very helpful in identifying the amino acid type of a fragment.

The last step in the automatic assignment process is to write the output of the program to a file. The function stf (shuffle to file) writes the contents of all of the buffers that make up the fragments into the file "output.file". The fragments are written in the sequential order determined by the shuffling routine and are labeled with the name of the residue and the sequence position of the corresponding amino acid in the protein. Alternate orderings and ambiguity factors are indicated. The output file format will be discussed in more detail in a later section.

The assignment macros shown above are the bare minimum necessary for automated assignment. The commands shown above are usually supplemented with other functions that provide additional scaling information, amino acid type tests, and error checking routines. More complete macros are distributed with the CONTRAST executable. These macros have been annotated to document the use of the "extra" functions.


Chapter 8

Chemical Shift Database

8.1 Set Shift Format

The CONTRAST chemical shift database is a series of CONTRAST set shift commands that is read into the program as a CONTRAST macro. The set shift command allows the user to set the amino acid type, atom (resonance) type, chemical shift range and probability value for that range. The format for the command is as follows:

set shift AAname Resonance LoChemShift [-] HiChemShift [Prob]

AAname The name or abbreviation of the amino acid or amino acid group for which the

chemical shift information holds. The name should correspond to the name used in

the sequence.

Resonance The resonance code of the atom to which the chemical shift information applies.

LoChemShift The lower bound of the chemical shift range.

HiChemShift The upper bound of the chemical shift range.

Prob A probability value between 0.0 and 1.0

Set Shift Examples

The following group of set shift commands is an example of a typical entry for the alpha carbon of alanine residues.

set shift A Ca 48-54

set shift A Ca 48-50 0.1

set shift A Ca 50-52 0.6

set shift A Ca 52-54 0.3

This example highlights several important points. In the first line the entire range of allowed chemical shifts is given without a probability value and the next three lines break up that chemical shift range into smaller subranges that contain probability values for each subrange. This allows CONTRAST to use the chemical range information in two different ways. When probability values are given, CONTRAST uses the subranges to automatically calculate probability-based amino acid type scores during the sequential assignment step. Both the Contrace and Reside functions use full ranges that do not include probability values to perform connectivity tracing and amino acid test generation respectively. If all set shift commands contain probability values, then Contrace will not use chemical shift ranges to trace spin systems and Reside will not generate amino acid tests. If none of the set shift commands contain probability values then probability-based amino acid type scoring will not be performed.

8.2 Probability Values

The algorithm that generates probability-based amino acid type scores during sequential assignment can be used with true probability values for the chemical shift subranges, but its performance is improved considerably when the probability values are normalized so that the highest probability value for each resonance is given a value of 1. Using this function the preceding examples would thus be converted to:

set shift A Ca 48-54

set shift A Ca 48-50 0.167

set shift A Ca 50-52 1.0

set shift A Ca 52-54 0.5

8.3 Amino Acid Names

Amino acid names used in the set shift statement should match the amino acid names used in the input sequence file, but they need not be limited to standard nomenclature. In order to distinguish a particular amino acid in the sequence from other like amino acids simply use a different name. For example two serines in the sequence could be named "Sx" and "Sy" respectively. In this case the standard information in the chemical shift database would no longer apply, and the user would have to include a set of chemical shift ranges for amino acids named "Sx" and "Sy". NOTE: The three standard names for each of the standard 20 amino acids are interconverted. For example "cysteine", "cys", and "c" are all considered equivalent. Furthermore amino acid names are case-insensitive so that "Cysteine", "cysteine", "CYS", "Cys", "C", and "c" are all considered equivalent.

Non