
Spectrum Research, LLC.
CONTRAST
Connectivity Tracing
Assignment Tools for Automated Assignment of Protein NMR Data
User Guide
Version 2.0
Copyright
Notice
Copyright © 1996 through 2001 Spectrum Research,
LLC. All rights reserved.
No part of this document may be reproduced,
transmitted, transcribed, stored in a retrieval system, or translated into any
language in any form by any means without the written permission of Spectrum
Research, LLC. Spectrum Research, LLC.
reserves the right to change the information in this document without prior
notice.
Trademarks
Contrast
is a trademark of Spectrum Research, LLC.
Acknowledgments
Contrast
software program was developed by Drs. John Markley and John Olson at the
National Magnetic Resonance Facility located at the University of
Wisconsin-Madison. All rights, title,
and interest in Contrast are owned by
the Wisconsin Alumni Research Foundation ("WARF"). The commercial version of Contrast has been exclusively licensed
to Spectrum Research LLC by WARF.
Credits
If the results (figures and/or data) obtained by Contrast TM application are
used for publication purposes, please refer to them in the following manner or
any other equivalent form:
"ContrastTM software, developed by
Spectrum Research, LLC., was used to compute the results in this
publication."
Chapter 1
CONTRAST is a non-graphical software tool for
automating NMR peak assignment. The program works with
NMR data in the form of ASCII lists of peak coordinates and intensities.. The
program provides the user with several versatile tools for manipulating peak
lists in order to design a custom strategy. The program can itself generate
customizable procedures for automatic assignment of NMR data. It should be
possible to use CONTRAST and the strategies it was designed to employ for
working with any type of multidimensional NMR spectral data set (although not
all combinations of NMR spectra are likely to yield complete assignments).
The CONTRAST program was designed to be an in-house
research tool and not a commercial package. We have successfully applied the
program to many real and synthesized NMR data sets, but we are always careful
to check all results. We provide no warranty or guarantee of its performance.
Use the program at your own risk.
Software Licensing and Installation
2.1 How to Obtain the Program
The CONTRAST executable can be downloaded from the
Spectrum Research website (www.specres.com/download.asp) or a demo CD can be
requested from Spectrum Research.
2.2 Installation
The CONTRAST executable, contrast.exe, needs no
special installation. We recommend that the executable and help files (or
corresponding symbolic links) be placed in the directory that contains the
spectral data to be assigned.
If you have obtained source code for CONTRAST, the
file "contrast.c" contains all of the functions and header
information necessary to compile CONTRAST. The program was written on a Silicon
Graphics Indigo workstation, but since all but a few minor functions are
implemented using ANSI C, the program can be ported easily to other platforms
by changing the system calls that are specific for the Silicon Graphics
platform. To compile the program copy contrast.c to the target directory and
type:
cc -o contrast -g contrast.c -lm
at the operating system prompt. The ASCII text file,
contrast.hlp, is a crude manual for the CONTRAST program. The manual is
designed so that it can be easily searched while running CONTRAST with the
CONTRAST "page" function, which is called by typing
"ctrl-h" at a prompt or "h" at the command line. The
contrast.hlp file should be located in the same directory as the CONTRAST
executable in order to use this feature.
Getting Started
This section introduces loading spectrum files,
searching spectra, displaying the results of a search, writing the results of a
search to a file, and quitting the CONTRAST program. A simple example is given
to illustrate each point, and the use of both the command line interface and
macro files is described. The following CONTRAST commands will be described.
lf cosy.con
scan cosy (d1 <.5> 8.0 && d2 > 4.0)
|results
d
btf |results > search.cosy.con
q
To run CONTRAST simply type the name of the CONTRAST
executable at the system prompt (e.g. contrast.exe). The computer's display
will be cleared, and after several lines of copyright information you will be
asked for the name of the log (starting macro) file that you wish to run. If
you want to run a session macro, then type its file name at the prompt. If your
log file name is "usr.log" (the standard session log file name)
simply type return at the prompt. The text that appears in the angle braces in
a CONTRAST prompt is always the default value for the prompt. If you do not
already have a session macro, type a new file name at the prompt. It is
customary to use the suffix ".log" for session macros and ".mac"
for subroutine or branching macros. After the name of the log file is typed in,
the user is prompted by a '>' symbol for the next command.
The LoadFile
command (abbreviated lf) is used to
load peak list files into CONTRAST. CONTRAST peak list files are typically
created from the name of the experiment with the '.con' suffix appended, but
they can have any name. They must, however, adhere to the format outlined in
Section @@. The LoadFile command can
also be used to load the sequence of the protein, since the formats of the
files are similar. The following line loads the file cosy.con into the program:
> lf cosy.con
The Scan
command (abbreviated sc) is used to search peak lists. It is an extremely
versatile command and will be described in more detail in section @@. In order
to search for peaks in the COSY spectrum read into the program the user could
type a command similar to the following:
> sc cosy (d1 <.5> 8.0 && d2 >
4.0) |results
In this example the COSY peak list is searched for
peaks in which the first dimension of each peak (d1) is within a tolerance of 0.5
units (<.5>) from 8.0 and (&&) the second dimension of each peak
(d2) is greater than (>) 4.0. The results of the search are placed in a
buffer called |results. The units of the tolerances and peak coordinates are
dependent on the units used in the input files. Since the coordinates are
typically expressed in terms of parts per million (PPM), we will assume that
input files use PPM in the rest of the manual.
The display command (abbreviated 'd') is used to
examine the contents of CONTRAST buffers. When a search is performed using the Scan command or one of several other
related commands, the results of the search are placed in a named buffer which
is added to the end of a master list of buffers. The buffers persist until the
user deletes them or quits the program. Associated with each buffer is a number
and the search Boolean that was used to create the buffer. Upon typing 'd' at
the CONTRAST command line, the program enters a crude 'display' mode that has a
unique set of subcommands for changing the way the buffers are displayed. These
subcommands are executed as each character is typed. To exit display mode type
'q' at the display command line prompt. Section @@ gives more information on
the different subcommands available within the display mode.
The buffertofile command (abbreviated 'btf') is used
to write the contents of a particular buffer to a file. In the following
example:
> btf |results >search.cosy.con
the |results buffer is written to the file,
search.cosy.con.
There are two pathways for exiting CONTRAST. The
quit command (abbreviated 'q') can be used to exit CONTRAST from the command
line. If CONTRAST is not at the command line, the program can be exited by
typing Ctrl-C to interrupt the action of the program followed by 'x' at the new
prompt. Typing 'q' at this new prompt causes the program to resume the action
that was interrupted by the Ctrl-C command.
Most of the commands that can be executed at the
CONTRAST command line can also be executed from a CONTRAST macro. For our
purposes a macro is an ASCII file that contains CONTRAST commands. When a macro
is executed, CONTRAST interprets each non-whitespace line as if it were typed
at the CONTRAST command line. Each line is executed serially until a quit
command is reached, until the macro branches to another macro, or until the end
of the file is reached. If the end of the file is reached the program returns
to the CONTRAST command line and waits for user input. All text in a macro
between two consecutive asterisks (**) and the next end-of-line marker is
considered to be a comment and is ignored by the program.
The 5 commands just described can be typed into a
file using a text editor and run as a CONTRAST macro. CONTRAST macros can be
run in many different ways. Macro files can be specified at the UNIX command
line when the program is started using the '<' sign to redirect input into
the program as follows:
CONTRAST <user.macro
Alternately the name of the macro can be specified
at the initial prompt by typing the name of the macro file and hitting enter.
Macros can be launched from within other macros or from the CONTRAST command
line using the execute command (abbreviated exe).
> exe user.macro
In this case control is transferred to user.macro
until the end of the file is reached at which time control will be returned to
the calling macro or initial command line. If the macro is terminated with a
quit command, however, the CONTRAST program will be exited without returning to
the calling procedure. The branch command can be used instead of the exe
command in order to fully transfer control to the called macro.
> branch user.macro
Input File Formats
CONTRAST input files use a free format in which
blank lines are ignored and white space (any number and combination of spaces
and/or tabs) is used to delimit fields. Comments can be inserted anywhere in an
input file by prefacing the comment with double asterisks (**). All text
following the double asterisks (up to the end of the line on which they appear)
is considered to be part of the comment and is effectively ignored by CONTRAST.
Most CONTRAST input files are either a form of a spectrum file or a macro file.
In the next release of CONTRAST the user will be given the option of reading in
spectrum files in a macro format, but an understanding of the spectrum file
format is currently essential to using CONTRAST effectively.
A CONTRAST spectrum file consists of a header
followed by a peak list. The header of a spectrum file should contain
information about the spectrum. Since most of this information is the same for
all instances of a particular type of spectrum, it is usually safer to copy and
modify an existing header from a similar spectrum than to write a header from
scratch. When copying a header from the spectrum file of the same kind of
experiment it is usually only necessary to modify the number of peaks, the
tolerances, and the comments. The fields in a spectrum file must appear in the
given order. Although comments and blank lines can appear anywhere in a
spectrum file it is a good practice to settle upon and stick to a style in
order to maximize readability and to minimize the possibility of making
mistakes. As long as fields appear in the correct order, it does not matter if
they are arranged on a different lines or if they are all placed on the same
line or some combination of the two arrangements. As all combinations have not
been rigorously tested, however, we recommend that a format similar to the one
shown below be used. Bold print is used to show essential information which
must be included in a spectrum file, normal print is used to show optional
information, and italics is used to show those elements of optional fields that
are even more optional. The following is the file format for an n-dimensional
spectrum (with as many as C correlations) that contains i peaks.
4.2 Spectrum File Format
name
n i (qual)
comment = numCom
d1lab d1atm d1tol d1cor1 (prob1) d1cor2 (prob2) d1corC (probC)
d2lab d2atm d2tol d2cor1 (prob1) d2cor2 (prob2) d2corC (probC)
dnlab dnatm dntol dncor1 (prob1)
dncor2 (prob2) dncorC (probC)
** comments
** comments
p1coord1 p1coord2 p1coord3 p1ntens
* p1comment
p2coord1 p2coord2 p2coord3 p2ntens
* p2comment
picoord1 picoord2 picoord3 pintens
* picomment
name The
name of the spectrum. The name of a CONTRAST spectrum file is generally the
spectrum name with the '.con' suffix appended to it.
n The
dimensionality of the spectrum.
i The
number of peaks in the spectrum.
(qual) An
estimation of the quality of the spectrum couched in terms of a probability. A
qual
factor of 1.0 indicates that 100% of the expected peaks will be present in the
spectrum, and that very little noise (false peaks)
are present. A qual factor of 0.9
indicates that 90% of the expected peaks are
present.
comment = Text that indicates that the next field
(numCom) is the number of characters the
program should allocate for the comment associated
with each peak. 'ment =' is
italicized to indicate that only 'com' is needed to
signal that the next field is
numCom.
numCom The
number of characters that the program should allocate for the comment
associated with each peak.
d#lab The
label of the #'th dimension of the peaks in the spectrum.
d#atm The
resonance code (also called atom code) describing all of the atoms of the #'th
dimension of the peaks in the spectrum. Since some
dimensions of a spectrum
often detect several different resonances, wild
cards are frequently used in this
field. A description of resonance codes is found in
section @.@.
d#tol The
default tolerance of the #'th dimension of the peaks in a spectrum. A tolerance
is one-half of the resolution of that dimension.
d#cor## The
resonance code (also called atom code) of the #'th dimension of the ##'th
correlation in the spectrum. Correlations describe
the types of peaks that one
would expect to see in a spectrum. An HNCA spectrum,
for example, contains an
Hni,Nai,Cai correlation (amide proton, amide
nitrogen, alpha carbon) and an
Hni,Nai,Ca- correlation (amide proton, amide
nitrogen, alpha carbon from
previous residue). The last resonance code for a
given dimension will be repeated
if previous or subsequent dimensions contain more
resonance codes. A description
of resonance codes is found in section @.@.
(prob##) The estimated probability
of seeing the previous correlation in the spectrum.
Note that only the last
probability listed in a vertical column will be used to describe the
##'th correlation. Other probabilities are used only
to make the file more readable.
** Comment
markers. Comment markers indicate that the text that follows on that
line is a comment and should be ignored by the
program. Users are encouraged to
use comments to document the origin of the spectrum
files and each modification
that the files undergoe. Most CONTRAST functions
that modify a spectrum or
spectrum file will append a comment to the file that
tells what was done to the file
and the date it was done.
comments Any
text that the user wants to include in the file.
p##coord# The
#'th coordinate (frequency dimension) of the ##'th peak in the spectrum
(usually in ppm units).
p##ntens The
intensity of the ##'th peak in the spectrum.
* A
special peak comment marker that causes the program to read in the comment
and associate it with the peak that the comment
follows. The 'comment =
numCom'
line described above is used to specify the maximum number of
characters that can be stored in each peak comment.
p#comment The
comment associated with the #'th peak of the spectrum.
hnca
3 4 (90)
comment length = 30
H Hni .02 Hni
N Nai .1 Nai ** Don't need to
repeat last resonance code
Ca Ca .1 Cai (90) Ca- (60)
** Created 9/9/99 from hnca.ppm
file.
** Comments can be inserted at any
point in the file after an
** asterisk.
8.61 114.3 180.2 100073 * peak 1
9.12 122.4 178.2 20073 * peak 2
7.43 118.9 134.2 10034.5 * peak 3
8.74 110.3 181.2 67896 * peak 4
4.5 Resonance Codes
Resonance codes are special CONTRAST words that
describe the type of atom that gives rise to an NMR signal. These codes are
sometimes called atom codes since they specify an atom type or group of atom
types. Resonance codes can contain a maximum of 4 characters with each
character describing a different aspect of an atom. If any character
representing a particular aspect is omitted then CONTRAST assumes the most
general case to hold for that aspect. For example the resonance code 'H'
contains only the atom type specifier. This resonance code thus includes all
hydrogen atoms. The resonance code 'Hb' represents all beta protons in the
protein, and the resonance code 'Hi' represents all protons on the current
residue. In this release of CONTRAST all resonance codes make reference to
amino acids in a protein or peptide. At this time there is no way simple way to
refer to nucleic acids or other molecules. A list of the valid resonance code
characters grouped by the different aspects that they describe follows:
Atom Specifiers:
C Carbon atom.
N Nitrogen atom.
H Hydrogen atom.
O Oxygen atom.
P Phosphorous atom.
X Wildcard. Matches any atom type.
Q NULL. Can never match another atom type.
IntraResidue Position Specifiers:
a Alpha. Bonded to or at the alpha position in the
residue.
b Beta. Bonded to or at the beta position in the
residue.
g Gamma. Bonded to or at the gamma position in the
residue.
d Delta. Bonded to or at the delta position in the
residue.
e Epsilon. Bonded to or at the epsilon position in
the residue.
f F. Bonded to or at the F position in the residue.
z Z. Bonded to or at the Z position in the residue.
k Backbone. All backbone atoms in the residue.
s Sidechain. All sidechain atoms in the residue.
r Ring. All ring atoms in the residue.
c Carbon. Bonded to a carbon atom in the residue.
h Hydrogen. Bonded to a hydrogen atom in the
residue.
n Nitrogen. Bonded to a nitrogen atom in the
residue.
o Oxygen. The carbonyl position or bonded to an
oxygen atom in the residue.
x Wildcard. All positions within a residue.
IntraResidue Position Specifiers:
- Within the previous residue.
i Within the current residue.
+ Within the next residue.
* Can be within any residue in the protein (often
from NOE).
Atom number:
0 Matches all other single character atom numbers.
1-9 This single character number is used to
distinguish between atoms at the same
position. For example two beta protons can be
distinguished by referring to one as
Hb2 and the other as Hb3.
Cai Matches alpha carbons within the current
residue.
Hbi2 Matches the second beta proton within the
current residue.
X Matches all atoms in the protein.
X- Matches all atoms in the previous residue.
Co- Matches the carbonyl carbon of the previous
residue.
Nai Matches the amide nitrogen of the current
residue.
Q Does not match any atom in the protein.
Cs+ Matches all carbon atoms in the side chain of
the next residue.
Cxi Matches all carbon atoms in the current residue.
Hxi1 Matches all number 1 protons in the current
residue.
Hxi0 Matches all protons in the current residue.
Hb*1 Matches all number 1 beta protons in the entire
protein.
Hn* Matches all amide protons in the protein.
CONTRAST sequence files follow the same general
format as spectrum files and are read into the program with the same command, LoadFiles (abbreviated lf). Sequence files are one-dimensional
spectrum files in which the name of the spectrum is 'sequence' and the
"peak comments" are amino acid names. The next section shows a
schematic of a sequence file. Bold print is used to show essential information
which must be included in a sequence file, normal print is used to show
optional information, and italics is used to show those elements of optional
fields that are even more optional. The following is the file format for a
sequence file for a protein that contains i amino acids in the sequence.
sequence
1 i
comment = lenAA
lab Q qual
** comments
** comments
1 prob1 * AAname1
2 prob2 * AAname2
i probi * AAnamei
sequence Indicates that the file is a sequence file.
1 The dimensionality of the file. Sequence files can
make use of more dimensions to
associate sequence positions with additional
numerical information.
i
The number of residues in the sequence.
comment = Text that indicates that the next field
(lenAA) is the number of characters the
program should allocate for the amino acid names. 'ment =' is italicized to indicate
that only 'com' is needed to signal that the next
field is 'lenAA'.
lenAA
The maximum number of characters used in residue names.
lab
Label to be used to identify sequence position numbers.
Q
'Q' = NULL place holder.
qual
Quality of sequence determination (usually 1.0).
** Comment
markers. Comment markers indicate that the text that follows on that
line is a comment and should be ignored by the
program. Users are encouraged to
use comments to document the origin of the sequence
files and each modification
that the files undergo. Most CONTRAST functions that
modify a sequence or
spectrum file will append a comment to the file that
tells what was done to the file
and the date it was done.
comments Any
text that the user wants to include in the file.
1,2,,i
Sequence position numbers. If there is ambiguity about the type of residue at a
sequence position, the sequence position number can
be repeated at the end of the
file with alternative residue types. The probability
value for the sequence position
should reflect this ambiguity.
prob#
Probability that the #'th sequence position contains that residue type.
AAname#
Name of the amino acid at the #'th sequence position. The name can be in any
desired format as long as the format matches that
used elsewhere in the program.
One letter abbreviations, three letter
abbreviations, and the entire names of the
standard 20 amino acids are understood and
interconverted by CONTRAST.
The following is the sequence file for a
hexapeptide. The third residue of the sequence is ambiguous and is thought to
be either a glutamate or a glutamine residue.
seq
1 6
# Q 0.9
** Hex1 hexapeptide sequence.
** 9/9/99 by Fred
1 1 * Ala
2 1 * V
3 .6 * Q
4 1 * A
5 1 * Serine
6 1 * t
3 .4 * E
** Note that the id of residue 3
is ambiguous.
Macro files are ASCII files that contain a list of
valid CONTRAST commands. The format for CONTRAST macro files is open and very
simple. The only general requirements are that lines must be less than 1000
characters long, and lines can not contain more than one CONTRAST command. If a
line contains more than one command the second command is generally ignored
without causing a problem, but sometimes the second can interfere with the
first command.
Each command has its own required format, but a few
general rules apply to all CONTRAST commands:
1. Their first non-whitespace character must be the
beginning of the command name. Leading whitespace is ignored.
2. Command names can be typed in as abbreviations,
complete command names, or any partial command name in between (eg. 'q', 'qu',
'quit', and 'quitcontrastnow' will all quit CONTRAST).
3. Command names are case independent. (eg. 'q' and
'Q' will quit CONTRAST).
4. A command's fields are all delimited by
whitespace (tabs and spaces).
5. The '->' marker can be used at the end of a
line to indicate that the command is continued on the next line.
6. The '**' marker (comment marker) will cause the
program to ignore the rest of the line.
7. All variables (marked by the '&' prefix)
contained in a command are replaced by the values or text strings that they
contain before the command is interpreted. Thus variables can be substituted
for command names and/or command fields.
Checking Input Files
CONTRAST input files should all be carefully checked
before beginning a CONTRAST run. If the input spectra are not referenced
correctly or if the peaks in the input spectra do not "line up", then
this problem must be dealt with before proceeding with making assignments. The
following macro provides a simple way to check the alignment of input spectra.
**Macro template for checking the
alignment of i input spectra.
**NOTE: Make sure tolerances are
conservative (large).
lf spec1.con ** Load input
spectrum 1.
lf spec2.con ** Load input
spectrum 2.
lf speci.con ** Load input
spectrum i.
contrace 1, >contrace.mac **
Automatically build spin systems.
dtf >display.out ** Save
internal buffers to file.
q ** Quit.
The Contrace
function automatically finds the best way to correlate the input spectra. In
this example it uses the first input spectrum as the starting point for
searches. (The command "contrace 2, >contrace.mac" specifies that
the second input spectrum be used as the starting point for searches.) The spectrum
specified to be the starting point is called the source spectrum, and for the
purposes of checking spectral correlation, the source spectrum should be
spectrum with the most reliable referencing that overlaps the most with the
other spectra. If you are unsure of which spectrum to designate as the source
spectrum, don't specify a source (contrace >contrace.mac) and Contrace will determine a good source
spectrum for you. The Contrace
function and the macro it generates will be described in more detail in the
next two sections.
The file ('display.out') created by running a macro
similar to that shown above can be examined to determine if there are any
problems with the input spectra. A simplified example of 'display.out' contents
is shown below:
hnco Hn_N_hnca Hn_N_hncoca
Hn_N_tocsy ... hnco Hn_N_hnca ...
----- --------- -----------
---------- ----- ---------
peak1 peak18 peak100 ... peak2
peak149 ...
peak34 peak23 ... ...
peak190 ... ...
The buffers in the file are organized into repeating
groups (fragments) based on the peaks of the source spectrum which in this case
is hnco. Each fragment starts with the source buffer and ends right before the
next source buffer. The buffers following the source buffer are named with
prefixes (that represent the resonances that were used to search the spectra)
that preceded the name of the spectrum that was searched. The peaks found in
each buffer are all the peaks that matched the given resonances within a
specified tolerance. It is not unusual for several peaks to be missing in a
spectrum and thus for several buffers to be empty, but if very few of a
spectrum's buffers contain peaks that correlate well to the peak in the source
buffer, then there is a problem. Either the tolerances used are too small or
there is a problem with the spectrum. Often times problems arise from using the
wrong magnitude or sign for the sweep width when referencing. If this is the
case the resonances near the center of that dimension's spectrum will often
match but the resonance frequencies towards the edges of the dimension will be
off by a considerable amount.
After major referencing problems have been
corrected, attention should be given to choosing the best tolerances possible.
Ideal tolerances are as small as possible, but not so small that legitimate
correlations fall outside the tolerance range. It is helpful to subtract the
correlated
resonances for a large number of fragments in order
to get a good feel for what tolerances should be used in the spectrum files.
The sum of the tolerances for the two spectra under consideration should be larger
than most of the differences. If the average difference is not close to zero,
then this could indicate another referencing problem. Referencing problems can
be corrected using the operate function (section @.@) or the set function
(section @.@), but it is not wise to use spectra to calculate assignments if
there is an unknown problem with the referencing. There are also several
commands in CONTRAST that calculate reference offsets automatically the most
reliable being the align function (section @.@). Until you are familiar with
working with peak lists, however, we recommend that you use the macro described
above.
Arithmetic Expressions and Booleans
Arithmetic expressions and Booleans must be able to
access many different fields within the major data structures of the CONTRAST
program. The sometimes combinatorial and sometimes synchronous nature of
assignment algorithms adds to the complexity of the syntax of these
expressions. This section first describes the system used for accessing
CONTRAST's variables and data structures; next it describes CONTRAST arithmetic
expressions; and finally it describes CONTRAST Boolean expressions.
CONTRAST accesses three kinds of data which we will
refer to as lists: spectra, buffers, and files. Spectra and buffers can be
thought of as lists of peaks while files are lists of the lines of text that
make up the file.
6.1.1 Spectrum Data Structures
A spectrum is a CONTRAST spectrum file that has been
read into memory by the program. It consists of the header information, peak
list, and any other information that becomes associated with the spectrum
during the course of the CONTRAST session. Outside of arithmetic expressions
and Booleans, spectra can be specified by name or by the cardinal number that
corresponds to their position in the sequence of spectra read into CONTRAST.
Within arithmetic expressions or Booleans, however, the name or number of the
spectrum must be preceded by the spectrum symbol '$'. Examples are:
1 The first spectrum loaded.
$2 The second spectrum loaded.
cosy The spectrum named cosy.
hnca The spectrum named hnca.
Different fields within a spectrum are referred to
by single character abbreviations preceding the spectrum symbol ('$'). If there
are several fields of the same type (eg. dimensions in a spectrum) then a digit
is appended to the abbreviation. The following is a partial list of the
spectral fields that can be accessed using this method.
6.1.2 Fields of a
Spectrum
di The coordinate of dimension i (where i = 1 to the
number of dimensions)
i The intensity of a peak. (Note: d0 = i)
c The comment associated with a peak.
C The numeric value of the comment associated with a
peak.
N Variable associated with the spectrum.
X Variable associated with the spectrum.
l The level (a variable) of the spectrum.
m The number of dimensions of a spectrum.
k The number of buffers associated with each peak.
w Current printed column width.
ti The tolerance for dimension i.
# The number of peaks.
The following examples show how different fields of
a COSY spectrum (the third spectrum read into the CONTRAST program) are
specified.
Examples
d1$cosy The frequency of the first dimension of a
peak.
c$3 The comment associated with a peak.
l$cosy The level of the COSY spectrum
6.1.3
Buffer Data Structures
Buffers are internal working lists which contain
peaks and any information associated with those peaks. Peaks are generally
added to buffers by performing searches of spectra or other buffers. Multiple
buffers are stored in the program in a linear list. Buffers can be added to and
deleted from the program's linear list of buffers just as peaks can be added
and deleted from individual buffers. Peaks from multiple spectra can be added
to a single buffer. The command line designation of a buffer is its name or its
position number in the list of buffers preceded by the '|' symbol (eg.
|hncoBuff or |1). Buffer names should be alphanumeric although the # and @ can
be used in special cases. Buffer names beginning with "|@" (e.g. |@hnca) must refer to buffers that
are not linked to a particular peak in a source spectrum. Each peak in a buffer
can have associated with it, in addition to all of the original information
associated with it in the spectrum, the following fields (pieces of
information).
# Number. The number of peaks in the buffer.
v Value. The first coordinate that wasn't matched in
the search.
t Tolerance. The tolerance of that value's
dimension.
n N. Integer variable.
x X. Real variable.
r Repeats. The number of different instances of that
value in the buffer within that value's tolerance.
c Comment. The text comment associated with the
peak.
C Comment number. The numeric value of the comment
associated with the peak.
di Dimension i. The frequency of dimension i.
D Deviation. Score between numDims*0.2 and numDims*1.2
that rates how close the peak is to the target(s), where numDims*1.2 is the
value of the best deviation (closest match) and numDims*0.2 is the worst
deviation value (on the edge of the tolerance ranges).
s Score. Used by several routines to determine the
rank of the peaks.
l Level. General purpose progress and scoring
variable for the peak.
w wLevel. General purpose progress and scoring
variable for the whole buffer.
ASCII files can be accessed directly by the CONTRAST
program. File names are specified with the '>' prefix (eg.
>filename.txt). Fields in a file are considered are delineated by white
space (spaces and tabs). Each field in a line is considered a dimension of that
line and uses the same 'di' convention used by spectra and buffers. For example
d3>filename.txt = "See" for the line, "See Spot. See Spot
run." CONTRAST uses the same conventions for specifying a line or range of
lines in a file as it does the peaks in a spectrum or buffer.
Peaks or lines are specified by suffixes added to
the field and list descriptors after a comma. Either a single peak (line) or a
range of peaks (lines) can be referenced. If no peak or line is specified then
the entire range is assumed. Boolean expressions will go through every peak or
line in a range and evaluate the value of the expression automatically. The
following is a list of peak specifiers.
6.2.1 Peak
Specifiers
,i The i'th peak or line in a list.
,H The peak or line with the highest specified field
value.
,L The peak or line with the lowest specified field
value.
,b The first peak or line in a list.
,fi The first i peaks or lines in a list.
,e The last peak or line in a list.
,li The last i peaks or lines in a list.
,i-j The i'th peak or line through the j'th peak or
line in a list.
,i+ The i'th peak or line through the last peak or
line in the list.
6.2.2 Examples
i|fred,f4H the highest intensity of the first 4
peaks in buffer fred.
i$fred,f4 the intensity of the first four peaks in
spectrum fred.
i|fred,l4 the intensity of the last four peaks in
fred
d1>fred,4+ the first field of the fourth through
the last lines in file fred.
i|fred,2-5 the intensity of the second through fifth
(inclusive) peaks
v|fred,H the value of the highest valued peak in
buffer fred
s|fred,L the lowest grade in buffer fred.
C|fred,e the numeric part of the comment from the
last peak in buffer fred.
c|fred,1 the comment text string from the first peak
in buffer fred.
d|fred,b the deviation of the first peak in buffer
fred.
#$fred the number of peaks in spectrum fred.
w$fred the column width of the spectrum fred.
CONTRAST arithmetic expressions are straightforward.
They can appear in most CONTRAST expressions in which a variable or parameter
is set to a discrete value. In Boolean expressions they can operate on sets and
ranges of values as long as there is only one variable or less in each term of
the Boolean. If a range is specified for a simple arithmetic expression, the
function always uses the highest value in the range for the calculation.
CONTRAST arithmetic expressions use a standard order of mathematical operations
but the order can be controlled by use of parenthesis. Nesting of parenthesis
is permitted. Use of white space within an arithmetic expression is optional
except for a few situations -- namely that the '+' and '-' operations should be
preceded by white space if they follow immediately after a list expression. A
list of arithmetic and text string operators follows. The accompanying examples
assume the following: #|hnca = 2, d1|cosy,1 = 8.5, and c$hnca,1 =
"His23Ca2". Boolean operators will be discussed in the next section.
6.3.1
Arithmetic Operators
+ Addition 4 + #|hnca = 2
- Subtraction d1|cosy,1 - 2 = 6.5
/ Division 10/4 = 2.5
* Multiplication #|hnca*d1|cosy,1 = 17
^ To the power of 4 ^ 3 = 64
% Modulus 5 % #|hnca = .5
sin Sine (in degrees) sin(90) = 1
cos Cosine (in degrees) cos(90) = 0
tan Tangent (in degrees) tan(180) = 0
log Logarithm base ten log(1) = 0
ln Natural logarithm ln(d1|cosy,1) = 2.14
6.3.2
Text Operators
vali(text)
The ith numeric part of text.
val2("fr2ed4.1") = 4.1
+ Union "fred" + "ted" =
"fredted"
^ Intersection "fred" ^ "ted" =
"ed"
- Delete Intersection "fred" -
"ted" = "fr"
* Number of Intersections "freded" *
"ed" = 2
/ Remove Characters "fred" /
"det" = "fr"
% Remove all but characters "fred" %
"det" = "ed"
6.3.3 Example Arithmetic
Expressions
(#|hnca*(d1|cosy,1 + .5))+2 = 20
val2(c|hnca,1) * 10 = 20
C|hnca,1 - 3 = 20
10 * (c|hnca,1 * "2") = 20 His23Ca2
val1(c|hnca,1 - "2") = 3
cos( val1(c|hnca,1/"ABC")-52) = -1
6.4 Boolean Expressions
Booleans are expressions that reduce to 1 (meaning
true) or 0 (meaning false). Many different CONTRAST functions use Boolean
expressions to determine whether or not the function will be executed for a
particular value, peak, or line. CONTRAST uses a versatile Boolean format that
allows sets, ranges, "boxes", and variables to be coded into an
expression so that one expression can be evaluated for many different
arrangements of data.
Boolean expressions are always marked by enclosure
in parenthesis (). If a command contains both a Boolean expression and a
separate mathematical expression that uses parenthesis, the Boolean expression
must be listed first. In the following example the Boolean is
"(d1|hnca>3)".
set level |hnca (d1|hnca>3) += (47 / i|hnca)
The Boolean in the preceding example is
straightforward. The level of each peak in the hnca buffer whose d1 value is
greater than 3 is incremented by 47 divided by the intensity of that peak.
Since no specific peak in the hnca buffer is specified, the Boolean is
evaluated for each peak in the buffer. The levels of only those peaks for which
the Boolean evaluates to 'true' are incremented.
CONTRAST Booleans can combine an unlimited number of
expressions by using the conjunctions '||' (or) and '&&' (and). For
instance the following command uses a
Boolean composed of three parts.
set level |hnca ( l|hnca = 2 || (d1|hnca>3
&& d2|hnca <= 9) ) += (47 / i|hnca)
In this Boolean the level of an HNCA peak will be
incremented if the peak's level is currently equal to 2 or ('||') if the d1
value of the peak is greater than 3 and ('&&') the d2 value of the peak
is less than or equal to 9. Note that expressions must be combined with
conjunctions. Expressions such as " x > y > z " are not
permitted in CONTRAST. Note also that some CONTRAST functions have not yet been
implemented with "short-circuit logic". Short circuit logic allows
the program to skip evaluating the rest of a Boolean when the expression is
guaranteed to evaluate to true or false. In the above example if the level of
an HNCA peak is equal to 2, then the full Boolean is guaranteed to evaluate to
true so the program does not need to continue by testing the d1 and d2 values
of the peak. Since several functions including the set function do not use short-circuit logic, we recommend that the
user avoid writing Booleans that rely on this feature.
CONTRAST Booleans often compare values from different
lists. These comparisons can be made synchronously or combinatorily. The
preceding example used a synchronous mechanism for making comparisons. It was
understood that each time the hnca buffer was referenced in the Boolean, that
it referred to the same peak. The following Boolean also uses a synchronous
mechanism, but this time it is not so obvious.
set level |fred (d1|fred,f5 > 3 &&
d1|tom,f5 <= 8) += 2
In this example when the first peak of buffer fred
is being compared to 3, the first peak of buffer tom is being compared to 8,
then the second peaks in each buffer are compared, the third, and so on. The
above expression is equivalent to the following 5 expressions.
set level |fred (d1|fred,1 > 3 &&
d1|tom,1 <= 8) += 2
set level |fred (d1|fred,2 > 3 &&
d1|tom,2 <= 8) += 2
set level |fred (d1|fred,3 > 3 &&
d1|tom,3 <= 8) += 2
set level |fred (d1|fred,4 > 3 && d1|tom,4
<= 8) += 2
set level |fred (d1|fred,5 > 3 &&
d1|tom,5 <= 8) += 2
Synchronous expressions are signaled by using double
conjunctions or operators. If a single '&' symbol had been used, a
combinatorial comparison would have been performed. The following is an example
of the use of a combinatorial conjunction.
set level |fred (d1|fred,f2 > 0 & d1|tom,f3
<= 8) += 10
In this example each of the first 2 peaks in fred is
compared to zero once for each of the first three peaks in tom. In this case
the level of one of the peaks in fred can be incremented by as much as 30 (3 *
10). This expression is equivalent to the following 6 commands.
set level |fred (d1|fred,1 > 0 & d1|tom,1
<= 8) += 10
set level |fred (d1|fred,1 > 0 & d1|tom,2
<= 8) += 10
set level |fred (d1|fred,1 > 0 & d1|tom,3
<= 8) += 10
set level |fred (d1|fred,2 > 0 & d1|tom,1
<= 8) += 10
set level |fred (d1|fred,2 > 0 & d1|tom,2
<= 8) += 10
set level |fred (d1|fred,2 > 0 & d1|tom,3
<= 8) += 10
All fields in a Boolean from the same list are
automatically synchronized even if combinatorial operators are used. The
following is an example of a case in which fields that are synchronized even
though a combinatorial conjunction ('&&') is specified.
set level |fred (d1|fred,f2 > 0 & d2|fred,f2
<= d1|tom,f3) += 10
This expression is equivalent to the following 6
commands.
set level |fred (d1|fred,1 > 0 & d2|fred,1
<= d1|tom,1) += 10
set level |fred (d1|fred,1 > 0 & d2|fred,1
<= d1|tom,2) += 10
set level |fred (d1|fred,1 > 0 & d2|fred,1
<= d1|tom,3) += 10
set level |fred (d1|fred,2 > 0 & d2|fred,2
<= d1|tom,1) += 10
set level |fred (d1|fred,2 > 0 & d2|fred,2
<= d1|tom,2) += 10
set level |fred (d1|fred,2 > 0 & d2|fred,2
<= d1|tom,3) += 10
Note that the two fields of |fred are synchronized,
but that the |fred and |tom lists are compared combinatorily. In order to
synchronize d2|fred,f2 and d1|tom,f3 we must use the "synchronous
less than or equal to" operator
("<<=" or "<<=="). Doubling Boolean operator
symbols makes the two operands of the operator synchronous just as doubling
Boolean conjunctions makes the left and right hand sides of the expressions
synchronous. In the following example a synchronous operator is used to
synchronize d2|fred,f2 and d1|tom,f3 in the expressions above.
set level |fred (d1|fred,f2 > 0 & d2|fred,f2
<<= d1|tom,f3) += 10
This expression is equivalent to the following two
expressions.
set level |fred (d1|fred,1 > 0 & d2|fred,1
<= d1|tom,1) += 10
set level |fred (d1|fred,2 > 0 & d2|fred,2
<= d1|tom,2) += 10
Note that the third peak in |tom is never used since
the first two peaks in |fred were specified and |tom was synchronized to |fred.
The default synchronization behavior of fields in a
Boolean can be over-ridden by appending "i" suffixes to the field
descriptions. The following is an example of the use of such suffixes.
set lev |fred (d1|fred,f2 > d1|tom,f2 &&
d2|fred,f2i2 = d2|tom,f3i1 && i|fred,f2i > 0) += 1
The following expressions are equivalent to the
expression above.
set lev |fred (d1|fred,1 > d1|tom,1 &&
d2|fred,1 = d2|tom,1 && i|fred,1 > 0) += 1
set lev |fred (d1|fred,1 > d1|tom,1 &&
d2|fred,1 = d2|tom,1 && i|fred,2 > 0) += 1
set lev |fred (d1|fred,1 > d1|tom,2 &&
d2|fred,2 = d2|tom,1 && i|fred,1 > 0) += 1
set lev |fred (d1|fred,1 > d1|tom,2 &&
d2|fred,2 = d2|tom,1 && i|fred,2 > 0) += 1
set lev |fred (d1|fred,2 > d1|tom,1 &&
d2|fred,1 = d2|tom,2 && i|fred,1 > 0) += 1
set lev |fred (d1|fred,2 > d1|tom,1 &&
d2|fred,1 = d2|tom,2 && i|fred,2 > 0) += 1
set lev |fred (d1|fred,2 > d1|tom,2 &&
d2|fred,2 = d2|tom,2 && i|fred,1 > 0) += 1
set lev |fred (d1|fred,2 > d1|tom,2 &&
d2|fred,2 = d2|tom,2 && i|fred,2 > 0) += 1
If all of the terms that contain field descriptions
in a Boolean are numbered from n = 1
to N, then the number n is used after
an 'i' suffix to specify the field description that the expression is
synchronized to. If no n value is
specified after an 'i' suffix, then the containing expression is made
independent (a combinatorial operation). In the above example the third field
"d2|fred,f2i2" is synchronized to the second field
"d1|tom,f3" and the fourth field "d2|tom,f3i1" is
synchronized to the first field "d1|fred,f2". The last field is
independent. If the 'i' suffix had not been added to the field description,
then the last field would have been synchronized to the first field since they
make reference to the same list.
6.4.1 Boolean operators and conjunctions
> combinatorial "greater than"
>= combinatorial "greater than or equal
to"
< combinatorial "less than"
<= combinatorial "less than or equal
to"
= combinatorial "equals"
!= combinatorial "not equal"
<> combinatorial "within a tolerance of
"
>< combinatorial "outside a tolerance of
"
& combinatorial "and"
| combinatorial "or"
>> synchronous "greater than"
>>= synchronous "greater than or equal
to"
<< synchronous "less than"
<<= synchronous "less than or equal
to"
== synchronous "equals"
!!= synchronous "not equal"
<<>> synchronous "within a
tolerance of "
>><< synchronous "outside a
tolerance of "
&& synchronous "and"
|| synchronous "or"
Tolerance operators contain a tolerance values
embedded in the operator. This value can take the form of a constant, a
variable, a field, a range, a set, or a box just like normal Boolean operands.
(Sets and boxes will be described in a subsequent section.) If a field
description is used as a tolerance, it is good practice to specify synchrony
directly using the 'i' suffix unless the field makes reference to a list
referenced elsewhere in the Boolean. The following is an example of an
expression that uses tolerances.
set lev |fred (d1|fred,f2 <.02> d1|tom,f3
&& d2|fred >t|Hai,1< d2|tom) += 1
Boolean expressions can contain mathematical expressions
as well as field descriptions and constants. The only limitation is that no
term in the expression can contain more than 1 range, set, or box. The
following is an example of a Boolean expression in which arithmetic expressions
occur.
set lev |fred (cos(d1|fred,f2*2)+8 <.02> 8.2
&& val2(C|fred)+6 >t|Hai,1/2< d2|tom) += 1
If the Boolean of a command is preceded by a NOT
symbol '!', then the set of peaks or lines for which the Boolean does not
evaluate to true is operated on by the command. In this special case the NOT
symbol '!' performs a complementarity operation rather than the negation
operation that it typically performs. For example in the command
set level |hnca !(d1|hnca>3) += 10
the level of each peak in |hnca that has a d1 value
less than or equal to 3 is incremented by 10.
An Adaptable Fully Automated Assignment Macro
This section contains an overview of the simplest
and most automated assignment procedure available in CONTRAST. The procedure is
implemented as a simple 6 part macro that can be used for most data sets with
minimal modification. The performance of the algorithm is highly dependent on
the type and quality of the data. The program always makes all possible
assignments given the input data set, even when the data is insufficient to
make an assignment. Therefore the output produced by the procedure should
always be carefully checked and the evidence for every assignment should be
examined and evaluated.
Figure 7.1 is an information flow diagram of the main
steps in the fully-automated assignment procedure. The main body of the
assignment program consists of three functions which generate CONTRAST macros
for the user (Contrace, Reside, and Overlap) and a single function (AnnBF) that generates sequential
assignments based on the output of the previous three functions. Arrows in the
diagram represent the flow of information from one function to another.
Figure 7.1
The fully-automated approach to assignments is
illustrated using sample macros written for two very different data sets. The
first macro is written for a 2D homonuclear data set consisting of three
experiments: COSY, TOCSY, and NOESY.
lf cosy.con
lf tocsy.con
lf noesy.con
lf seq.con
exe shifts.mac
contrace >contrace.mac -n -F
overlap 5 >overlap.mac
annbf 5, -l -x3
stf 5 >output.file
The next macro is written for a 3D heteronuclear
data set consisting of 9 experiments:
HNCO, HNCA, HN(CO)CA, HN(CO)CACB, HNCACB, HCACO,
HN-TOCSY-HMQC, HCCH-COSY, and HCCH-TOCSY.
lf hnco.con
lf hnca.con
lf hncoca.con
lf hncocacb.con
lf hncacb.con
lf hcaco.con
lf hntocsy.con
lf hcchcosy.con
lf hcchtocsy.con
lf seq.con
exe shifts.mac
contrace 1, >contrace.mac -n -F
overlap 1 >overlap.mac
annbf 1, -l -x3
stf 1 >output.file
A comparison of the two macros shows that the main
difference between them is the input data. The first step in both macros is to
load the data into the program. The first three lines in the 2D macro and the
first 9 lines in the 3D macro simply read the peak lists into the program, and
the next line reads in the protein sequence. This step has already been
described in Section @4.
In the next step a macro is executed which contains
a database of the characteristic chemical shifts of the common amino acids.
This database is experiment independent and should contain as much information
as possible about the distribution of chemical shifts. The chemical shift
database is described in Section 8.
The next step is the heart of the CONTRAST automated
assignment procedure. The Contrace
command generates a strategy for assembling spin systems using data from the
input spectra and the chemical shift database. The strategy generated by the Contrace routine is output as a CONTRAST
macro (in the cases above named contrace.mac). The function implements the
strategy as it is being generated. The result of the function is a list of
buffers that contain the modified results of searches and other manipulations
of the data. These buffers are grouped into fragments that roughly correspond
to amino acid spin systems.
The starting point for each fragment is a peak from
a "source" spectrum. There is a one to one correspondence between the
peaks of the source spectrum and fragments. The ideal source spectrum meets all
of the following criteria:
1) The source spectrum is of high resolution and is
well-referenced.
2) The source spectrum is very complete -- very few
peaks are missing.
3) The source spectrum can be correlated to peaks
from the other spectra.
4) The source spectrum contains one correlation
(peak) per residue.
5) The source spectrum is relatively noise free;
there are very few extra peaks.
These criteria should be taken as ideals which can
be used to govern the choice of a source spectrum. They are ordered in order of
decreasing importance.
In the 2D macro above the selection of the source
spectrum was left to the Contrace
function. In the case above the function generally constructs a spectrum from
the Hn,Ha or fingerprint region using peaks from the COSY and TOCSY spectra.
This spectrum is added to the list of spectra and becomes spectrum 5 (the
sequence is treated as if it were a spectrum). The references to "5"
in the following commands all refer to the newly created source spectrum. On
the other hand, the HNCO spectrum (spectrum 1) is specified to the Contrace function as being the source
spectrum. If it had not been specified, a new source would have been
constructed from either the HNCOCA or HNCO spectra, and any missing peaks would
have been filled in by the other spectra.
Each fragment starts off with the peak from the
source spectrum which yields the first 2 (in the case of a 2D source) or 3 (in
the case of a 3D source) assignments. A series of search and filter steps
creates additional buffers (lists of peaks) within a fragment. These buffers
are called working buffers, because they are used to build assignment buffers
which are special buffers named for the resonance assignment that they contain.
One of the chemical shift dimensions of the first peak in the assignment buffer
is the actual frequency assignment for the resonance.
The Contrace
function stops when there is an assignment buffer for each resonance mentioned
in the correlation lists of the input spectra. Generally there is not enough
information in the spectra to correctly assign all the resonances and usually
the assignments of the last assignment buffers are the most uncertain. Spin
systems are usually assigned all the way out to the epsilon position for every
residue in the protein. The fragments can be considered to be "fuzzy"
since they contain alternate assignments, and since no hard-fast endpoint
decisions are made at this point of the analysis.
The next step of the macros is the
"overlap" step. The Overlap
function generates what are known as overlap tests which will be used in the
sequential assignment step to score the likelihood that two fragments are
derived from sequential residues. These overlap tests are generally very
simple. They consist of commands that award points when resonances from overlapping
assignment buffers are within a specific tolerance of one another. Overlapping
assignment buffers are assignment buffers from two different fragments that are
expected to contain the same resonance. For example the "previous Ca
buffer" generated from a peak in the hn(co)ca spectrum should contain the
same Ca resonance as the "Ca buffer" generated from a peak in the
HNCA spectrum from the previous residue in the sequence. When NOESY spectra are
used to score for sequential fragments, working buffers containing NOESY peaks
are used in addition to assignment buffers in making overlap tests.
The next step of the automated assignment macros is
the shuffling step in which the fragments created by the Contrace function are shuffled into the correct sequential order
using the sequence of the protein. In this example the annbf (best first
simulated annealing) algorithm is used to shuffle the peaks. This function uses
the overlap tests generated by Overlap
to place fragments in the correct order, and it uses the chemical shift
database to match fragments to the correct positions in the sequence. The
shuffling routine can also use other tests for matching fragments to sequence
positions. These tests can be written by hand or automatically generated by the
Reside function. In this simple case
we do not illustrate the use of such tests, but they are often very helpful in
identifying the amino acid type of a fragment.
The last step in the automatic assignment process is
to write the output of the program to a file. The function stf (shuffle to
file) writes the contents of all of the buffers that make up the fragments into
the file "output.file". The fragments are written in the sequential
order determined by the shuffling routine and are labeled with the name of the
residue and the sequence position of the corresponding amino acid in the
protein. Alternate orderings and ambiguity factors are indicated. The output
file format will be discussed in more detail in a later section.
The assignment macros shown above are the bare
minimum necessary for automated assignment. The commands shown above are
usually supplemented with other functions that provide additional scaling
information, amino acid type tests, and error checking routines. More complete
macros are distributed with the CONTRAST executable. These macros have been
annotated to document the use of the "extra" functions.
Chemical Shift Database
The CONTRAST chemical shift database is a series of
CONTRAST set shift commands that is read into the program as a CONTRAST macro.
The set shift command allows the user to set the amino acid type, atom
(resonance) type, chemical shift range and probability value for that range.
The format for the command is as follows:
set shift AAname Resonance LoChemShift [-]
HiChemShift [Prob]
AAname The name or abbreviation of the amino acid or
amino acid group for which the
chemical shift information holds. The name should
correspond to the name used in
the sequence.
Resonance The resonance code of the atom to which
the chemical shift information applies.
LoChemShift The lower bound of the chemical shift
range.
HiChemShift The upper bound of the chemical shift
range.
Prob A probability value between 0.0 and 1.0
Set Shift Examples
The following group of set shift commands is an
example of a typical entry for the alpha carbon of alanine residues.
set shift A Ca 48-54
set shift A Ca 48-50 0.1
set shift A Ca 50-52 0.6
set shift A Ca 52-54 0.3
This example highlights several important points. In
the first line the entire range of allowed chemical shifts is given without a
probability value and the next three lines break up that chemical shift range
into smaller subranges that contain probability values for each subrange. This
allows CONTRAST to use the chemical range information in two different ways.
When probability values are given, CONTRAST uses the subranges to automatically
calculate probability-based amino acid type scores during the sequential
assignment step. Both the Contrace
and Reside functions use full ranges
that do not include probability values to perform connectivity tracing and
amino acid test generation respectively. If all set shift commands contain
probability values, then Contrace
will not use chemical shift ranges to trace spin systems and Reside will not generate amino acid tests.
If none of the set shift commands contain probability values then
probability-based amino acid type scoring will not be performed.
The algorithm that generates probability-based amino
acid type scores during sequential assignment can be used with true probability
values for the chemical shift subranges, but its performance is improved
considerably when the probability values are normalized so that the highest
probability value for each resonance is given a value of 1. Using this function
the preceding examples would thus be converted to:
set shift A Ca 48-54
set shift A Ca 48-50 0.167
set shift A Ca 50-52 1.0
set shift A Ca 52-54 0.5
Amino acid names used in the set shift statement
should match the amino acid names used in the input sequence file, but they
need not be limited to standard nomenclature. In order to distinguish a
particular amino acid in the sequence from other like amino acids simply use a
different name. For example two serines in the sequence could be named
"Sx" and "Sy" respectively. In this case the standard information
in the chemical shift database would no longer apply, and the user would have
to include a set of chemical shift ranges for amino acids named "Sx"
and "Sy". NOTE: The three standard names for each of the standard 20
amino acids are interconverted. For example "cysteine",
"cys", and "c" are all considered equivalent. Furthermore
amino acid names are case-insensitive so that "Cysteine",
"cysteine", "CYS", "Cys", "C", and
"c" are all considered equivalent.
Non