
Spectrum Research, LLC.
CONTRAST
Connectivity Tracing
Assignment Tools for Automated Assignment of Protein NMR Data
User Guide
Version 2.0
Copyright
Notice
Copyright © 1996 through 2001 Spectrum Research,
LLC. All rights reserved.
No part of this document may be reproduced,
transmitted, transcribed, stored in a retrieval system, or translated into any
language in any form by any means without the written permission of Spectrum
Research, LLC. Spectrum Research, LLC.
reserves the right to change the information in this document without prior
notice.
Trademarks
Contrast
is a trademark of Spectrum Research, LLC.
Acknowledgments
Contrast
software program was developed by Drs. John Markley and John Olson at the
National Magnetic Resonance Facility located at the University of
Wisconsin-Madison. All rights, title,
and interest in Contrast are owned by
the Wisconsin Alumni Research Foundation ("WARF"). The commercial version of Contrast has been exclusively licensed
to Spectrum Research LLC by WARF.
Credits
If the results (figures and/or data) obtained by Contrast TM application are
used for publication purposes, please refer to them in the following manner or
any other equivalent form:
"ContrastTM software, developed by
Spectrum Research, LLC., was used to compute the results in this
publication."
Chapter 1
CONTRAST is a non-graphical software tool for
automating NMR peak assignment. The program works with
NMR data in the form of ASCII lists of peak coordinates and intensities.. The
program provides the user with several versatile tools for manipulating peak
lists in order to design a custom strategy. The program can itself generate
customizable procedures for automatic assignment of NMR data. It should be
possible to use CONTRAST and the strategies it was designed to employ for
working with any type of multidimensional NMR spectral data set (although not
all combinations of NMR spectra are likely to yield complete assignments).
The CONTRAST program was designed to be an in-house
research tool and not a commercial package. We have successfully applied the
program to many real and synthesized NMR data sets, but we are always careful
to check all results. We provide no warranty or guarantee of its performance.
Use the program at your own risk.
Software Licensing and Installation
2.1 How to Obtain the Program
The CONTRAST executable can be downloaded from the
Spectrum Research website (www.specres.com/download.asp) or a demo CD can be
requested from Spectrum Research.
2.2 Installation
The CONTRAST executable, contrast.exe, needs no
special installation. We recommend that the executable and help files (or
corresponding symbolic links) be placed in the directory that contains the
spectral data to be assigned.
If you have obtained source code for CONTRAST, the
file "contrast.c" contains all of the functions and header
information necessary to compile CONTRAST. The program was written on a Silicon
Graphics Indigo workstation, but since all but a few minor functions are
implemented using ANSI C, the program can be ported easily to other platforms
by changing the system calls that are specific for the Silicon Graphics
platform. To compile the program copy contrast.c to the target directory and
type:
cc -o contrast -g contrast.c -lm
at the operating system prompt. The ASCII text file,
contrast.hlp, is a crude manual for the CONTRAST program. The manual is
designed so that it can be easily searched while running CONTRAST with the
CONTRAST "page" function, which is called by typing
"ctrl-h" at a prompt or "h" at the command line. The
contrast.hlp file should be located in the same directory as the CONTRAST
executable in order to use this feature.
Getting Started
This section introduces loading spectrum files,
searching spectra, displaying the results of a search, writing the results of a
search to a file, and quitting the CONTRAST program. A simple example is given
to illustrate each point, and the use of both the command line interface and
macro files is described. The following CONTRAST commands will be described.
lf cosy.con
scan cosy (d1 <.5> 8.0 && d2 > 4.0)
|results
d
btf |results > search.cosy.con
q
To run CONTRAST simply type the name of the CONTRAST
executable at the system prompt (e.g. contrast.exe). The computer's display
will be cleared, and after several lines of copyright information you will be
asked for the name of the log (starting macro) file that you wish to run. If
you want to run a session macro, then type its file name at the prompt. If your
log file name is "usr.log" (the standard session log file name)
simply type return at the prompt. The text that appears in the angle braces in
a CONTRAST prompt is always the default value for the prompt. If you do not
already have a session macro, type a new file name at the prompt. It is
customary to use the suffix ".log" for session macros and ".mac"
for subroutine or branching macros. After the name of the log file is typed in,
the user is prompted by a '>' symbol for the next command.
The LoadFile
command (abbreviated lf) is used to
load peak list files into CONTRAST. CONTRAST peak list files are typically
created from the name of the experiment with the '.con' suffix appended, but
they can have any name. They must, however, adhere to the format outlined in
Section @@. The LoadFile command can
also be used to load the sequence of the protein, since the formats of the
files are similar. The following line loads the file cosy.con into the program:
> lf cosy.con
The Scan
command (abbreviated sc) is used to search peak lists. It is an extremely
versatile command and will be described in more detail in section @@. In order
to search for peaks in the COSY spectrum read into the program the user could
type a command similar to the following:
> sc cosy (d1 <.5> 8.0 && d2 >
4.0) |results
In this example the COSY peak list is searched for
peaks in which the first dimension of each peak (d1) is within a tolerance of 0.5
units (<.5>) from 8.0 and (&&) the second dimension of each peak
(d2) is greater than (>) 4.0. The results of the search are placed in a
buffer called |results. The units of the tolerances and peak coordinates are
dependent on the units used in the input files. Since the coordinates are
typically expressed in terms of parts per million (PPM), we will assume that
input files use PPM in the rest of the manual.
The display command (abbreviated 'd') is used to
examine the contents of CONTRAST buffers. When a search is performed using the Scan command or one of several other
related commands, the results of the search are placed in a named buffer which
is added to the end of a master list of buffers. The buffers persist until the
user deletes them or quits the program. Associated with each buffer is a number
and the search Boolean that was used to create the buffer. Upon typing 'd' at
the CONTRAST command line, the program enters a crude 'display' mode that has a
unique set of subcommands for changing the way the buffers are displayed. These
subcommands are executed as each character is typed. To exit display mode type
'q' at the display command line prompt. Section @@ gives more information on
the different subcommands available within the display mode.
The buffertofile command (abbreviated 'btf') is used
to write the contents of a particular buffer to a file. In the following
example:
> btf |results >search.cosy.con
the |results buffer is written to the file,
search.cosy.con.
There are two pathways for exiting CONTRAST. The
quit command (abbreviated 'q') can be used to exit CONTRAST from the command
line. If CONTRAST is not at the command line, the program can be exited by
typing Ctrl-C to interrupt the action of the program followed by 'x' at the new
prompt. Typing 'q' at this new prompt causes the program to resume the action
that was interrupted by the Ctrl-C command.
Most of the commands that can be executed at the
CONTRAST command line can also be executed from a CONTRAST macro. For our
purposes a macro is an ASCII file that contains CONTRAST commands. When a macro
is executed, CONTRAST interprets each non-whitespace line as if it were typed
at the CONTRAST command line. Each line is executed serially until a quit
command is reached, until the macro branches to another macro, or until the end
of the file is reached. If the end of the file is reached the program returns
to the CONTRAST command line and waits for user input. All text in a macro
between two consecutive asterisks (**) and the next end-of-line marker is
considered to be a comment and is ignored by the program.
The 5 commands just described can be typed into a
file using a text editor and run as a CONTRAST macro. CONTRAST macros can be
run in many different ways. Macro files can be specified at the UNIX command
line when the program is started using the '<' sign to redirect input into
the program as follows:
CONTRAST <user.macro
Alternately the name of the macro can be specified
at the initial prompt by typing the name of the macro file and hitting enter.
Macros can be launched from within other macros or from the CONTRAST command
line using the execute command (abbreviated exe).
> exe user.macro
In this case control is transferred to user.macro
until the end of the file is reached at which time control will be returned to
the calling macro or initial command line. If the macro is terminated with a
quit command, however, the CONTRAST program will be exited without returning to
the calling procedure. The branch command can be used instead of the exe
command in order to fully transfer control to the called macro.
> branch user.macro
Input File Formats
CONTRAST input files use a free format in which
blank lines are ignored and white space (any number and combination of spaces
and/or tabs) is used to delimit fields. Comments can be inserted anywhere in an
input file by prefacing the comment with double asterisks (**). All text
following the double asterisks (up to the end of the line on which they appear)
is considered to be part of the comment and is effectively ignored by CONTRAST.
Most CONTRAST input files are either a form of a spectrum file or a macro file.
In the next release of CONTRAST the user will be given the option of reading in
spectrum files in a macro format, but an understanding of the spectrum file
format is currently essential to using CONTRAST effectively.
A CONTRAST spectrum file consists of a header
followed by a peak list. The header of a spectrum file should contain
information about the spectrum. Since most of this information is the same for
all instances of a particular type of spectrum, it is usually safer to copy and
modify an existing header from a similar spectrum than to write a header from
scratch. When copying a header from the spectrum file of the same kind of
experiment it is usually only necessary to modify the number of peaks, the
tolerances, and the comments. The fields in a spectrum file must appear in the
given order. Although comments and blank lines can appear anywhere in a
spectrum file it is a good practice to settle upon and stick to a style in
order to maximize readability and to minimize the possibility of making
mistakes. As long as fields appear in the correct order, it does not matter if
they are arranged on a different lines or if they are all placed on the same
line or some combination of the two arrangements. As all combinations have not
been rigorously tested, however, we recommend that a format similar to the one
shown below be used. Bold print is used to show essential information which
must be included in a spectrum file, normal print is used to show optional
information, and italics is used to show those elements of optional fields that
are even more optional. The following is the file format for an n-dimensional
spectrum (with as many as C correlations) that contains i peaks.
4.2 Spectrum File Format
name
n i (qual)
comment = numCom
d1lab d1atm d1tol d1cor1 (prob1) d1cor2 (prob2) d1corC (probC)
d2lab d2atm d2tol d2cor1 (prob1) d2cor2 (prob2) d2corC (probC)
dnlab dnatm dntol dncor1 (prob1)
dncor2 (prob2) dncorC (probC)
** comments
** comments
p1coord1 p1coord2 p1coord3 p1ntens
* p1comment
p2coord1 p2coord2 p2coord3 p2ntens
* p2comment
picoord1 picoord2 picoord3 pintens
* picomment
name The
name of the spectrum. The name of a CONTRAST spectrum file is generally the
spectrum name with the '.con' suffix appended to it.
n The
dimensionality of the spectrum.
i The
number of peaks in the spectrum.
(qual) An
estimation of the quality of the spectrum couched in terms of a probability. A
qual
factor of 1.0 indicates that 100% of the expected peaks will be present in the
spectrum, and that very little noise (false peaks)
are present. A qual factor of 0.9
indicates that 90% of the expected peaks are
present.
comment = Text that indicates that the next field
(numCom) is the number of characters the
program should allocate for the comment associated
with each peak. 'ment =' is
italicized to indicate that only 'com' is needed to
signal that the next field is
numCom.
numCom The
number of characters that the program should allocate for the comment
associated with each peak.
d#lab The
label of the #'th dimension of the peaks in the spectrum.
d#atm The
resonance code (also called atom code) describing all of the atoms of the #'th
dimension of the peaks in the spectrum. Since some
dimensions of a spectrum
often detect several different resonances, wild
cards are frequently used in this
field. A description of resonance codes is found in
section @.@.
d#tol The
default tolerance of the #'th dimension of the peaks in a spectrum. A tolerance
is one-half of the resolution of that dimension.
d#cor## The
resonance code (also called atom code) of the #'th dimension of the ##'th
correlation in the spectrum. Correlations describe
the types of peaks that one
would expect to see in a spectrum. An HNCA spectrum,
for example, contains an
Hni,Nai,Cai correlation (amide proton, amide
nitrogen, alpha carbon) and an
Hni,Nai,Ca- correlation (amide proton, amide
nitrogen, alpha carbon from
previous residue). The last resonance code for a
given dimension will be repeated
if previous or subsequent dimensions contain more
resonance codes. A description
of resonance codes is found in section @.@.
(prob##) The estimated probability
of seeing the previous correlation in the spectrum.
Note that only the last
probability listed in a vertical column will be used to describe the
##'th correlation. Other probabilities are used only
to make the file more readable.
** Comment
markers. Comment markers indicate that the text that follows on that
line is a comment and should be ignored by the
program. Users are encouraged to
use comments to document the origin of the spectrum
files and each modification
that the files undergoe. Most CONTRAST functions
that modify a spectrum or
spectrum file will append a comment to the file that
tells what was done to the file
and the date it was done.
comments Any
text that the user wants to include in the file.
p##coord# The
#'th coordinate (frequency dimension) of the ##'th peak in the spectrum
(usually in ppm units).
p##ntens The
intensity of the ##'th peak in the spectrum.
* A
special peak comment marker that causes the program to read in the comment
and associate it with the peak that the comment
follows. The 'comment =
numCom'
line described above is used to specify the maximum number of
characters that can be stored in each peak comment.
p#comment The
comment associated with the #'th peak of the spectrum.
hnca
3 4 (90)
comment length = 30
H Hni .02 Hni
N Nai .1 Nai ** Don't need to
repeat last resonance code
Ca Ca .1 Cai (90) Ca- (60)
** Created 9/9/99 from hnca.ppm
file.
** Comments can be inserted at any
point in the file after an
** asterisk.
8.61 114.3 180.2 100073 * peak 1
9.12 122.4 178.2 20073 * peak 2
7.43 118.9 134.2 10034.5 * peak 3
8.74 110.3 181.2 67896 * peak 4
4.5 Resonance Codes
Resonance codes are special CONTRAST words that
describe the type of atom that gives rise to an NMR signal. These codes are
sometimes called atom codes since they specify an atom type or group of atom
types. Resonance codes can contain a maximum of 4 characters with each
character describing a different aspect of an atom. If any character
representing a particular aspect is omitted then CONTRAST assumes the most
general case to hold for that aspect. For example the resonance code 'H'
contains only the atom type specifier. This resonance code thus includes all
hydrogen atoms. The resonance code 'Hb' represents all beta protons in the
protein, and the resonance code 'Hi' represents all protons on the current
residue. In this release of CONTRAST all resonance codes make reference to
amino acids in a protein or peptide. At this time there is no way simple way to
refer to nucleic acids or other molecules. A list of the valid resonance code
characters grouped by the different aspects that they describe follows:
Atom Specifiers:
C Carbon atom.
N Nitrogen atom.
H Hydrogen atom.
O Oxygen atom.
P Phosphorous atom.
X Wildcard. Matches any atom type.
Q NULL. Can never match another atom type.
IntraResidue Position Specifiers:
a Alpha. Bonded to or at the alpha position in the
residue.
b Beta. Bonded to or at the beta position in the
residue.
g Gamma. Bonded to or at the gamma position in the
residue.
d Delta. Bonded to or at the delta position in the
residue.
e Epsilon. Bonded to or at the epsilon position in
the residue.
f F. Bonded to or at the F position in the residue.
z Z. Bonded to or at the Z position in the residue.
k Backbone. All backbone atoms in the residue.
s Sidechain. All sidechain atoms in the residue.
r Ring. All ring atoms in the residue.
c Carbon. Bonded to a carbon atom in the residue.
h Hydrogen. Bonded to a hydrogen atom in the
residue.
n Nitrogen. Bonded to a nitrogen atom in the
residue.
o Oxygen. The carbonyl position or bonded to an
oxygen atom in the residue.
x Wildcard. All positions within a residue.
IntraResidue Position Specifiers:
- Within the previous residue.
i Within the current residue.
+ Within the next residue.
* Can be within any residue in the protein (often
from NOE).
Atom number:
0 Matches all other single character atom numbers.
1-9 This single character number is used to
distinguish between atoms at the same
position. For example two beta protons can be
distinguished by referring to one as
Hb2 and the other as Hb3.
Cai Matches alpha carbons within the current
residue.
Hbi2 Matches the second beta proton within the
current residue.
X Matches all atoms in the protein.
X- Matches all atoms in the previous residue.
Co- Matches the carbonyl carbon of the previous
residue.
Nai Matches the amide nitrogen of the current
residue.
Q Does not match any atom in the protein.
Cs+ Matches all carbon atoms in the side chain of
the next residue.
Cxi Matches all carbon atoms in the current residue.
Hxi1 Matches all number 1 protons in the current
residue.
Hxi0 Matches all protons in the current residue.
Hb*1 Matches all number 1 beta protons in the entire
protein.
Hn* Matches all amide protons in the protein.
CONTRAST sequence files follow the same general
format as spectrum files and are read into the program with the same command, LoadFiles (abbreviated lf). Sequence files are one-dimensional
spectrum files in which the name of the spectrum is 'sequence' and the
"peak comments" are amino acid names. The next section shows a
schematic of a sequence file. Bold print is used to show essential information
which must be included in a sequence file, normal print is used to show
optional information, and italics is used to show those elements of optional
fields that are even more optional. The following is the file format for a
sequence file for a protein that contains i amino acids in the sequence.
sequence
1 i
comment = lenAA
lab Q qual
** comments
** comments
1 prob1 * AAname1
2 prob2 * AAname2
i probi * AAnamei
sequence Indicates that the file is a sequence file.
1 The dimensionality of the file. Sequence files can
make use of more dimensions to
associate sequence positions with additional
numerical information.
i
The number of residues in the sequence.
comment = Text that indicates that the next field
(lenAA) is the number of characters the
program should allocate for the amino acid names. 'ment =' is italicized to indicate
that only 'com' is needed to signal that the next
field is 'lenAA'.
lenAA
The maximum number of characters used in residue names.
lab
Label to be used to identify sequence position numbers.
Q
'Q' = NULL place holder.
qual
Quality of sequence determination (usually 1.0).
** Comment
markers. Comment markers indicate that the text that follows on that
line is a comment and should be ignored by the
program. Users are encouraged to
use comments to document the origin of the sequence
files and each modification
that the files undergo. Most CONTRAST functions that
modify a sequence or
spectrum file will append a comment to the file that
tells what was done to the file
and the date it was done.
comments Any
text that the user wants to include in the file.
1,2,,i
Sequence position numbers. If there is ambiguity about the type of residue at a
sequence position, the sequence position number can
be repeated at the end of the
file with alternative residue types. The probability
value for the sequence position
should reflect this ambiguity.
prob#
Probability that the #'th sequence position contains that residue type.
AAname#
Name of the amino acid at the #'th sequence position. The name can be in any
desired format as long as the format matches that
used elsewhere in the program.
One letter abbreviations, three letter
abbreviations, and the entire names of the
standard 20 amino acids are understood and
interconverted by CONTRAST.
The following is the sequence file for a
hexapeptide. The third residue of the sequence is ambiguous and is thought to
be either a glutamate or a glutamine residue.
seq
1 6
# Q 0.9
** Hex1 hexapeptide sequence.
** 9/9/99 by Fred
1 1 * Ala
2 1 * V
3 .6 * Q
4 1 * A
5 1 * Serine
6 1 * t
3 .4 * E
** Note that the id of residue 3
is ambiguous.
Macro files are ASCII files that contain a list of
valid CONTRAST commands. The format for CONTRAST macro files is open and very
simple. The only general requirements are that lines must be less than 1000
characters long, and lines can not contain more than one CONTRAST command. If a
line contains more than one command the second command is generally ignored
without causing a problem, but sometimes the second can interfere with the
first command.
Each command has its own required format, but a few
general rules apply to all CONTRAST commands:
1. Their first non-whitespace character must be the
beginning of the command name. Leading whitespace is ignored.
2. Command names can be typed in as abbreviations,
complete command names, or any partial command name in between (eg. 'q', 'qu',
'quit', and 'quitcontrastnow' will all quit CONTRAST).
3. Command names are case independent. (eg. 'q' and
'Q' will quit CONTRAST).
4. A command's fields are all delimited by
whitespace (tabs and spaces).
5. The '->' marker can be used at the end of a
line to indicate that the command is continued on the next line.
6. The '**' marker (comment marker) will cause the
program to ignore the rest of the line.
7. All variables (marked by the '&' prefix)
contained in a command are replaced by the values or text strings that they
contain before the command is interpreted. Thus variables can be substituted
for command names and/or command fields.
Checking Input Files
CONTRAST input files should all be carefully checked
before beginning a CONTRAST run. If the input spectra are not referenced
correctly or if the peaks in the input spectra do not "line up", then
this problem must be dealt with before proceeding with making assignments. The
following macro provides a simple way to check the alignment of input spectra.
**Macro template for checking the
alignment of i input spectra.
**NOTE: Make sure tolerances are
conservative (large).
lf spec1.con ** Load input
spectrum 1.
lf spec2.con ** Load input
spectrum 2.
lf speci.con ** Load input
spectrum i.
contrace 1, >contrace.mac **
Automatically build spin systems.
dtf >display.out ** Save
internal buffers to file.
q ** Quit.
The Contrace
function automatically finds the best way to correlate the input spectra. In
this example it uses the first input spectrum as the starting point for
searches. (The command "contrace 2, >contrace.mac" specifies that
the second input spectrum be used as the starting point for searches.) The spectrum
specified to be the starting point is called the source spectrum, and for the
purposes of checking spectral correlation, the source spectrum should be
spectrum with the most reliable referencing that overlaps the most with the
other spectra. If you are unsure of which spectrum to designate as the source
spectrum, don't specify a source (contrace >contrace.mac) and Contrace will determine a good source
spectrum for you. The Contrace
function and the macro it generates will be described in more detail in the
next two sections.
The file ('display.out') created by running a macro
similar to that shown above can be examined to determine if there are any
problems with the input spectra. A simplified example of 'display.out' contents
is shown below:
hnco Hn_N_hnca Hn_N_hncoca
Hn_N_tocsy ... hnco Hn_N_hnca ...
----- --------- -----------
---------- ----- ---------
peak1 peak18 peak100 ... peak2
peak149 ...
peak34 peak23 ... ...
peak190 ... ...
The buffers in the file are organized into repeating
groups (fragments) based on the peaks of the source spectrum which in this case
is hnco. Each fragment starts with the source buffer and ends right before the
next source buffer. The buffers following the source buffer are named with
prefixes (that represent the resonances that were used to search the spectra)
that preceded the name of the spectrum that was searched. The peaks found in
each buffer are all the peaks that matched the given resonances within a
specified tolerance. It is not unusual for several peaks to be missing in a
spectrum and thus for several buffers to be empty, but if very few of a
spectrum's buffers contain peaks that correlate well to the peak in the source
buffer, then there is a problem. Either the tolerances used are too small or
there is a problem with the spectrum. Often times problems arise from using the
wrong magnitude or sign for the sweep width when referencing. If this is the
case the resonances near the center of that dimension's spectrum will often
match but the resonance frequencies towards the edges of the dimension will be
off by a considerable amount.
After major referencing problems have been
corrected, attention should be given to choosing the best tolerances possible.
Ideal tolerances are as small as possible, but not so small that legitimate
correlations fall outside the tolerance range. It is helpful to subtract the
correlated
resonances for a large number of fragments in order
to get a good feel for what tolerances should be used in the spectrum files.
The sum of the tolerances for the two spectra under consideration should be larger
than most of the differences. If the average difference is not close to zero,
then this could indicate another referencing problem. Referencing problems can
be corrected using the operate function (section @.@) or the set function
(section @.@), but it is not wise to use spectra to calculate assignments if
there is an unknown problem with the referencing. There are also several
commands in CONTRAST that calculate reference offsets automatically the most
reliable being the align function (section @.@). Until you are familiar with
working with peak lists, however, we recommend that you use the macro described
above.
Arithmetic Expressions and Booleans
Arithmetic expressions and Booleans must be able to
access many different fields within the major data structures of the CONTRAST
program. The sometimes combinatorial and sometimes synchronous nature of
assignment algorithms adds to the complexity of the syntax of these
expressions. This section first describes the system used for accessing
CONTRAST's variables and data structures; next it describes CONTRAST arithmetic
expressions; and finally it describes CONTRAST Boolean expressions.
CONTRAST accesses three kinds of data which we will
refer to as lists: spectra, buffers, and files. Spectra and buffers can be
thought of as lists of peaks while files are lists of the lines of text that
make up the file.
6.1.1 Spectrum Data Structures
A spectrum is a CONTRAST spectrum file that has been
read into memory by the program. It consists of the header information, peak
list, and any other information that becomes associated with the spectrum
during the course of the CONTRAST session. Outside of arithmetic expressions
and Booleans, spectra can be specified by name or by the cardinal number that
corresponds to their position in the sequence of spectra read into CONTRAST.
Within arithmetic expressions or Booleans, however, the name or number of the
spectrum must be preceded by the spectrum symbol '$'. Examples are:
1 The first spectrum loaded.
$2 The second spectrum loaded.
cosy The spectrum named cosy.
hnca The spectrum named hnca.
Different fields within a spectrum are referred to
by single character abbreviations preceding the spectrum symbol ('$'). If there
are several fields of the same type (eg. dimensions in a spectrum) then a digit
is appended to the abbreviation. The following is a partial list of the
spectral fields that can be accessed using this method.
6.1.2 Fields of a
Spectrum
di The coordinate of dimension i (where i = 1 to the
number of dimensions)
i The intensity of a peak. (Note: d0 = i)
c The comment associated with a peak.
C The numeric value of the comment associated with a
peak.
N Variable associated with the spectrum.
X Variable associated with the spectrum.
l The level (a variable) of the spectrum.
m The number of dimensions of a spectrum.
k The number of buffers associated with each peak.
w Current printed column width.
ti The tolerance for dimension i.
# The number of peaks.
The following examples show how different fields of
a COSY spectrum (the third spectrum read into the CONTRAST program) are
specified.
Examples
d1$cosy The frequency of the first dimension of a
peak.
c$3 The comment associated with a peak.
l$cosy The level of the COSY spectrum
6.1.3
Buffer Data Structures
Buffers are internal working lists which contain
peaks and any information associated with those peaks. Peaks are generally
added to buffers by performing searches of spectra or other buffers. Multiple
buffers are stored in the program in a linear list. Buffers can be added to and
deleted from the program's linear list of buffers just as peaks can be added
and deleted from individual buffers. Peaks from multiple spectra can be added
to a single buffer. The command line designation of a buffer is its name or its
position number in the list of buffers preceded by the '|' symbol (eg.
|hncoBuff or |1). Buffer names should be alphanumeric although the # and @ can
be used in special cases. Buffer names beginning with "|@" (e.g. |@hnca) must refer to buffers that
are not linked to a particular peak in a source spectrum. Each peak in a buffer
can have associated with it, in addition to all of the original information
associated with it in the spectrum, the following fields (pieces of
information).
# Number. The number of peaks in the buffer.
v Value. The first coordinate that wasn't matched in
the search.
t Tolerance. The tolerance of that value's
dimension.
n N. Integer variable.
x X. Real variable.
r Repeats. The number of different instances of that
value in the buffer within that value's tolerance.
c Comment. The text comment associated with the
peak.
C Comment number. The numeric value of the comment
associated with the peak.
di Dimension i. The frequency of dimension i.
D Deviation. Score between numDims*0.2 and numDims*1.2
that rates how close the peak is to the target(s), where numDims*1.2 is the
value of the best deviation (closest match) and numDims*0.2 is the worst
deviation value (on the edge of the tolerance ranges).
s Score. Used by several routines to determine the
rank of the peaks.
l Level. General purpose progress and scoring
variable for the peak.
w wLevel. General purpose progress and scoring
variable for the whole buffer.
ASCII files can be accessed directly by the CONTRAST
program. File names are specified with the '>' prefix (eg.
>filename.txt). Fields in a file are considered are delineated by white
space (spaces and tabs). Each field in a line is considered a dimension of that
line and uses the same 'di' convention used by spectra and buffers. For example
d3>filename.txt = "See" for the line, "See Spot. See Spot
run." CONTRAST uses the same conventions for specifying a line or range of
lines in a file as it does the peaks in a spectrum or buffer.
Peaks or lines are specified by suffixes added to
the field and list descriptors after a comma. Either a single peak (line) or a
range of peaks (lines) can be referenced. If no peak or line is specified then
the entire range is assumed. Boolean expressions will go through every peak or
line in a range and evaluate the value of the expression automatically. The
following is a list of peak specifiers.
6.2.1 Peak
Specifiers
,i The i'th peak or line in a list.
,H The peak or line with the highest specified field
value.
,L The peak or line with the lowest specified field
value.
,b The first peak or line in a list.
,fi The first i peaks or lines in a list.
,e The last peak or line in a list.
,li The last i peaks or lines in a list.
,i-j The i'th peak or line through the j'th peak or
line in a list.
,i+ The i'th peak or line through the last peak or
line in the list.
6.2.2 Examples
i|fred,f4H the highest intensity of the first 4
peaks in buffer fred.
i$fred,f4 the intensity of the first four peaks in
spectrum fred.
i|fred,l4 the intensity of the last four peaks in
fred
d1>fred,4+ the first field of the fourth through
the last lines in file fred.
i|fred,2-5 the intensity of the second through fifth
(inclusive) peaks
v|fred,H the value of the highest valued peak in
buffer fred
s|fred,L the lowest grade in buffer fred.
C|fred,e the numeric part of the comment from the
last peak in buffer fred.
c|fred,1 the comment text string from the first peak
in buffer fred.
d|fred,b the deviation of the first peak in buffer
fred.
#$fred the number of peaks in spectrum fred.
w$fred the column width of the spectrum fred.
CONTRAST arithmetic expressions are straightforward.
They can appear in most CONTRAST expressions in which a variable or parameter
is set to a discrete value. In Boolean expressions they can operate on sets and
ranges of values as long as there is only one variable or less in each term of
the Boolean. If a range is specified for a simple arithmetic expression, the
function always uses the highest value in the range for the calculation.
CONTRAST arithmetic expressions use a standard order of mathematical operations
but the order can be controlled by use of parenthesis. Nesting of parenthesis
is permitted. Use of white space within an arithmetic expression is optional
except for a few situations -- namely that the '+' and '-' operations should be
preceded by white space if they follow immediately after a list expression. A
list of arithmetic and text string operators follows. The accompanying examples
assume the following: #|hnca = 2, d1|cosy,1 = 8.5, and c$hnca,1 =
"His23Ca2". Boolean operators will be discussed in the next section.
6.3.1
Arithmetic Operators
+ Addition 4 + #|hnca = 2
- Subtraction d1|cosy,1 - 2 = 6.5
/ Division 10/4 = 2.5
* Multiplication #|hnca*d1|cosy,1 = 17
^ To the power of 4 ^ 3 = 64
% Modulus 5 % #|hnca = .5
sin Sine (in degrees) sin(90) = 1
cos Cosine (in degrees) cos(90) = 0
tan Tangent (in degrees) tan(180) = 0
log Logarithm base ten log(1) = 0
ln Natural logarithm ln(d1|cosy,1) = 2.14
6.3.2
Text Operators
vali(text)
The ith numeric part of text.
val2("fr2ed4.1") = 4.1
+ Union "fred" + "ted" =
"fredted"
^ Intersection "fred" ^ "ted" =
"ed"
- Delete Intersection "fred" -
"ted" = "fr"
* Number of Intersections "freded" *
"ed" = 2
/ Remove Characters "fred" /
"det" = "fr"
% Remove all but characters "fred" %
"det" = "ed"
6.3.3 Example Arithmetic
Expressions
(#|hnca*(d1|cosy,1 + .5))+2 = 20
val2(c|hnca,1) * 10 = 20
C|hnca,1 - 3 = 20
10 * (c|hnca,1 * "2") = 20 His23Ca2
val1(c|hnca,1 - "2") = 3
cos( val1(c|hnca,1/"ABC")-52) = -1
6.4 Boolean Expressions
Booleans are expressions that reduce to 1 (meaning
true) or 0 (meaning false). Many different CONTRAST functions use Boolean
expressions to determine whether or not the function will be executed for a
particular value, peak, or line. CONTRAST uses a versatile Boolean format that
allows sets, ranges, "boxes", and variables to be coded into an
expression so that one expression can be evaluated for many different
arrangements of data.
Boolean expressions are always marked by enclosure
in parenthesis (). If a command contains both a Boolean expression and a
separate mathematical expression that uses parenthesis, the Boolean expression
must be listed first. In the following example the Boolean is
"(d1|hnca>3)".
set level |hnca (d1|hnca>3) += (47 / i|hnca)
The Boolean in the preceding example is
straightforward. The level of each peak in the hnca buffer whose d1 value is
greater than 3 is incremented by 47 divided by the intensity of that peak.
Since no specific peak in the hnca buffer is specified, the Boolean is
evaluated for each peak in the buffer. The levels of only those peaks for which
the Boolean evaluates to 'true' are incremented.
CONTRAST Booleans can combine an unlimited number of
expressions by using the conjunctions '||' (or) and '&&' (and). For
instance the following command uses a
Boolean composed of three parts.
set level |hnca ( l|hnca = 2 || (d1|hnca>3
&& d2|hnca <= 9) ) += (47 / i|hnca)
In this Boolean the level of an HNCA peak will be
incremented if the peak's level is currently equal to 2 or ('||') if the d1
value of the peak is greater than 3 and ('&&') the d2 value of the peak
is less than or equal to 9. Note that expressions must be combined with
conjunctions. Expressions such as " x > y > z " are not
permitted in CONTRAST. Note also that some CONTRAST functions have not yet been
implemented with "short-circuit logic". Short circuit logic allows
the program to skip evaluating the rest of a Boolean when the expression is
guaranteed to evaluate to true or false. In the above example if the level of
an HNCA peak is equal to 2, then the full Boolean is guaranteed to evaluate to
true so the program does not need to continue by testing the d1 and d2 values
of the peak. Since several functions including the set function do not use short-circuit logic, we recommend that the
user avoid writing Booleans that rely on this feature.
CONTRAST Booleans often compare values from different
lists. These comparisons can be made synchronously or combinatorily. The
preceding example used a synchronous mechanism for making comparisons. It was
understood that each time the hnca buffer was referenced in the Boolean, that
it referred to the same peak. The following Boolean also uses a synchronous
mechanism, but this time it is not so obvious.
set level |fred (d1|fred,f5 > 3 &&
d1|tom,f5 <= 8) += 2
In this example when the first peak of buffer fred
is being compared to 3, the first peak of buffer tom is being compared to 8,
then the second peaks in each buffer are compared, the third, and so on. The
above expression is equivalent to the following 5 expressions.
set level |fred (d1|fred,1 > 3 &&
d1|tom,1 <= 8) += 2
set level |fred (d1|fred,2 > 3 &&
d1|tom,2 <= 8) += 2
set level |fred (d1|fred,3 > 3 &&
d1|tom,3 <= 8) += 2
set level |fred (d1|fred,4 > 3 && d1|tom,4
<= 8) += 2
set level |fred (d1|fred,5 > 3 &&
d1|tom,5 <= 8) += 2
Synchronous expressions are signaled by using double
conjunctions or operators. If a single '&' symbol had been used, a
combinatorial comparison would have been performed. The following is an example
of the use of a combinatorial conjunction.
set level |fred (d1|fred,f2 > 0 & d1|tom,f3
<= 8) += 10
In this example each of the first 2 peaks in fred is
compared to zero once for each of the first three peaks in tom. In this case
the level of one of the peaks in fred can be incremented by as much as 30 (3 *
10). This expression is equivalent to the following 6 commands.
set level |fred (d1|fred,1 > 0 & d1|tom,1
<= 8) += 10
set level |fred (d1|fred,1 > 0 & d1|tom,2
<= 8) += 10
set level |fred (d1|fred,1 > 0 & d1|tom,3
<= 8) += 10
set level |fred (d1|fred,2 > 0 & d1|tom,1
<= 8) += 10
set level |fred (d1|fred,2 > 0 & d1|tom,2
<= 8) += 10
set level |fred (d1|fred,2 > 0 & d1|tom,3
<= 8) += 10
All fields in a Boolean from the same list are
automatically synchronized even if combinatorial operators are used. The
following is an example of a case in which fields that are synchronized even
though a combinatorial conjunction ('&&') is specified.
set level |fred (d1|fred,f2 > 0 & d2|fred,f2
<= d1|tom,f3) += 10
This expression is equivalent to the following 6
commands.
set level |fred (d1|fred,1 > 0 & d2|fred,1
<= d1|tom,1) += 10
set level |fred (d1|fred,1 > 0 & d2|fred,1
<= d1|tom,2) += 10
set level |fred (d1|fred,1 > 0 & d2|fred,1
<= d1|tom,3) += 10
set level |fred (d1|fred,2 > 0 & d2|fred,2
<= d1|tom,1) += 10
set level |fred (d1|fred,2 > 0 & d2|fred,2
<= d1|tom,2) += 10
set level |fred (d1|fred,2 > 0 & d2|fred,2
<= d1|tom,3) += 10
Note that the two fields of |fred are synchronized,
but that the |fred and |tom lists are compared combinatorily. In order to
synchronize d2|fred,f2 and d1|tom,f3 we must use the "synchronous
less than or equal to" operator
("<<=" or "<<=="). Doubling Boolean operator
symbols makes the two operands of the operator synchronous just as doubling
Boolean conjunctions makes the left and right hand sides of the expressions
synchronous. In the following example a synchronous operator is used to
synchronize d2|fred,f2 and d1|tom,f3 in the expressions above.
set level |fred (d1|fred,f2 > 0 & d2|fred,f2
<<= d1|tom,f3) += 10
This expression is equivalent to the following two
expressions.
set level |fred (d1|fred,1 > 0 & d2|fred,1
<= d1|tom,1) += 10
set level |fred (d1|fred,2 > 0 & d2|fred,2
<= d1|tom,2) += 10
Note that the third peak in |tom is never used since
the first two peaks in |fred were specified and |tom was synchronized to |fred.
The default synchronization behavior of fields in a
Boolean can be over-ridden by appending "i" suffixes to the field
descriptions. The following is an example of the use of such suffixes.
set lev |fred (d1|fred,f2 > d1|tom,f2 &&
d2|fred,f2i2 = d2|tom,f3i1 && i|fred,f2i > 0) += 1
The following expressions are equivalent to the
expression above.
set lev |fred (d1|fred,1 > d1|tom,1 &&
d2|fred,1 = d2|tom,1 && i|fred,1 > 0) += 1
set lev |fred (d1|fred,1 > d1|tom,1 &&
d2|fred,1 = d2|tom,1 && i|fred,2 > 0) += 1
set lev |fred (d1|fred,1 > d1|tom,2 &&
d2|fred,2 = d2|tom,1 && i|fred,1 > 0) += 1
set lev |fred (d1|fred,1 > d1|tom,2 &&
d2|fred,2 = d2|tom,1 && i|fred,2 > 0) += 1
set lev |fred (d1|fred,2 > d1|tom,1 &&
d2|fred,1 = d2|tom,2 && i|fred,1 > 0) += 1
set lev |fred (d1|fred,2 > d1|tom,1 &&
d2|fred,1 = d2|tom,2 && i|fred,2 > 0) += 1
set lev |fred (d1|fred,2 > d1|tom,2 &&
d2|fred,2 = d2|tom,2 && i|fred,1 > 0) += 1
set lev |fred (d1|fred,2 > d1|tom,2 &&
d2|fred,2 = d2|tom,2 && i|fred,2 > 0) += 1
If all of the terms that contain field descriptions
in a Boolean are numbered from n = 1
to N, then the number n is used after
an 'i' suffix to specify the field description that the expression is
synchronized to. If no n value is
specified after an 'i' suffix, then the containing expression is made
independent (a combinatorial operation). In the above example the third field
"d2|fred,f2i2" is synchronized to the second field
"d1|tom,f3" and the fourth field "d2|tom,f3i1" is
synchronized to the first field "d1|fred,f2". The last field is
independent. If the 'i' suffix had not been added to the field description,
then the last field would have been synchronized to the first field since they
make reference to the same list.
6.4.1 Boolean operators and conjunctions
> combinatorial "greater than"
>= combinatorial "greater than or equal
to"
< combinatorial "less than"
<= combinatorial "less than or equal
to"
= combinatorial "equals"
!= combinatorial "not equal"
<> combinatorial "within a tolerance of
"
>< combinatorial "outside a tolerance of
"
& combinatorial "and"
| combinatorial "or"
>> synchronous "greater than"
>>= synchronous "greater than or equal
to"
<< synchronous "less than"
<<= synchronous "less than or equal
to"
== synchronous "equals"
!!= synchronous "not equal"
<<>> synchronous "within a
tolerance of "
>><< synchronous "outside a
tolerance of "
&& synchronous "and"
|| synchronous "or"
Tolerance operators contain a tolerance values
embedded in the operator. This value can take the form of a constant, a
variable, a field, a range, a set, or a box just like normal Boolean operands.
(Sets and boxes will be described in a subsequent section.) If a field
description is used as a tolerance, it is good practice to specify synchrony
directly using the 'i' suffix unless the field makes reference to a list
referenced elsewhere in the Boolean. The following is an example of an
expression that uses tolerances.
set lev |fred (d1|fred,f2 <.02> d1|tom,f3
&& d2|fred >t|Hai,1< d2|tom) += 1
Boolean expressions can contain mathematical expressions
as well as field descriptions and constants. The only limitation is that no
term in the expression can contain more than 1 range, set, or box. The
following is an example of a Boolean expression in which arithmetic expressions
occur.
set lev |fred (cos(d1|fred,f2*2)+8 <.02> 8.2
&& val2(C|fred)+6 >t|Hai,1/2< d2|tom) += 1
If the Boolean of a command is preceded by a NOT
symbol '!', then the set of peaks or lines for which the Boolean does not
evaluate to true is operated on by the command. In this special case the NOT
symbol '!' performs a complementarity operation rather than the negation
operation that it typically performs. For example in the command
set level |hnca !(d1|hnca>3) += 10
the level of each peak in |hnca that has a d1 value
less than or equal to 3 is incremented by 10.
An Adaptable Fully Automated Assignment Macro
This section contains an overview of the simplest
and most automated assignment procedure available in CONTRAST. The procedure is
implemented as a simple 6 part macro that can be used for most data sets with
minimal modification. The performance of the algorithm is highly dependent on
the type and quality of the data. The program always makes all possible
assignments given the input data set, even when the data is insufficient to
make an assignment. Therefore the output produced by the procedure should
always be carefully checked and the evidence for every assignment should be
examined and evaluated.
Figure 7.1 is an information flow diagram of the main
steps in the fully-automated assignment procedure. The main body of the
assignment program consists of three functions which generate CONTRAST macros
for the user (Contrace, Reside, and Overlap) and a single function (AnnBF) that generates sequential
assignments based on the output of the previous three functions. Arrows in the
diagram represent the flow of information from one function to another.
Figure 7.1
The fully-automated approach to assignments is
illustrated using sample macros written for two very different data sets. The
first macro is written for a 2D homonuclear data set consisting of three
experiments: COSY, TOCSY, and NOESY.
lf cosy.con
lf tocsy.con
lf noesy.con
lf seq.con
exe shifts.mac
contrace >contrace.mac -n -F
overlap 5 >overlap.mac
annbf 5, -l -x3
stf 5 >output.file
The next macro is written for a 3D heteronuclear
data set consisting of 9 experiments:
HNCO, HNCA, HN(CO)CA, HN(CO)CACB, HNCACB, HCACO,
HN-TOCSY-HMQC, HCCH-COSY, and HCCH-TOCSY.
lf hnco.con
lf hnca.con
lf hncoca.con
lf hncocacb.con
lf hncacb.con
lf hcaco.con
lf hntocsy.con
lf hcchcosy.con
lf hcchtocsy.con
lf seq.con
exe shifts.mac
contrace 1, >contrace.mac -n -F
overlap 1 >overlap.mac
annbf 1, -l -x3
stf 1 >output.file
A comparison of the two macros shows that the main
difference between them is the input data. The first step in both macros is to
load the data into the program. The first three lines in the 2D macro and the
first 9 lines in the 3D macro simply read the peak lists into the program, and
the next line reads in the protein sequence. This step has already been
described in Section @4.
In the next step a macro is executed which contains
a database of the characteristic chemical shifts of the common amino acids.
This database is experiment independent and should contain as much information
as possible about the distribution of chemical shifts. The chemical shift
database is described in Section 8.
The next step is the heart of the CONTRAST automated
assignment procedure. The Contrace
command generates a strategy for assembling spin systems using data from the
input spectra and the chemical shift database. The strategy generated by the Contrace routine is output as a CONTRAST
macro (in the cases above named contrace.mac). The function implements the
strategy as it is being generated. The result of the function is a list of
buffers that contain the modified results of searches and other manipulations
of the data. These buffers are grouped into fragments that roughly correspond
to amino acid spin systems.
The starting point for each fragment is a peak from
a "source" spectrum. There is a one to one correspondence between the
peaks of the source spectrum and fragments. The ideal source spectrum meets all
of the following criteria:
1) The source spectrum is of high resolution and is
well-referenced.
2) The source spectrum is very complete -- very few
peaks are missing.
3) The source spectrum can be correlated to peaks
from the other spectra.
4) The source spectrum contains one correlation
(peak) per residue.
5) The source spectrum is relatively noise free;
there are very few extra peaks.
These criteria should be taken as ideals which can
be used to govern the choice of a source spectrum. They are ordered in order of
decreasing importance.
In the 2D macro above the selection of the source
spectrum was left to the Contrace
function. In the case above the function generally constructs a spectrum from
the Hn,Ha or fingerprint region using peaks from the COSY and TOCSY spectra.
This spectrum is added to the list of spectra and becomes spectrum 5 (the
sequence is treated as if it were a spectrum). The references to "5"
in the following commands all refer to the newly created source spectrum. On
the other hand, the HNCO spectrum (spectrum 1) is specified to the Contrace function as being the source
spectrum. If it had not been specified, a new source would have been
constructed from either the HNCOCA or HNCO spectra, and any missing peaks would
have been filled in by the other spectra.
Each fragment starts off with the peak from the
source spectrum which yields the first 2 (in the case of a 2D source) or 3 (in
the case of a 3D source) assignments. A series of search and filter steps
creates additional buffers (lists of peaks) within a fragment. These buffers
are called working buffers, because they are used to build assignment buffers
which are special buffers named for the resonance assignment that they contain.
One of the chemical shift dimensions of the first peak in the assignment buffer
is the actual frequency assignment for the resonance.
The Contrace
function stops when there is an assignment buffer for each resonance mentioned
in the correlation lists of the input spectra. Generally there is not enough
information in the spectra to correctly assign all the resonances and usually
the assignments of the last assignment buffers are the most uncertain. Spin
systems are usually assigned all the way out to the epsilon position for every
residue in the protein. The fragments can be considered to be "fuzzy"
since they contain alternate assignments, and since no hard-fast endpoint
decisions are made at this point of the analysis.
The next step of the macros is the
"overlap" step. The Overlap
function generates what are known as overlap tests which will be used in the
sequential assignment step to score the likelihood that two fragments are
derived from sequential residues. These overlap tests are generally very
simple. They consist of commands that award points when resonances from overlapping
assignment buffers are within a specific tolerance of one another. Overlapping
assignment buffers are assignment buffers from two different fragments that are
expected to contain the same resonance. For example the "previous Ca
buffer" generated from a peak in the hn(co)ca spectrum should contain the
same Ca resonance as the "Ca buffer" generated from a peak in the
HNCA spectrum from the previous residue in the sequence. When NOESY spectra are
used to score for sequential fragments, working buffers containing NOESY peaks
are used in addition to assignment buffers in making overlap tests.
The next step of the automated assignment macros is
the shuffling step in which the fragments created by the Contrace function are shuffled into the correct sequential order
using the sequence of the protein. In this example the annbf (best first
simulated annealing) algorithm is used to shuffle the peaks. This function uses
the overlap tests generated by Overlap
to place fragments in the correct order, and it uses the chemical shift
database to match fragments to the correct positions in the sequence. The
shuffling routine can also use other tests for matching fragments to sequence
positions. These tests can be written by hand or automatically generated by the
Reside function. In this simple case
we do not illustrate the use of such tests, but they are often very helpful in
identifying the amino acid type of a fragment.
The last step in the automatic assignment process is
to write the output of the program to a file. The function stf (shuffle to
file) writes the contents of all of the buffers that make up the fragments into
the file "output.file". The fragments are written in the sequential
order determined by the shuffling routine and are labeled with the name of the
residue and the sequence position of the corresponding amino acid in the
protein. Alternate orderings and ambiguity factors are indicated. The output
file format will be discussed in more detail in a later section.
The assignment macros shown above are the bare
minimum necessary for automated assignment. The commands shown above are
usually supplemented with other functions that provide additional scaling
information, amino acid type tests, and error checking routines. More complete
macros are distributed with the CONTRAST executable. These macros have been
annotated to document the use of the "extra" functions.
Chemical Shift Database
The CONTRAST chemical shift database is a series of
CONTRAST set shift commands that is read into the program as a CONTRAST macro.
The set shift command allows the user to set the amino acid type, atom
(resonance) type, chemical shift range and probability value for that range.
The format for the command is as follows:
set shift AAname Resonance LoChemShift [-]
HiChemShift [Prob]
AAname The name or abbreviation of the amino acid or
amino acid group for which the
chemical shift information holds. The name should
correspond to the name used in
the sequence.
Resonance The resonance code of the atom to which
the chemical shift information applies.
LoChemShift The lower bound of the chemical shift
range.
HiChemShift The upper bound of the chemical shift
range.
Prob A probability value between 0.0 and 1.0
Set Shift Examples
The following group of set shift commands is an
example of a typical entry for the alpha carbon of alanine residues.
set shift A Ca 48-54
set shift A Ca 48-50 0.1
set shift A Ca 50-52 0.6
set shift A Ca 52-54 0.3
This example highlights several important points. In
the first line the entire range of allowed chemical shifts is given without a
probability value and the next three lines break up that chemical shift range
into smaller subranges that contain probability values for each subrange. This
allows CONTRAST to use the chemical range information in two different ways.
When probability values are given, CONTRAST uses the subranges to automatically
calculate probability-based amino acid type scores during the sequential
assignment step. Both the Contrace
and Reside functions use full ranges
that do not include probability values to perform connectivity tracing and
amino acid test generation respectively. If all set shift commands contain
probability values, then Contrace
will not use chemical shift ranges to trace spin systems and Reside will not generate amino acid tests.
If none of the set shift commands contain probability values then
probability-based amino acid type scoring will not be performed.
The algorithm that generates probability-based amino
acid type scores during sequential assignment can be used with true probability
values for the chemical shift subranges, but its performance is improved
considerably when the probability values are normalized so that the highest
probability value for each resonance is given a value of 1. Using this function
the preceding examples would thus be converted to:
set shift A Ca 48-54
set shift A Ca 48-50 0.167
set shift A Ca 50-52 1.0
set shift A Ca 52-54 0.5
Amino acid names used in the set shift statement
should match the amino acid names used in the input sequence file, but they
need not be limited to standard nomenclature. In order to distinguish a
particular amino acid in the sequence from other like amino acids simply use a
different name. For example two serines in the sequence could be named
"Sx" and "Sy" respectively. In this case the standard information
in the chemical shift database would no longer apply, and the user would have
to include a set of chemical shift ranges for amino acids named "Sx"
and "Sy". NOTE: The three standard names for each of the standard 20
amino acids are interconverted. For example "cysteine",
"cys", and "c" are all considered equivalent. Furthermore
amino acid names are case-insensitive so that "Cysteine",
"cysteine", "CYS", "Cys", "C", and
"c" are all considered equivalent.
Non-existent Chemical Shift Ranges
Amino acid resonances for which no chemical shift
information is given are ignored by the CONTRAST program in subsequent steps
which require chemical shift information. Thus Reside does not include these resonances in the amino acid tests
that it generates, and Contrace does
not use the chemical shift ranges for these resonances in tracing spin systems.
In general it a good practice to comment out set shift commands for resonances
that are not very likely to be assigned correctly by the program. The ability
of the program to assign a resonance correctly is dependent on the amount of
experimental data available to the program and on the number of resonances
which must be correctly assigned before the resonance in question can be
assigned. In general the more intervening bonds between the starting resonances
(from the source spectrum) and the resonance in question, the more uncertain
will be the assignment of that resonance and the more important it is for the
set shift commands for that resonance to be commented out.
NULL Chemical Shift Ranges
A chemical shift range can be explicitly set to NULL
as in the following examples:
set shift G Cbi2 NULL
set shift G Cbi2 Q
This is not the same as simply not including or
commenting out a set shift command for that particular resonance. NULL chemical
shift ranges tell the program that there should NOT be a resonance of that
type. The lines above tell the program that glycines do not contain beta
carbons. This information is used by CONTRAST to penalize amino acid type
assignments if the spin system being assigned contains a resonance that the set
shift command indicates should be NULL.
Contrace
9.1 Contrace Overview
The Contrace
function generates and implements a strategy for tracing spin systems. The
function creates a set of buffers that contains the non-specific assignments
for resonance types in each spin system, and it creates a set of CONTRAST
macros that produces identical results when executed. The macros created by Contrace can be modified to fine tune
the Contrace strategy. The following
sections break the operation of the Contrace
function into its components and describe the types of commands that the
strategy uses in order to create the connectivity tracing macro. More
information about the functions that Contrace
uses can be found in Appendix @.
9.2 Contrace Options
The format for the Contrace command is as follows.
contrace [source] [>filename] [-n,r,h,m]
[-F,D,N,P,C] [-f] [-g] [-a] [-x] [-devi,devo] [fuzz]
source The name or number of the spectrum to be
taken as the source spectrum. If no source spectrum is specified, then the Contrace program will create a new
source spectrum using the input spectra.
filename The file name of the CONTRAST macro that
will be generated by Contrace. If no
file name is specified, then the default name contrace.mac will be used.
fuzz The fuzziness factor expressed as a percentage.
This value affects the severity of the chemical shift range and other types of
filtering used in the Contrace
strategy. The default value is 0 (no fuzziness).
-n,r,h,m These flags determine the method used to
calculate which resonances should be assigned first. Only one flag in this
group may be included.
-n Probability calculations in which noise is not
considered. (default)
-r Rigorous probability calculations. (very slow)
-h Heuristic calculations.
-m Automatically chooses between rigorous and
"no noise" calculations.
-F,D,N,P,C These flags determine the method of
fragment filtering used in the calculations. Fragment filtering is a method of
eliminating an assigned resonance from further consideration so that assigned
resonance values are not reassigned in subsequent positions in the spin system.
Only one flag in this group may be included.
-F Always do fragment filter to eliminate resonance
from consideration.
-D Fragment filter only when determined necessary
(default).
-N No fragment filtering.
-P Percent fragment filtering to reduce likelihood
of value reassignment.
-C Constant fragment filtering.
-f Fill the source spectrum using peaks from an
overlapping spectrum. Source is not filled by default.
-g Glycine filter source spectrum. If the source spectrum
is the fingerprint region of a spectrum, it may contain two peaks for each
glycine. This flag eliminates one of those peaks. The default behavior of the
program is to do no glycine filtering.
-a Arginine filter the source spectrum. If the
source spectrum is taken from an HNCO spectrum, it may contain peaks from
arginine side chains as well as backbone peaks. This flag causes the program to
filter those extra peaks. The default behavior of the program is not to perform
arginine filtering.
-x Perform setx cross-checking.
-devi In line deviation filtering is performed. (All
deviation filtering is done as each assignment is made.) The default behavior
is for no deviation filtering to be done.
-devo Out of line deviation filtering is performed.
(All deviation filtering is done at the end of the macro.) The default behavior
is for no deviation filtering to be done.
The Contrace
function examines the spectra that have previously been read into the CONTRAST
program using the lf function and
prepares them to be used by the Contrace
algorithm by applying cluster filters, diagonal filters, and symmetrize
algorithms where needed. Since the Contrace
function is often run more than once on the same data sets, it is possible to
skip these operations by inserting comments in the spectrum files. If the Contrace function detects the key word
"cluster", "diagonal", or "symmetrize" in the
comments of the header of one of the spectra, then the corresponding
preprocessing function will be skipped for that spectrum.
Cluster filters are often necessary when a spectrum
has been peakpicked with a peakpicking function that picks multiple maxima of a
single signal as if they arose from separate peaks. Single peaks can appear to
be multiplets because of incomplete decoupling and noise spikes that create irregularities
on the surface of the peak. Contrace
detects these peaks by taking weighted averages of the coordinates of peaks
that lie within calculated resolution-dependent tolerances of one another.
The following is a representative example of the cf
(cluster filter) function used to detect and average these multiplets.
cf hncocacb (%1
<0.009> d1 && %2 <0.09> d2 && %3 <0.09> d3)
-b
In this example peaks in the spectrum, hncocacb, are
compared to one another using the Boolean. The expressions containing percent
'%' symbols represent target dimension values taken from one peak and the
expressions containing dimension 'd' symbols are dimension values taken from
the other peak. The above Boolean evaluates to true if and only if the first
dimension coordinate from peak x (%1) lies within a tolerance of 0.009 units
(<0.009>) from the first dimension coordinate from peak y (d1) and the second
dimension of peak x (%2) lies within a tolerance of 0.09 units from the d2
dimensions of peak y and the third dimension (%3) of peak x lies within a
tolerance of 0.09 units of the d3 dimension of peak y. The '-b' flag at the end
of the function call indicates that for each peak in the spectrum, that peak
and the best matching peak will be averaged before the second, third, fourth,
etc. matches are averaged.
Cluster filters generated by Contrace are most often modified by adjusting the tolerances in the
angled brackets of the Boolean. The tolerances used by Contrace are conservative so users will often find it necessary to
increase the tolerances considerably.
9.3.2
Diagonal Filters
Diagonal filters are applied when Contrace determines that a spectrum
should contain a symmetric diagonal. This determination is made based on the
list of correlations input in the header of each spectrum. Peaks that have
coordinates within a calculated resolution-based tolerance of the calculated
position of the diagonal are deleted.
The following is a representative example of a
diagonal filter.
filter cosy cosy
(%1 <.009> d2)
The diagonal filter is a special form of the more
general filter command which can compare peaks from up to two different spectra
or buffers. The peaks in the first listed spectrum or buffer are deleted if the
Boolean evaluates to true. The dimension fields from the first spectrum or
buffer are referred to using the '%' symbol and the dimension fields from the
second listed spectrum or buffer are referred to using the 'd' symbol. The
first operand in each operation expression (i.e. the operand to the left of the
operator) refers to the first listed spectrum or buffer while the second
operand in each operation expression (i.e. the operand to the right of the
operator) refers to the second listed spectrum or buffer. Other fields from the
two lists are referenced explicitly (eg. c$cosy,1 or l$cosy).
In the special case of the diagonal filter, the
first and second lists are the same so that in the example above %1 refers to
the first dimension of a peak in the COSY spectrum while d2 refers to the
second dimension of the same peak in the COSY spectrum. If the indicated
coordinate values of the peak are within a tolerance of 0.009 units of one
another, then that peak is deleted.
The tolerances used by the Contrace algorithm for diagonal filters are conservative so the
user might find it necessary to insert larger tolerances. In general Contrace is conservative in its choice
of tolerances whenever it is asked to delete or obscure information.
9.3.3
Symmetrize
Many spectra contain peaks that are related by
symmetry in such a way that one peak contains exactly the same information as
its symmetric partner. Symmetry relations in spectra could be used to filter
peaks that have no symmetric partners, if it were not for the many
circumstances that conspire together to obscure one or both peaks in a
symmetric pair. For this reason the symmetrize operation used by Contrace is not a filter, but a
mechanism for ensuring that each peak in a symmetric spectrum has a
symmetry-related partner that contains identical information. Towards this end
the symmetrize operation adjusts the intensities and positions of symmetric
pairs of peaks so that each peak in the pair contains the same frequency and
intensity information. The function also creates symmetric partners for peaks
for which no partners exist. The Contrace
function uses the correlation list in the header of each spectrum file to
determine which spectra contain symmetry-related dimensions and applies the
symmetrize function to those selected spectra. The following is a simple
example of a call to the symmetrize function.
sym cosy (%1 <0.02> d2 && %2
<0.02> d1) -b ** Symmetrizes 2D spectrum.
ws cosy >cosy.sym ** Writes symmetrized 2D
spectrum.
The symmetrize function (sym) in this example
searches the peaks of a COSY spectrum for pairs of peaks for which the Boolean
evaluates to true. The first operand of each operation expression refer to one
peak while the second operand refers to a different peak. In this case if the
first dimension (%1) of peak 1 is within a tolerance of 0.02 units of the
second dimension (d2) of peak 2 and the second dimension (%2) of peak 1 is
within a tolerance of 0.02 units of the first dimension (d1) of peak 2, then
the coordinates and the intensities of the two peaks will be averaged and
adjusted so that each peak contains the same identical information. The -b flag
instructs the algorithm to sample all possible partners for peak 1 and choose
only the best matching partner. If no symmetric partners are found for a peak
then a symmetric partner is created for it using information from the existing
peak.
The write spectrum (ws) function is then used to save a copy of the modified spectrum
to a file named cosy.sym.
If a source spectrum is not specified in the call to
the Contrace function, then the Contrace function will create a source
spectrum from the spectra input into the CONTRAST program. The Contrace function searches the input
spectra for the most complete spectrum that contains the greatest number of
correlated resonances in common with the greatest number of other spectra. The
ideal spectrum has either a single correlation so that there is a one to one
correspondence between the peaks in the spectrum and the residues in the
protein, or it contains one correlation that can readily be distinguished from
other correlations using the input database of chemical shift ranges. In the
following example the fingerprint region of a COSY spectrum is singled out as
the best starting point for the source spectrum.
scan cosy (d1
<3> 9 && d2 <1.35> 4.45) |source -f ** Create new source
buffer.
Here we see that the COSY spectrum is searched for
peaks with d1 values in the range of 6 to 12 ppm and d2 values in the range of
3.10 to 5.8 ppm. The peaks are placed in a buffer named |source and the
contents of the buffer are filtered (-f) to ensure that multiple entries of the
same peak are not placed in the buffer.
The above Scan
function creates a source buffer but not a source spectrum. The following
example shows how a source buffer is converted to a source spectrum.
bts |source $source
** Create new source spectrum.
Here we see that the buffer to spectrum function
(bts) copies the contents of the source buffer to a spectrum with the name,
$source. This new source spectrum is added to the end of the list of input
spectra.
The Contrace
function then uses the write spectrum (ws)
function to save a copy of the newly created source spectrum to a spectrum file
named "source.tmp" as follows.
ws source
>source.tmp ** Write new source spectrum.
The function then deletes the source buffer using
the delete (del) function.
del |source **
Delete source buffer.
If the Contrace
function is called with a -f flag, then any gaps in the source spectrum will be
filled in with peaks from another overlapping spectrum. In the following
example a TOCSY spectrum is used to fill in any gaps in the source spectrum
which was taken from the fingerprint region of a COSY spectrum.
fill source tocsy
!(%1 <.02> d1 && %2 <.02> d2 || 9 >3< d1 || 4.45
>1.35< d2)
In this example we see that the complement of the
set of peaks for which the Boolean evaluates to true is added to the source
spectrum. Thus if the d1 dimension of a TOCSY peak is within a tolerance of
0.02 units of the first dimension of the source spectrum (%1) and the d2
dimension of the TOCSY peak is within 0.02 units of the second dimension of the
source peak (%2) or if the first dimension of the TOCSY peak is outside the
range of 6 to 12 units or if the second dimension of the TOCSY peak is outside
the range of 3.1 to 5.8 units, then the TOCSY peak is not added to the source
spectrum. All other TOCSY peaks for which the Boolean evaluates to false,
however, are added to the source spectrum.
If the source spectrum is a 3D or higher dimensional
spectrum, then it is not likely that another spectrum will contain a
correlation that matches the source correlation in every dimension, but it is
often only necessary for a substantial subset of the dimensions to match. The
following is an example of how Contrace
uses an HNCOCA spectrum to fill a source spectrum created from peaks from a
HNCO spectrum.
fill source hncoca
!( %1 <.03> d1 && %2 <.15> d2) d3=0.0
In this case only the first two dimensions were
required to match while the third dimension (the alpha carbon dimension) of
each filling peak in the HNCOCA spectrum is set to 0.0 before it is copied to
the source spectrum. This is an acceptable substitution for a missing HNCO peak
since only the amide proton and nitrogen dimensions of the HNCO spectrum are
held in common with the other experiments input into the CONTRAST program.
Each peak in the source spectrum is the seed for a
group of assigned resonances called a fragment. There is a one to one
correspondence between the peaks in the source spectrum and the fragments, and
in the ideal case there is a one to one correspondence between the fragments
and the amino acids in the protein. The sa
(ScanAll) function is used to create
one buffer for each peak in the source spectrum. In general functions that have
an "all" suffix repeat a basic procedure for every peak in the source
spectrum which is usually specified by number (or name) immediately following
the function name. The following command uses the fifth spectrum as the source
spectrum and searches the spectrum named "source" for peaks that have
d1 values that match the first dimension of peaks in the source spectrum (%1)
within a tolerance of 0.02.
sa 5, source (%1 <0.02> d1 ) -f ** Search
source using peaks in source and filter (-f) the results.
The result of this function is that every peak in
the source spectrum spawns a new buffer which becomes the seed of a new
fragment. In this case the spectrum searched is the source spectrum itself.
This ensures that every buffer created contains at least one peak. Each buffer
created is called the source buffer for the its fragment. The value
corresponding to each dimension of the best peak in a source buffer is the
assignment for the resonance that the dimension represents. Thus a source
buffer is a special assignment buffer (a buffer that contains the assignment
for one or more resonances).
After Contrace
creates a source buffer for each fragment, it uses the setall function to set
the 'n' (endpoint) field associated with each peak to 1.
setall 5, n |source () = 1 ** Endpoint Determination
The 'n' field is a general integer field that the Contrace function uses for the purpose
of indicating whether the assignment in an assignment buffer should be trusted.
The 'n' field is given a value of 1 if the assignment is to be used in further
calculations or it is given a value of 0 to indicate that further calculations
should not be based on that assignment. This is necessary since Contrace traces every fragment to the
extent of the longest spin system possible given the data. Since alanine
side-chains should not produce valid assignments at the gamma and delta
positions, it is desirable that Contrace
be able to warn the user using the 'n' field. NOTE: Although it is usually safe
to set the endpoint fields of all source buffers to 1, endpoint determination
for other assignment buffers is one of the least reliable parts of the Contrace calculation and should always
be considered with suspicion.
The following is an example of the lines generated
by Contrace to create fragments from
a 3D HNCO source spectrum.
sa 1, hnco (%1 <0.02> d1 && %2
<0.2> d2 ) -f
setall 1, n |hnco () = 1 ** Endpoint Determination
Working buffers are buffers that contain peaks from
a single search. It is not necessary to refer to working buffers while
recording assignments, but Contrace
saves all working buffers to provide a record of what information was used to
make the assignments. Working buffers are created by search functions. If a
buffer name is specified after the search Boolean, then the list of buffers is
checked for a buffer that has that name and the matching peaks from that search
are added to the buffer. If that buffer is not found then a buffer by that name
is created. If no buffer name is listed then the name of the spectrum or buffer
being searched is used by default, and a new buffer is created. The following
is an example of a Contrace search
that creates a working buffer.
sa 1, hnca (%1 <0.02> d1 && %2
<0.2> d2 ) |Hni_Nai_hnca -f
In this example source spectrum 1 peaks are used to
provide the target values for the search (%1 = the first dimension of the
source peak, %2 = the second dimension of the source peak) and the d1 and d2
dimensions of the HNCA spectrum are searched. The Contrace function creates a new buffer name from the resonances
corresponding to the dimensions that were searched and the name of the spectrum
searched. In this example %1 is the amide proton dimension and %2 is the amide
nitrogen dimension. The -f flag causes duplicate peaks in the resulting buffers
to be filtered.
The Contrace
function determines which spectra should be searched first so that the best
assignment pathway is taken. The preceding example resulted in one of the initial
working buffers. As the assigned fragment is extended, target values will also
be taken from assignment buffers (as opposed to the source spectrum alone). The
following is an example of a search that uses assignment buffers as targets.
sa 1, hcchtocsy (d3|Hai,1 <0.02> d1 &&
d3|Cai,1 <0.4> d2 ) |Hai_Cai_hcchtocsy -f
In this example an HCCH-TOCSY spectrum is searched
for d1 dimensions that match the alpha proton assignment (the d3 dimension of
the first peak in the assignment buffer, Hai) and for d2 dimensions that match
the alpha carbon assignment (the d3 dimension of the first peak in the
assignment buffer, Cai).
After working buffers are created, they are filtered
using the prune (prunebuffer) function.
prune hnca |Hni_Nai_hnca (dev < 30) lev -= 100
In this case the prune function searches through the
|Hni_Nai_hnca buffer of each fragment and when it finds identical peaks whose
deviations are less than 30% of the deviation of the identical peak with the
highest deviation value, it subtracts 100 from the level effectively removing
it from further consideration. The Contrace
function avoids removing peaks from buffers since that would hide information
from the user.
The Contrace
function analyses the information content of all of the working buffers and
determines the next resonance to be assigned. The working buffer with the least
ambiguous information for assigning that resonance is then determined and the
algorithm uses the ScanAll function
to create an assignment buffer named for the resonance that it is created to
assign. In the following example the ScanAll
(sa) function copies the contents of
each |Hni_Nai_hnca working buffer in each fragment (defined by the peaks in
source spectrum 1) to a new assignment buffer named |Cai.
sa 1, |Hni_Nai_hnca
() |Cai
The empty parenthesis () is the Contrace convention for specifying that the Boolean is true for
every member of an object.
The newly created assignment buffer usually contains
the resonance to be assigned as well as other resonances from other
correlations in the original spectrum, other resonances from overlapping spin
systems, and false signals (noise) in the spectra. In order to distinguish
between these signals, peaks from appropriate working buffers are added to the
assignment buffer. In theory the number of the "correct" resonance
signals that fall within a given tolerance of each other should be greater than
the numbers of the other resonances.
The fillall
command is used to add peaks from the working buffers of a fragment to the
assignment buffer of the same fragment. Peaks are only added to the assignment
buffer if they do not duplicate (within a specified tolerance) resonances
already contained in the assignment buffer.
fillall 1, |Cai
|Hni_Nai_hncacb !(%3 <0.6> d3)
In the above example peaks from the |Hni_Nai_hncacb
buffer are added to the |Cai buffer if and only if the third dimension (d3)
resonance of each Hni_Nai_hncacb peak is not (!) within 0.6 ppm of the third
dimension (%3) resonance of an existing |Cai peak.
Once a resonance buffer has been filled with peaks
from all of the contributing working buffers, the Contrace function uses the setall function to increment the level
of each peak in the assignment buffer for each working buffer that contains a
matching resonance.
setall 1, level
|Cai (d3 <0.4> d3|Hni_Nai_hnca && 0 <= l|Hni_Nai_hnca) -n +=
-> 0.60/#|Hni_Nai_hnca*(1+DEV/120)
The first line of the call to the setall function
specifies that the level of each peak in the |Cai buffer for which the Boolean
is true is to be incremented by the formula on the following line. The Boolean
will only be true when the d3 dimension of the |Cai peak is within a tolerance
of 0.4 of the d3 dimension of a peak in the |Hni_Nai_hnca buffer and if the
level of that peak (l|Hni_Nai_hnca) is greater than or equal to 0. (Note that
filters described in the preceding section set the levels of filtered peaks to
negative values.) The flag '-n' indicates that a "NOESY-type" search
is to be performed (see appendix). The += symbol indicates that the levels are
to be incremented as opposed to decremented (-=) or divided (/=) or multiplied
(*=) or set to (=). The "->" symbol indicates that the line has
been continued. The formula used by Contrace
to increment the levels of the peaks is shown on the second line of the command
and is simply the value 0.60 (determined by Contrace
for each buffer) divided by the number of peaks in the working buffer and then
multiplied by the sum of 1 and the deviation value of the match of the
resonance in the working buffer to the resonance of the assignment buffer
divided by 120. This formula has been determined empirically, and is beyond the
scope of this manual. It can, however, be easily modified by the user to a simpler
or more complicated formula.
The setall function is repeated for each working
buffer that contains resonances that might match resonances in the assignment
buffer. The peaks in the assignment buffer are then filtered using two types of
filters. The first filter is a range filter. It simply reduces the levels of
peaks whose resonances fall outside the normal range for that type of
resonance.
setall 1, level
|Cai !(d3 <15> 54) -= 100 ** Cai Range Filter: 39-69
In this example the setall function is used to
reduce by 100 the levels of all peaks in the |Cai buffer whose d3 resonances
fall outside (note the '!' symbol) the range of 39 to 69 ppm. The second type
of filter is a fragment filter. These filters exclude from consideration
resonances that have already been assigned to other atom types. The following
example subtracts 1000 from the level of every peak in the |Cai assignment
buffer that has a d3 value that is within a tolerance of 0.4 ppm of the d3
value of the first peak in the |Ca_prev assignment buffer (d3|Ca_prev,1).
setall 1, level
|Cai (d3 <0.4> d3|Ca_prev,1) -= 1000 ** FRAG-FILTER assigned peaks.
After fragment filters have been applied for each
previously created assignment buffer, the peaks in the new assignment buffer
sorted by decreasing level value using the order (ord) command.
ord |Cai level
After the buffer is sorted, the first peak in the
buffer should contain the assignment for the specified resonance and the
following peaks contain ranked alternative assignments. If the first peak of
the assignment is likely to contain the correct assignment the 'n' variable of
the peak is set to 1 using the set function.
setall 1, n |Cai
(l|Cai,1 >= l|Cb_prev,1 && n|Cb_prev,1 > 0) = 1
This function sets the 'n' variable to 1 if and only
if the level of the first peak in the assignment buffer (l|Cai,1) is greater
than or equal to the level of the first peak in the previous assignment buffer
( l|Cb_prev,1) which must also have a positive 'n' variable.
Finally the information about the location of
assignments in the assigned fragment is recorded in the CONTRAST program using
the set function so that other fully-automated functions that follow the Contrace function can refer to the
assignments. The following example records the dimension, buffer name and peak
number (d3|Cai,1) that contains the assignment for the Cai (alpha carbon of the
current residue in the fragment) as well as a Boolean that can be used to
evaluate the quality of the assignment.
set frag d3|Cai,1 =
Cai (n|Cai,1 > 0)
In this case the Boolean simply uses the endpoint
value 'n' as the metric to test for the validity of the assignment.
Sections 9.6 and 9.7 described the process of
creating primary working buffers and creating primary assignment buffers. A
primary assignment buffer is the first assignment buffer created at any given
level of assignments. If however there are two or more resonances that can be
assigned at a given level (for example Hb1 and Hb2) then the Contrace program will create a set of
secondary working buffers and a secondary assignment buffer to assign the
second resonance. These secondary buffers are created immediately after the
primary assignment buffer has been completed and the next set of primary
working buffers have been created and filtered. Secondary assignments are made
after the next set of primary working buffers has been created, because the new
working buffers often contain information that is useful in determining the
assignment of the secondary resonance.
The process of creating secondary assignment buffers
is similar to that of creating the primary buffers except that the criteria for
determining whether or not the secondary assignment is valid are more strict.
This is partially due to the fact that secondary and primary assignments are
often degenerate, but it also reflects the fact that it is far more harmful to
the scoring algorithms for a bad secondary assignment to be considered valid
than it is to ignore a valid secondary assignment.
9.9 Exiting Contrace
Contrace
continues tracing a spin system as long as there are unassigned resonances in
the spectra (as input in the headers of each spectrum file). Contrace will create an assignment
buffer for each resonance even though there is not enough information to
properly assign the resonance. Care should be taken to evaluate the quality of
the data that went into each assignment. The ambiguity value for the assignment
should not be relied on! In fact the ambiguity assessment function has been
taken out of the Contrace function to
ensure that the user carefully evaluates each assignment.
Overlap Tests
CONTRAST uses the overlap between adjacent fragments
to sequentially order the fragments. Fragments must be overlapped in order to
use the CONTRAST program to assign NMR data. This overlap occurs as a result of
dipolar (through-space NOESY) connectivities between residues or scalar
(through bond) connectivities between residues. Fragments should be constructed
so that they include buffer(s) that contain resonances from the previous and/or
following fragment in the sequence as well as buffer(s) that contain analogous
resonances within the residue represented by the current fragment. Figure 10.1
uses connectivity graphs to represent a series of fragments that overlap in the
C dimension due to an interresidue scalar coupling from an experiment such as
the HN(CO)CA experiment.

Figure
10.1 Graph representation of the connectivities present
between assignment buffers within three fragments. Assignment buffers on the
shaded background represent those assignment buffers that are actually used to
assign the fragments to alanine, valine and lysine amino acids respectively.
Arrows show the overlap between the fragments at the Ca resonances.
We see from Figure 10.1 that if two fragments are to
be considered adjacent in the sequence, then the left-most C resonance in one
fragment must be assigned to the same chemical shift as the right-most C
resonance of the previous fragment.
Fragments also overlap though dipolar couplings
between residues. Scalar overlap is modeled in the CONTRAST method by assignment
buffers that overlap between sequential residues, however, dipolar overlap is
modeled by a working buffer that contains NOESY peaks from one residue that can
overlap with both assignment buffers and working buffers in a neighboring
fragment. This type of dipolar overlap is illustrated in Figure 10.2.

Figure
10.2 Illustration of the overlap between adjacent
fragments that is due to dipolar, through-space coupling. Resonances on the
shaded background represent resonances that have been assigned in assignment
buffers and are connected with dark line segments. Resonances on the unshaded
portion of the diagram represent resonances in a working buffer that arise from
NOESY-type connectivities which are represented by light line segments. Overlap
between a working buffer in the Lys 3 fragment and Val 2 assignment and working
buffers are indicated by arrows.
Since through-space couplings are often seen between
non-sequential fragments and thus can be unreliable in some cases. On the other
hand scalar couplings across the peptide bond can be ambiguous when there are
several resonances involved in the coupling that have similar chemical
shifts. Thus it is always desirable to make fragment
adjacency determinations using both dipolar and scalar couplings is both types
of data are available.
10.1 The Overlap Function
The Overlap
function is used to create a set of "overlap tests" that is used by a
subsequent sequential assignment function to determine when fragments are
adjacent in the sequence. The Overlap
function is similar to the Contrace
function in that it generates a CONTRAST macro that can be used as is or
modified by the user. The function, however, is much simpler than Contrace since it usually creates only a
few lines of CONTRAST commands. The Overlap
function helps provide a fully-automated pathway to assignments, but it does
not perform any function that the user would not be capable of performing by
hand.
The Overlap
function generates set Overlap (Set Ovl) tests which is used for scoring
the overlap between fragments. The Overlap
function uses information entered into the CONTRAST program with the Set Frag function that records the
locations of assignments and potentially useful NOESY-type working buffers that
provide a means of scoring the overlap between fragments. The Contrace function automatically
generates calls to the Set Frag
function, but the user can also enter that information "by hand". The
Overlap function takes two optional
command line parameters: the name or number of the source spectrum and the name
of the file to which the generated macro is saved. The following is an example
call to the Overlap function.
overlap 1
>overlap.mac
In this example spectrum number one is specified as
the source spectrum and a copy of the macro generated by the function is saved
to the file overlap.mac as it is executed by CONTRAST.
10.2 Set Ovl Tests
Set overlap tests are similar to other CONTRAST
commands that use Booleans except that the order of the fields in Set Ovl tests is critical (in other
CONTRAST commands the order is not as important). In Set Ovl Booleans all of the left-hand operands refer to the one
fragment and all of the right-hand operands refer to the next fragment
(whatever fragment happens to be to the right of the first fragment). The
general form of the command is
set ovl source
|bufferLF |bufferRF ->
(operandLF op
operandRF [conj operandLF op operandRF [conj ...]]) -scaleF -includeF score
where:
set ovl = The CONTRAST command.
source = The name or number of the source spectrum.
bufferLF = The name of the buffer being tested from
the fragment on the left.
bufferRF = The name of the buffer being tested from
the fragment on the right.
-> = Line continued symbol.
operandLF = The left-hand operand which corresponds
to the fragment on the left.
op = Operator (eg. >, <=, <.02>, etc.)
operandRF = The right-hand operand which corresponds
to the fragment on the right.
conj = Conjunction (eg. ||, &&, |, &)
scaleF = Flag that either specifies that scores be
scaled (-s) or unscaled (-u).
includeF = Flag that specifies that either the best
matching value alone is scored (-b), that a
score be generated for the best match in the
right-hand buffer for each different
element of the left-hand buffer (-n), or that all
matches between the two
fragments are scored (-a).
score = The number of points that is awarded when
the Set Ovl Boolean evaluates to
true.
The Set Ovl
function can be used to score the adjacency of two fragments that share a
common dimension. Each fragment in the following example contains a |Cai buffer
that contains peaks whose d3 dimension arises from the Ca resonance of the
residue making up most of the current fragment and a |Ca_prev buffer whose d3
dimension is from the Ca resonance of the previous residue. Note: a |Ca_prev
buffer is usually formed from peaks from scalar-coupling experiments such as
the 3D HN(CO)CA or 3D HN(CO)CACB in which the H1 and N15 resonances arise from
one residue and the Ca or Cb resonances arise from the previous residue.
set ovl 1 |Cai
|Ca_prev (%3 <.1> d3 && n|Cai > 0 && 0 < n|Ca_prev)
-s -b 100
In this example 100 points are scaled (-s) by the
deviation of the match between the resonances being compared and added to the
sequential score for the protein if and only if the d3 dimension (%3) of a peak
in the |Cai buffer matches the d3 dimension of a peak in the |Ca_prev buffer
within a tolerance of 0.1ppm and if the 'n' values of the peaks in the
best-matching (-b) pair of peaks are both greater than zero. Note that if the
operation, "0 < n|Ca_prev" had been reversed (eg. "n|Ca_prev
> 0") then the program would have used the n value from the |Ca_prev
buffer of the left-hand fragment instead of the |Ca_prev buffer from the
right-hand fragment. The order that the |Cai and |Ca_prev buffers are listed
before the Boolean is also significant. If the two buffers had been listed with
|Ca_prev before |Cai, then the program would have tried to match peaks from the
|Ca_prev buffer of the left-hand fragment with peaks from the |Cai buffer of
the right-hand fragment and only unfortuitous random matches would have been
possible.
The Set Ovl
function can also calculate adjacency scores for two fragments based on
dipolar, through-space NOESY-type information. The following example shows how
a working buffer in one fragment (that is formed from a search along the H1 and
N15 dimensions of a 3D H1N15-NOESY experiment) is used to determine adjacency
based on its ability to match the amide proton of the following fragment.
set ovl 1
|H_N_noesy |hnco (d3|H_N_noesy <.1> d1|hnco,1 && 0 < n|hnco,1)
-u -n 50
In this example the adjacency score for the two
fragments is incremented by 50 points (unscaled due to the -u flag) for each
peak in the |H_N_noesy working buffer of the left-hand fragment whose d3
dimension matches the d1 dimension (the amide proton dimension) of the first
peak in the |hnco assignment buffer within a tolerance of 0.1 if and only if
the n value of that HNCO peak is greater than 0. The -n includeF flag indicates
that a NOESY-type matching is to be done in which only the best match between
each peak in the left-hand buffer is used to evaluate the score. Thus if the
|H_N_noesy buffer contained 10 peaks and was being compared to every peak in
the |hnco buffer (instead of to only the first peak) and if the |hnco buffer
contained 100 peaks, then a maximum of 500 points (50*10) could be generated
using the -n flag (compared to a maximum of 50 points using the -b flag and a
maximum of 50000 points (50*10*100) using the -a flag). Note that a -n option
is generally appropriate whenever one or both of the buffers being compared are
NOESY-type working buffers.
Amino Acid Tests
In order to make effective use of a protein
sequence, the amino acid types of at least a few CONTRAST fragments must be
partially determined. The ability to reduce the number of sequence positions to
which a fragment can be assigned allows the sequence to be used generate
constraints. Many errors can occur at the primary level of assignments that can
produce errors in the constraints generated at the amino acid type assignment
stage. In fact the amino acid type assignment stage is prone to errors even
when the primary assignments are perfect. CONTRAST allows for these errors by
allowing fragments to be scored at every position in the sequence. Correctly
assigned regions of the protein sequence may score well enough to compensate
for errors, thus the program can arrive at correct assignments even when there
are errors in the primary and amino acid type assignments.
There are several different methods for making amino-acid-type
assignments. Each method will be discussed in a separate section and can be
used separately, but best results are obtained when all of the described
methods are used together.
Each peak in the source spectrum is a starting point
for each fragment that the program constructs. When the peaks of the source
spectrum file (or any other spectrum file) are read into the CONTRAST program,
comments are also read into the program and are associated with each peak that
they follow (see section @@.@). Any amino acid type assignment that is already
known by the user as a result of previous work can be used to bias future
CONTRAST assignments by adding specially formatted peak labels to the source
peaks that will give rise to the fragments for which extra information is
known. The strength of the bias is proportional to the magnitude of the score
that the label instructs to award to the assignment.
CONTRAST peak labels are added to a peak's comment
field (added after the "**" symbol following the peak but before the
newline. These peak labels are marked by inclusion in square brackets and
simply specify a list of residue identifiers and a score separated by commas or
spaces. Figure 11.1 is an example of a spectrum file that uses peak labels.
|
3 5 (95) com 50 Hn Hni .1 Hni N Nai .1 Nai CO Co- .1 Co- ** Peakpicked 9/9/99 from prothnco.ser expt. 9.1 110.0 180.0 10000 ** hnco1 [L,I,V 100] 9.2 111.0 181.0 11000 ** hnco2 [N20,N25 100] 9.3 112.0 182.0 12000 ** hnco3 [ala -1000] 9. 4 113.0 183.0 13000 ** hnco4 [G 10] [A 20] 10 100.0 160.0 1 ** bogus [P45 100000] |
Figure 11.1 Sample spectrum
file. The file demonstrates the use of peak labels for amino-acid type
assignment. Each peak gives rise to a fragment. When the first fragment (the
fragment arising from the first peak) is assigned to leucine, isoleucine, or
valine positions in the sequence
100 points is added to whatever amino acid type
score is generated by other techniques. If the second fragment is assigned to
asparagines 20 or 25 then the assignment is also awarded 100 points. If the
third fragment is assigned to any alanine position then 1000 points are
subtracted from the score of the assignment. If the fourth fragment is assigned
to any glycine residue then 10 points are added to the assignment score, and if
it is assigned to any alanine then 20 points are added. Finally if the fifth
fragment is assigned to P45 then the assignment receives 100000 extra points.
CONTRAST sequential assignment functions check
special AA Comment Tests for instructions on how and when to make use of peak
labels (see the following section). Peak labels will be ignored if these set aa
tests have not been read into the program before the sequential assignment
step.
Assigning proteins should be an iterative process.
The use of peak labels allows alternative assignments to be tried out by
biasing a fragment away from a previous assignment or towards another
assignment or both. Negative biasing (discouraging a particular assignment) is
accomplished simply by entering a large or small negative score for the
unwanted assignment. One misassignment can have a cascading effect. The correct
fragment displaced by the incorrectly assigned fragment, may in turn displace
another correct assignment which may in turn displace another and so the
process may continue. By finding that one incorrect assignment and displacing
it using negatively biased peak labels, one can go from nonsensical assignments
to correct assignments in one small step.
Positive biasing (encouraging a particular
assignment or set of assignments) is accomplished by giving the fragment an
positive score if it is assigned to a desired set of sequence positions.
Positively biased peak labels can usually be used even before the automated
assignment process has begun. Perhaps the most important example is using phony
peaks with positively biased peak labels to fill in known gaps in the data. For
instance if the HNCO spectrum is used as source, then fragments can not
normally be generated for proline residues since they do not contain amide
protons. In this case phony peaks (peaks that are made up by the user so that
the dimensions are not likely to match other resonances from other spectra) are
created and peak labels with very high scores for the respective proline
positions are added to the peaks (eg. [P45 10000]). This fills in the
"gaps" in the data and minimizes the time other fragments will be
tried out at those positions.
11.2 AA Comment Tests
AA comment tests are read into the CONTRAST program
to instruct sequential assignment programs how to use peak labels in the
comments of source spectrum peaks to aid in amino acid type assignment. The set
aa function is used to read AA comment tests into the CONTRAST program. This
function is similar to the Set Ovl
function described in section 10. It's general format is as follows:
set aa AAname source[,] (c|buffName[,peak]) [-flag]
scale
set aa = the command name
AAname = the name of the amino acid for which the
test will apply
source = the name or number of the source spectrum
buffName = the name of the buffer containing the
peak whose comment field is to be checked
peak = the number of the peak in the buffer that
should be checked
flag = flag that causes the points awarded by the
peak label to be scaled by the
deviation of the peak in question (-s) or to be left
unscaled by default (-u).
scale = a scaling factor entered as a percentage
where values of 100 or 0 indicate that
the score included in the comment is not to be
scored by a value between 0 and
100 is scaled by that percentage.
The following is a typical example of an AA comment
test.
set aa L 5,
(c|source,1) 0
In this example, the sequencing function is
instructed that spectrum 5 is the source spectrum and that it should consult
the comment of the first peak in the buffer named |source for the fragment
(c|source,1) when determining the amino acid type of a fragment that is being
scored at a leucine residue. It is typical for the source buffer to be
specified as the buffer to check for peak labels, but any other buffer can be
specified. If the comment associated with the specified buffer in a fragment
contains a peak label such as [L,V 100] and that fragment is being tried at a
leucine or valine position in the protein sequence, or if the comment contains
an expression such as [L5 100] and that fragment is being tried at leucine 5 in
the sequence, then the placement will be awarded 100 additional points.
NOTE AA comment tests are a special type of amino
acid test and should not be confused with the "normal" amino acid
tests. The parenthesis in all other amino acid tests must contain a Boolean
expression and not just a field location as is required in AA comment tests.
Furthermore AA comment tests for any given amino acid must be read into
CONTRAST before any other amino acid tests for that amino acids or the results
can not be predicted.
The form of the amino acid name specified in the set
aa command is not important if it is a standard name for one of the standard 20
amino acids; if this is not the case then it should match the amino acid name
used in the sequence. For example if the sequence input file for CONTRAST
included a glx residue, then glx must be specified in the set aa command in
order for the amino acid comment test to be applied to that residue.
One can use peak comments to "lock in" or
direct the assignments of particular peaks and fragments. Not only can this
facilitate "bookkeeping", but it also provides a means for other
CONTRAST functions to explore and evaluate alternative assignments in an
iterative fashion.
11.3 General Amino Acid Tests
General amino acid tests (those that are not AA
comment tests) are used to award points when the chemical shifts of specific
resonances in the fragment lie within specified ranges. They are entered into
the CONTRAST program using the set aa command similarly to how the set aa
function is used to read in AA comment tests. The general form is:
set aa AAname
source[,] |buffName (Boolean) [-flag] score [** comment]
set aa = the command name.
AAname = the name of the amino acid for which the
test will apply.
source = the name or number of the source spectrum.
buffName = the name of the principal buffer that is
being tested by the test.
Boolean = a Boolean expression. If the Boolean is
true for a particular fragment, then
points are awarded to the amino acid type score at
the position specified by
AAname.
flag = flag that causes the points awarded by the
test to be scaled by the
deviation of the peak in question (-s) or to be left
unscaled by default (-u).
score = the points awarded when the Boolean
evaluates to true.
comment = a mnemonic phrase to describe the function
of the amino acid test.
Amino acid tests are used to implement several different
amino acid type assignment strategies used by high-level CONTRAST functions.
Depending on the method tens or thousands of tests are created to score amino
acid type assignments. An example of a single test follows.
set aa E 5, |Hbi (d2|Hbi,1 <1.55> 2.15
&& n|Hbi,1 > 0 && -> (4)
d2|Hgi,1 <0.75> 1.95 && n|Hgi,1 >
0) 100 ** LEKR
In this example, if the second dimension of the
first peak in the Hbi buffer (d2|Hbi,1) is within a tolerance of 1.55 ppm of
2.15 ppm and if the d2 dimension of the first peak of the Hgi buffer is within
0.75 ppm of 1.95 ppm and if both endpoint fields for both peaks indicate that
the peaks are well-connected to the spin system, then 100 points are added to the
global score when the fragment is tried at a glutamate position in the
sequence. The arrow symbol "->" is a CONTRAST symbol that informs
the program that the command is continued on the next line. The comment of the
test indicates each amino acid type "LEKR" (leucine, glutamate,
lysine or arginine) for which this particular test can score true if the
resonances of the fragment fall within the input standard ranges. The number of
amino acids associated with an amino acid test is a measure of the resolving power
of the test. In the ideal case if the Hb and Hg chemical shifts of a fragment
fall within the two chemical shift ranges specified in the Boolean expression,
then according to the test the fragment must be either a leucine, a glutamate,
a lysine or an arginine (assuming that the fragment is correctly assigned and
that the two chemical shifts fall within the chemical shift ranges read into
CONTRAST. The example given is a glutamate test (set aa E ...), identical tests
for each of the other amino acid types (L, K, and R) should also be generated
so that any fragment that tests positive for one type of residue will test
positive for the other three if the resonances of the fragment fall within the
chemical shift ranges input into the program.
Set aa tests can be used to test for distinguishing
features of amino acid structure. Such tests are called geometry tests. They
use an identical format as that described in section 11.3, but rather than
awarding points for resonances that fall within chemical shift ranges, they
generally are used to subtract points for when differences between the
structure of the assignments and the structure of the amino acid type are
found. The following is an example of a geometry test.
set aa S 5, |Hgi
(n|Hgi,1 > 0) -s -b -200
In this example, if the "n" (endpoint)
field of the first peak in the Hgi buffer of a fragment is greater than zero
(n|Hgi,1 > 0), this is an indication that a resonance with significant
connectivity to the rest of the fragment was assigned as Hg (the resonances
assigned in the Hgi buffer) and that the sequential assignment function should
subtract 200 points from the fragments score whenever it is assigned to a
serine position (S) in the sequence. The Contrace
function only sets the endpoint field "n" of a peak to a positive
value (indicating a connection to the rest of the fragment) if a stringent set
of criteria is met so that the endpoint field is only extremely rarely set to a
positive value when it should not be. This is a conservative way to use amino
acid topologies to determine amino acid type, since the penalty for a violation
is small, and these violations do not prevent the sequencing function from
mapping these spin systems to those amino acid types for which violations
occur. Since the endpoint field is often set to zero (indicating a weaker
connection to the assigned fragment) before the spin system has been fully
traced, geometry tests do not explicitly penalize the mapping of amino acid
types to spin systems with endpoint fields that are set to zero before within
the limits of the size of the amino acid. Since all spin systems are traced by Contrace as far as the correlations in
the data allow, the sequential assignment algorithms are able to consider all
of the resonances in the fuzzy spin systems in generating amino acid
assignments.
11.5 The Reside Function
The Reside
(Residue Identification) function automatically generates the comment tests,
geometry tests and general amino acid tests described in the preceding
sections. It takes data input into the CONTRAST program using Set Shiftr commands and Set Frag commands and generates amino
acid type identifications tests that will be used by sequencing functions to
generate sequential assignments. The tests generated by the Reside function enable CONTRAST
sequencing functions to determine the relative likelihood that a particular
fragment originated from a specific type of amino acid. Figure 11.5.1 is a
schematic of the Reside function.

Figure
11.5.1 Diagram of the Reside
algorithm. In the first step, input chemical shift ranges for each resonance in
the fragment are divided into all possible subregions that have widths greater
than an input resolution. Boundaries for these subregions are taken from the
set of all upper and lower bounds for all input chemical shift ranges for each
particular resonance. For each resonance each subregion is then associated with
the set of all residue types whose full chemical shift ranges for that
resonance intersect the subregion. All intersecting subregions with identical
amino acid type sets are combined, and all subregions with sets of amino acid
types whose cardinality equals the number of identifiable amino acids in the
sequence are eliminated. Amino acid type tests are generated by taking all
combinations of subregions between the different resonances (including
combinations in which there are no subregions taken from a resonance) . Tests
are eliminated if the cardinality of the intersecting set of the amino acid
sets associated with the subregions represented in the test is greater than a
user-defined value that is generally inversely proportional to the number of
resonance ranges combined to form the test. When all amino acid tests have been
assembled, intersecting tests (tests with intersecting resonance subregions
that are associated with identical sets of amino acid types) are combined by
taking the union of each resonance subregion. At this point if there are amino
acid types in the sequence that are not represented in a user-defined minimum
number of tests, one dimensional tests (tests involving only one resonance
range) that include underrepresented amino acid types are added to the list of
tests, starting with the most discriminating tests possible and continuing
until the user-defined minimum or maximum is reached.
The resonances to be included in the Reside calculation are specified by
fragment description statements of the form:
set frag d3|Cbi,1 =
Cbi (n|Cbi,1 > 0) * q|Cbi,1
In the example above, the specified resonance is
found at the third dimension of the first peak of the Cbi buffer (d3|Cbi,1),
and it is defined to be the beta carbon of the ith residue (Cbi). The Boolean expression (in parentheses) contains
any additional conditions to add to the amino acid tests for that resonance, and
the final term (* q|Cbi,1) instructs Reside
to multiply all scores generated by amino acid type tests that include the Cbi
resonance by the quality factor of the peak containing the specified resonance
(q|Cbi,1). The quality factor is a general purpose CONTRAST variable that is
associated with each peak in a buffer. The function used to generate the
quality factor is dependent of the pathway taken by the spin system assembly
algorithm and the desire of the user, but it either represents an estimate of the
ambiguity of the assignment (a) or an estimate of the confidence in the
assignment (1-a). The Contrace
function automatically generates fragment descriptions for all resonances it
assigns, but since the user is able to define quality factors that would not
function well as scaling factors the Contrace
function does not automatically generate instructions for the use of the
quality factor. Scaling factors can be added to the CONTRAST macro after the Contrace function by repeating the
fragment description and including the desired scaling factor. Furthermore,
since the calculation time of Reside
scales exponentially with the number of resonances that the function is
evaluating, the user should limit the resonances tested by the Reside function by setting the resonance
types to NULL for all fragment descriptions that the user wishes to omit. For
example the following command will overwrite the command above with the effect
that Reside will not generate tests
that involve the Cbi resonance.
set frag d3|Cbi,1 =
NULL (n|Cbi,1 > 0)
The general syntax for the Reside function follows.
Reside
S[,] [>aa.mac] [-res R] [-max1 X1 -max2 X2 ... -maxi Xi] [-mint N] [-pts P]
S The name or number of the source spectrum.
aa.mac The name of the output macro file to be
generated.
R The minimum resolution of a test. No chemical
shift ranges in the generated tests
should be less than R ppm.
X1 Instructs Reside
to filter out 1-resonance amino acid tests that could score true for more
than X1 different amino acids.
X2 Instructs Reside
to filter out 2-resonance amino acid tests that could score true for more
than X2 different amino acids.
Xi Instructs Reside
to filter out i-resonance amino acid tests that could score true for more
than Xi different amino acids.
N Instructs Reside
to continue generating tests until each amino acid in the sequence is
included in at least N tests.
P The number of points before scaling awarded each
test generated if the probability
distributions of the amino acid ranges are not used
to generate points.
The following is an example call to the Reside function.
reside 1,
>CaCb.mac -res .5 -max1 8 -max2 4 -mint 1 -pts 100
In this example 1 is the number of the source
spectrum and CaCb.mac is the name of the output file to which the amino acid
tests will be written. [-res .5]: The smallest resonance range permitted to be
treated is 0.5 ppm. [-max1 8]: If a test can score true for over 8 well-behaved
(all resonances within the standard ranges input with the Set Shiftr commands) amino acid types and if the test checks only 1
resonance range, then the test will be deleted. [-max2 4]: Likewise all tests
that check 2 resonance ranges and can score true for over 4 well-behaved amino
acid types will also be deleted. [-mint 1]: If there are no amino acid tests
that can possibly score true for a well-behaved representative from a given
amino acid type, then the program will continue generating amino acid tests
(with relaxed standards) until every amino acid type is represented by at least
one test. Finally "-pts 100" means that each amino acid test will
give a maximum score of 100. The absolute value of the points generated by Reside is not critical since the program
scales the points awarded by amino acid tests to a user-defined multiple of the
number of connectivity points generated by overlap tests for the best-connected
fragments in the sequence. This ratio is set using the set so function.
Although -max1 through -maxi flags are optional, it
is usually a good idea to use them to limit the number of tests generated. In
general the maximum number of amino acids that can receive a score for a given
test should in all cases be less than 10. The more resonances that a test
includes, the lower the maximum number of amino acids scoring true for the test
should be. For example if a test only checks a region of the alpha carbon
dimension, the "-max1" flag might be used to delete the test, if it
is possible for over eight types of amino acids to score true for the test and
still have alpha carbon resonances lying in the standard ranges characteristic
for those amino acids. A limit of up to four amino acids (-max2 = 4) is
appropriate for tests that check two different resonance ranges -- say both alpha
and beta carbons. For tests that check still more dimensions, it is appropriate
to set the -maxi flags even lower still.
It is a good practice to limit the total number of
resonances read into any single Reside
function by including unrelated resonances in separate Reside calls. For example if a fragment contains Ca and Cb
resonances from the previous residue, then these should be treated separately
from the other resonances in the fragment that are from the current amino acid.
All the resonances from the current residue should be excluded by setting
resonance types to NULL using the Set
Frag command, and then the Reside
function should be run for the two remaining resonances. After that the Ca and Cb assignment buffers
corresponding to the previous residue should have resonances set to NULL and
all the other assignment buffers should have resonances set back to appropriate
values so that a second call to the Reside
function can be made for the current residue resonances.
11.6 Reside Probability Scoring
The Set Shiftr
function can be used not only to set resonance ranges, but it can also be used
to read into CONTRAST probability distributions for the resonance ranges. If
probability distributions are read into the program, then the Reside function can be used in a much
more powerful fashion to generate tests amino acid tests that yield
probabilities instead of point values.
Chapter 12
Once fragments have been assembled (either by hand
or using the Contrace function), they
must be shuffled to match the sequence of the protein. Since CONTRAST fragments
all arise from peaks in the source spectrum, a shuffling of fragments
corresponds to a shuffling of the peaks in the source spectrum so that the peak
from the first residue in the sequence is first in the peak list, the peak from
the second residue is second, and so forth. CONTRAST shuffling functions all
use overlap tests to determine sequential connectivity, and some sequencing
functions use amino acid type tests together with the sequence of the protein
to make mappings of the fragments onto the sequence.
The CONTRAST program includes 12 different shuffling
routines for ordering fragments. These functions are Shuffle, AnnealBF, Anneal,
AnnealQ, AnnealBFQ, AnnealLQ, AnnealBQ, AnnealAQ, Anneal3Q, ShufQ, AlignQ, and
ShufSeq. Of these functions 3 are not recommended (ShufQ, AlignQ, and ShufSeq)
and only 2 (Shuffle and AnnealBF) are described in this section. The remaining
functions are variations on the AnnealBF function. They are all called in the
same way and they have the same function as AnnealBF, but they use different
techniques and thus can yield different results. Variations in the assignments
generated by the different shuffling techniques are important indicators of the
quality of the assignments. One can have more confidence in regions of the
assignments that are independent of the technique used to make them, but one
must be suspicious of those regions vary with the technique if the scores for
those assignments are comparable.
The call to the Shuffle function has the following
syntax.
shuffle [S,]
[-d,-s] ["N"]
S The name or number of the source spectrum.
-d Flag to make shuffle compare the deviations
between the top two scores
-s Flag to make shuffle compare the highest score
plus the deviations
N The deviation percentage below which deviations
are used for comparisons. N=100 means that deviations are always used while N=0
means that raw scores are always used.
Shuffle is almost always called with the default
parameters as in the following example so the other variations will not be
discussed.
shuffle 1
In this example shuffle takes all of the overlap
tests that have been read into CONTRAST with the Set Ovl function and uses a "best first" method to
arrange the fragments so that fragments with the best connectivities to one
another are adjacent. The Shuffle function does not use amino acid tests or the
sequence of the protein to map the fragments onto the sequence. Instead it
forms the fragments into an unbroken circular chain. The user must determine from
the scores of the connections between the links where the chain should be
broken.
12.2 The AnnealBF
Function
The AnnealBF function employs a hybrid between a
"best first" and a simulated annealing algorithm to seek an optimum
mapping of the fragments to the protein sequence. The function uses overlap
tests and amino acid tests together with the sequence of the protein to make
sequential assignments. Rather than searching for a minimum "energy"
like traditional simulated annealing algorithms, this algorithm searches for a
global maximum score which is a scaled combination of all of the connectivity
and amino acid scores plus bonuses such as comment label bonuses (see Section
@). The command line call to the AnnealBF function has the following syntax.
annbf S[,]
["Temp[, Tfactor[, MaxPerTemp[, MinPerTemp[, LoTemp]]]]"] [-F] [-O]
[-x N]
S The name or number of the source spectrum.
Temp The percentage of the highest possible
temperature change that will be used to determine the starting temperature.
Tfactor The percentage by which the temperature gets
reduced at each annealing round.
MaxPerTemp The number of attempted moves divided by
the number of source peaks per temperature level before the temperature is
lowered.
MinPerTemp The number of successful moves at a
temperature level that will allow for an early exit from that level.
LoTemp The absolute temperature that the algorithm
must reach before exiting.
F A flag that can have the values of either 's',
'm', or 'u' that determines how well connectivity scores are scaled to fall
between 0 and 100.
's' Rigorously scaled connectivity scores. (Requires
more calculation time.)
'm' Moderate level of scaling.
'u' Unscaled connectivity scoring.
O An optional flag that can have a value of either
'b' or 'l' and that determines the type of overlap scoring used by the AnnealBF
algorithm.
-b [DEFAULT] Nonlinear overlap scoring (bonus
awarded for the best overlap scores).
-l Linear overlap scoring (no bonuses awarded).
N Value that follows the -x flag that indicates the
number of extra cycles that the algorithm will go through at the end of the
simulated annealing cycle. (Default: End with the simulated annealing cycle.)
Suggested: -x1
Note that if an optional quoted parameter is to be
specified, then all of the parameters before it must also be specified, since
the parameters are defined by their positions in the parameter sequence. This
function and the other Anneal functions are the only functions that use such a
parameter list. The following is an example of a call to the AnnealBF function.
annbf 1, "50,
2, 100, 10, .1" -u
In this example all of the default values for
optional parameters are indicated. To change only the Temp the function can be
called:
annbf 1,
"40" -u
but to change the MinPerTemp value the function must
be called:
annbf 1, "50,
2, 100, 8" -u
or
annbf 1, "0,
0, 0, 8" -u
where parameters given zero values are automatically
set to their default values. The default values are the same for all of the
other Anneal functions, but the AnnealBF function is the only function that
takes the -b/-l and the -x N flags.
The Temp parameter can set to achieve different
effects. A Temp parameter of over 100% assures that the combinatorial
optimization will be done from a completely random starting point while Temp
parameters of less than 5% can be used to make minor refinements in a
sequential ordering. Setting the temp factor to a very small value (eg. .0001)
causes the function to behave like a conjugate gradient maximizer so that it
can only find local maxima.
Chapter 13
Output Files
CONTRAST output files are for the most part
low-level representations of the internal states of the buffers in the program.
Very little effort has been made to simplify or interpret the data in the
program for two reasons: 1) To do so would hide information from the user, 2)
To do so might lead the user to have a false confidence in the assignments.
CONTRAST output files force the user to analyze the data in order to extract
assignments. It is hoped that this analysis will bring to light the ambiguous
or incorrect assignments that are almost always a part of any assignment
process. CONTRAST should be used in an iterative and interactive process that
involves the user's judgment as much as possible between assignment rounds.
The next several sections describe CONTRAST output
options. These options should not be viewed as mutually exclusive. Several of
these options should be used at each round of assignments.
The display function is used to enter display mode
for interactive viewing of internal buffer information. Display mode is entered
by typing 'd' followed by RETURN at the CONTRAST command line. Once in the
display mode any number of display commands can be executed interactively. Most
of these commands are one character commands which do not require a RETURN to
be entered after the character is typed. Display commands can be launched
non-interactively from the CONTRAST command line by typing "d "
followed by the capitalized display command character. This will execute the
display command and automatically return to the CONTRAST command line.
Display Mode Commands:
0 Activates all buffers so that all buffers will be
effected by commands.
Num
Activate the buffer number Num so
that that buffer will be effected by commands.
a Start at first buffer and show buffers with column
widths and separations determined automatically.
b Also HOME key on most systems. Display beginning
of buffer.
c Set the buffer columns to be displayed.
d Also DOWN key on most systems. Move active
buffer(s) one row down.
e Edit indicated buffer fields.
f Enter field format.
gX Go to buffer number X.
h Help. View partial list of display commands.
i Toggle on/off information string for active
buffers.
l Also LEFT key on most systems. Shifts display
window one buffer to the left.
m Displays buffers using current buffer marker
position.
n Toggle on/off buffer name for active buffers.
o Also PAGEDOWN key on most systems. Move active
buffer(s) one page down.
p Also PAGEUP key on most systems. Move active
buffer(s) one page up.
q Quit display mode.
r Also RIGHT key on most systems. Shift display
window right by one buffer.
s Select buffers for display. (A suboption of the
'c' command.)
t Toggle on/off titles of fields in active buffers.
u Also UP key on most systems. Move active column(s)
one row up.
vc Video Columns. Set the number of columns on video
screen.
vr Video Rows. Set the number of rows on video
screen.
wb Write Buffer. Write buffer to an ASCII file.
ws Write Spectrum. Write spectrum to an ASCII file.
x Don't change current settings. (An escape from the
'c' command.)
z Start at last buffer and show buffers with column
widths and separations determined automatically.
-X Shift display window left by X buffers.
+X Shift display window right by X buffers. (Same as
=X).
=X Shift display window right by X buffers. (Same as
+X).
The DisplayToFile (dtf) command prints the contents
of all the buffers to a file in a format similar to that of the interactive
Display command. The following is the syntax of the dtf command.
dtf [>]file.name
[-w||-a] [-v||-h] ["Header"]
Fname The name of the file to which the buffer
information is written.
-w Overwrite Flag. Causes the file to be overwritten
if it already exists. [default]
-a Append Flag. Causes the file to be overwritten if
it already exists.
-v Vertical Flag. Causes the buffers to be written
vertically (on sequential lines) in the file. [default]
-h Horizontal Flag. Causes the buffers to be written
horizontally (across lines) in the file.
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ V 138
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
FRAGMENT 117: < 8.78122 124.384 173.658 1 >
Comment: hnco117
NEXT FRAGMENT: 105 < 8.64135 116.992 174.554 1
> Comment: hnco105
Top Scoring Fragments: (Choice = 1)
>NEXT = 105 REPEATS = 3 SCORE = 16.5191 [ambig =
22.34]
NEXT = 98 REPEATS = 3 SCORE = 13.2406
NEXT = 99 REPEATS = 3 SCORE = 12.8427
NEXT = 2 REPEATS = 2 SCORE = 12.3592
NEXT = 33 REPEATS = 2 SCORE = 7.974
Buffers:
hnco: (Buffer #697) Search: d1 8.7812
.05 and d2 124.3840 .25
# spectr dev rep ire < Hn N CO ntens
> comment
1:hnco 2.40 1 1 < 8.78 124.38 173.66
1 > hnco 117
hnca: (Buffer #698) Search: d1 8.7812 .05 and d2
124.3840 .25
# spectr dev rep ire < Hn N Ca ntens > comment
1:hnca 1.89 1 1 < 8.77 124.46 62.09 1 > hnca
223
hncoca: (Buffer #701) Search: d1 8.7812 .05 and d2
124.3840 .25
# spectr dev rep ire < Hn N <Ca ntens >
comment
1:hncoca 1.42 1 1 < 8.79 124.37 62.54 1 >
hncoca 12
2:hncoca 0.44 1 1 < 8.82 124.48 51.52 1 >
hncoca 200
tocsy: (Buffer #702) Search: d1 8.7812 .05 and d2
124.3840 .25
# spectr dev rep ire < Hn N Ha ntens > comment
1:tocsy 2.24 1 1 < 8.79 124.39 4.35 1 > tocsy
223
hcaco: (Buffer #700) Search: d1 4.3473 .05 and d2
62.0887 .25
# spectr dev rep ire < Ha Ca CO ntens >
comment
1:hcaco 0.94 1 1 < 4.36 62.13 175.44 1 > hcaco
144
2:hcaco 0.64 1 1 < 4.33 62.20 174.51 1 > hcaco
19
hcan: (Buffer #699) Search: d1 4.3473 .05 and d3
62.0887 .25
# spectr dev rep ire < Ha N Ca ntens > comment
1:hcan 0.68 1 1 < 4.31 116.55 62.09 1 > hcan
178
2:hcan 0.62 1 1 < 4.36 110.81 62.24 1 > hcan
59
The ShuffleToSpectrum (sts) command is used to
rearrange the source spectrum in the new "shuffled" order determined
by sequencing functions such as Shuffle or AnnBF. The current source spectrum
is copied in the new shuffled order to a new spectrum. This new spectrum can
then be written to a spectrum file with the WriteSpectrum
(ws) command. The ShuffleToSpectrum /
WriteSpectrum sequence is useful for
doing successive rounds of assignments, since the starting order of fragments
is determined by the order of the peaks in the source spectrum.
The syntax of the ShuffleToSpectrum (sts) file
follows.
sts S[,]
[[>]Fname]
S The name or number of the source spectrum.
Fname The file name to which the shuffled source
spectrum is written.
13.5 WriteSpectrum (ws)
The WriteSpectrum
command writes a spectrum in memory to a CONTRAST spectrum file. The function
reads all of the information in memory and combines that information with any
initial header comments from the original spectrum's file in order to create a
complete spectrum file.
If a spectrum name is not given, then a name will be
created by incrementing the numeric part of the original file's suffix (eg.
hnco.con is converted to hnco.con2 and hnca.con3 is converted to hnca.con4).
Syntax:
ws S [[>]Fname]
S The name or number of the spectrum to be written
to a file.
Fname The name of the new spectrum file to be
written.
Appendix A: GLOSSARY
List Generic term referring to either spectra or
buffers.
set of actions. The search can be performed on any
attribute of the peak (eg. dimension 1 > 4.5, intensity < 100, comment
> 5, or score > 29). The extra attributes associated with peaks in
buffers (eg. score, deviation value, etc.) can also be used in the search.
Other peaks or groups of peaks can also be used in the search. A Boolean
expression is used to define the parameters for a search. Boolean search
expressions should always be enclosed in parentheses on the command line. A '!'
symbol preceding such a parenthetic Boolean indicates that the complement of
the set should be used instead of the peaks found in the normal search.
Example: !(d1 < 4.5)
Target The targets of a search are the delimiters
that are used in the search. In the search (d1 < 4.5) the first dimensions
(d1) of the peaks in the list being searched make the Boolean true if they are
less in value than the target value of 4.5. If the peaks of a list are used as
target values to search another list, the targets are often expressed with the
dimension number following a '%' symbol. For example, (%1 > d2) indicates
that the second dimension (d2) of a peak in the list being searched must be
smaller than the first dimension of a peak in the target list (%1).
Match If a peak or list of peaks is used to generate
target values, a peak that is found in a successful search is said to match the
peak used to generate the target value(s).
Tolerance Ranges of values centered at a target
value are usually used in searches to do assignments due to the lack of
precision of NMR data. A tolerance is one-half of the width of this range. In
the Boolean search (6.5 <.02> d1) a "tolerance operator" is
specified in order to make the Boolean true iff d1 is between 6.5 + .02 and 6.5
- .20. The opposite case (6.5 >.02< d1) is true when d1 is greater than
6.7 or less than 6.3.
Flags Alphanumeric characters used to modify the
performance of a given function. Flags are immediately preceded by hyphens on
the command line. In the following example the '-s' flag causes the Scan function to compare matching peaks
using "scaled scores" and the '-b' flag causes the function to return
only the "best" scoring match. BUFFER NAMES: The HNCA peaks for which
the Boolean expression is true are added to a buffer which is given the default
name, "|hnca," taken from the name of the spectrum being searched. A
different name could be specified by adding it to the Scan expression AFTER the Boolean expression. NOTE: buffer names
are always designated with a "|" symbol! The following example saves
the matching peaks found by the search to a buffer named "|fred":
scan hnco hnca (%2 <.2> d2 || %1 <.02>
d1) |fred
If the Boolean of the above example did not contain
"%"s to indicate a search target, then the roles of the two listed
spectra would be interpreted differently by the search function. In the
examples:
scan hnco hnca (8.5 <.02> d1 || 114 <.2>
d2)
scan hnco hnca (114 <.2> d2 || 8.5 <.02>
d1)
Example: scan 1, 1 2 (d1 < 3.0) -s-b |fred
Booleans
Resonance
Fragment
Comment
Coordinate
Dimension
Target
Deviation
Spectrum
Buffer
Assignment Buffer
Working Buffer
Primary Assignment
Sequential Assignment
Amino Acid Type Assignment
BASIC SEARCHES: Most functions that involve searches
use a standard Boolean format. The format is necessarily rigid since there are
so many options in performing searches in automated assignment work. The format
of a basic search is demonstrated by the equivalent Scan function:
scan hnca (d1 > 7.2 && d2 <.02>
114.2)
scan hnca (7.2 < d1 && d2 <.02>
114.2)
In this example, the first dimension of each peak in
the HNCA spectrum is searched for peaks whose first dimension (d1) is greater
than 7.2 and (&&) whose second dimension (d2) is within a tolerance of
+/- .02 of 114.2.
VARIABLE TARGETS: In the following equivalent
examples:
scan hnco hnca (%1 <.02> d1 || %2 <.2>
d2)
scan hnco hnca (%2 <.2> d2 || %1 <.02>
d1)
peaks in the HNCO spectrum are used to generate
targets for searches of the HNCA spectrum. %1 and %2 refer to dimensions one
and two respectively of the HNCO spectrum, and d1 and d2 refer to dimensions
one and two of the spectrum which is being searched (hnca). Targets are taken
from each peak of the HNCO spectrum, and each peak of the hnca spectrum is
searched using each set of targets.
two separate
searches are performed. The HNCO spectrum is searched and matching peaks are
added to a buffer named "|hnco", and the HNCA spectrum is then
searched and the matching peaks are added to a buffer named "|hnca".
In the example:
scan hnco hnca (8.5 <.02> d1 || 114 <.2>
d2) |fred
matching peaks from both of the searches are added
to a single buffer named "|fred". Finally, in the example:
scan hnco hnca hncoca (8.5 <.02> d1 || 114
<.2> d2) |co |ca
matching peaks from HNCO are added to the |co
buffer, matching peaks from hnca are added to the |ca buffer, and because there
are no more buffer names listed, matching hncoca peaks are added the |ca buffer
(the last buffer listed).
SPECIFYING LISTS: Both buffers and spectra can be
searched using search routines. Buffers are referred to as either the buffer
name or number following the "|" symbol, and spectra are referred to
in the same fashion but without the "|" symbol. For example:
scan hnco |hnca hncoca (8.5 <.02> d1 || 114
<.2> d2) |co |ca
searches a spectrum, a buffer, and then a spectrum.
scan 1 |1 2 (8.5 <.02> d1 || 114 <.2>
d2) |fred
searches a spectrum, a buffer, and then a spectrum.
scan 1 |hnco hnca (8.5 <.02> d1 || 114
<.2> d2)
searches a spectrum, a buffer, and then a spectrum.
PARENTHESIS: AND SPACES: All Boolean expressions
must be enclosed by parentheses and followed by at least one space. Parentheses
can be nested to any level. Spaces are used within Boolean expressions only as
delimiters -- otherwise they are ignored.
ARGUMENTS: (VALUES:) The arguments compared in
Boolean expressions may take many different forms. The following examples give
an idea of the argument syntax used in Boolean expressions. There are many
different attributes of peaks in lists and all of them may be accessed to some
extent in Boolean expressions. The values can be a part of mathematical
expressions of any complexity just as long as each expression contains less
than two variables to be stepped through by the Boolean. A list of functions
recognized by Booleans follows this section.
&a = the value of the variable 'a'.
23.1 = a number.
e = 2.7182818
PI = 3.1415927
#|fred = the number of peaks in buffer fred.
w|fred = the level of the buffer fred.
m|fred = the number of dimensions in the indicated
peak.
d1|fred,4 = the first coordinate of the 4th peak in
fred (or p1).
dx|fred,4 = if any of coordinates of fred matches
test.
da|fred,4 = if 2 peaks are being compared, then all
dimensions must match.
dc|fred,4 = if 2 peaks are being compared, then
combinations of at least the minimum number of dimensions between the peaks
must match.
d1 = the first dimension of either the buffer or
spectrum to be searched.
%1 = the coordinate of the first dimension of the
source spectrum.
d1|fred = the d1 value of all combinations of peaks
in fred.
d1|fred,f4 = the d1 value of the first four peaks in
fred.
i|fred,b = the intensity of the first peak (or d0 or
p0) in fred.
c|fred,e = the numeric part of the comment from the
last peak in fred.
v|fred,h = the value of the highest valued peak in
fred.
g|fred,l = the lowest grade in buffer fred.
d|fred,1 = the deviation of the first peak in buffer
fred.
n|fred,1 = the number of internal repeats of the
first peak in buffer fred.
t|fred,1 = the tolerance of the value of the first
peak in buffer fred.
r|fred,1 = the number of repeats for the first peak
in buffer fred.
NOTE: The "|" (buffer) signs above can be
replaced with "$" (spectrum) signs to specify spectra instead of
buffers. For example:
i$fred,b = the intensity of the first peak in the
spectrum fred.
d1$3,h = the value of the highest first dimension in
spectrum fred.
#$fred = the number of peaks in the spectrum fred.
d1$hnca,4 = the d1 value of the fourth peak in the
spectrum hnca.
d1$3,f4 = the d1 value of the first four peaks in
the third spectrum.
EXCEPTION: The "w" field corresponds to
the column width when used with spectra, but when used with buffers, it refers
to the buffer level.
w$fred = the column width of the spectrum fred.
FUNCTIONS: The following is a list of the most
common functions that can be used in Boolean arguments.
* Multiplication. ( w|fred * 2.5)
/ Division. ( d1 / 2 )
+ Addition. ( 23.1 + 413.23 + PI )
- Subtraction. ( d1$fred,4 -8 )
^ To the power of. ( 4^(2^2) )
% Modulus. ( 5 % 2 )
cos Cosine (in degrees). ( cos(90) )
sin Sine (in degrees). ( sin(90) )
tan Tangent (in degrees). ( tan(90) )
log Log based ten. ( log(1) )
ln Natural log (base e). ( ln(.5) )
OPERATORS: AND CONJUNCTIONS: The following is an
exhaustive list of legal operators and conjunctions in Boolean statements.
> Greater Than (combinatorial)
>= Greater Than or Equal To (combinatorial)
< Less Than (combinatorial)
<= Less Than or Equal To (combinatorial)
= Equals (combinatorial)
!= Not Equals (combinatorial)
<tol> Within a Tolerance (tol) of
(combinatorial)
>tol< Outside a Tolerance (tol) of
(combinatorial)
& And (combinatorial)
| Or (combinatorial)
>> Greater Than (synchronous)
>>= Greater Than or Equal To (synchronous)
<< Less Than (synchronous)
<<= Less Than or Equal To (synchronous)
== Equals (synchronous)
!!= Not Equals (synchronous)
<<tol>> Within a Tolerance (tol) of
(synchronous)
>>tol<< Outside a Tolerance (tol) of
(synchronous)
&& And (synchronous)
|| Or (synchronous)
!(Boolean) The complement of the set of matches
found with Boolean.
SYNCHRONOUS: vs. COMBINATORIAL: OPERATORS: AND
CONJUNCTIONS: Synchronous operators and conjunctions (doubled symbols, e.g.
">>") force the arguments they act on to come from the same
peak or from comparable peaks in the specified lists. For example, in the
Boolean
spec1 spec2 (%1 >> d1)
the first peak in spec1 is compared to the first
peak in spec2, the second peak in spec1 is compared to the second peak in
spec2, and so on until the end of either spec1 or spec2 is reached. On the
other hand, combinatorial operators and conjunctions (single symbols, e.g.
">") force all combinatorial possibilities of peaks between the
two arguments to be considered whenever the two arguments are from different
lists. For example, in the Boolean
spec1 spec2 (%1 >> d1)
the first peak in spec1 is compared to the first
peak in spec2, the first peak in spec1 is compared to the second peak in spec2,
and so on until the first peak in spec1 has been compared to all of the peaks
in spec2, and then all subsequent peaks in spec1 are likewise compared to each
peak in spec2. If d1 and %1 had referred to the same list, then a synchronous
comparison would have been performed. For synchronous operators, symmetry is
forced (where allowed by the identical list rule) on the two joined expressions
when as if the two expressions were symmetric.
RULES
1: If any two variable arguments in a Boolean
expression come from the same structure, whether buffer, spectrum or peak, then
they will be synchronized so that the same element will be used for each
argument as each element in the argument is stepped through.
2: Synchronized operators (eg. >> << ==
&& ||) force symmetry between operands unless overridden by rule number
1. The following example connects elements which are in sync.
___________________ ___
| ________|__________ ___|-due to &&
operator
|__________| | | ___
| | | | `-due to <<>> operator
(d1 <<.02>> %1 && d2|fred >
d2|barney)
prompts: In CONTRAST the user enters commands and
data at prompts. Prompts from main menu of commands use the '>' character.
At these prompts and similar prompts within other command menus, simply enter
the command and type return. Other prompts will suggest a default value in
arrow brackets (ex: <1.23>: ) which can be accepted unaltered by typing
return or another value can be entered by typing the new value and returning.
The escape key followed by one of the following characters has the following
effect at any type of prompt.
OPTIONS:
ESC-ESC = Escapes out of loops or routines.
ESC-Q = Drops out of the current routine or shell.
ESC-E = Edit the current value being entered. (see
editing:)
ESC-D = Edits the Default value for the prompt. (see
editing:)
ESC-S = Shells to main command menu.
ESC-Z = Shells to operating system.
ESC-LS = Displays the file names in the current
directory.
ESC-O = Displays these prompt options.
ESC-H = Context sensitive help. (see page:)
<-- = Edit current (or default if no characters
entered yet) string.
DEL = Delete current string and start over.
BKSPC = Delete last character of current string.
Appendix B: CONTRAST
COMMANDS
This section contains an alphabetical list of
CONTRAST commands and includes command syntaxes, examples, and known bugs. The
following is a sample entry for a command.
CommandAbbreviation (Command Name)
DESCRIPTION
A description of the use of the CONTRAST command.
EXAMPLES
A list of examples of the use of the CONTRAST command
and a description of the effect of each example.
SYNTAX
cmnd >value
[optional] [opt1 | opt2 | opt3] [-flag1] [-flag2] last
cmnd A valid abbreviation for the command.
[] Optional parameters.
| Or. The use of one parameter excludes the use of
the other.
-flag Each flag or represents a different command
option. A hyphen must precede each individual flag.
italics
Text is italicized to show that it should not be taken literally but should be
replaced
with the text or values that the italicized text
describes. E.g. >filename would be
replaced with the name of a file.
-> In the syntax section as well as in actual
CONTRAST commands, this symbol indicates that a line has been broken and is
continued on the next line.
line Underlined parameters
must be included at the indicated positions on the command line. (Although the
order of parameters is unimportant in many CONTRAST commands, it is a good
practice to write command parameters in the order that they are listed in the
syntax statement. All whitespace characters such as spaces and tabs are ignored
by CONTRAST.
CAVEATS
RELATED COMMANDS
BUGS
help:, "h", "?"
"al" <align> Calculates rough
alignment corrections between spectra.
"ala" <auto lock all> Locks
fragments together based on input criteria.
"alt" <alter> Operates on columns in
a spectrum (-l to list, -f to file).
"ann" <anneal> Uses simulated
annealing to order fragments.
"ap" <autparamset> Allows user to
set the parameters used for AUTO.
"aut" <auto> Does automated
connectivity tracing (CONTRAST).
"beep" <beep> Produces audible beep.
Useful for alerting user to macro end.
"bob" <buffer overlap buffer> Prints
the degree of overlap between buffers.
"btf" <buffer to file> Prints
specified buffer contents to a file.
"bye" <bye> Also "q"
<quit>. Exits program or subprogram.
"cbl" <create buffer link> Allocates
buffer links for each peak in a spec.
"clr" <clrb> Clears indicated buffer
(default: clears ALL buffers).
"cls" <cls> Clears the screen.
"cob" <children overlap buffer>
"com" <compress> Compresses, sorts
and tabulates repeats for buffers.
"conv" <convert> Converts peak list
files to CONTRAST format.
"cyc" <cycle> Loops through a series
of commands (-b to begin, -c to clear).
"csa" <combined search all>
"cs" <combined search>
"d" <display> Interactive display of
one or more buffers in columns.
"dir" <directory> Displays current
directory.
"df" <doubletfilter> Removes the
doublets from a spectrum.
"ed" <edit> Edits the following command
or last command by default.
"ev" <evaluate> Evaluates a numeric
expression.
"exe" <execute> Executes a macro
file.
"fill" <fill> Fills in missing peaks
in one spectrum with peaks from another.
"fit" <fit> Calculates least squares
fit to a straight line in a matched file.
"fit0" <fit0> Calculates zero order
fit in a matched file.
"fd" <full display> Displays the specified
contents of a buffer.
"fdf" <full display to file> Writes
specified contents of a buffer to a file.
"fs" <fscan> Fast scan for one
target using rapid search from HASH table.
"h","?" <help> Pages to
this menu or to specified command.
"hash" <hash> Creates HASH table
from active spectra for use by FSCAN.
"int" <intersect> Takes an
intersection of specified buffers.
"inta" <intersect all> Takes an
intersection for each fragment.
"key" <key binding> Sets the return
values of keystrokes.
"lf" <load file> Loads a spectrum in
CONTRAST format.
"load" <load> Loads spectra in
CONTRAST format using log file.
"lock" <lock> Locks the keyboard
until the argument of lock is typed.
"mat" <match> Matches two spectra
(-l to list, -f to file).
"mal" <matchalign> Does brute force
fine tuning alignment of 2 spectra.
"mm" <main menu> Calls another shell
with the main menu.
"op" <operate> Operates on specified
dimensions of a spectrum.
"opf" <operate file> Operates on
dimensions of spectra and saves to a file.
"ord" <order buffer> Orders the
buffer(s) by the specified field.
"ordc" <order counter> Counts the
number of sequential comments in a spectrum.
"pa" <page> Pages through a file.
Allows for searches. Used by HELP.
"pr" <prompt> Prints prompt (for use
within CYCLE or EXE).
"prn" <print> Prints out the global
variable or variable list.
"pru" <prunebuffer> Filters multiple
occurrences of peaks in different buffers.
"ran" <random> Generates random
numbers.
"rec" <recycle> CYCLE command in
which edited commands are updated.
"rl" <readlog> Reads global
parameters from a log file.
"sa" <scan all> Scans spectra using
search string from each peak of spectra.
"saf" <saveasfile> Saves a buffer as
a peak list file in CONTRAST format.
"sbob" <single dimension buffer overlap
buffer> sbob |buf1,d1 |b2,d3 -s-b .1
"sa" <scan all> Searches other
spectra for each peak of a source spectrum.
"sc" <scan> Searches spectra using
search strings.
"sco" <score> Calculates the score
for all of the elements of a buffer.
"sd" <scaledev> Scales the deviation
values in a buffer.
"set" <set> Sets the values of
global variables and structures.
"sh" <shell> Shells out to operating
system.
"shuf" <shuffle> Orders the peaks in
source spectrum based on neighbor's score.
"si" <scale intensities> Scales the
intensities of all peaks in a spectrum.
"sn" <score neighbor> Determines
amount of overlap between 2 buffer sets.
"sp" <spec> Lists and allows user to
define active spectra.
"spl" <split> Splits peaks in buffer
into different values.
"ss" <sstr> Produces a list of past
search strings which can be edited.
"stf" <shuffle to file> Prints a
quick and dirty log of the shuffled spectra.
"stp" <shuffle to plot> Prints
shuffling to a data file to plot ambiguity levels.
"sts" <shuffle to spectrum> Produces
an ordered copy of the source spectrum.
"ti" <time> Displays the time and
date.
"timer" <timer> Calculates the time
between timer calls.
"ubl" <update buffer links>
Increases the number of buffer links.
"un" <union> Adds the contents of
two buffers to a third.
"una" <union all> Forms a union in
each fragment.
"wb" <write buffer> Writes buffer to
CONTRAST spectrum file.
"wl" <writelog> Writes global
parameters to a log file.
"ws" <write spectrum> Writes
spectrum to a CONTRAST spectrum file.
"q" <quit> Quits CONTRAST or exits
current routine.
"qq" <qq> Quits both CYCLE and
CONTRAST.
AAPM (Auto Amino Acid Probability )
DESCRIPTION
Reads in a file in a flat ascii format and generates
Amino Acid tests based on standard probability distributions of the amino acids
in the protein. Uses seq.con (protein sequence file) to calculate probabilities
based on the amino acid count of the protein and amino acid probability
distributions. Limit 20 dimensions.
SYNTAX
aapm source
>file.in >file.out -a-n; >Hx.aa
"d3|ntoc,1" (bin.1,av.05,lim.05); ->
>Ca.aa
"d3|hnca,1" (bin.5,av.1,lim.05); ...
source
-p,-n = prev test (paa) or next test (naa). DEFAULT
= normal test (aa).
-a,-w = append to existing file, or write new file
(DEFAULT).
Reads in file, file.in,
and uses the dimensions of file.in
specified in h1.aa to create an amino acid test for each amino acid listed in
h1.aa. Tests are automatically read into program and output is appended to
file, file.out. Above, source
spectrum is indicated with an integer 1.
EXAMPLES
aapm 1 >file.in
>file.out -a-n ->
>Hx.aa "d3|ntoc,1" (bin.1,av.05,lim.05)
->
>Ca.aa "d3|hnca,1" (bin.5,av.1,lim.05)
...
The arrow (->) indicates a continued line. This
is taken care of in MainMenu.
aapm 1 "d3|ntoc" (binwidth=.1, av = .05,
lower limit = .03) -a -n >flat.cmp >Hx.aa >Hx.mac
aapm 1 "d3|ntoc,1" (bin=.1,av=.05,lim=.03)
-a -n >flat.cmp >Hx.aa >Hx.mac
Example format of file, Hx.aa:
d3 = A 13 15
d3 = C 13 15 16
d3 = D 13 15 16...
CAVEATS
RELATED COMMANDS
BUGS
ATE, Aatesteval (AA Test Eval)
DESCRIPTION
Tests the spin system identification routines: aa,
paa & naa. The first part goes through each aa test entered and tells the
number of times each test...
SYNTAX
ate 1, >file.out
"Header to appear at top of entry in file " -a
-a append to file
-w write new file
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
AC (Auto Con) - see CT (Contrace)
DESCRIPTION
NOTE: Parenthesis in the Boolean expressions can
only be nested to a depth of MAXNEST = 10. NOTE: If math is to be done to calculate
the lock, the operations are performed one at a time in order. There is no
hierarchical order of operations.
SYNTAX
ala 1, if(score > 75% and diff >= 25% and num
= 1) lock = %s * 100.0
ala 1, lock if score > 50
ala 1, score > 50 or diff > 25% lock = %d
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
DESCRIPTION
SYNTAX
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
ALT, ALTF - see OP, OPF (Operate)
DESCRIPTION
Anneal algorithm that uses ovl tests (Set Ovl). To alter the annealing
schedule you must enclose all of the parameters. Use IN ORDER in quotation
marks with 0's for the default values. Otherwise default values will be used.
These values can be viewed and changed under: "set Temp", "set
Tfactor", "set MaxPerTemp", and "set MinPerTemp".
Temp = the percentage of the highest possible energy
change.
Set Temp to over 200% for simulated annealing from a
random start.
Set Temp to < 5% for using the routine to refine
a sequence.
Set Temp to a very small number to make it find only
a local minimum.
Tfactor = the percentage that the temperature gets
lowered at each annealing round.
Set Tfactor to > 8 for fast rough calculations
Set Tfactor to < 8 for slow refined calculations
MaxPerTemp: MaxPerTemp * NumPeaks = the number of
attempted moves per temperature level before the temperature is lowered.
MinPerTemp: MinPerTemp * NumPeaks = the number of
successful moves per temperature before the temperature is lowered.
SYNTAX
ann 1 "temp, tfactor, maxpertemp,
minpertemp"
ann 1 "50, 8, 100, 10"
[Note: The above values represent the default values.]
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
AAQ, ANNAQ (Anneal AQ)
DESCRIPTION
Anneal algorithm that uses ovl tests (Set Ovl) and aa tests (set aa ala) to
make assignments based on the sequence. This implementation uses the
inefficient but robust "best" algorithm which rearranges the sequence
by swapping groups of residues by breaking at weak links. The algorithm cycles
between using connectivity information and position information, connectivity
information only, and position information only to calculate the frequency that
proposed moves are accepted.
NOTE: to alter the annealing schedule you must
enclose all of the parameters. Use IN ORDER in quotation marks with 0's for the
default values. Otherwise, default values will be used. These values can be
viewed and changed under: "set Temp", "set deltaTemp", "set
MaxPerTemp", and "set MinPerTemp".
Temp = the percentage of the highest possible energy
change.
Set Temp to over 200% for simulated annealing from a
random start.
Set Temp to < 5% for using the routine to refine
a sequence.
Set Temp to a very small number to make it find only
a local minimum.
Tfactor = the percentage that the temperature gets
lowered at each annealing round.
Set deltaTemp to > 8 for fast rough calculations
Set deltaTemp to < 8 for slow refined
calculations
MaxPerTemp: MaxPerTemp * NumPeaks = the number of
attempted moves per temperature before the temperature is lowered.
MinPerTemp: MinPerTemp * NumPeaks = the number of
successful moves per temperature before the temperature is lowered.
loTemp: The absolute temperature that the algorithm
must go to before exiting.
SYNTAX
annaq 1 3 "temp, deltaTemp, maxperTemp,
minperTemp, loTemp" -s
annaq 1 3 "50, 2, 100, 10, .1" -u
1 = source spectrum
3 = 3 temperature levels before switching the way
proposed moves are accepted.
50 = percentage of the highest possible temperature
change of the initial temperature.
2 = percentage that temperature gets reduced by at
each annealing round.
100 = the number of attempted moves (x numPeaks) per
temperature level before the temperature is lowered.
10 = the number of successful moves per temperature
level before the temperature is lowered.
.1 = The absolute temperature that the algorithm
must reach before exiting.
[Note: The above values represent the default
values.]
How well are connectivity scores scaled to fall
between 0 and 100?
-s Rigorously scaled connectivity scores. (Requires
more calculation time.)
-m Moderate level of scaling.
-u Unscaled connectivity scoring.
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
ABQ, ANNBQ (Anneal BQ)
DESCRIPTION
Anneal algorithm that uses ovl tests (Set Ovl) and aa tests (set aa ala) to make
assignments based on the sequence. This implementation uses the inefficient but
robust "best" algorithm which rearranges the sequence by swapping
groups of residues by breaking at weak links. NOTE: to alter the annealing
schedule you must enclose all of the parameters. Use IN ORDER in quotation
marks with 0's for the default values. Otherwise default values will be used.
These values can be viewed and changed under: "set Temp", "set
Tfactor", "set MaxPerTemp", and "set MinPerTemp".
Temp = the percentage of the highest possible energy
change.
Set Temp to over 200% for simulated annealing from a
random start.
Set Temp to < 5% for using the routine to refine
a sequence.
Set Temp to a very small number to make it find only
a local minimum.
Tfactor = the percentage that the temperature gets
lowered at each annealing round.
Set Tfactor to > 8 for fast rough calculations.
Set Tfactor to < 8 for slow refined calculations.
MaxPerTemp: MaxPerTemp * NumPeaks = the number of
attempted moves per
temperature before the temperature is lowered.
MinPerTemp: MinPerTemp * NumPeaks = the number of
successful moves per temperature before the temperature is lowered.
loTemp: The absolute temperature that the algorithm
must reach before exiting.
SYNTAX
annlbq 1 "temp, tfactor, maxpertemp,
minpertemp, loTemp" -s
annlbq 1 "50, 2, 100, 10, .1" -u
1 = source spectrum
50 = percentage of the highest possible temperature
change of the initial temperature.
2 = percentage that temperature gets reduced by at
each annealing round.
100 = the number of attempted moves (x numPeaks) per
temperature level before temperature is lowered.
10 = the number of successful moves per temperature
level before temperature is lowered.
.1 = The absolute temperature that the algorithm
must reach before exiting.
[Note: The above values represent the default
values.]
How well are connectivity scores scaled to fall
between 0 and 100?
-s Rigorously scaled connectivity scores. (Requires
more calculation time.) (DEFAULT)
-m Moderate level of scaling.
-u Unscaled connectivity scoring.
-b Nonlinear overlap scoring (bonus awarded for best
overlap scores). (DEFAULT)
-l Linear overlap scoring (no bonuses awarded).
-x # The number of extra cycles that the algorithm
will go through at the end of the simulated annealing cycle. (Default: End with
the simulated annealing cycle.) Suggested: -x1
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
ALQ, ANNLQ (Anneal LQ)
DESCRIPTION
Anneal algorithm that uses ovl tests (Set Ovl) and aa tests (set aa ala) to make
assignments based on the sequence. This implementation uses the inefficient but
robust "long" algorithm which rearranges the sequence by swapping
groups of residues. To alter the annealing schedule you must enclose all of the
parameters. Use IN ORDER in quotation marks with 0's for the default values.
Otherwise default values will be used. These values can be viewed and changed
under: "set Temp", "set Tfactor", "set
MaxPerTemp", and "set MinPerTemp".
Temp = the percentage of the highest possible energy
change.
Set Temp to over 200% for simulated annealing from a
random start.
Set Temp to < 5% for using the routine to refine
a sequence.
Set Temp to a very small number to make it find only
a local minimum.
Tfactor = the percentage that the temperature gets
lowered at each annealing round.
Set Tfactor to > 8 for fast rough calculations
Set Tfactor to < 8 for slow refined calculations
MaxPerTemp: MaxPerTemp * NumPeaks = the number of
attempted moves per
temperature before the temperature is lowered.
MinPerTemp: MinPerTemp * NumPeaks = the number of
successful moves per
temperature before the temperature is lowered.
SYNTAX
annlq 1 "temp, tfactor, maxpertemp,
minpertemp" -s
annlq 1 "50, 8, 100, 10" -u
[Note: The above values represent the default
values.]
How well are connectivity scores scaled to fall
between 0 and 100?
-s Rigorously scaled connectivity scores. (Requires
more calculation time.)
-m Moderate level of scaling.
-u Unscaled connectivity scoring.
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
ANNQ, (ANNEAL Q)
DESCRIPTION
Anneal algorithm that uses ovl tests (Set Ovl) and aa tests (set aa ala) to
make assignments based on the sequence. This implementation uses the efficient,
but overly simplified "swap" algorithm which rearranges the sequence
by swapping individual residues. To alter the annealing schedule you must
enclose all of the parameters. Use IN ORDER in quotation marks with 0's for the
default values. Otherwise default values will be used. These values can be
viewed and changed under: "set Temp", "set Tfactor",
"set MaxPerTemp", and "set MinPerTemp".
Temp = the percentage of the highest possible energy
change.
Set Temp to over 200% for simulated annealing from a
random start.
Set Temp to < 5% for using the routine to refine
a sequence.
Set Temp to a very small number to make it find only
a local minimum.
Tfactor = the percentage that the temperature gets
lowered at each annealing round.
Set Tfactor to > 8 for fast rough calculations
Set Tfactor to < 8 for slow refined calculations
MaxPerTemp: MaxPerTemp * NumPeaks = the number of
attempted moves per
temperature before the temperature is lowered.
MinPerTemp: MinPerTemp * NumPeaks = the number of
successful moves per
temperature before the temperature is lowered.
loTemp: The absolute temperature that the algorithm
must go to before exiting.
SYNTAX
annq 1 "temp, tfactor, maxpertemp,
minpertemp" -s
annq 1 "50, 2, 100, 10, .1" -u
1 = source spectrum
50 = percentage of the highest possible temperature
change of the initial temperature.
2 = percentage that temperature gets reduced by at
each annealing round.
100 = the number of attempted moves (x numPeaks) per
temperature level before the temperature is lowered.
10 = the number of successful moves per temperature
level before temperature is lowered.
.1 = The absolute temperature that the algorithm
must reach before exiting.
[Note: The above values represent the default
values.]
How well are connectivity scores scaled to fall
between 0 and 100?
-s Rigorously scaled connectivity scores. (Requires
more calculation time.)
-m Moderate level of scaling.
-u Unscaled connectivity scoring.
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
A3Q, ANN3Q (Anneal 3Q)
DESCRIPTION
Anneal algorithm that uses ovl tests (Set Ovl) and aa tests (set aa ala) to
make assignments based on the sequence. This implementation uses the
inefficient but robust "best" algorithm which rearranges the sequence
by swapping groups of residues by breaking at weak links. The algorithm cycles
between using connectivity information and position information, connectivity
information only, and position information only to calculate the frequency that
proposed moves are accepted.
To alter the annealing schedule you must enclose all
of the parameters. Use IN ORDER in quotation marks with 0's for the default
values. Otherwise default values will be used. These values can be viewed and
changed under: "set Temp", "set deltaTemp", "set
MaxPerTemp", and "set MinPerTemp".
Temp = the percentage of the highest possible energy
change.
Set Temp to over 200% for simulated annealing from a
random start.
Set Temp to < 5% for using the routine to refine
a sequence.
Set Temp to a very small number to make it find only
a local minimum.
Tfactor = the percentage that the temperature gets
lowered at each annealing round.
Set deltaTemp to > 8 for fast rough calculations.
Set deltaTemp to < 8 for slow refined
calculations.
MaxPerTemp: MaxPerTemp * NumPeaks = the number of
attempted moves per
temperature before the temperature is lowered.
MinPerTemp: MinPerTemp * NumPeaks = the number of
successful moves per
temperature before the temperature is lowered.
loTemp: The absolute temperature that the algorithm
must reach before exiting.
SYNTAX
ann3q 1 3 "temp, deltaTemp, maxperTemp,
minperTemp, loTemp" -s
ann3q 1 3 "50, 2, 100, 10, .1" -u
1 = source spectrum
3 = 3 temperature levels before switching the way
proposed moves are accepted.
50 = percentage of the highest possible temperature
change of the initial temperature.
2 = percentage that temperature gets reduced by at
each annealing round.
100 = the number of attempted moves (x numPeaks) per
temperature level before the temperature is lowered.
10 = the number of successful moves per temperature
level before temperature is lowered.
.2 = The absolute temperature that the algorithm
must reach before exiting.
[Note: The above values represent the default
values.]
How well are connectivity scores scaled to fall
between 0 and 100?
-s Rigorously scaled connectivity scores. (Requires
more calculation time.)
-m Moderate level of scaling.
-u Unscaled connectivity scoring.
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
AP, APS, APSET, AUTP (Aut Param Set)
DESCRIPTION
Displays the parameters for the automated tracing of
spin systems so that the user can change any parameter by typing the letter
representing that parameter and following the instructions, or the user can
accept the parameters by hitting the return key. Prints the screen that lets
you change the parameters in autorec. Returns(0) if 'q' is hit.
Starting Points:
Creating a new file of starting points will create a
'spectrum' which contains the input starting points. Starting points should be
entered so that each different type of nucleus is in a different column.
SYNTAX
AutParamSet( aut, &plr, 30 );
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
DESCRIPTION
Reads in chemical shift range information (set
shift) and fragment definitions (Set Frag)
that have been already read into CONTRAST. If this information has not been
read into CONTRAST the function will look for separate macro files on the
command line with the chemical shift range file first (default file name,
chemshft.mac), and the fragment template file (default file name, fragment.mac)
second. Function will create a set of AA tests (default file name, aa.mac)
which it enters into the program. If the AA test file already exists, it will
be overwritten. If Set Shiftr or Set Frag has already been used to load
those tests, then the filenames for the tests are not necessary, and the files
will not be visited even if the filenames are included.
SYNTAX
autoaa >aa.mac [chemshft.mac] [fragment.mac] [-d]
[-res RangeResolution] [-max MaxAA]
-d: The "do flag". If -d is used, then
autoaa will load the command into memory as it is created; otherwise, (by
default) the indicated file will be created and the exe command must be used
later to load the "setaa" commands. NOTE: that the buffers must first
be created before Reside can be run
with the '-d' option.
-res: RangeResolution is the desired resolution of
the AA tests generated by the tests. A resolution of .2 means that there will
be no chemical shift ranges generated that are smaller than .2 ppm.
-maxa: MaxAA is the maximum number of amino acids
that a test can score true for the given sequence. If MaxAA is input as a
fraction then the number of AA in the sequence will be used to generate the
maximum number of AA.
-max1,2,3...: The maximum number of amino acids that
can score true for a given test that has 1,2,3,... dimensions.
-mint: Mintests is the minimum number of tests per
amino acid type.
-maxt: Maxtests is the maximum number of 1D tests in
which the number of amino acids that the test is true for is greater than
MaxAA.
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
AUTOTRACE (formerly CONTRACER)
DESCRIPTION
Note: pr.score = -1 when the object has been
included in spin system.
SYNTAX
AutoTrace( &aut, choptr );
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
DESCRIPTION
SYNTAX
Beep();
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
DESCRIPTION
Calculates and prints the amount of overlap between
two buffers by comparing the two buffers on a coordinate by coordinate (as
opposed to value by value) basis.
SYNTAX
BufferOverlapBuffer(plrptr1, plrptr2, &rep,
&srep, tol, flag);
bob -a |plr1 |plr2 .02
-a = all coordinates checked
-m = minus the matches (default)
-c = split and compress the plr's first
(NOTE: always does the split without including the
matches (involves less doubling))
EXAMPLES
CAVEATS
RELATED COMMANDS
sbob
BUGS
BOP, BOLP, BOVP (Buff Overlap Peak)
DESCRIPTION
Returns the number of times a member of L is also
found in peak in rep. Returns the # of times a member of L is found in peak
scaled by L dev, L internal confidence fractions, L intens, dev from overlap.
Reps are taken care of in compressed list. Each peaks must get divided by their
internRep for each peak in the buffer.
NOTE: When creating globtol, put most specific
labels first and more general last to serve as defaults. Position [0] should
have universal tolerance.
NOTE: Overlap must be done on compressed list.
NOTE: If peaks are "man-made" then use
intens value as an indication of how good the peak is. 1 has no effect, higher
values mean a good peak.
NOTE: "rep" and "srep" can be
passed into this routine with nonzero values and scores will be added to them.
SYNTAX
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
BTF (Buff To File)
DESCRIPTION
Writes the specified contents of a buffer to a file.
By default the contents of the current buffer are appended to the end of the
specified file.
SYNTAX
order: Unimportant except that the first unspecified
argument (any argument without one of the prefixes: >, ", |, #, :, or
-) is taken to be the file name if one is not specified by '>'.
default: Uses current buffer and it appends to end
of a file which must be specified.
>filename: The path and file name to save buffer
to.
|buffer: The name or number of the buffer to save.
:code: The one letter code identifier for the
specification of the buffer.
-flags:
-a = append (default).
-w = overwrite (also -o).
-n = don't print header information.
-h = print header information (default).
"string" The format for each peak in the
buffer that gets printed. Embedded quotes are used to show that the exact
contents of the quotes should be printed character for character.
# = FIELD WIDTH. Any integer directly following a
character in "string" represents the maximum width of the field that
the value that the character represents will have.
#.# = PRECISION. Any fractional part following the
field width specifies the number of digits that will follow the decimal for
real values to be printed.
' ' = SPACE. Prints a space. (Can be repeated by
following the space with a FIELD WIDTH argument.
"" = EMBEDDED QUOTES. The contents of
embedded quotes will be printed exactly as is.
c = CODE. Prints the one character nucleus type.
d = DEVIATION. Prints the deviation of MATCH from
TARGET.
i = INTENSITY. Prints the intensity of the peak.
m = MATCH. Prints the value that matched TARGET w/in
TOL.
n = INTERNAL REPEATS. Prints the number of peaks
from the same spectrum that was repeated in the buffer.
p = PEAK. Prints the coordinates of the peak.
r = REPEATS. Prints the number of times that the
value was repeated in the buffer.
s = SPECTRUM. Prints the name of the spectrum that
the peak was taken from.
t = TOL. The tolerance used in the search
(corresponds to the tolerance associated with the nucleus.
v = VALUE. Prints the new value resulting from the
search.
EXAMPLES
""spec = "s4 "p= "p6.2 i4.0
3"code 4="c3 "
Result: spec = noes p= 10.34 4.45 4300 code 4= H
spec = hnco p= 133.23 8.93172.33 4200 code 4= n
NOTE: This example illustrates that at the present
time this program does not keep the display in correct columns when spectra
with different numbers of dimensions are used. To make this output more
readable it is suggested that you put the PEAK field at the end of the line.
Also note that the format specified by the PEAK arguments is responsible for
the spacing between the coordinates. If 'p7.2' was used instead of 'p6.2', the
coordinates 8.93 and 172.33 would have been separated.
btf |buffname >filename -w-n Overwrites buffer
(buffname) to file, filename without a header.
btf fname Appends current buffer to file, fname.
btf :A fname Appends first buffer w/ code = 'A' to
the file, fname.
btf |2 >hist.fil -n ""spec=
"s" Prints the string 'spec= ' followed by the name of the spectrum
that each peak came from in the 2nd buffer to the file, hist.fil w/ no header.
CAVEATS
RELATED COMMANDS
BUGS
BTSS (Buff To Spec Shell)
DESCRIPTION
Copies the contents of a buffer into a spectrum *.
Note: Does not allocate the spectrum or change
numSpec.
SYNTAX
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
CBL (Create Buffer Links)
DESCRIPTION
Reallocates buffer pointers to each peak in a
spectrum. Adds the indicated number of buffer links to those there already and
sets new buffptrs to null. Used for the automatic peak ordering algorithms.
SYNTAX
cbl(specptr, numptrs);
EXAMPLES
cbl 1, 7 (adds 7 buffer links for each peak in
spectrum 1)
CAVEATS
RELATED COMMANDS
BUGS
CFS, CF (Cluster Filter Shell)
DESCRIPTION
Takes intensity weighted average of peak positions
that are within the tolerances specified in the Boolean. In the unlikely event
that peak are given zero intensity, the peaks are assigned weights of 0.1 to
prevent the calculations from being skewed. Returns the number of peaks
deleted.
SYNTAX
cf hnco (d1 <.05> %1 && d2 <.4>
%2)
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
CLRB (Clear Buffers)
DESCRIPTION
Clears specified buffers. Clears all buffers when
command is given no argument. When all buffers are cleared, all buffer links
are also cleared.
SYNTAX
EXAMPLES
clrb 1 2 6 Buffers 1, 2, and 6 are cleared.
clrb 1 2-7 8 9 Buffers 1 through 9 are cleared.
clrb 1 4 >6 Buffers can be specified using bounds:
clr <6 9 10
clrb All buffers are cleared.
CAVEATS
RELATED COMMANDS
BUGS
CLS
DESCRIPTION
UNIX version only.
SYNTAX
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
COB (Child Overlap Buffer)
DESCRIPTION
Measures the overlap between the children of each
peak in buffer 1 with the members of buffer 2.
NOTE: The buffer whose peaks are to be used to
create a search should be listed first. The children are found by searching a
the spectrum that was searched to generate buffer 2 using the supplied search
template where the %#'s will be replaced by the specified first, second, etc.
coordinate of the peak. The supplied tolerance will be used to determine if a
match occurs.
SYNTAX
ChildOvlBuff( choice, cobstr);
EXAMPLES
cob -a |2 |3 .03, "d1 %1 .04 and d2 %2 .1"
cob -all coords matched |search buffer |comp buffer,
tolerance used for match, "template string"
-a all coords matched in comparing 2 peaks
-m (default) minus the matches (only the non-matches
are searched)
CAVEATS
RELATED COMMANDS
BUGS
COM, CO, COMP (Compress)
DESCRIPTION
Compress is used to prepare the buffer for the
automatic SCORE function as well as automatic OVERLAP calculations. Compress
orders the values in the buffer and determines the number of times they are
repeated between experiments, rep, and the number of times they are repeated
within an experiment, internRep. Multiple appearances of the same peak in a
buffer are discarded.
NOTE: Compress must be performed on a buffer that
has been split into the peak's constituent values. If the list has not
previously been SPLIT the COMP command splits the list automatically leaving
out the matched values.
You must update llr after using this routine if you
want the changes saved.
Note: The only time a peak is deleted is if there
are 2 of same peak, or if it is near a peak already chosen to be in the spin
system. InternRep peaks will only increase the rep of other peaks by one.
SYNTAX
Compress( &plr );
order: Any order of arguments will be accepted.
default: Eliminates duplicate peaks and splits peaks
only into those values that were not matched in the search for the peak.
|buffer The buffer name or number to compress.
:code The identifying 1 letter nucleus code of the
buffer.
-m Split peaks into all of the coordinates
(including the matched values).
EXAMPLES
comp |fred -m Compresses the buffer named fred and
includes match values.
CAVEATS
RELATED COMMANDS
BUGS
CONTRACER - see AUTOTRACE
CS (Combined Search)
DESCRIPTION
Scans listed spectra using a search string
constructed from a template which is assembled from all combinations of the
best peaks in 2 different buffers. CS works best if ORD has been used on the
two buffers to position the best peaks at the beginning of the buffer.
SYNTAX
cs 11 |2,4 |12,4 |result "d2 %3 .2 and d1
%3r2.9,6.5 .03"
Searches spectrum 11. Takes top 4 peaks from buffer
2. Takes top 4 peaks from buffer 12. Puts the resulting finds in buffer
|result. Search string: Searches dim 2 of spec 11 for a target taken from
buffer 2 (listed first) within a tolerance of .2 AND dim 1 of spec 11 for a
target (within the range of 2.9 and 6.5) taken from buffer 12 within a
tolerance of .03.
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
CSA (Combined Search All)
DESCRIPTION
Scans listed spectra using a search string
constructed from a template which is assembled from all combinations of the
best peaks in 2 different buffers linked to each peak of the source spectrum.
NOTE: All buffers must be specified by NAME rather
than number. If more than one spectra are listed to be searched, the results of
the searches will all be put into one buffer. CSA works best if ORD has been
used on the two buffers to position the best peaks at the beginning of the
buffer. NOTE: csTopNum1 and csTopNum2 should be set also. They are global
variables and are used for both CS and CSA.
SYNTAX
csa(choptr);
EXAMPLES
csa 1, 11 |hnca,4 |ntocsy,4 |result "d2 %3 .2 and
d1 %3r2.9,6.5 .03"
Goes through each peak in spectrum 1 and
uses/creates buffers linked to that peak. Searches spectrum 11. Takes top 4
peaks from buffer hnca. Takes top 4 peaks from buffer ntocsy. Puts the
resulting finds in buffer |result. Search string: Searches dim 2 of spec 11 for
a target taken from buffer hnca (listed first) within a tolerance of .2 AND dim
1 of spec 11 for a target (within the range of 2.9 and 6.5) taken from buffer
ntocsy within a tolerance of .03. NOTE: It is IMPORTANT that the first target
in the string correspond to the first buffer listed!
CAVEATS
RELATED COMMANDS
BUGS
CT (Contrace)
DESCRIPTION
Generates a CONTRAST macro that can be used as is or
modified to create assigned fragments (primary assignments) for a protein.
Bases its analysis on the experiments that have already been read into the
program. The correlations per experiment should already be included in the
spectra files (or they could be added after reading in the experiments using
the AddCorrelation routine which has not been written yet).
Whether or not any chemical shift range filtering is
to be performed, Set Shiftr
statements must be used to load chemical shift ranges for each resonance type
in set of input spectra. The fuzziness of filtering is expressed as a
percentage on the command line.
0% fuzziness: the resonance chemical shift range
will be used as it was set.
100% fuzziness: the resonance chemical shift range
will be doubled.
<0% fuzziness: (default) No filtering will be
done.
The spectrum to be used as the source spectrum can
be specified (using the name or number of the spectrum).
SYNTAX
'r' = Rigorous calculations.
'h' = Uses heuristics to cut down on computation time.
'n' = Does heuristics to compare with exptl. no
noise prob calculation.
'm' = (default) Chooses between rigorous and
heuristic automatically.
'f' = Fill source spectrum. Default= Don't fill.
'g' = Glycine filter source spectrum. Default= Don't
glycine filter.
'a' = Arginine filter source spectrum. Default=
Don't arg-filter.
'x' = SetX cross-checking. Default= Don't cross-check.
-devi = Inline deviation filtering.
-devo = Out of line deviation filtering. (Do
deviation filtering at end of macro.)
Fragment Filtering Flags:
'D' = (default) Frag-filter when determined
necessary.
'F' = Frag-filter even when numRepeats of theRes
> than other correlations.
'N' = No frag filtering.
'P' = Percent frag-filtering.
'C' = (default) Constant frag-filtering.
NOMENCLATURE:
BS[].stat = -1 = low fragOvl slush buffer that's
been used already
BS[].stat = 0 = slush buffer
BS[].stat = 1 =
BS[].stat = 6 = frag
correlation: the correlations w/in a spectrum.
xcorrelation: the correlations between spectra or
buffers (when dimensions are in common)
slush buffer:
primary buffer:
immature primary buffer:
resonance overlap: when resonances (part of the
cor's) from one buff or spec overlap w/ res from another
incomplete
complete
specific range filter:
unique: A dimension with only one type of resonance.
It's unique even if spec has several correlations.
EXAMPLES
set shift all Ca 39-69
set shift ST Cb 61-73
set shift others Cb 18-48
NOTE: If you want the program to include a "Ile
Boost" to make sure Hg12, Hg13 and Cg1 values don't get ignored if they
get assigned to the second buffers Hgi2 and Cgi2, then you must explicitly
include the chemical shift ranges for those resonances.
Example:
set shift Ile Cg2 14-22
set shift Ile Hg2 0-1.2
Values don't get ignored if:
contrace 1, >contrace.mac 0.0% -m
CAVEATS
RELATED COMMANDS
BUGS
CYC (Cycle)
DESCRIPTION
SYNTAX
Cyc(choice,&cycL,&inputF);
inputF = 'c' Clear and create new cycle from
beginning
= 'a' append to end of cycle list
= 'b' begin running cycle from beginning
= 'x' at end of cycle
= 'r' Running cycle at current position
= 'e' edit cycle list
= 'f' false (not cycling)
= 'm' for macro input
Need to have routines in choice command interpreter:
"q" to quit cycle
"qq or "q q" to quit cycle and
program
"prompt" to allow you to put in any
command
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
DELALL (Delete All Shell)
DESCRIPTION
SYNTAX
EXAMPLES
delall 1, |hnca () deletes each peak in |hnca
delall 1, |hnca (d1>3) deletes selected peaks in
|hnca
delall 1, |hnca !(d1>3) deletes all but selected
peaks in |hnca
delall 1, |hnca deletes the buffer |hnca and all
associated peaks
delall 1, hnca (d1>3) deletes selected peaks in
$hnca
delall 1, hnca !(d1>3) deletes all but selected
peaks in $hnca
CAVEATS
RELATED COMMANDS
BUGS
DEL (Delete Shell)
DESCRIPTION
SYNTAX
EXAMPLES
del |hnca () deletes each peak in |hnca
del |hnca (d1>3) deletes selected peaks in |hnca
del |hnca deletes the buffer |hnca and all
associated peaks
del $hnca () deletes previous three for spectra
delall 1, |hnca () deletes previous three for
buffers in each fragment
delall 1, |hnca !(d1>3)
CAVEATS
RELATED COMMANDS
BUGS
DF (Doublet Filter)
DESCRIPTION
SYNTAX
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
DIR
DESCRIPTION
SYNTAX
Dir();
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
DISP, D (Display)
DESCRIPTION
Below is a summary of the commands available within
the interactive display mode. A more detailed description of each command follows
directly.
DISPLAY MENU
0: All columns will be effected by commands.
1,2...: Only indicated column will be effected by
commands.
a: Fills screen with first buffers in list.
z: Fills screen with last buffers in list.
c: Set the Columns to be displayed.
m: Displays columns using current position.
HOME,^A: Displays ALL buffers from the beginning.
UP,u: Moves active buffer(s) one row Up.
PGUP,p: Moves active buffer(s) one page Up.
DOWN,d: Moves active buffer(s) one row down.
PGDN,o: Moves active buffer(s) one page down.
LEFT,l: Shifts displayed buffers to the left.
RIGHT,r: Shifts displayed buffers to the right.
e: Edit indicated spectral fields.
f: Select new Fields to be displayed.
t: Toggle on/off column Titles.
n: Toggle on/off spectrum Names.
i: Toggle on/off Information line display.
h,?: Help Menu for DISPLAY.
vr: Set the number of Rows on video screen.
vc: Set the number of Columns on video screen.
wb: Write buffer to an ASCII file.
ws: Write spectrum to an ASCII file.
RETN,q: Quit display mode.
SYNTAX
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
DISP E (Display Edit, Edit Spectrum)
DESCRIPTION
Called within interactive display mode by typing
'e', this command allows many different fields of a spectrum to be edited. All
of the editing commands effect the spectrum being referred to directly except
for the remove peak (rm) command which can delete a peak from the buffer only.
SYNTAX
Dimension # should always directly follow the field
to be edited.
The buffer number always comes first.
EXAMPLES
e d1 2 3 edit dim 1 of 3rd peak in buffer 2
e i 2 3 edit intensity of 3rd peak in buffer 2
e c 2 3 edit comment of 3rd peak in buffer 2
e r 2 3 removes peak 3 from buffer 2
e n 2 edit spectrum name of buffer 2
e l1 2 edit label for dim 1 of buffer 2
e t1 2 edit tolerance for dim 1 of buffer 2
e f1 2 edit format for dim 1 of buffer 2
CAVEATS
RELATED COMMANDS
BUGS
DISP F (Display Fields)
DESCRIPTION
fields:
w - the level of automatic tracing
g - the grade (score) for that peak
l - the line number
c - the comment
x - the one letter nucleus code
d - the deviation of the match value from the target
n - the number of internal repeats
s - spectrum name
m - the match values
v - the main value being considered for a peak
t - the tolerance of that value
r - the number of repeats
p - the peak
i - the peak's intensity
' ' extra space
"" quoted literals
SYNTAX
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
DTF (Display To File)
DESCRIPTION
Prints all buffers to file as if they were screen
dumped from display.
Make specRay[i] a pointer to the plr;
Make colsrch a field in print;
SYNTAX
DTF(choptr);
dtf >file.name -a "Header string"
-w (default) overwrite
-a append
-v (default) creates vertical file
-h created horizontal file
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
EVAL (Evaluate)
DESCRIPTION
Evaluates the expression in the string. Note that
the first character of the string must be the opening parenthesis for the
expression to be evaluated or it must be the first character of the expression
and the string must end right after the expression to be evaluated.
NOTE: If a range is specified the highest value from
that range will be returned unless ,L is specified for lowest value.
MarkandEval calls Eval and finds the endpoint
automatically to evaluate expressions that are part of longer strings. It too
must begin on the expression to be evaluated.
SYNTAX
value = Eval(str,source,fragi,varPeak,numVars,com);
source -> Limits the search for matching buffer
to a particular fragment if source != NULL.
fragi -> Used if source != NULL to specify
fragment.
varPeak -> Array of pointers to either spectra or
buffers to consider if lists aren't specified.
numVars -> The number of elements in array
varPeak.
com -> The text form of the returned value or the
resulting string if a string operation is
specified (in which case the value returned will be
NONFLOAT).
special numbers = E, PI
Operators:
for numeric operations:
* + - / ^ %(modulus) sin cos tan log ln val1, val2,
...
Operands of all trig functions should be in degrees.
Val#(string) takes the #th value from the string. If
there aren't # values then it takes the last value.
for text operations:
+ Union "fred" + "ted" =
"fredted"
^ Intersection "fred" ^ "ted" =
"ed"
- Delete Intersection "fred" -
"ted" = "fr"
* Count Number of Intersections "eded" *
"ed" = 2
/ Remove Characters "fred" /
"det" = "fr"
% Remove all but characters "fred" %
"det" = "ed"
Values:
&a = the value of the variable 'a'
23.1 = a number
e = 2.7182818
PI = 3.1415927
#|fred = the number of peaks in fred
w|fred = the level of the buffer
m|fred = the number of dimensions in the peaks
d1|fred,4 = the first coordinate of the 4th peak in
fred (or p1)
dx|fred,4 = the number of coordinates for
evaluations or assignments or looks at each coordinate and demands at least one
match in Boolean tests.
dc|fred,4 = the number of coordinates for
evaluations or assignments or tests all combinations of coordinates for at
least numDim matches.
da|fred,4 = demands that all dimensions match.
i|fred,b = the intensity of the first peak (or d0 or
p0)
c|fred,e = the numeric part of the comment from the
last peak
v|fred,h = the value of the highest valued peak in
fred
g|fred,l = the lowest grade in fred
d|fred,1 = the deviation
n|fred,1 = the number of internal repeats
t|fred,1 = the tolerance of the value
r|fred,1 = the number of repeats
#$fred, m$fred, d1$fred,4, i$fred,b, and c$fred,e
also apply to spectra
w$fred = the column width of the spectrum fred
Field indicators:
# the number of peaks
b the ambiguity of the buffer
w the level of the buffer or column width of the
spectrum
m the number of dimensions in the peaks
d1... the first coordinate ...
i the intensity
v the value (buffer only)
g the score (buffer only)
s the score (buffer only)
C the first numeric value in a comment
c the text of the comment
d the deviation (buffer only)
n the N variable
x the X variable
q the quality factor (ambiguity or confidence)
r the number of repeats (buffer only)
k the number of links (spectrum only)
List indicators:
| buffer
$ spectrum
Peak indicators:
blank Specified by the search.
,1... The first peak ...
,b The beginning or first peak.
,f4.. The first four peaks...
,l4.. The last four peaks...
,e The end or last peak.
,H The highest value in the list.
,L The lowest value in the list.
,2-5.. Peaks 2 through five.
NOTE: H or L can be appended to a range to have the
range return the highest (default) or lowest value.
EXAMPLES
i|fred,f4H = the intensity of the highest intensity
peak from the first 4 peaks of buffer fred.
i$fred,f4 = the intensity of the first four peaks in
spectrum fred.
i|fred,l4 = the intensity of the last four peaks in
fred
d1$fred,4+ = the first coordinate of the fourth
through the last peaks in spectrum fred.
i|fred,2-5 = the intensity of the second through
fifth (inclusive) peaks
v|fred,H = the value of the highest valued peak in
buffer fred
s|fred,L = the lowest grade in buffer fred.
C|fred,e = the numeric part of the comment from the
last peak in buffer fred.
c|fred,1 = the comment text string from the first
peak in buffer fred.
d|fred,b = the deviation of the first peak in buffer
fred.
#$fred = the number of peaks in spectrum fred.
w$fred = the column width of the spectrum fred.
CAVEATS
RELATED COMMANDS
BUGS
EXE (Execute)
DESCRIPTION
Executes a macro file given the file's path and
name. Each line of the macro file should contain only one command as it would
be typed at the main menu prompt.
SYNTAX
exe flav.mac -e
-e = (default) If you want the commands read from
the macro file to be echoed on the display.
-n = No echo.
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
FB (Fast Boolean)
DESCRIPTION
SYNTAX
if(FastBool(str))...
Note: FB uses absolutely no spaces. The input string
must be put in paren.
Operators:
> >= < <= == != && ||
<tol> >tol<
&& = and
|| = or
<tol> = is within a tolerance, tol, of next
value
>tol< = is outside of a tolerance, tol, of
next value
Values: Only floating pt numbers or integers.
Functions: None
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
FCS (File Compress Shell)
DESCRIPTION
Reads in a file in a flat ascii format and
compresses lines if the Boolean is true. The first comparison in the Boolean
MUST be between values from the same fields and the same file. This field is
called the primary field of the compression, and it's values should be sorted
so that all identical values are grouped together. Since the Boolean contains a
comparison from this field, all matches must be contained within contiguous
blocks.
The output file is created so that the values of all
of the selected matching lines get averaged.
SYNTAX
Operations:
<> = != < > >= <= ><
EXAMPLES
Given a sample space and tab delimited input file,
file.out:
**var seq AA grpID rfID fMol fPPM lopH hipH loTmp
hiTmp shifts...
165 8 H 495 390 1001 0.00 5.50 5.50 45.00 45.00 NULL
5.04 NULL NULL NULL
164 8 H 497 391 1003 0.00 5.50 5.50 45.00 45.00 53.4
NULL 29.5 NULL NULL
165 8 H 530 389 1001 0.00 5.10 5.50 45.00 45.00 53.4
5.04 29.5 NULL NULL
164 8 H 1704 389 1001 0.00 5.10 5.50 45.00 45.00
53.4 5.04 29.5 NULL NULL
165 8 H 1875 957 1001 0.00 5.50 5.50 45.00 45.00
53.4 5.03 29.5 NULL NULL
164 9 K 495 390 1001 0.00 5.50 5.50 45.00 45.00 NULL
4.82 NULL 1.53 1.53
164 9 K 497 391 1003 0.00 5.50 5.50 45.00 45.00 54.5
NULL NULL NULL NULL
164 9 K 1875 957 1001 0.00 5.50 5.50 45.00 45.00
54.3 4.82 NULL 1.53 1.53
164 10 E 495 390 1001 0.00 5.50 5.50 45.00 45.00
NULL 5.06 NULL 2.16 2.16
164 10 E 497 391 1003 0.00 5.50 5.50 45.00 45.00
51.8 NULL NULL NULL NULL
164 10 E 1875 957 1001 0.00 5.50 5.50 45.00 45.00
51.9 5.08 NULL 2.16 2.16
fc file.out (d2 = d2 && d1 = d1 &&
d8 <.5> d8) >short.out
Produces the tab delimited output file, short.out:
**var seq AA grpID rfID fMol fPPM lopH hipH loTmp
hiTmp shifts...
165 8 H 967 579 1001 0.00 5.37 5.50 45.00 45.00 53.4
5.04 29.5 NULL NULL
164 8 H 1100 390 1002 0.00 5.30 5.50 45.00 45.00
53.4 5.04 29.5 NULL NULL
164 9 K 956 579 1002 0.00 5.50 5.50 45.00 45.00 54.4
4.82 NULL 1.53 1.53
164 10 E 956 579 1002 0.00 5.50 5.50 45.00 45.00
51.8 5.07 NULL 2.16 2.16
CAVEATS
RELATED COMMANDS
BUGS
FDF (Full Display To File)
DESCRIPTION
Writes out the contents of the specified buffer to
the specified file using the template set by "set fd".
SYNTAX
fdf ( plrptr, fname, appendF);
fdf |3 >file.name -a-n
-a = append to the end of the existing file.
-w = overwrite existing file. (default)
-h = print header information to file. (default)
-n = no header information printed.
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
FDS (Full Display Shell)
DESCRIPTION
SYNTAX
FullDisplayShell( str );
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
FF (File Filter)
DESCRIPTION
Reads in a shift file in a flat ascii format and
copies the file to an output file (short.out) filtering out lines for which the
Boolean is true. Deletes lines of this file based on the values of a single
field. NOTE: Function reads in each tab/space-delimited field as if it were a
floating point number.
SYNTAX
-p Prints out each line that is filtered to the
screen.
-n (DEFAULT) Doesn't print out each line to the
screen.
operations:
<> = != < > >= <= ><
EXAMPLES
ff file.out !(d1 == d1>good.var.ids,f5 ) -p
>short.out
-p Prints out each line that is filtered to the
screen.
ff file.out (d6 < 4.3) -n >short.out
-n Doesn't print out each line to the screen.
CAVEATS
RELATED COMMANDS
BUGS
FILESTAT (File Stat)
DESCRIPTION
Reads in a file in a flat ascii format and
calculates statistical parameters for indicated columns.
SYNTAX
-a Append to output file.
-w (default) Overwrite output file.
EXAMPLES
filestat file.in
(d1 = .1, d2 = #100, d3 = 10) -a -0 >file.out
Reads in file, file.in,
and calculates the mean, standard deviation, etc. for the first (d1), second
(d2) and third (d3) columns in the file. It also calculates a values for a
binned probability distribution where the bin width in d1 = .1, the bin width
in d3 = 10, and the width of the bins in d2 is calculated to give 100 bins
total for the data. If the width of a column is set to be zero then the
probability distribution values are not calculated. -a flag causes the new
output to be appended to the end of the output file if it already exists. -0 is
the value that the bins are to be aligned with. If the first data point is 50.094
then the first bin will start at 50.0 since 50.0 is the highest number less
than 50.094 for which (50.0-0) % .1 = 0. Output is put in the file, file.out.
CAVEATS
RELATED COMMANDS
BUGS
FIL (Fill)
DESCRIPTION
Fills the first listed structure (either a buffer or
a spectrum) with the peaks from the second structure that are found in the
search (Boolean). The designated dimension is set to the designated value
before adding the peaks to the first structure.
SYNTAX
Fil (str);
-a (DEFAULT) All matches marked.
-n Noesy-type matches are marked.
-b Best match marked.
-u (DEFAULT) Unscaled. Matches are compared using
only their deviations.
-s Scaled. Matches are compared to determine best
(-b or -n) using scaled scores.
EXAMPLES
Normal call:
fil hnco hnca !(d1 <.05> %1 && d2
<.4> %2) d3=0.0
(Note: The '!' sign means "not" and in
this case takes the complement of the results from the search.)
Other calls:
fil hnco hnca (d1 <.05> %1 && d2
<.4> %2) d3=0.0
fil |hnco hnca (d1 <.05> %1 && d2
<.4> %2) d3=0.0
fil hnco |hnca (d1 <.05> %1 && d2
<.4> %2)
fil |hnco |hnca !(d1 <.05> %1 && d2
<.4> %2)
CAVEATS
RELATED COMMANDS
BUGS
FILA (Fill All)
DESCRIPTION
Returns the number of peaks that are filled in in
the first listed list.
SYNTAX
Fila(str);
Flags:
-a (DEFAULT) All matches marked.
-n Noesy-type matches are marked.
-b Best match marked.
-u (DEFAULT) Unscaled. Matches are compared using
only their deviations.
-s Scaled. Matches are compared to determine best
(-b or -n) using scaled scores.
EXAMPLES
Normal Call:
fila 1, |hncoca |hnca (#|hnca > 1 && %3
<.1> d3) -f
Adds |hnca peaks to |hncoca buffer when they match
with peaks already in the |hncoca buffer for each fragment.
Other Calls:
fila 1, hnco hnca (d1 <.05> %1 && d2
<.4> %2) -s -b d3=0.0
fila 1, |hnco hnca (d1 <.05> %1 && d2
<.4> %2) d3=0.0
fila 1, hnco |hnca (d1 <.05> %1 && d2
<.4> %2)
fila 1, |hnco |hnca !(d1 <.05> %1 &&
d2 <.4> %2)
CAVEATS
RELATED COMMANDS
BUGS
FILTER
DESCRIPTION
Performs a search of other lists in order to delete
peaks from a list. Function marks all peaks in the first list based on whether
a match is found in the following lists so that a peak in the first list is
allowed to be marked only once for each list that follows. An operator and
integer following the search specifies the number of marks necessary for a peak
to be deleted. In the first example a peak is deleted if a search of each of
the following three lists (hnca, hncoca, and ntoc) does not find a match for
that peak. If "== 2" had been specified rather than "== 3"
then a peak would have been deleted if there were no matches for two of the
three spectra.
Returns: The number of peaks deleted.
Returns the number of peaks that are filled in the
first listed spectrum. (Actually doesn't return number yet).
SYNTAX
Filter (str);
-f (DEFAULT)
-u (DEFAULT)
-a (DEFAULT)
EXAMPLES
Normal Call:
filter hnco hnca hncoca ntoc !(d1 <.05> %1
&& d2 <.4> %2) == 3
**deletes all pks from hnco not in all 3 other
spectra
filter hnco hnca hncoca (d1 <.05> %1
&& d2 <.4> %2) < 1
**deletes all pks from hnco not in all 2 other spec
Other Calls:
filter hnco hnca (d1 <.05> %1 && d2
<.4> %2) == 1
**marks and deletes peaks in hnco that are also in
hnca
filter |hnco hnca (d1 <.05> %1 && d2
<.4> %2) < 1
**deletes peaks in buffer that are not in hnca
filter hnco |hnca |ntoc (d1 <.05> %1
&& d2 <.4> %2) > 1
**deletes pks in hnco that are in both buffers
CAVEATS
RELATED COMMANDS
BUGS
FIT, LS, LSQ (Least Squares)
DESCRIPTION
page 527 of Numerical Recipes
SYNTAX
Form: LeastSquares(specptr, choptr, "",
's');
if ch = 's' -> print to screen
if ch = 'c' -> also generate a comment string
mine theirs
m = b
b = a
numPeaks = ndata = ss
xav = sxoss
EXAMPLES
fit 1 d1 d2
fit spectrum.number dim1 correlated.with.dim2
CAVEATS
RELATED COMMANDS
BUGS
FIT0, FT0 (Fit 0)
DESCRIPTION
SYNTAX
LeastSquares(specptr, choptr, "", 's');
if ch = 's' -> print to screen
if ch = 'c' -> also generate a comment string
EXAMPLES
fit 1 d1 d2
fit spectrum.number dim1 correlated.with.dim2
CAVEATS
RELATED COMMANDS
BUGS
HELP, H (Page)
DESCRIPTION
The CONTRAST HELP SYSTEM uses the PAGE function to
page through the contrast.hlp file. Backward (^R) and forward (^S) incremental
searches can be performed within page as well as paging up and down.
SYNTAX
HELP COMMANDS: (see PAGE for more details)
PGUP,^P: Page up.
UP,^U,u: Line up.
PGDN,^O: Page down.
DOWN,^D,d: Line down.
HOME,^H: Beginning of file.
END,^E: Goes to end of file.
m: Marks a page.
g: Returns to a marked page.
q,^Q,ESC: Quits PAGE.
c:(toggle) Case sensitive/insensitive.
^S,S,s: SEARCHES FORWARD
^R,R,r REVERSE SEARCH
h,?: Help page for PAGE.
COMMANDS WITHIN SEARCH MODES:
BKSPC,^B Returns to the position of the search for
the previous letter and removes the last letter added to search string.
DEL,^G Goes to the position of the search at its
beginning and deletes the search string.
^R If in REVERSE search mode it searches backwards
for the next occurance of the string.
If in FORWARD search mode it reverses the direction
of the search.
If no search string has been entered then last
search is retrieved and searched.
^S If in FORWARD search mode it searches forwards
for the next occurance of the string.
If in REVERSE search mode it changes the direction
of the search so that it starts searching forward.
If no search string has been entered then last
search is retrieved and searched.
^Q,ESC Quits search mode leaving the file at the
current page.
alphanum Adds character to search string.
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
IF (Boolean)
DESCRIPTION
SYNTAX
if( #|fred > 2 || 3^3 != &tom ) function
if( g|fred,2 < 13 ) function
if( d1|fred,4 == 10 || (&tom>3 &&
cos180/&tom==2) ) function
Operators:
> >= < <= == != && ||
<tol> >tol< and or
&& = and
|| = or
<tol> = is within a tolerance, tol, of next
value
>tol< = is outside of a tolerance, tol, of
next value
Values:
&a = the value of the variable 'a'
23.1 = a number
e = 2.7182818
PI = 3.1415927
#|fred = the number of peaks in fred
w|fred = the level of the buffer
d1|fred,4 = the first coordinate of the 4th peak in
fred (or p1)
dx|fred,4 = if any of coordinates of fred matches
test
da|fred,4 = if 2 peaks are being compared then all
dimensions must match.
dc|fred,4 = if 2 peaks are being compared then
combinations of at least the minimum number of dimensions between the peaks
must match.
i|fred,b = the intensity of the first peak (or d0 or
p0)
c|fred,e = the numeric part of the comment from the
last peak
v|fred,h = the value of the highest valued peak in
fred
g|fred,l = the lowest grade in fred
d|fred,1 = the deviation
n|fred,1 = the number of internal repeats
t|fred,1 = the tolerance of the value of the first
peak
r|fred,1 = the number of repeats for the first peak
Functions:
* / + - ^ % cos sin tan log ln
Standard Boolean Calls from other Functions:
d1 = replaced by functions with each peak in either
buffer or spectrum
%1 = the coordinate of the first dimension of the
source spectrum
d1|fred = the d1 value of all combinations of peaks
in fred
d1|fred,f4 = the d1 value of the first four peaks in
fred
d1$hnca,4 = the d1 value of the fourth peak in the
spectrum hnca
d1$3,f4 = the d1 value of the first four peaks in
the third spectrum
EXAMPLES
CAVEATS
RELATED COMMANDS
BUGS
INTA (Intersect All)
DESCRIPTION
Takes the intersection of two buffers to form a
third all the way down the buffers linked to each peak in a spectrum. Take only
the top num1 and num2 peaks of the two buffers.
NOTE: The actual buffer names MUST be used in this
routine. Also all of the peak groups must contain the same types of buffers
with the same names. The third buffer is added right after the second buffer.
SYNTAX
inta(choptr);
inta source, |buff1[,num1] |buff2[,num2] (Boolean)
[|int] [-f -s]
OR
inta source, [num1, num2] ( FD|buff1[,peak] op
FD|buff2[,peak] ) [|int] [-f -s]
where FD = field(s)
{d1,dx,da,dc,d,c,#,w,i,v,g,n,t,r}
and op = operator(s) { <tol> >tol< == !=
>= > <= < && || }
and num2 and num2 = the number of peaks to consider
for intersection
Flags:
-f delete first buffer
-s delete second buffer
EXAMPLES
inta 1, |NH,3 |HNCO,4 (dx|NH <.02> dx|HNCO)
|INT
inta 1, 3 4 (dx|NH <.02> dx|HNCO) |INT