Spectrum Research, LLC.

 

 

 

 

 

CONTRAST

 

Connectivity Tracing Assignment Tools for Automated Assignment of Protein NMR Data

User Guide
Version 2.0

 

 

 

Copyright Notice

 

Copyright © 1996 through 2001 Spectrum Research, LLC.  All rights reserved.

No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form by any means without the written permission of Spectrum Research, LLC.  Spectrum Research, LLC. reserves the right to change the information in this document without prior notice.

 

Trademarks

 

Contrast is a trademark of Spectrum Research, LLC.

Acknowledgments

 

Contrast software program was developed by Drs. John Markley and John Olson at the National Magnetic Resonance Facility located at the University of Wisconsin-Madison.  All rights, title, and interest in Contrast are owned by the Wisconsin Alumni Research Foundation ("WARF").  The commercial version of Contrast has been exclusively licensed to Spectrum Research LLC by WARF. 

 

Credits

 

If the results (figures and/or data) obtained by Contrast TM application are used for publication purposes, please refer to them in the following manner or any other equivalent form:

 

"ContrastTM software, developed by Spectrum Research, LLC., was used to compute the results in this publication."

 

 

 

 

Chapter 1

Introduction

1.1 Program Features

CONTRAST is a non-graphical software tool for automating NMR peak assignment. The program works with NMR data in the form of ASCII lists of peak coordinates and intensities.. The program provides the user with several versatile tools for manipulating peak lists in order to design a custom strategy. The program can itself generate customizable procedures for automatic assignment of NMR data. It should be possible to use CONTRAST and the strategies it was designed to employ for working with any type of multidimensional NMR spectral data set (although not all combinations of NMR spectra are likely to yield complete assignments).

1.2 Disclaimer

The CONTRAST program was designed to be an in-house research tool and not a commercial package. We have successfully applied the program to many real and synthesized NMR data sets, but we are always careful to check all results. We provide no warranty or guarantee of its performance. Use the program at your own risk.

Chapter 2

Software Licensing and Installation

2.1 How to Obtain the Program

The CONTRAST executable can be downloaded from the Spectrum Research website (www.specres.com/download.asp) or a demo CD can be requested from Spectrum Research.

2.2 Installation

The CONTRAST executable, contrast.exe, needs no special installation. We recommend that the executable and help files (or corresponding symbolic links) be placed in the directory that contains the spectral data to be assigned.

If you have obtained source code for CONTRAST, the file "contrast.c" contains all of the functions and header information necessary to compile CONTRAST. The program was written on a Silicon Graphics Indigo workstation, but since all but a few minor functions are implemented using ANSI C, the program can be ported easily to other platforms by changing the system calls that are specific for the Silicon Graphics platform. To compile the program copy contrast.c to the target directory and type:

 

cc -o contrast -g contrast.c -lm

 

at the operating system prompt. The ASCII text file, contrast.hlp, is a crude manual for the CONTRAST program. The manual is designed so that it can be easily searched while running CONTRAST with the CONTRAST "page" function, which is called by typing "ctrl-h" at a prompt or "h" at the command line. The contrast.hlp file should be located in the same directory as the CONTRAST executable in order to use this feature.


Chapter 3

Getting Started

This section introduces loading spectrum files, searching spectra, displaying the results of a search, writing the results of a search to a file, and quitting the CONTRAST program. A simple example is given to illustrate each point, and the use of both the command line interface and macro files is described. The following CONTRAST commands will be described.

lf cosy.con

scan cosy (d1 <.5> 8.0 && d2 > 4.0) |results

d

btf |results > search.cosy.con

q

3.1 Starting CONTRAST

To run CONTRAST simply type the name of the CONTRAST executable at the system prompt (e.g. contrast.exe). The computer's display will be cleared, and after several lines of copyright information you will be asked for the name of the log (starting macro) file that you wish to run. If you want to run a session macro, then type its file name at the prompt. If your log file name is "usr.log" (the standard session log file name) simply type return at the prompt. The text that appears in the angle braces in a CONTRAST prompt is always the default value for the prompt. If you do not already have a session macro, type a new file name at the prompt. It is customary to use the suffix ".log" for session macros and ".mac" for subroutine or branching macros. After the name of the log file is typed in, the user is prompted by a '>' symbol for the next command.

3.2 Loading Peak Lists

The LoadFile command (abbreviated lf) is used to load peak list files into CONTRAST. CONTRAST peak list files are typically created from the name of the experiment with the '.con' suffix appended, but they can have any name. They must, however, adhere to the format outlined in Section @@. The LoadFile command can also be used to load the sequence of the protein, since the formats of the files are similar. The following line loads the file cosy.con into the program:

> lf cosy.con

3.3 Searching Peak Lists

The Scan command (abbreviated sc) is used to search peak lists. It is an extremely versatile command and will be described in more detail in section @@. In order to search for peaks in the COSY spectrum read into the program the user could type a command similar to the following:

> sc cosy (d1 <.5> 8.0 && d2 > 4.0) |results

In this example the COSY peak list is searched for peaks in which the first dimension of each peak (d1) is within a tolerance of 0.5 units (<.5>) from 8.0 and (&&) the second dimension of each peak (d2) is greater than (>) 4.0. The results of the search are placed in a buffer called |results. The units of the tolerances and peak coordinates are dependent on the units used in the input files. Since the coordinates are typically expressed in terms of parts per million (PPM), we will assume that input files use PPM in the rest of the manual.

3.4 Displaying Search Results

The display command (abbreviated 'd') is used to examine the contents of CONTRAST buffers. When a search is performed using the Scan command or one of several other related commands, the results of the search are placed in a named buffer which is added to the end of a master list of buffers. The buffers persist until the user deletes them or quits the program. Associated with each buffer is a number and the search Boolean that was used to create the buffer. Upon typing 'd' at the CONTRAST command line, the program enters a crude 'display' mode that has a unique set of subcommands for changing the way the buffers are displayed. These subcommands are executed as each character is typed. To exit display mode type 'q' at the display command line prompt. Section @@ gives more information on the different subcommands available within the display mode.

3.5 Writing Buffers to a File

The buffertofile command (abbreviated 'btf') is used to write the contents of a particular buffer to a file. In the following example:

> btf |results >search.cosy.con

the |results buffer is written to the file, search.cosy.con.

3.6 Quitting CONTRAST

There are two pathways for exiting CONTRAST. The quit command (abbreviated 'q') can be used to exit CONTRAST from the command line. If CONTRAST is not at the command line, the program can be exited by typing Ctrl-C to interrupt the action of the program followed by 'x' at the new prompt. Typing 'q' at this new prompt causes the program to resume the action that was interrupted by the Ctrl-C command.

3.7 CONTRAST Macros

Most of the commands that can be executed at the CONTRAST command line can also be executed from a CONTRAST macro. For our purposes a macro is an ASCII file that contains CONTRAST commands. When a macro is executed, CONTRAST interprets each non-whitespace line as if it were typed at the CONTRAST command line. Each line is executed serially until a quit command is reached, until the macro branches to another macro, or until the end of the file is reached. If the end of the file is reached the program returns to the CONTRAST command line and waits for user input. All text in a macro between two consecutive asterisks (**) and the next end-of-line marker is considered to be a comment and is ignored by the program.

The 5 commands just described can be typed into a file using a text editor and run as a CONTRAST macro. CONTRAST macros can be run in many different ways. Macro files can be specified at the UNIX command line when the program is started using the '<' sign to redirect input into the program as follows:

CONTRAST <user.macro

Alternately the name of the macro can be specified at the initial prompt by typing the name of the macro file and hitting enter. Macros can be launched from within other macros or from the CONTRAST command line using the execute command (abbreviated exe).

> exe user.macro

In this case control is transferred to user.macro until the end of the file is reached at which time control will be returned to the calling macro or initial command line. If the macro is terminated with a quit command, however, the CONTRAST program will be exited without returning to the calling procedure. The branch command can be used instead of the exe command in order to fully transfer control to the called macro.

> branch user.macro

Chapter 4

Input File Formats

CONTRAST input files use a free format in which blank lines are ignored and white space (any number and combination of spaces and/or tabs) is used to delimit fields. Comments can be inserted anywhere in an input file by prefacing the comment with double asterisks (**). All text following the double asterisks (up to the end of the line on which they appear) is considered to be part of the comment and is effectively ignored by CONTRAST. Most CONTRAST input files are either a form of a spectrum file or a macro file. In the next release of CONTRAST the user will be given the option of reading in spectrum files in a macro format, but an understanding of the spectrum file format is currently essential to using CONTRAST effectively.

4.1 CONTRAST Spectrum Files

A CONTRAST spectrum file consists of a header followed by a peak list. The header of a spectrum file should contain information about the spectrum. Since most of this information is the same for all instances of a particular type of spectrum, it is usually safer to copy and modify an existing header from a similar spectrum than to write a header from scratch. When copying a header from the spectrum file of the same kind of experiment it is usually only necessary to modify the number of peaks, the tolerances, and the comments. The fields in a spectrum file must appear in the given order. Although comments and blank lines can appear anywhere in a spectrum file it is a good practice to settle upon and stick to a style in order to maximize readability and to minimize the possibility of making mistakes. As long as fields appear in the correct order, it does not matter if they are arranged on a different lines or if they are all placed on the same line or some combination of the two arrangements. As all combinations have not been rigorously tested, however, we recommend that a format similar to the one shown below be used. Bold print is used to show essential information which must be included in a spectrum file, normal print is used to show optional information, and italics is used to show those elements of optional fields that are even more optional. The following is the file format for an n-dimensional spectrum (with as many as C correlations) that contains i peaks.

 

4.2 Spectrum File Format

name

n i (qual)

comment = numCom

d1lab d1atm d1tol d1cor1 (prob1) d1cor2 (prob2) d1corC (probC)

d2lab d2atm d2tol d2cor1 (prob1) d2cor2 (prob2) d2corC (probC)

dnlab dnatm dntol dncor1 (prob1) dncor2 (prob2) dncorC (probC)

** comments

** comments

p1coord1 p1coord2 p1coord3 p1ntens * p1comment

p2coord1 p2coord2 p2coord3 p2ntens * p2comment

picoord1 picoord2 picoord3 pintens * picomment

4.3 Spectrum Field Definitions

name The name of the spectrum. The name of a CONTRAST spectrum file is generally the

spectrum name with the '.con' suffix appended to it.

n The dimensionality of the spectrum.

i The number of peaks in the spectrum.

(qual) An estimation of the quality of the spectrum couched in terms of a probability. A

qual factor of 1.0 indicates that 100% of the expected peaks will be present in the

spectrum, and that very little noise (false peaks) are present. A qual factor of 0.9

indicates that 90% of the expected peaks are present.

comment = Text that indicates that the next field (numCom) is the number of characters the

program should allocate for the comment associated with each peak. 'ment =' is

italicized to indicate that only 'com' is needed to signal that the next field is

numCom.

numCom The number of characters that the program should allocate for the comment

associated with each peak.

d#lab The label of the #'th dimension of the peaks in the spectrum.

d#atm The resonance code (also called atom code) describing all of the atoms of the #'th

dimension of the peaks in the spectrum. Since some dimensions of a spectrum

often detect several different resonances, wild cards are frequently used in this

field. A description of resonance codes is found in section @.@.

d#tol The default tolerance of the #'th dimension of the peaks in a spectrum. A tolerance

is one-half of the resolution of that dimension.

d#cor## The resonance code (also called atom code) of the #'th dimension of the ##'th

correlation in the spectrum. Correlations describe the types of peaks that one

would expect to see in a spectrum. An HNCA spectrum, for example, contains an

Hni,Nai,Cai correlation (amide proton, amide nitrogen, alpha carbon) and an

Hni,Nai,Ca- correlation (amide proton, amide nitrogen, alpha carbon from

previous residue). The last resonance code for a given dimension will be repeated

if previous or subsequent dimensions contain more resonance codes. A description

of resonance codes is found in section @.@.

(prob##) The estimated probability of seeing the previous correlation in the spectrum.

 

 

Note that only the last probability listed in a vertical column will be used to describe the

##'th correlation. Other probabilities are used only to make the file more readable.

** Comment markers. Comment markers indicate that the text that follows on that

line is a comment and should be ignored by the program. Users are encouraged to

use comments to document the origin of the spectrum files and each modification

that the files undergoe. Most CONTRAST functions that modify a spectrum or

spectrum file will append a comment to the file that tells what was done to the file

and the date it was done.

comments Any text that the user wants to include in the file.

p##coord# The #'th coordinate (frequency dimension) of the ##'th peak in the spectrum

(usually in ppm units).

p##ntens The intensity of the ##'th peak in the spectrum.

* A special peak comment marker that causes the program to read in the comment

and associate it with the peak that the comment follows. The 'comment =

numCom' line described above is used to specify the maximum number of

characters that can be stored in each peak comment.

p#comment The comment associated with the #'th peak of the spectrum.

4.4 Example Spectrum File

hnca

3 4 (90)

comment length = 30

H Hni .02 Hni

N Nai .1 Nai ** Don't need to repeat last resonance code

Ca Ca .1 Cai (90) Ca- (60)

** Created 9/9/99 from hnca.ppm file.

** Comments can be inserted at any point in the file after an

** asterisk.

8.61 114.3 180.2 100073 * peak 1

9.12 122.4 178.2 20073 * peak 2

7.43 118.9 134.2 10034.5 * peak 3

8.74 110.3 181.2 67896 * peak 4

 

4.5 Resonance Codes

Resonance codes are special CONTRAST words that describe the type of atom that gives rise to an NMR signal. These codes are sometimes called atom codes since they specify an atom type or group of atom types. Resonance codes can contain a maximum of 4 characters with each character describing a different aspect of an atom. If any character representing a particular aspect is omitted then CONTRAST assumes the most general case to hold for that aspect. For example the resonance code 'H' contains only the atom type specifier. This resonance code thus includes all hydrogen atoms. The resonance code 'Hb' represents all beta protons in the protein, and the resonance code 'Hi' represents all protons on the current residue. In this release of CONTRAST all resonance codes make reference to amino acids in a protein or peptide. At this time there is no way simple way to refer to nucleic acids or other molecules. A list of the valid resonance code characters grouped by the different aspects that they describe follows:

Atom Specifiers:

C Carbon atom.

N Nitrogen atom.

H Hydrogen atom.

O Oxygen atom.

P Phosphorous atom.

X Wildcard. Matches any atom type.

Q NULL. Can never match another atom type.

IntraResidue Position Specifiers:

a Alpha. Bonded to or at the alpha position in the residue.

b Beta. Bonded to or at the beta position in the residue.

g Gamma. Bonded to or at the gamma position in the residue.

d Delta. Bonded to or at the delta position in the residue.

e Epsilon. Bonded to or at the epsilon position in the residue.

f F. Bonded to or at the F position in the residue.

z Z. Bonded to or at the Z position in the residue.

k Backbone. All backbone atoms in the residue.

s Sidechain. All sidechain atoms in the residue.

r Ring. All ring atoms in the residue.

c Carbon. Bonded to a carbon atom in the residue.

h Hydrogen. Bonded to a hydrogen atom in the residue.

n Nitrogen. Bonded to a nitrogen atom in the residue.

o Oxygen. The carbonyl position or bonded to an oxygen atom in the residue.

x Wildcard. All positions within a residue.

IntraResidue Position Specifiers:

- Within the previous residue.

i Within the current residue.

+ Within the next residue.

* Can be within any residue in the protein (often from NOE).

Atom number:

0 Matches all other single character atom numbers.

1-9 This single character number is used to distinguish between atoms at the same

position. For example two beta protons can be distinguished by referring to one as

Hb2 and the other as Hb3.

4.6 Resonance Code Examples

Cai Matches alpha carbons within the current residue.

Hbi2 Matches the second beta proton within the current residue.

X Matches all atoms in the protein.

X- Matches all atoms in the previous residue.

Co- Matches the carbonyl carbon of the previous residue.

Nai Matches the amide nitrogen of the current residue.

Q Does not match any atom in the protein.

Cs+ Matches all carbon atoms in the side chain of the next residue.

Cxi Matches all carbon atoms in the current residue.

Hxi1 Matches all number 1 protons in the current residue.

Hxi0 Matches all protons in the current residue.

Hb*1 Matches all number 1 beta protons in the entire protein.

Hn* Matches all amide protons in the protein.

4.7 Sequence Files

CONTRAST sequence files follow the same general format as spectrum files and are read into the program with the same command, LoadFiles (abbreviated lf). Sequence files are one-dimensional spectrum files in which the name of the spectrum is 'sequence' and the "peak comments" are amino acid names. The next section shows a schematic of a sequence file. Bold print is used to show essential information which must be included in a sequence file, normal print is used to show optional information, and italics is used to show those elements of optional fields that are even more optional. The following is the file format for a sequence file for a protein that contains i amino acids in the sequence.

4.8 Sequence File Format

sequence

1 i

comment = lenAA

lab Q qual

** comments

** comments

1 prob1 * AAname1

2 prob2 * AAname2

i probi * AAnamei

4.9 Sequence Field Definitions

sequence Indicates that the file is a sequence file.

1 The dimensionality of the file. Sequence files can make use of more dimensions to

associate sequence positions with additional numerical information.

i The number of residues in the sequence.

comment = Text that indicates that the next field (lenAA) is the number of characters the

program should allocate for the amino acid names. 'ment =' is italicized to indicate

that only 'com' is needed to signal that the next field is 'lenAA'.

lenAA The maximum number of characters used in residue names.

lab Label to be used to identify sequence position numbers.

Q 'Q' = NULL place holder.

qual Quality of sequence determination (usually 1.0).

** Comment markers. Comment markers indicate that the text that follows on that

line is a comment and should be ignored by the program. Users are encouraged to

use comments to document the origin of the sequence files and each modification

that the files undergo. Most CONTRAST functions that modify a sequence or

spectrum file will append a comment to the file that tells what was done to the file

and the date it was done.

comments Any text that the user wants to include in the file.

1,2,,i Sequence position numbers. If there is ambiguity about the type of residue at a

sequence position, the sequence position number can be repeated at the end of the

file with alternative residue types. The probability value for the sequence position

should reflect this ambiguity.

prob# Probability that the #'th sequence position contains that residue type.

AAname# Name of the amino acid at the #'th sequence position. The name can be in any

desired format as long as the format matches that used elsewhere in the program.

One letter abbreviations, three letter abbreviations, and the entire names of the

standard 20 amino acids are understood and interconverted by CONTRAST.

4.10 Example Sequence File

The following is the sequence file for a hexapeptide. The third residue of the sequence is ambiguous and is thought to be either a glutamate or a glutamine residue.

seq

1 6

# Q 0.9

** Hex1 hexapeptide sequence.

** 9/9/99 by Fred

1 1 * Ala

2 1 * V

3 .6 * Q

4 1 * A

5 1 * Serine

6 1 * t

3 .4 * E

** Note that the id of residue 3 is ambiguous.

4.11 Macro Files

Macro files are ASCII files that contain a list of valid CONTRAST commands. The format for CONTRAST macro files is open and very simple. The only general requirements are that lines must be less than 1000 characters long, and lines can not contain more than one CONTRAST command. If a line contains more than one command the second command is generally ignored without causing a problem, but sometimes the second can interfere with the first command.

Each command has its own required format, but a few general rules apply to all CONTRAST commands:

1. Their first non-whitespace character must be the beginning of the command name. Leading whitespace is ignored.

2. Command names can be typed in as abbreviations, complete command names, or any partial command name in between (eg. 'q', 'qu', 'quit', and 'quitcontrastnow' will all quit CONTRAST).

3. Command names are case independent. (eg. 'q' and 'Q' will quit CONTRAST).

4. A command's fields are all delimited by whitespace (tabs and spaces).

5. The '->' marker can be used at the end of a line to indicate that the command is continued on the next line.

6. The '**' marker (comment marker) will cause the program to ignore the rest of the line.

7. All variables (marked by the '&' prefix) contained in a command are replaced by the values or text strings that they contain before the command is interpreted. Thus variables can be substituted for command names and/or command fields.

Chapter 5

Checking Input Files

CONTRAST input files should all be carefully checked before beginning a CONTRAST run. If the input spectra are not referenced correctly or if the peaks in the input spectra do not "line up", then this problem must be dealt with before proceeding with making assignments. The following macro provides a simple way to check the alignment of input spectra.

**Macro template for checking the alignment of i input spectra.

**NOTE: Make sure tolerances are conservative (large).

lf spec1.con ** Load input spectrum 1.

lf spec2.con ** Load input spectrum 2.

lf speci.con ** Load input spectrum i.

contrace 1, >contrace.mac ** Automatically build spin systems.

dtf >display.out ** Save internal buffers to file.

q ** Quit.

The Contrace function automatically finds the best way to correlate the input spectra. In this example it uses the first input spectrum as the starting point for searches. (The command "contrace 2, >contrace.mac" specifies that the second input spectrum be used as the starting point for searches.) The spectrum specified to be the starting point is called the source spectrum, and for the purposes of checking spectral correlation, the source spectrum should be spectrum with the most reliable referencing that overlaps the most with the other spectra. If you are unsure of which spectrum to designate as the source spectrum, don't specify a source (contrace >contrace.mac) and Contrace will determine a good source spectrum for you. The Contrace function and the macro it generates will be described in more detail in the next two sections.

The file ('display.out') created by running a macro similar to that shown above can be examined to determine if there are any problems with the input spectra. A simplified example of 'display.out' contents is shown below:

hnco Hn_N_hnca Hn_N_hncoca Hn_N_tocsy ... hnco Hn_N_hnca ...

----- --------- ----------- ---------- ----- ---------

peak1 peak18 peak100 ... peak2 peak149 ...

peak34 peak23 ... ...

peak190 ... ...

The buffers in the file are organized into repeating groups (fragments) based on the peaks of the source spectrum which in this case is hnco. Each fragment starts with the source buffer and ends right before the next source buffer. The buffers following the source buffer are named with prefixes (that represent the resonances that were used to search the spectra) that preceded the name of the spectrum that was searched. The peaks found in each buffer are all the peaks that matched the given resonances within a specified tolerance. It is not unusual for several peaks to be missing in a spectrum and thus for several buffers to be empty, but if very few of a spectrum's buffers contain peaks that correlate well to the peak in the source buffer, then there is a problem. Either the tolerances used are too small or there is a problem with the spectrum. Often times problems arise from using the wrong magnitude or sign for the sweep width when referencing. If this is the case the resonances near the center of that dimension's spectrum will often match but the resonance frequencies towards the edges of the dimension will be off by a considerable amount.

After major referencing problems have been corrected, attention should be given to choosing the best tolerances possible. Ideal tolerances are as small as possible, but not so small that legitimate correlations fall outside the tolerance range. It is helpful to subtract the correlated

resonances for a large number of fragments in order to get a good feel for what tolerances should be used in the spectrum files. The sum of the tolerances for the two spectra under consideration should be larger than most of the differences. If the average difference is not close to zero, then this could indicate another referencing problem. Referencing problems can be corrected using the operate function (section @.@) or the set function (section @.@), but it is not wise to use spectra to calculate assignments if there is an unknown problem with the referencing. There are also several commands in CONTRAST that calculate reference offsets automatically the most reliable being the align function (section @.@). Until you are familiar with working with peak lists, however, we recommend that you use the macro described above.

Chapter 6

Arithmetic Expressions and Booleans

Arithmetic expressions and Booleans must be able to access many different fields within the major data structures of the CONTRAST program. The sometimes combinatorial and sometimes synchronous nature of assignment algorithms adds to the complexity of the syntax of these expressions. This section first describes the system used for accessing CONTRAST's variables and data structures; next it describes CONTRAST arithmetic expressions; and finally it describes CONTRAST Boolean expressions.

6.1 Accessing CONTRAST lists

CONTRAST accesses three kinds of data which we will refer to as lists: spectra, buffers, and files. Spectra and buffers can be thought of as lists of peaks while files are lists of the lines of text that make up the file.

6.1.1 Spectrum Data Structures

 

A spectrum is a CONTRAST spectrum file that has been read into memory by the program. It consists of the header information, peak list, and any other information that becomes associated with the spectrum during the course of the CONTRAST session. Outside of arithmetic expressions and Booleans, spectra can be specified by name or by the cardinal number that corresponds to their position in the sequence of spectra read into CONTRAST. Within arithmetic expressions or Booleans, however, the name or number of the spectrum must be preceded by the spectrum symbol '$'. Examples are:

1 The first spectrum loaded.

$2 The second spectrum loaded.

cosy The spectrum named cosy.

hnca The spectrum named hnca.

Different fields within a spectrum are referred to by single character abbreviations preceding the spectrum symbol ('$'). If there are several fields of the same type (eg. dimensions in a spectrum) then a digit is appended to the abbreviation. The following is a partial list of the spectral fields that can be accessed using this method.

 

6.1.2 Fields of a Spectrum

di The coordinate of dimension i (where i = 1 to the number of dimensions)

i The intensity of a peak. (Note: d0 = i)

c The comment associated with a peak.

C The numeric value of the comment associated with a peak.

N Variable associated with the spectrum.

X Variable associated with the spectrum.

l The level (a variable) of the spectrum.

m The number of dimensions of a spectrum.

k The number of buffers associated with each peak.

w Current printed column width.

ti The tolerance for dimension i.

# The number of peaks.

The following examples show how different fields of a COSY spectrum (the third spectrum read into the CONTRAST program) are specified.

Examples

d1$cosy The frequency of the first dimension of a peak.

c$3 The comment associated with a peak.

l$cosy The level of the COSY spectrum

 

6.1.3 Buffer Data Structures

Buffers are internal working lists which contain peaks and any information associated with those peaks. Peaks are generally added to buffers by performing searches of spectra or other buffers. Multiple buffers are stored in the program in a linear list. Buffers can be added to and deleted from the program's linear list of buffers just as peaks can be added and deleted from individual buffers. Peaks from multiple spectra can be added to a single buffer. The command line designation of a buffer is its name or its position number in the list of buffers preceded by the '|' symbol (eg. |hncoBuff or |1). Buffer names should be alphanumeric although the # and @ can be used in special cases. Buffer names beginning with "|@" (e.g. |@hnca) must refer to buffers that are not linked to a particular peak in a source spectrum. Each peak in a buffer can have associated with it, in addition to all of the original information associated with it in the spectrum, the following fields (pieces of information).

6.1.4 Fields of a Buffer

# Number. The number of peaks in the buffer.

v Value. The first coordinate that wasn't matched in the search.

t Tolerance. The tolerance of that value's dimension.

n N. Integer variable.

x X. Real variable.

r Repeats. The number of different instances of that value in the buffer within that value's tolerance.

c Comment. The text comment associated with the peak.

C Comment number. The numeric value of the comment associated with the peak.

di Dimension i. The frequency of dimension i.

D Deviation. Score between numDims*0.2 and numDims*1.2 that rates how close the peak is to the target(s), where numDims*1.2 is the value of the best deviation (closest match) and numDims*0.2 is the worst deviation value (on the edge of the tolerance ranges).

s Score. Used by several routines to determine the rank of the peaks.

l Level. General purpose progress and scoring variable for the peak.

w wLevel. General purpose progress and scoring variable for the whole buffer.

6.1.5 Files

ASCII files can be accessed directly by the CONTRAST program. File names are specified with the '>' prefix (eg. >filename.txt). Fields in a file are considered are delineated by white space (spaces and tabs). Each field in a line is considered a dimension of that line and uses the same 'di' convention used by spectra and buffers. For example d3>filename.txt = "See" for the line, "See Spot. See Spot run." CONTRAST uses the same conventions for specifying a line or range of lines in a file as it does the peaks in a spectrum or buffer.

6.2 Designating Peaks or Lines in a List

Peaks or lines are specified by suffixes added to the field and list descriptors after a comma. Either a single peak (line) or a range of peaks (lines) can be referenced. If no peak or line is specified then the entire range is assumed. Boolean expressions will go through every peak or line in a range and evaluate the value of the expression automatically. The following is a list of peak specifiers.

 

6.2.1 Peak Specifiers

,i The i'th peak or line in a list.

,H The peak or line with the highest specified field value.

,L The peak or line with the lowest specified field value.

,b The first peak or line in a list.

,fi The first i peaks or lines in a list.

,e The last peak or line in a list.

,li The last i peaks or lines in a list.

,i-j The i'th peak or line through the j'th peak or line in a list.

,i+ The i'th peak or line through the last peak or line in the list.

 

6.2.2 Examples

i|fred,f4H the highest intensity of the first 4 peaks in buffer fred.

i$fred,f4 the intensity of the first four peaks in spectrum fred.

i|fred,l4 the intensity of the last four peaks in fred

d1>fred,4+ the first field of the fourth through the last lines in file fred.

i|fred,2-5 the intensity of the second through fifth (inclusive) peaks

v|fred,H the value of the highest valued peak in buffer fred

s|fred,L the lowest grade in buffer fred.

C|fred,e the numeric part of the comment from the last peak in buffer fred.

c|fred,1 the comment text string from the first peak in buffer fred.

d|fred,b the deviation of the first peak in buffer fred.

#$fred the number of peaks in spectrum fred.

w$fred the column width of the spectrum fred.

6.3 Arithmetic Expressions

CONTRAST arithmetic expressions are straightforward. They can appear in most CONTRAST expressions in which a variable or parameter is set to a discrete value. In Boolean expressions they can operate on sets and ranges of values as long as there is only one variable or less in each term of the Boolean. If a range is specified for a simple arithmetic expression, the function always uses the highest value in the range for the calculation. CONTRAST arithmetic expressions use a standard order of mathematical operations but the order can be controlled by use of parenthesis. Nesting of parenthesis is permitted. Use of white space within an arithmetic expression is optional except for a few situations -- namely that the '+' and '-' operations should be preceded by white space if they follow immediately after a list expression. A list of arithmetic and text string operators follows. The accompanying examples assume the following: #|hnca = 2, d1|cosy,1 = 8.5, and c$hnca,1 = "His23Ca2". Boolean operators will be discussed in the next section.

 

6.3.1 Arithmetic Operators

+ Addition 4 + #|hnca = 2

- Subtraction d1|cosy,1 - 2 = 6.5

/ Division 10/4 = 2.5

* Multiplication #|hnca*d1|cosy,1 = 17

^ To the power of 4 ^ 3 = 64

% Modulus 5 % #|hnca = .5

sin Sine (in degrees) sin(90) = 1

cos Cosine (in degrees) cos(90) = 0

tan Tangent (in degrees) tan(180) = 0

log Logarithm base ten log(1) = 0

ln Natural logarithm ln(d1|cosy,1) = 2.14

 

6.3.2 Text Operators

 

vali(text) The ith numeric part of text. val2("fr2ed4.1") = 4.1

+ Union "fred" + "ted" = "fredted"

^ Intersection "fred" ^ "ted" = "ed"

- Delete Intersection "fred" - "ted" = "fr"

* Number of Intersections "freded" * "ed" = 2

/ Remove Characters "fred" / "det" = "fr"

% Remove all but characters "fred" % "det" = "ed"

6.3.3 Example Arithmetic Expressions

(#|hnca*(d1|cosy,1 + .5))+2 = 20

val2(c|hnca,1) * 10 = 20

C|hnca,1 - 3 = 20

10 * (c|hnca,1 * "2") = 20 His23Ca2

val1(c|hnca,1 - "2") = 3

cos( val1(c|hnca,1/"ABC")-52) = -1

 

6.4 Boolean Expressions

Booleans are expressions that reduce to 1 (meaning true) or 0 (meaning false). Many different CONTRAST functions use Boolean expressions to determine whether or not the function will be executed for a particular value, peak, or line. CONTRAST uses a versatile Boolean format that allows sets, ranges, "boxes", and variables to be coded into an expression so that one expression can be evaluated for many different arrangements of data.

Boolean expressions are always marked by enclosure in parenthesis (). If a command contains both a Boolean expression and a separate mathematical expression that uses parenthesis, the Boolean expression must be listed first. In the following example the Boolean is "(d1|hnca>3)".

set level |hnca (d1|hnca>3) += (47 / i|hnca)

The Boolean in the preceding example is straightforward. The level of each peak in the hnca buffer whose d1 value is greater than 3 is incremented by 47 divided by the intensity of that peak. Since no specific peak in the hnca buffer is specified, the Boolean is evaluated for each peak in the buffer. The levels of only those peaks for which the Boolean evaluates to 'true' are incremented.

CONTRAST Booleans can combine an unlimited number of expressions by using the conjunctions '||' (or) and '&&' (and). For instance the following command uses a

Boolean composed of three parts.

set level |hnca ( l|hnca = 2 || (d1|hnca>3 && d2|hnca <= 9) ) += (47 / i|hnca)

In this Boolean the level of an HNCA peak will be incremented if the peak's level is currently equal to 2 or ('||') if the d1 value of the peak is greater than 3 and ('&&') the d2 value of the peak is less than or equal to 9. Note that expressions must be combined with conjunctions. Expressions such as " x > y > z " are not permitted in CONTRAST. Note also that some CONTRAST functions have not yet been implemented with "short-circuit logic". Short circuit logic allows the program to skip evaluating the rest of a Boolean when the expression is guaranteed to evaluate to true or false. In the above example if the level of an HNCA peak is equal to 2, then the full Boolean is guaranteed to evaluate to true so the program does not need to continue by testing the d1 and d2 values of the peak. Since several functions including the set function do not use short-circuit logic, we recommend that the user avoid writing Booleans that rely on this feature.

CONTRAST Booleans often compare values from different lists. These comparisons can be made synchronously or combinatorily. The preceding example used a synchronous mechanism for making comparisons. It was understood that each time the hnca buffer was referenced in the Boolean, that it referred to the same peak. The following Boolean also uses a synchronous mechanism, but this time it is not so obvious.

set level |fred (d1|fred,f5 > 3 && d1|tom,f5 <= 8) += 2

In this example when the first peak of buffer fred is being compared to 3, the first peak of buffer tom is being compared to 8, then the second peaks in each buffer are compared, the third, and so on. The above expression is equivalent to the following 5 expressions.

set level |fred (d1|fred,1 > 3 && d1|tom,1 <= 8) += 2

set level |fred (d1|fred,2 > 3 && d1|tom,2 <= 8) += 2

set level |fred (d1|fred,3 > 3 && d1|tom,3 <= 8) += 2

set level |fred (d1|fred,4 > 3 && d1|tom,4 <= 8) += 2

set level |fred (d1|fred,5 > 3 && d1|tom,5 <= 8) += 2

Synchronous expressions are signaled by using double conjunctions or operators. If a single '&' symbol had been used, a combinatorial comparison would have been performed. The following is an example of the use of a combinatorial conjunction.

set level |fred (d1|fred,f2 > 0 & d1|tom,f3 <= 8) += 10

 

In this example each of the first 2 peaks in fred is compared to zero once for each of the first three peaks in tom. In this case the level of one of the peaks in fred can be incremented by as much as 30 (3 * 10). This expression is equivalent to the following 6 commands.

set level |fred (d1|fred,1 > 0 & d1|tom,1 <= 8) += 10

set level |fred (d1|fred,1 > 0 & d1|tom,2 <= 8) += 10

set level |fred (d1|fred,1 > 0 & d1|tom,3 <= 8) += 10

set level |fred (d1|fred,2 > 0 & d1|tom,1 <= 8) += 10

set level |fred (d1|fred,2 > 0 & d1|tom,2 <= 8) += 10

set level |fred (d1|fred,2 > 0 & d1|tom,3 <= 8) += 10

All fields in a Boolean from the same list are automatically synchronized even if combinatorial operators are used. The following is an example of a case in which fields that are synchronized even though a combinatorial conjunction ('&&') is specified.

set level |fred (d1|fred,f2 > 0 & d2|fred,f2 <= d1|tom,f3) += 10

This expression is equivalent to the following 6 commands.

set level |fred (d1|fred,1 > 0 & d2|fred,1 <= d1|tom,1) += 10

set level |fred (d1|fred,1 > 0 & d2|fred,1 <= d1|tom,2) += 10

set level |fred (d1|fred,1 > 0 & d2|fred,1 <= d1|tom,3) += 10

set level |fred (d1|fred,2 > 0 & d2|fred,2 <= d1|tom,1) += 10

set level |fred (d1|fred,2 > 0 & d2|fred,2 <= d1|tom,2) += 10

set level |fred (d1|fred,2 > 0 & d2|fred,2 <= d1|tom,3) += 10

Note that the two fields of |fred are synchronized, but that the |fred and |tom lists are compared combinatorily. In order to synchronize d2|fred,f2 and d1|tom,f3 we must use the "synchronous

less than or equal to" operator ("<<=" or "<<=="). Doubling Boolean operator symbols makes the two operands of the operator synchronous just as doubling Boolean conjunctions makes the left and right hand sides of the expressions synchronous. In the following example a synchronous operator is used to synchronize d2|fred,f2 and d1|tom,f3 in the expressions above.

set level |fred (d1|fred,f2 > 0 & d2|fred,f2 <<= d1|tom,f3) += 10

 

This expression is equivalent to the following two expressions.

set level |fred (d1|fred,1 > 0 & d2|fred,1 <= d1|tom,1) += 10

set level |fred (d1|fred,2 > 0 & d2|fred,2 <= d1|tom,2) += 10

Note that the third peak in |tom is never used since the first two peaks in |fred were specified and |tom was synchronized to |fred.

The default synchronization behavior of fields in a Boolean can be over-ridden by appending "i" suffixes to the field descriptions. The following is an example of the use of such suffixes.

set lev |fred (d1|fred,f2 > d1|tom,f2 && d2|fred,f2i2 = d2|tom,f3i1 && i|fred,f2i > 0) += 1

The following expressions are equivalent to the expression above.

set lev |fred (d1|fred,1 > d1|tom,1 && d2|fred,1 = d2|tom,1 && i|fred,1 > 0) += 1

set lev |fred (d1|fred,1 > d1|tom,1 && d2|fred,1 = d2|tom,1 && i|fred,2 > 0) += 1

set lev |fred (d1|fred,1 > d1|tom,2 && d2|fred,2 = d2|tom,1 && i|fred,1 > 0) += 1

set lev |fred (d1|fred,1 > d1|tom,2 && d2|fred,2 = d2|tom,1 && i|fred,2 > 0) += 1

set lev |fred (d1|fred,2 > d1|tom,1 && d2|fred,1 = d2|tom,2 && i|fred,1 > 0) += 1

set lev |fred (d1|fred,2 > d1|tom,1 && d2|fred,1 = d2|tom,2 && i|fred,2 > 0) += 1

set lev |fred (d1|fred,2 > d1|tom,2 && d2|fred,2 = d2|tom,2 && i|fred,1 > 0) += 1

set lev |fred (d1|fred,2 > d1|tom,2 && d2|fred,2 = d2|tom,2 && i|fred,2 > 0) += 1

If all of the terms that contain field descriptions in a Boolean are numbered from n = 1 to N, then the number n is used after an 'i' suffix to specify the field description that the expression is synchronized to. If no n value is specified after an 'i' suffix, then the containing expression is made independent (a combinatorial operation). In the above example the third field "d2|fred,f2i2" is synchronized to the second field "d1|tom,f3" and the fourth field "d2|tom,f3i1" is synchronized to the first field "d1|fred,f2". The last field is independent. If the 'i' suffix had not been added to the field description, then the last field would have been synchronized to the first field since they make reference to the same list.

6.4.1 Boolean operators and conjunctions

> combinatorial "greater than"

>= combinatorial "greater than or equal to"

< combinatorial "less than"

<= combinatorial "less than or equal to"

= combinatorial "equals"

!= combinatorial "not equal"

<> combinatorial "within a tolerance of "

>< combinatorial "outside a tolerance of "

& combinatorial "and"

| combinatorial "or"

>> synchronous "greater than"

>>= synchronous "greater than or equal to"

<< synchronous "less than"

<<= synchronous "less than or equal to"

== synchronous "equals"

!!= synchronous "not equal"

<<>> synchronous "within a tolerance of "

>><< synchronous "outside a tolerance of "

&& synchronous "and"

|| synchronous "or"

Tolerance operators contain a tolerance values embedded in the operator. This value can take the form of a constant, a variable, a field, a range, a set, or a box just like normal Boolean operands. (Sets and boxes will be described in a subsequent section.) If a field description is used as a tolerance, it is good practice to specify synchrony directly using the 'i' suffix unless the field makes reference to a list referenced elsewhere in the Boolean. The following is an example of an expression that uses tolerances.

set lev |fred (d1|fred,f2 <.02> d1|tom,f3 && d2|fred >t|Hai,1< d2|tom) += 1

 

Boolean expressions can contain mathematical expressions as well as field descriptions and constants. The only limitation is that no term in the expression can contain more than 1 range, set, or box. The following is an example of a Boolean expression in which arithmetic expressions occur.

set lev |fred (cos(d1|fred,f2*2)+8 <.02> 8.2 && val2(C|fred)+6 >t|Hai,1/2< d2|tom) += 1

 

If the Boolean of a command is preceded by a NOT symbol '!', then the set of peaks or lines for which the Boolean does not evaluate to true is operated on by the command. In this special case the NOT symbol '!' performs a complementarity operation rather than the negation operation that it typically performs. For example in the command

 

set level |hnca !(d1|hnca>3) += 10

the level of each peak in |hnca that has a d1 value less than or equal to 3 is incremented by 10.


Chapter 7

An Adaptable Fully Automated Assignment Macro

This section contains an overview of the simplest and most automated assignment procedure available in CONTRAST. The procedure is implemented as a simple 6 part macro that can be used for most data sets with minimal modification. The performance of the algorithm is highly dependent on the type and quality of the data. The program always makes all possible assignments given the input data set, even when the data is insufficient to make an assignment. Therefore the output produced by the procedure should always be carefully checked and the evidence for every assignment should be examined and evaluated.

Figure 7.1 is an information flow diagram of the main steps in the fully-automated assignment procedure. The main body of the assignment program consists of three functions which generate CONTRAST macros for the user (Contrace, Reside, and Overlap) and a single function (AnnBF) that generates sequential assignments based on the output of the previous three functions. Arrows in the diagram represent the flow of information from one function to another.


Figure 7.1
 

 

The fully-automated approach to assignments is illustrated using sample macros written for two very different data sets. The first macro is written for a 2D homonuclear data set consisting of three experiments: COSY, TOCSY, and NOESY.

7.1 Fully-Automated 2D Macro

lf cosy.con

lf tocsy.con

lf noesy.con

lf seq.con

exe shifts.mac

contrace >contrace.mac -n -F

overlap 5 >overlap.mac

annbf 5, -l -x3

stf 5 >output.file

The next macro is written for a 3D heteronuclear data set consisting of 9 experiments:

HNCO, HNCA, HN(CO)CA, HN(CO)CACB, HNCACB, HCACO, HN-TOCSY-HMQC, HCCH-COSY, and HCCH-TOCSY.

7.2 Fully-automated 3D Macro

lf hnco.con

lf hnca.con

lf hncoca.con

lf hncocacb.con

lf hncacb.con

lf hcaco.con

lf hntocsy.con

lf hcchcosy.con

lf hcchtocsy.con

lf seq.con

exe shifts.mac

contrace 1, >contrace.mac -n -F

overlap 1 >overlap.mac

annbf 1, -l -x3

stf 1 >output.file

A comparison of the two macros shows that the main difference between them is the input data. The first step in both macros is to load the data into the program. The first three lines in the 2D macro and the first 9 lines in the 3D macro simply read the peak lists into the program, and the next line reads in the protein sequence. This step has already been described in Section @4.

In the next step a macro is executed which contains a database of the characteristic chemical shifts of the common amino acids. This database is experiment independent and should contain as much information as possible about the distribution of chemical shifts. The chemical shift database is described in Section 8.

The next step is the heart of the CONTRAST automated assignment procedure. The Contrace command generates a strategy for assembling spin systems using data from the input spectra and the chemical shift database. The strategy generated by the Contrace routine is output as a CONTRAST macro (in the cases above named contrace.mac). The function implements the strategy as it is being generated. The result of the function is a list of buffers that contain the modified results of searches and other manipulations of the data. These buffers are grouped into fragments that roughly correspond to amino acid spin systems.

The starting point for each fragment is a peak from a "source" spectrum. There is a one to one correspondence between the peaks of the source spectrum and fragments. The ideal source spectrum meets all of the following criteria:

1) The source spectrum is of high resolution and is well-referenced.

2) The source spectrum is very complete -- very few peaks are missing.

3) The source spectrum can be correlated to peaks from the other spectra.

4) The source spectrum contains one correlation (peak) per residue.

5) The source spectrum is relatively noise free; there are very few extra peaks.

 

These criteria should be taken as ideals which can be used to govern the choice of a source spectrum. They are ordered in order of decreasing importance.

In the 2D macro above the selection of the source spectrum was left to the Contrace function. In the case above the function generally constructs a spectrum from the Hn,Ha or fingerprint region using peaks from the COSY and TOCSY spectra. This spectrum is added to the list of spectra and becomes spectrum 5 (the sequence is treated as if it were a spectrum). The references to "5" in the following commands all refer to the newly created source spectrum. On the other hand, the HNCO spectrum (spectrum 1) is specified to the Contrace function as being the source spectrum. If it had not been specified, a new source would have been constructed from either the HNCOCA or HNCO spectra, and any missing peaks would have been filled in by the other spectra.

Each fragment starts off with the peak from the source spectrum which yields the first 2 (in the case of a 2D source) or 3 (in the case of a 3D source) assignments. A series of search and filter steps creates additional buffers (lists of peaks) within a fragment. These buffers are called working buffers, because they are used to build assignment buffers which are special buffers named for the resonance assignment that they contain. One of the chemical shift dimensions of the first peak in the assignment buffer is the actual frequency assignment for the resonance.

The Contrace function stops when there is an assignment buffer for each resonance mentioned in the correlation lists of the input spectra. Generally there is not enough information in the spectra to correctly assign all the resonances and usually the assignments of the last assignment buffers are the most uncertain. Spin systems are usually assigned all the way out to the epsilon position for every residue in the protein. The fragments can be considered to be "fuzzy" since they contain alternate assignments, and since no hard-fast endpoint decisions are made at this point of the analysis.

The next step of the macros is the "overlap" step. The Overlap function generates what are known as overlap tests which will be used in the sequential assignment step to score the likelihood that two fragments are derived from sequential residues. These overlap tests are generally very simple. They consist of commands that award points when resonances from overlapping assignment buffers are within a specific tolerance of one another. Overlapping assignment buffers are assignment buffers from two different fragments that are expected to contain the same resonance. For example the "previous Ca buffer" generated from a peak in the hn(co)ca spectrum should contain the same Ca resonance as the "Ca buffer" generated from a peak in the HNCA spectrum from the previous residue in the sequence. When NOESY spectra are used to score for sequential fragments, working buffers containing NOESY peaks are used in addition to assignment buffers in making overlap tests.

 

The next step of the automated assignment macros is the shuffling step in which the fragments created by the Contrace function are shuffled into the correct sequential order using the sequence of the protein. In this example the annbf (best first simulated annealing) algorithm is used to shuffle the peaks. This function uses the overlap tests generated by Overlap to place fragments in the correct order, and it uses the chemical shift database to match fragments to the correct positions in the sequence. The shuffling routine can also use other tests for matching fragments to sequence positions. These tests can be written by hand or automatically generated by the Reside function. In this simple case we do not illustrate the use of such tests, but they are often very helpful in identifying the amino acid type of a fragment.

The last step in the automatic assignment process is to write the output of the program to a file. The function stf (shuffle to file) writes the contents of all of the buffers that make up the fragments into the file "output.file". The fragments are written in the sequential order determined by the shuffling routine and are labeled with the name of the residue and the sequence position of the corresponding amino acid in the protein. Alternate orderings and ambiguity factors are indicated. The output file format will be discussed in more detail in a later section.

The assignment macros shown above are the bare minimum necessary for automated assignment. The commands shown above are usually supplemented with other functions that provide additional scaling information, amino acid type tests, and error checking routines. More complete macros are distributed with the CONTRAST executable. These macros have been annotated to document the use of the "extra" functions.


Chapter 8

Chemical Shift Database

8.1 Set Shift Format

The CONTRAST chemical shift database is a series of CONTRAST set shift commands that is read into the program as a CONTRAST macro. The set shift command allows the user to set the amino acid type, atom (resonance) type, chemical shift range and probability value for that range. The format for the command is as follows:

set shift AAname Resonance LoChemShift [-] HiChemShift [Prob]

AAname The name or abbreviation of the amino acid or amino acid group for which the

chemical shift information holds. The name should correspond to the name used in

the sequence.

Resonance The resonance code of the atom to which the chemical shift information applies.

LoChemShift The lower bound of the chemical shift range.

HiChemShift The upper bound of the chemical shift range.

Prob A probability value between 0.0 and 1.0

Set Shift Examples

The following group of set shift commands is an example of a typical entry for the alpha carbon of alanine residues.

set shift A Ca 48-54

set shift A Ca 48-50 0.1

set shift A Ca 50-52 0.6

set shift A Ca 52-54 0.3

This example highlights several important points. In the first line the entire range of allowed chemical shifts is given without a probability value and the next three lines break up that chemical shift range into smaller subranges that contain probability values for each subrange. This allows CONTRAST to use the chemical range information in two different ways. When probability values are given, CONTRAST uses the subranges to automatically calculate probability-based amino acid type scores during the sequential assignment step. Both the Contrace and Reside functions use full ranges that do not include probability values to perform connectivity tracing and amino acid test generation respectively. If all set shift commands contain probability values, then Contrace will not use chemical shift ranges to trace spin systems and Reside will not generate amino acid tests. If none of the set shift commands contain probability values then probability-based amino acid type scoring will not be performed.

8.2 Probability Values

The algorithm that generates probability-based amino acid type scores during sequential assignment can be used with true probability values for the chemical shift subranges, but its performance is improved considerably when the probability values are normalized so that the highest probability value for each resonance is given a value of 1. Using this function the preceding examples would thus be converted to:

set shift A Ca 48-54

set shift A Ca 48-50 0.167

set shift A Ca 50-52 1.0

set shift A Ca 52-54 0.5

8.3 Amino Acid Names

Amino acid names used in the set shift statement should match the amino acid names used in the input sequence file, but they need not be limited to standard nomenclature. In order to distinguish a particular amino acid in the sequence from other like amino acids simply use a different name. For example two serines in the sequence could be named "Sx" and "Sy" respectively. In this case the standard information in the chemical shift database would no longer apply, and the user would have to include a set of chemical shift ranges for amino acids named "Sx" and "Sy". NOTE: The three standard names for each of the standard 20 amino acids are interconverted. For example "cysteine", "cys", and "c" are all considered equivalent. Furthermore amino acid names are case-insensitive so that "Cysteine", "cysteine", "CYS", "Cys", "C", and "c" are all considered equivalent.

Non-existent Chemical Shift Ranges

Amino acid resonances for which no chemical shift information is given are ignored by the CONTRAST program in subsequent steps which require chemical shift information. Thus Reside does not include these resonances in the amino acid tests that it generates, and Contrace does not use the chemical shift ranges for these resonances in tracing spin systems. In general it a good practice to comment out set shift commands for resonances that are not very likely to be assigned correctly by the program. The ability of the program to assign a resonance correctly is dependent on the amount of experimental data available to the program and on the number of resonances which must be correctly assigned before the resonance in question can be assigned. In general the more intervening bonds between the starting resonances (from the source spectrum) and the resonance in question, the more uncertain will be the assignment of that resonance and the more important it is for the set shift commands for that resonance to be commented out.

 

NULL Chemical Shift Ranges

A chemical shift range can be explicitly set to NULL as in the following examples:

set shift G Cbi2 NULL

set shift G Cbi2 Q

This is not the same as simply not including or commenting out a set shift command for that particular resonance. NULL chemical shift ranges tell the program that there should NOT be a resonance of that type. The lines above tell the program that glycines do not contain beta carbons. This information is used by CONTRAST to penalize amino acid type assignments if the spin system being assigned contains a resonance that the set shift command indicates should be NULL.

Chapter 9

Contrace

9.1 Contrace Overview

The Contrace function generates and implements a strategy for tracing spin systems. The function creates a set of buffers that contains the non-specific assignments for resonance types in each spin system, and it creates a set of CONTRAST macros that produces identical results when executed. The macros created by Contrace can be modified to fine tune the Contrace strategy. The following sections break the operation of the Contrace function into its components and describe the types of commands that the strategy uses in order to create the connectivity tracing macro. More information about the functions that Contrace uses can be found in Appendix @.

9.2 Contrace Options

The format for the Contrace command is as follows.

contrace [source] [>filename] [-n,r,h,m] [-F,D,N,P,C] [-f] [-g] [-a] [-x] [-devi,devo] [fuzz]

source The name or number of the spectrum to be taken as the source spectrum. If no source spectrum is specified, then the Contrace program will create a new source spectrum using the input spectra.

filename The file name of the CONTRAST macro that will be generated by Contrace. If no file name is specified, then the default name contrace.mac will be used.

fuzz The fuzziness factor expressed as a percentage. This value affects the severity of the chemical shift range and other types of filtering used in the Contrace strategy. The default value is 0 (no fuzziness).

-n,r,h,m These flags determine the method used to calculate which resonances should be assigned first. Only one flag in this group may be included.

-n Probability calculations in which noise is not considered. (default)

-r Rigorous probability calculations. (very slow)

-h Heuristic calculations.

-m Automatically chooses between rigorous and "no noise" calculations.

-F,D,N,P,C These flags determine the method of fragment filtering used in the calculations. Fragment filtering is a method of eliminating an assigned resonance from further consideration so that assigned resonance values are not reassigned in subsequent positions in the spin system. Only one flag in this group may be included.

-F Always do fragment filter to eliminate resonance from consideration.

-D Fragment filter only when determined necessary (default).

-N No fragment filtering.

-P Percent fragment filtering to reduce likelihood of value reassignment.

-C Constant fragment filtering.

-f Fill the source spectrum using peaks from an overlapping spectrum. Source is not filled by default.

-g Glycine filter source spectrum. If the source spectrum is the fingerprint region of a spectrum, it may contain two peaks for each glycine. This flag eliminates one of those peaks. The default behavior of the program is to do no glycine filtering.

-a Arginine filter the source spectrum. If the source spectrum is taken from an HNCO spectrum, it may contain peaks from arginine side chains as well as backbone peaks. This flag causes the program to filter those extra peaks. The default behavior of the program is not to perform arginine filtering.

-x Perform setx cross-checking.

-devi In line deviation filtering is performed. (All deviation filtering is done as each assignment is made.) The default behavior is for no deviation filtering to be done.

-devo Out of line deviation filtering is performed. (All deviation filtering is done at the end of the macro.) The default behavior is for no deviation filtering to be done.

9.3 Preprocessing Spectra

The Contrace function examines the spectra that have previously been read into the CONTRAST program using the lf function and prepares them to be used by the Contrace algorithm by applying cluster filters, diagonal filters, and symmetrize algorithms where needed. Since the Contrace function is often run more than once on the same data sets, it is possible to skip these operations by inserting comments in the spectrum files. If the Contrace function detects the key word "cluster", "diagonal", or "symmetrize" in the comments of the header of one of the spectra, then the corresponding preprocessing function will be skipped for that spectrum.

 

9.3.1 Cluster Filters

 

Cluster filters are often necessary when a spectrum has been peakpicked with a peakpicking function that picks multiple maxima of a single signal as if they arose from separate peaks. Single peaks can appear to be multiplets because of incomplete decoupling and noise spikes that create irregularities on the surface of the peak. Contrace detects these peaks by taking weighted averages of the coordinates of peaks that lie within calculated resolution-dependent tolerances of one another.

 

The following is a representative example of the cf (cluster filter) function used to detect and average these multiplets.

cf hncocacb (%1 <0.009> d1 && %2 <0.09> d2 && %3 <0.09> d3) -b

 

In this example peaks in the spectrum, hncocacb, are compared to one another using the Boolean. The expressions containing percent '%' symbols represent target dimension values taken from one peak and the expressions containing dimension 'd' symbols are dimension values taken from the other peak. The above Boolean evaluates to true if and only if the first dimension coordinate from peak x (%1) lies within a tolerance of 0.009 units (<0.009>) from the first dimension coordinate from peak y (d1) and the second dimension of peak x (%2) lies within a tolerance of 0.09 units from the d2 dimensions of peak y and the third dimension (%3) of peak x lies within a tolerance of 0.09 units of the d3 dimension of peak y. The '-b' flag at the end of the function call indicates that for each peak in the spectrum, that peak and the best matching peak will be averaged before the second, third, fourth, etc. matches are averaged.

Cluster filters generated by Contrace are most often modified by adjusting the tolerances in the angled brackets of the Boolean. The tolerances used by Contrace are conservative so users will often find it necessary to increase the tolerances considerably.

 

9.3.2 Diagonal Filters

 

Diagonal filters are applied when Contrace determines that a spectrum should contain a symmetric diagonal. This determination is made based on the list of correlations input in the header of each spectrum. Peaks that have coordinates within a calculated resolution-based tolerance of the calculated position of the diagonal are deleted.

 

The following is a representative example of a diagonal filter.

filter cosy cosy (%1 <.009> d2)

 

The diagonal filter is a special form of the more general filter command which can compare peaks from up to two different spectra or buffers. The peaks in the first listed spectrum or buffer are deleted if the Boolean evaluates to true. The dimension fields from the first spectrum or buffer are referred to using the '%' symbol and the dimension fields from the second listed spectrum or buffer are referred to using the 'd' symbol. The first operand in each operation expression (i.e. the operand to the left of the operator) refers to the first listed spectrum or buffer while the second operand in each operation expression (i.e. the operand to the right of the operator) refers to the second listed spectrum or buffer. Other fields from the two lists are referenced explicitly (eg. c$cosy,1 or l$cosy).

 

In the special case of the diagonal filter, the first and second lists are the same so that in the example above %1 refers to the first dimension of a peak in the COSY spectrum while d2 refers to the second dimension of the same peak in the COSY spectrum. If the indicated coordinate values of the peak are within a tolerance of 0.009 units of one another, then that peak is deleted.

The tolerances used by the Contrace algorithm for diagonal filters are conservative so the user might find it necessary to insert larger tolerances. In general Contrace is conservative in its choice of tolerances whenever it is asked to delete or obscure information.

 

9.3.3 Symmetrize

 

Many spectra contain peaks that are related by symmetry in such a way that one peak contains exactly the same information as its symmetric partner. Symmetry relations in spectra could be used to filter peaks that have no symmetric partners, if it were not for the many circumstances that conspire together to obscure one or both peaks in a symmetric pair. For this reason the symmetrize operation used by Contrace is not a filter, but a mechanism for ensuring that each peak in a symmetric spectrum has a symmetry-related partner that contains identical information. Towards this end the symmetrize operation adjusts the intensities and positions of symmetric pairs of peaks so that each peak in the pair contains the same frequency and intensity information. The function also creates symmetric partners for peaks for which no partners exist. The Contrace function uses the correlation list in the header of each spectrum file to determine which spectra contain symmetry-related dimensions and applies the symmetrize function to those selected spectra. The following is a simple example of a call to the symmetrize function.

sym cosy (%1 <0.02> d2 && %2 <0.02> d1) -b ** Symmetrizes 2D spectrum.

ws cosy >cosy.sym ** Writes symmetrized 2D spectrum.

The symmetrize function (sym) in this example searches the peaks of a COSY spectrum for pairs of peaks for which the Boolean evaluates to true. The first operand of each operation expression refer to one peak while the second operand refers to a different peak. In this case if the first dimension (%1) of peak 1 is within a tolerance of 0.02 units of the second dimension (d2) of peak 2 and the second dimension (%2) of peak 1 is within a tolerance of 0.02 units of the first dimension (d1) of peak 2, then the coordinates and the intensities of the two peaks will be averaged and adjusted so that each peak contains the same identical information. The -b flag instructs the algorithm to sample all possible partners for peak 1 and choose only the best matching partner. If no symmetric partners are found for a peak then a symmetric partner is created for it using information from the existing peak.

The write spectrum (ws) function is then used to save a copy of the modified spectrum to a file named cosy.sym.

9.4 Selecting a Source Spectrum

If a source spectrum is not specified in the call to the Contrace function, then the Contrace function will create a source spectrum from the spectra input into the CONTRAST program. The Contrace function searches the input spectra for the most complete spectrum that contains the greatest number of correlated resonances in common with the greatest number of other spectra. The ideal spectrum has either a single correlation so that there is a one to one correspondence between the peaks in the spectrum and the residues in the protein, or it contains one correlation that can readily be distinguished from other correlations using the input database of chemical shift ranges. In the following example the fingerprint region of a COSY spectrum is singled out as the best starting point for the source spectrum.

scan cosy (d1 <3> 9 && d2 <1.35> 4.45) |source -f ** Create new source buffer.

 

Here we see that the COSY spectrum is searched for peaks with d1 values in the range of 6 to 12 ppm and d2 values in the range of 3.10 to 5.8 ppm. The peaks are placed in a buffer named |source and the contents of the buffer are filtered (-f) to ensure that multiple entries of the same peak are not placed in the buffer.

The above Scan function creates a source buffer but not a source spectrum. The following example shows how a source buffer is converted to a source spectrum.

bts |source $source ** Create new source spectrum.

Here we see that the buffer to spectrum function (bts) copies the contents of the source buffer to a spectrum with the name, $source. This new source spectrum is added to the end of the list of input spectra.

The Contrace function then uses the write spectrum (ws) function to save a copy of the newly created source spectrum to a spectrum file named "source.tmp" as follows.

ws source >source.tmp ** Write new source spectrum.

The function then deletes the source buffer using the delete (del) function.

del |source ** Delete source buffer.

If the Contrace function is called with a -f flag, then any gaps in the source spectrum will be filled in with peaks from another overlapping spectrum. In the following example a TOCSY spectrum is used to fill in any gaps in the source spectrum which was taken from the fingerprint region of a COSY spectrum.

 

fill source tocsy !(%1 <.02> d1 && %2 <.02> d2 || 9 >3< d1 || 4.45 >1.35< d2)

In this example we see that the complement of the set of peaks for which the Boolean evaluates to true is added to the source spectrum. Thus if the d1 dimension of a TOCSY peak is within a tolerance of 0.02 units of the first dimension of the source spectrum (%1) and the d2 dimension of the TOCSY peak is within 0.02 units of the second dimension of the source peak (%2) or if the first dimension of the TOCSY peak is outside the range of 6 to 12 units or if the second dimension of the TOCSY peak is outside the range of 3.1 to 5.8 units, then the TOCSY peak is not added to the source spectrum. All other TOCSY peaks for which the Boolean evaluates to false, however, are added to the source spectrum.

If the source spectrum is a 3D or higher dimensional spectrum, then it is not likely that another spectrum will contain a correlation that matches the source correlation in every dimension, but it is often only necessary for a substantial subset of the dimensions to match. The following is an example of how Contrace uses an HNCOCA spectrum to fill a source spectrum created from peaks from a HNCO spectrum.

 

fill source hncoca !( %1 <.03> d1 && %2 <.15> d2) d3=0.0

In this case only the first two dimensions were required to match while the third dimension (the alpha carbon dimension) of each filling peak in the HNCOCA spectrum is set to 0.0 before it is copied to the source spectrum. This is an acceptable substitution for a missing HNCO peak since only the amide proton and nitrogen dimensions of the HNCO spectrum are held in common with the other experiments input into the CONTRAST program.

9.5 Creating Fragments

Each peak in the source spectrum is the seed for a group of assigned resonances called a fragment. There is a one to one correspondence between the peaks in the source spectrum and the fragments, and in the ideal case there is a one to one correspondence between the fragments and the amino acids in the protein. The sa (ScanAll) function is used to create one buffer for each peak in the source spectrum. In general functions that have an "all" suffix repeat a basic procedure for every peak in the source spectrum which is usually specified by number (or name) immediately following the function name. The following command uses the fifth spectrum as the source spectrum and searches the spectrum named "source" for peaks that have d1 values that match the first dimension of peaks in the source spectrum (%1) within a tolerance of 0.02.

 

sa 5, source (%1 <0.02> d1 ) -f ** Search source using peaks in source and filter (-f) the results.

 

The result of this function is that every peak in the source spectrum spawns a new buffer which becomes the seed of a new fragment. In this case the spectrum searched is the source spectrum itself. This ensures that every buffer created contains at least one peak. Each buffer created is called the source buffer for the its fragment. The value corresponding to each dimension of the best peak in a source buffer is the assignment for the resonance that the dimension represents. Thus a source buffer is a special assignment buffer (a buffer that contains the assignment for one or more resonances).

 

After Contrace creates a source buffer for each fragment, it uses the setall function to set the 'n' (endpoint) field associated with each peak to 1.

setall 5, n |source () = 1 ** Endpoint Determination

The 'n' field is a general integer field that the Contrace function uses for the purpose of indicating whether the assignment in an assignment buffer should be trusted. The 'n' field is given a value of 1 if the assignment is to be used in further calculations or it is given a value of 0 to indicate that further calculations should not be based on that assignment. This is necessary since Contrace traces every fragment to the extent of the longest spin system possible given the data. Since alanine side-chains should not produce valid assignments at the gamma and delta positions, it is desirable that Contrace be able to warn the user using the 'n' field. NOTE: Although it is usually safe to set the endpoint fields of all source buffers to 1, endpoint determination for other assignment buffers is one of the least reliable parts of the Contrace calculation and should always be considered with suspicion.

The following is an example of the lines generated by Contrace to create fragments from a 3D HNCO source spectrum.

sa 1, hnco (%1 <0.02> d1 && %2 <0.2> d2 ) -f

setall 1, n |hnco () = 1 ** Endpoint Determination

9.6 Creating Primary Working Buffers

Working buffers are buffers that contain peaks from a single search. It is not necessary to refer to working buffers while recording assignments, but Contrace saves all working buffers to provide a record of what information was used to make the assignments. Working buffers are created by search functions. If a buffer name is specified after the search Boolean, then the list of buffers is checked for a buffer that has that name and the matching peaks from that search are added to the buffer. If that buffer is not found then a buffer by that name is created. If no buffer name is listed then the name of the spectrum or buffer being searched is used by default, and a new buffer is created. The following is an example of a Contrace search that creates a working buffer.

sa 1, hnca (%1 <0.02> d1 && %2 <0.2> d2 ) |Hni_Nai_hnca -f

In this example source spectrum 1 peaks are used to provide the target values for the search (%1 = the first dimension of the source peak, %2 = the second dimension of the source peak) and the d1 and d2 dimensions of the HNCA spectrum are searched. The Contrace function creates a new buffer name from the resonances corresponding to the dimensions that were searched and the name of the spectrum searched. In this example %1 is the amide proton dimension and %2 is the amide nitrogen dimension. The -f flag causes duplicate peaks in the resulting buffers to be filtered.

The Contrace function determines which spectra should be searched first so that the best assignment pathway is taken. The preceding example resulted in one of the initial working buffers. As the assigned fragment is extended, target values will also be taken from assignment buffers (as opposed to the source spectrum alone). The following is an example of a search that uses assignment buffers as targets.

sa 1, hcchtocsy (d3|Hai,1 <0.02> d1 && d3|Cai,1 <0.4> d2 ) |Hai_Cai_hcchtocsy -f

In this example an HCCH-TOCSY spectrum is searched for d1 dimensions that match the alpha proton assignment (the d3 dimension of the first peak in the assignment buffer, Hai) and for d2 dimensions that match the alpha carbon assignment (the d3 dimension of the first peak in the assignment buffer, Cai).

After working buffers are created, they are filtered using the prune (prunebuffer) function.

prune hnca |Hni_Nai_hnca (dev < 30) lev -= 100

 

In this case the prune function searches through the |Hni_Nai_hnca buffer of each fragment and when it finds identical peaks whose deviations are less than 30% of the deviation of the identical peak with the highest deviation value, it subtracts 100 from the level effectively removing it from further consideration. The Contrace function avoids removing peaks from buffers since that would hide information from the user.

9.7 Creating Primary Assignment Buffers

The Contrace function analyses the information content of all of the working buffers and determines the next resonance to be assigned. The working buffer with the least ambiguous information for assigning that resonance is then determined and the algorithm uses the ScanAll function to create an assignment buffer named for the resonance that it is created to assign. In the following example the ScanAll (sa) function copies the contents of each |Hni_Nai_hnca working buffer in each fragment (defined by the peaks in source spectrum 1) to a new assignment buffer named |Cai.

sa 1, |Hni_Nai_hnca () |Cai

The empty parenthesis () is the Contrace convention for specifying that the Boolean is true for every member of an object.

The newly created assignment buffer usually contains the resonance to be assigned as well as other resonances from other correlations in the original spectrum, other resonances from overlapping spin systems, and false signals (noise) in the spectra. In order to distinguish between these signals, peaks from appropriate working buffers are added to the assignment buffer. In theory the number of the "correct" resonance signals that fall within a given tolerance of each other should be greater than the numbers of the other resonances.

The fillall command is used to add peaks from the working buffers of a fragment to the assignment buffer of the same fragment. Peaks are only added to the assignment buffer if they do not duplicate (within a specified tolerance) resonances already contained in the assignment buffer.

fillall 1, |Cai |Hni_Nai_hncacb !(%3 <0.6> d3)

 

In the above example peaks from the |Hni_Nai_hncacb buffer are added to the |Cai buffer if and only if the third dimension (d3) resonance of each Hni_Nai_hncacb peak is not (!) within 0.6 ppm of the third dimension (%3) resonance of an existing |Cai peak.

Once a resonance buffer has been filled with peaks from all of the contributing working buffers, the Contrace function uses the setall function to increment the level of each peak in the assignment buffer for each working buffer that contains a matching resonance.

setall 1, level |Cai (d3 <0.4> d3|Hni_Nai_hnca && 0 <= l|Hni_Nai_hnca) -n += -> 0.60/#|Hni_Nai_hnca*(1+DEV/120)

 

The first line of the call to the setall function specifies that the level of each peak in the |Cai buffer for which the Boolean is true is to be incremented by the formula on the following line. The Boolean will only be true when the d3 dimension of the |Cai peak is within a tolerance of 0.4 of the d3 dimension of a peak in the |Hni_Nai_hnca buffer and if the level of that peak (l|Hni_Nai_hnca) is greater than or equal to 0. (Note that filters described in the preceding section set the levels of filtered peaks to negative values.) The flag '-n' indicates that a "NOESY-type" search is to be performed (see appendix). The += symbol indicates that the levels are to be incremented as opposed to decremented (-=) or divided (/=) or multiplied (*=) or set to (=). The "->" symbol indicates that the line has been continued. The formula used by Contrace to increment the levels of the peaks is shown on the second line of the command and is simply the value 0.60 (determined by Contrace for each buffer) divided by the number of peaks in the working buffer and then multiplied by the sum of 1 and the deviation value of the match of the resonance in the working buffer to the resonance of the assignment buffer divided by 120. This formula has been determined empirically, and is beyond the scope of this manual. It can, however, be easily modified by the user to a simpler or more complicated formula.

The setall function is repeated for each working buffer that contains resonances that might match resonances in the assignment buffer. The peaks in the assignment buffer are then filtered using two types of filters. The first filter is a range filter. It simply reduces the levels of peaks whose resonances fall outside the normal range for that type of resonance.

setall 1, level |Cai !(d3 <15> 54) -= 100 ** Cai Range Filter: 39-69

 

In this example the setall function is used to reduce by 100 the levels of all peaks in the |Cai buffer whose d3 resonances fall outside (note the '!' symbol) the range of 39 to 69 ppm. The second type of filter is a fragment filter. These filters exclude from consideration resonances that have already been assigned to other atom types. The following example subtracts 1000 from the level of every peak in the |Cai assignment buffer that has a d3 value that is within a tolerance of 0.4 ppm of the d3 value of the first peak in the |Ca_prev assignment buffer (d3|Ca_prev,1).

setall 1, level |Cai (d3 <0.4> d3|Ca_prev,1) -= 1000 ** FRAG-FILTER assigned peaks.

After fragment filters have been applied for each previously created assignment buffer, the peaks in the new assignment buffer sorted by decreasing level value using the order (ord) command.

ord |Cai level

After the buffer is sorted, the first peak in the buffer should contain the assignment for the specified resonance and the following peaks contain ranked alternative assignments. If the first peak of the assignment is likely to contain the correct assignment the 'n' variable of the peak is set to 1 using the set function.

setall 1, n |Cai (l|Cai,1 >= l|Cb_prev,1 && n|Cb_prev,1 > 0) = 1

 

This function sets the 'n' variable to 1 if and only if the level of the first peak in the assignment buffer (l|Cai,1) is greater than or equal to the level of the first peak in the previous assignment buffer ( l|Cb_prev,1) which must also have a positive 'n' variable.

Finally the information about the location of assignments in the assigned fragment is recorded in the CONTRAST program using the set function so that other fully-automated functions that follow the Contrace function can refer to the assignments. The following example records the dimension, buffer name and peak number (d3|Cai,1) that contains the assignment for the Cai (alpha carbon of the current residue in the fragment) as well as a Boolean that can be used to evaluate the quality of the assignment.

 

set frag d3|Cai,1 = Cai (n|Cai,1 > 0)

 

In this case the Boolean simply uses the endpoint value 'n' as the metric to test for the validity of the assignment.

9.8 Secondary Buffers

Sections 9.6 and 9.7 described the process of creating primary working buffers and creating primary assignment buffers. A primary assignment buffer is the first assignment buffer created at any given level of assignments. If however there are two or more resonances that can be assigned at a given level (for example Hb1 and Hb2) then the Contrace program will create a set of secondary working buffers and a secondary assignment buffer to assign the second resonance. These secondary buffers are created immediately after the primary assignment buffer has been completed and the next set of primary working buffers have been created and filtered. Secondary assignments are made after the next set of primary working buffers has been created, because the new working buffers often contain information that is useful in determining the assignment of the secondary resonance.

The process of creating secondary assignment buffers is similar to that of creating the primary buffers except that the criteria for determining whether or not the secondary assignment is valid are more strict. This is partially due to the fact that secondary and primary assignments are often degenerate, but it also reflects the fact that it is far more harmful to the scoring algorithms for a bad secondary assignment to be considered valid than it is to ignore a valid secondary assignment.

9.9 Exiting Contrace

Contrace continues tracing a spin system as long as there are unassigned resonances in the spectra (as input in the headers of each spectrum file). Contrace will create an assignment buffer for each resonance even though there is not enough information to properly assign the resonance. Care should be taken to evaluate the quality of the data that went into each assignment. The ambiguity value for the assignment should not be relied on! In fact the ambiguity assessment function has been taken out of the Contrace function to ensure that the user carefully evaluates each assignment.

Chapter 10

Overlap Tests

CONTRAST uses the overlap between adjacent fragments to sequentially order the fragments. Fragments must be overlapped in order to use the CONTRAST program to assign NMR data. This overlap occurs as a result of dipolar (through-space NOESY) connectivities between residues or scalar (through bond) connectivities between residues. Fragments should be constructed so that they include buffer(s) that contain resonances from the previous and/or following fragment in the sequence as well as buffer(s) that contain analogous resonances within the residue represented by the current fragment. Figure 10.1 uses connectivity graphs to represent a series of fragments that overlap in the C dimension due to an interresidue scalar coupling from an experiment such as the HN(CO)CA experiment.


Figure 10.1 Graph representation of the connectivities present between assignment buffers within three fragments. Assignment buffers on the shaded background represent those assignment buffers that are actually used to assign the fragments to alanine, valine and lysine amino acids respectively. Arrows show the overlap between the fragments at the Ca resonances.

We see from Figure 10.1 that if two fragments are to be considered adjacent in the sequence, then the left-most C resonance in one fragment must be assigned to the same chemical shift as the right-most C resonance of the previous fragment.

Fragments also overlap though dipolar couplings between residues. Scalar overlap is modeled in the CONTRAST method by assignment buffers that overlap between sequential residues, however, dipolar overlap is modeled by a working buffer that contains NOESY peaks from one residue that can overlap with both assignment buffers and working buffers in a neighboring fragment. This type of dipolar overlap is illustrated in Figure 10.2.


Figure 10.2 Illustration of the overlap between adjacent fragments that is due to dipolar, through-space coupling. Resonances on the shaded background represent resonances that have been assigned in assignment buffers and are connected with dark line segments. Resonances on the unshaded portion of the diagram represent resonances in a working buffer that arise from NOESY-type connectivities which are represented by light line segments. Overlap between a working buffer in the Lys 3 fragment and Val 2 assignment and working buffers are indicated by arrows.

Since through-space couplings are often seen between non-sequential fragments and thus can be unreliable in some cases. On the other hand scalar couplings across the peptide bond can be ambiguous when there are several resonances involved in the coupling that have similar chemical

 

shifts. Thus it is always desirable to make fragment adjacency determinations using both dipolar and scalar couplings is both types of data are available.

10.1 The Overlap Function

The Overlap function is used to create a set of "overlap tests" that is used by a subsequent sequential assignment function to determine when fragments are adjacent in the sequence. The Overlap function is similar to the Contrace function in that it generates a CONTRAST macro that can be used as is or modified by the user. The function, however, is much simpler than Contrace since it usually creates only a few lines of CONTRAST commands. The Overlap function helps provide a fully-automated pathway to assignments, but it does not perform any function that the user would not be capable of performing by hand.

The Overlap function generates set Overlap (Set Ovl) tests which is used for scoring the overlap between fragments. The Overlap function uses information entered into the CONTRAST program with the Set Frag function that records the locations of assignments and potentially useful NOESY-type working buffers that provide a means of scoring the overlap between fragments. The Contrace function automatically generates calls to the Set Frag function, but the user can also enter that information "by hand". The Overlap function takes two optional command line parameters: the name or number of the source spectrum and the name of the file to which the generated macro is saved. The following is an example call to the Overlap function.

overlap 1 >overlap.mac

 

In this example spectrum number one is specified as the source spectrum and a copy of the macro generated by the function is saved to the file overlap.mac as it is executed by CONTRAST.

10.2 Set Ovl Tests

Set overlap tests are similar to other CONTRAST commands that use Booleans except that the order of the fields in Set Ovl tests is critical (in other CONTRAST commands the order is not as important). In Set Ovl Booleans all of the left-hand operands refer to the one fragment and all of the right-hand operands refer to the next fragment (whatever fragment happens to be to the right of the first fragment). The general form of the command is

set ovl source |bufferLF |bufferRF ->

(operandLF op operandRF [conj operandLF op operandRF [conj ...]]) -scaleF -includeF score

where:

set ovl = The CONTRAST command.

source = The name or number of the source spectrum.

bufferLF = The name of the buffer being tested from the fragment on the left.

bufferRF = The name of the buffer being tested from the fragment on the right.

-> = Line continued symbol.

operandLF = The left-hand operand which corresponds to the fragment on the left.

op = Operator (eg. >, <=, <.02>, etc.)

operandRF = The right-hand operand which corresponds to the fragment on the right.

conj = Conjunction (eg. ||, &&, |, &)

scaleF = Flag that either specifies that scores be scaled (-s) or unscaled (-u).

includeF = Flag that specifies that either the best matching value alone is scored (-b), that a

score be generated for the best match in the right-hand buffer for each different

element of the left-hand buffer (-n), or that all matches between the two

fragments are scored (-a).

score = The number of points that is awarded when the Set Ovl Boolean evaluates to true.

 

The Set Ovl function can be used to score the adjacency of two fragments that share a common dimension. Each fragment in the following example contains a |Cai buffer that contains peaks whose d3 dimension arises from the Ca resonance of the residue making up most of the current fragment and a |Ca_prev buffer whose d3 dimension is from the Ca resonance of the previous residue. Note: a |Ca_prev buffer is usually formed from peaks from scalar-coupling experiments such as the 3D HN(CO)CA or 3D HN(CO)CACB in which the H1 and N15 resonances arise from one residue and the Ca or Cb resonances arise from the previous residue.

 

set ovl 1 |Cai |Ca_prev (%3 <.1> d3 && n|Cai > 0 && 0 < n|Ca_prev) -s -b 100

 

In this example 100 points are scaled (-s) by the deviation of the match between the resonances being compared and added to the sequential score for the protein if and only if the d3 dimension (%3) of a peak in the |Cai buffer matches the d3 dimension of a peak in the |Ca_prev buffer within a tolerance of 0.1ppm and if the 'n' values of the peaks in the best-matching (-b) pair of peaks are both greater than zero. Note that if the operation, "0 < n|Ca_prev" had been reversed (eg. "n|Ca_prev > 0") then the program would have used the n value from the |Ca_prev buffer of the left-hand fragment instead of the |Ca_prev buffer from the right-hand fragment. The order that the |Cai and |Ca_prev buffers are listed before the Boolean is also significant. If the two buffers had been listed with |Ca_prev before |Cai, then the program would have tried to match peaks from the |Ca_prev buffer of the left-hand fragment with peaks from the |Cai buffer of the right-hand fragment and only unfortuitous random matches would have been possible.

The Set Ovl function can also calculate adjacency scores for two fragments based on dipolar, through-space NOESY-type information. The following example shows how a working buffer in one fragment (that is formed from a search along the H1 and N15 dimensions of a 3D H1N15-NOESY experiment) is used to determine adjacency based on its ability to match the amide proton of the following fragment.

set ovl 1 |H_N_noesy |hnco (d3|H_N_noesy <.1> d1|hnco,1 && 0 < n|hnco,1) -u -n 50

 

In this example the adjacency score for the two fragments is incremented by 50 points (unscaled due to the -u flag) for each peak in the |H_N_noesy working buffer of the left-hand fragment whose d3 dimension matches the d1 dimension (the amide proton dimension) of the first peak in the |hnco assignment buffer within a tolerance of 0.1 if and only if the n value of that HNCO peak is greater than 0. The -n includeF flag indicates that a NOESY-type matching is to be done in which only the best match between each peak in the left-hand buffer is used to evaluate the score. Thus if the |H_N_noesy buffer contained 10 peaks and was being compared to every peak in the |hnco buffer (instead of to only the first peak) and if the |hnco buffer contained 100 peaks, then a maximum of 500 points (50*10) could be generated using the -n flag (compared to a maximum of 50 points using the -b flag and a maximum of 50000 points (50*10*100) using the -a flag). Note that a -n option is generally appropriate whenever one or both of the buffers being compared are NOESY-type working buffers.


Chapter 11

Amino Acid Tests

In order to make effective use of a protein sequence, the amino acid types of at least a few CONTRAST fragments must be partially determined. The ability to reduce the number of sequence positions to which a fragment can be assigned allows the sequence to be used generate constraints. Many errors can occur at the primary level of assignments that can produce errors in the constraints generated at the amino acid type assignment stage. In fact the amino acid type assignment stage is prone to errors even when the primary assignments are perfect. CONTRAST allows for these errors by allowing fragments to be scored at every position in the sequence. Correctly assigned regions of the protein sequence may score well enough to compensate for errors, thus the program can arrive at correct assignments even when there are errors in the primary and amino acid type assignments.

There are several different methods for making amino-acid-type assignments. Each method will be discussed in a separate section and can be used separately, but best results are obtained when all of the described methods are used together.

11.1 Peak Labels

Each peak in the source spectrum is a starting point for each fragment that the program constructs. When the peaks of the source spectrum file (or any other spectrum file) are read into the CONTRAST program, comments are also read into the program and are associated with each peak that they follow (see section @@.@). Any amino acid type assignment that is already known by the user as a result of previous work can be used to bias future CONTRAST assignments by adding specially formatted peak labels to the source peaks that will give rise to the fragments for which extra information is known. The strength of the bias is proportional to the magnitude of the score that the label instructs to award to the assignment.

CONTRAST peak labels are added to a peak's comment field (added after the "**" symbol following the peak but before the newline. These peak labels are marked by inclusion in square brackets and simply specify a list of residue identifiers and a score separated by commas or spaces. Figure 11.1 is an example of a spectrum file that uses peak labels.


hnco

3 5 (95)

com 50

Hn Hni .1 Hni

N Nai .1 Nai

CO Co- .1 Co-

** Peakpicked 9/9/99 from prothnco.ser expt.

9.1 110.0 180.0 10000 ** hnco1 [L,I,V 100]

9.2 111.0 181.0 11000 ** hnco2 [N20,N25 100]

9.3 112.0 182.0 12000 ** hnco3 [ala -1000]

9. 4 113.0 183.0 13000 ** hnco4 [G 10] [A 20]

10 100.0 160.0 1 ** bogus [P45 100000]

 

Figure 11.1 Sample spectrum file. The file demonstrates the use of peak labels for amino-acid type assignment. Each peak gives rise to a fragment. When the first fragment (the fragment arising from the first peak) is assigned to leucine, isoleucine, or valine positions in the sequence

100 points is added to whatever amino acid type score is generated by other techniques. If the second fragment is assigned to asparagines 20 or 25 then the assignment is also awarded 100 points. If the third fragment is assigned to any alanine position then 1000 points are subtracted from the score of the assignment. If the fourth fragment is assigned to any glycine residue then 10 points are added to the assignment score, and if it is assigned to any alanine then 20 points are added. Finally if the fifth fragment is assigned to P45 then the assignment receives 100000 extra points.

CONTRAST sequential assignment functions check special AA Comment Tests for instructions on how and when to make use of peak labels (see the following section). Peak labels will be ignored if these set aa tests have not been read into the program before the sequential assignment step.

Assigning proteins should be an iterative process. The use of peak labels allows alternative assignments to be tried out by biasing a fragment away from a previous assignment or towards another assignment or both. Negative biasing (discouraging a particular assignment) is accomplished simply by entering a large or small negative score for the unwanted assignment. One misassignment can have a cascading effect. The correct fragment displaced by the incorrectly assigned fragment, may in turn displace another correct assignment which may in turn displace another and so the process may continue. By finding that one incorrect assignment and displacing it using negatively biased peak labels, one can go from nonsensical assignments to correct assignments in one small step.

Positive biasing (encouraging a particular assignment or set of assignments) is accomplished by giving the fragment an positive score if it is assigned to a desired set of sequence positions. Positively biased peak labels can usually be used even before the automated assignment process has begun. Perhaps the most important example is using phony peaks with positively biased peak labels to fill in known gaps in the data. For instance if the HNCO spectrum is used as source, then fragments can not normally be generated for proline residues since they do not contain amide protons. In this case phony peaks (peaks that are made up by the user so that the dimensions are not likely to match other resonances from other spectra) are created and peak labels with very high scores for the respective proline positions are added to the peaks (eg. [P45 10000]). This fills in the "gaps" in the data and minimizes the time other fragments will be tried out at those positions.

11.2 AA Comment Tests

AA comment tests are read into the CONTRAST program to instruct sequential assignment programs how to use peak labels in the comments of source spectrum peaks to aid in amino acid type assignment. The set aa function is used to read AA comment tests into the CONTRAST program. This function is similar to the Set Ovl function described in section 10. It's general format is as follows:

set aa AAname source[,] (c|buffName[,peak]) [-flag] scale

set aa = the command name

AAname = the name of the amino acid for which the test will apply

source = the name or number of the source spectrum

buffName = the name of the buffer containing the peak whose comment field is to be checked

peak = the number of the peak in the buffer that should be checked

flag = flag that causes the points awarded by the peak label to be scaled by the

deviation of the peak in question (-s) or to be left unscaled by default (-u).

scale = a scaling factor entered as a percentage where values of 100 or 0 indicate that

the score included in the comment is not to be scored by a value between 0 and

100 is scaled by that percentage.

 

The following is a typical example of an AA comment test.

 

set aa L 5, (c|source,1) 0

 

In this example, the sequencing function is instructed that spectrum 5 is the source spectrum and that it should consult the comment of the first peak in the buffer named |source for the fragment (c|source,1) when determining the amino acid type of a fragment that is being scored at a leucine residue. It is typical for the source buffer to be specified as the buffer to check for peak labels, but any other buffer can be specified. If the comment associated with the specified buffer in a fragment contains a peak label such as [L,V 100] and that fragment is being tried at a leucine or valine position in the protein sequence, or if the comment contains an expression such as [L5 100] and that fragment is being tried at leucine 5 in the sequence, then the placement will be awarded 100 additional points.

NOTE AA comment tests are a special type of amino acid test and should not be confused with the "normal" amino acid tests. The parenthesis in all other amino acid tests must contain a Boolean expression and not just a field location as is required in AA comment tests. Furthermore AA comment tests for any given amino acid must be read into CONTRAST before any other amino acid tests for that amino acids or the results can not be predicted.

The form of the amino acid name specified in the set aa command is not important if it is a standard name for one of the standard 20 amino acids; if this is not the case then it should match the amino acid name used in the sequence. For example if the sequence input file for CONTRAST included a glx residue, then glx must be specified in the set aa command in order for the amino acid comment test to be applied to that residue.

One can use peak comments to "lock in" or direct the assignments of particular peaks and fragments. Not only can this facilitate "bookkeeping", but it also provides a means for other CONTRAST functions to explore and evaluate alternative assignments in an iterative fashion.

 

11.3 General Amino Acid Tests

General amino acid tests (those that are not AA comment tests) are used to award points when the chemical shifts of specific resonances in the fragment lie within specified ranges. They are entered into the CONTRAST program using the set aa command similarly to how the set aa function is used to read in AA comment tests. The general form is:

set aa AAname source[,] |buffName (Boolean) [-flag] score [** comment]

set aa = the command name.

AAname = the name of the amino acid for which the test will apply.

source = the name or number of the source spectrum.

buffName = the name of the principal buffer that is being tested by the test.

Boolean = a Boolean expression. If the Boolean is true for a particular fragment, then

points are awarded to the amino acid type score at the position specified by

AAname.

flag = flag that causes the points awarded by the test to be scaled by the

deviation of the peak in question (-s) or to be left unscaled by default (-u).

score = the points awarded when the Boolean evaluates to true.

comment = a mnemonic phrase to describe the function of the amino acid test.

 

Amino acid tests are used to implement several different amino acid type assignment strategies used by high-level CONTRAST functions. Depending on the method tens or thousands of tests are created to score amino acid type assignments. An example of a single test follows.

set aa E 5, |Hbi (d2|Hbi,1 <1.55> 2.15 && n|Hbi,1 > 0 && -> (4)

d2|Hgi,1 <0.75> 1.95 && n|Hgi,1 > 0) 100 ** LEKR

 

In this example, if the second dimension of the first peak in the Hbi buffer (d2|Hbi,1) is within a tolerance of 1.55 ppm of 2.15 ppm and if the d2 dimension of the first peak of the Hgi buffer is within 0.75 ppm of 1.95 ppm and if both endpoint fields for both peaks indicate that the peaks are well-connected to the spin system, then 100 points are added to the global score when the fragment is tried at a glutamate position in the sequence. The arrow symbol "->" is a CONTRAST symbol that informs the program that the command is continued on the next line. The comment of the test indicates each amino acid type "LEKR" (leucine, glutamate, lysine or arginine) for which this particular test can score true if the resonances of the fragment fall within the input standard ranges. The number of amino acids associated with an amino acid test is a measure of the resolving power of the test. In the ideal case if the Hb and Hg chemical shifts of a fragment fall within the two chemical shift ranges specified in the Boolean expression, then according to the test the fragment must be either a leucine, a glutamate, a lysine or an arginine (assuming that the fragment is correctly assigned and that the two chemical shifts fall within the chemical shift ranges read into CONTRAST. The example given is a glutamate test (set aa E ...), identical tests for each of the other amino acid types (L, K, and R) should also be generated so that any fragment that tests positive for one type of residue will test positive for the other three if the resonances of the fragment fall within the chemical shift ranges input into the program.

11.4 Geometry Tests

Set aa tests can be used to test for distinguishing features of amino acid structure. Such tests are called geometry tests. They use an identical format as that described in section 11.3, but rather than awarding points for resonances that fall within chemical shift ranges, they generally are used to subtract points for when differences between the structure of the assignments and the structure of the amino acid type are found. The following is an example of a geometry test.

 

set aa S 5, |Hgi (n|Hgi,1 > 0) -s -b -200

 

In this example, if the "n" (endpoint) field of the first peak in the Hgi buffer of a fragment is greater than zero (n|Hgi,1 > 0), this is an indication that a resonance with significant connectivity to the rest of the fragment was assigned as Hg (the resonances assigned in the Hgi buffer) and that the sequential assignment function should subtract 200 points from the fragments score whenever it is assigned to a serine position (S) in the sequence. The Contrace function only sets the endpoint field "n" of a peak to a positive value (indicating a connection to the rest of the fragment) if a stringent set of criteria is met so that the endpoint field is only extremely rarely set to a positive value when it should not be. This is a conservative way to use amino acid topologies to determine amino acid type, since the penalty for a violation is small, and these violations do not prevent the sequencing function from mapping these spin systems to those amino acid types for which violations occur. Since the endpoint field is often set to zero (indicating a weaker connection to the assigned fragment) before the spin system has been fully traced, geometry tests do not explicitly penalize the mapping of amino acid types to spin systems with endpoint fields that are set to zero before within the limits of the size of the amino acid. Since all spin systems are traced by Contrace as far as the correlations in the data allow, the sequential assignment algorithms are able to consider all of the resonances in the fuzzy spin systems in generating amino acid assignments.

11.5 The Reside Function

The Reside (Residue Identification) function automatically generates the comment tests, geometry tests and general amino acid tests described in the preceding sections. It takes data input into the CONTRAST program using Set Shiftr commands and Set Frag commands and generates amino acid type identifications tests that will be used by sequencing functions to generate sequential assignments. The tests generated by the Reside function enable CONTRAST sequencing functions to determine the relative likelihood that a particular fragment originated from a specific type of amino acid. Figure 11.5.1 is a schematic of the Reside function.


Figure 11.5.1 Diagram of the Reside algorithm. In the first step, input chemical shift ranges for each resonance in the fragment are divided into all possible subregions that have widths greater than an input resolution. Boundaries for these subregions are taken from the set of all upper and lower bounds for all input chemical shift ranges for each particular resonance. For each resonance each subregion is then associated with the set of all residue types whose full chemical shift ranges for that resonance intersect the subregion. All intersecting subregions with identical amino acid type sets are combined, and all subregions with sets of amino acid types whose cardinality equals the number of identifiable amino acids in the sequence are eliminated. Amino acid type tests are generated by taking all combinations of subregions between the different resonances (including combinations in which there are no subregions taken from a resonance) . Tests are eliminated if the cardinality of the intersecting set of the amino acid sets associated with the subregions represented in the test is greater than a user-defined value that is generally inversely proportional to the number of resonance ranges combined to form the test. When all amino acid tests have been assembled, intersecting tests (tests with intersecting resonance subregions that are associated with identical sets of amino acid types) are combined by taking the union of each resonance subregion. At this point if there are amino acid types in the sequence that are not represented in a user-defined minimum number of tests, one dimensional tests (tests involving only one resonance range) that include underrepresented amino acid types are added to the list of tests, starting with the most discriminating tests possible and continuing until the user-defined minimum or maximum is reached.

The resonances to be included in the Reside calculation are specified by fragment description statements of the form:

 

set frag d3|Cbi,1 = Cbi (n|Cbi,1 > 0) * q|Cbi,1

 

In the example above, the specified resonance is found at the third dimension of the first peak of the Cbi buffer (d3|Cbi,1), and it is defined to be the beta carbon of the ith residue (Cbi). The Boolean expression (in parentheses) contains any additional conditions to add to the amino acid tests for that resonance, and the final term (* q|Cbi,1) instructs Reside to multiply all scores generated by amino acid type tests that include the Cbi resonance by the quality factor of the peak containing the specified resonance (q|Cbi,1). The quality factor is a general purpose CONTRAST variable that is associated with each peak in a buffer. The function used to generate the quality factor is dependent of the pathway taken by the spin system assembly algorithm and the desire of the user, but it either represents an estimate of the ambiguity of the assignment (a) or an estimate of the confidence in the assignment (1-a). The Contrace function automatically generates fragment descriptions for all resonances it assigns, but since the user is able to define quality factors that would not function well as scaling factors the Contrace function does not automatically generate instructions for the use of the quality factor. Scaling factors can be added to the CONTRAST macro after the Contrace function by repeating the fragment description and including the desired scaling factor. Furthermore, since the calculation time of Reside scales exponentially with the number of resonances that the function is evaluating, the user should limit the resonances tested by the Reside function by setting the resonance types to NULL for all fragment descriptions that the user wishes to omit. For example the following command will overwrite the command above with the effect that Reside will not generate tests that involve the Cbi resonance.

set frag d3|Cbi,1 = NULL (n|Cbi,1 > 0)

The general syntax for the Reside function follows.

Reside S[,] [>aa.mac] [-res R] [-max1 X1 -max2 X2 ... -maxi Xi] [-mint N] [-pts P]

S The name or number of the source spectrum.

aa.mac The name of the output macro file to be generated.

R The minimum resolution of a test. No chemical shift ranges in the generated tests

should be less than R ppm.

X1 Instructs Reside to filter out 1-resonance amino acid tests that could score true for more

than X1 different amino acids.

X2 Instructs Reside to filter out 2-resonance amino acid tests that could score true for more

than X2 different amino acids.

Xi Instructs Reside to filter out i-resonance amino acid tests that could score true for more

than Xi different amino acids.

N Instructs Reside to continue generating tests until each amino acid in the sequence is

included in at least N tests.

P The number of points before scaling awarded each test generated if the probability

distributions of the amino acid ranges are not used to generate points.

 

The following is an example call to the Reside function.

reside 1, >CaCb.mac -res .5 -max1 8 -max2 4 -mint 1 -pts 100

 

In this example 1 is the number of the source spectrum and CaCb.mac is the name of the output file to which the amino acid tests will be written. [-res .5]: The smallest resonance range permitted to be treated is 0.5 ppm. [-max1 8]: If a test can score true for over 8 well-behaved (all resonances within the standard ranges input with the Set Shiftr commands) amino acid types and if the test checks only 1 resonance range, then the test will be deleted. [-max2 4]: Likewise all tests that check 2 resonance ranges and can score true for over 4 well-behaved amino acid types will also be deleted. [-mint 1]: If there are no amino acid tests that can possibly score true for a well-behaved representative from a given amino acid type, then the program will continue generating amino acid tests (with relaxed standards) until every amino acid type is represented by at least one test. Finally "-pts 100" means that each amino acid test will give a maximum score of 100. The absolute value of the points generated by Reside is not critical since the program scales the points awarded by amino acid tests to a user-defined multiple of the number of connectivity points generated by overlap tests for the best-connected fragments in the sequence. This ratio is set using the set so function.

Although -max1 through -maxi flags are optional, it is usually a good idea to use them to limit the number of tests generated. In general the maximum number of amino acids that can receive a score for a given test should in all cases be less than 10. The more resonances that a test includes, the lower the maximum number of amino acids scoring true for the test should be. For example if a test only checks a region of the alpha carbon dimension, the "-max1" flag might be used to delete the test, if it is possible for over eight types of amino acids to score true for the test and still have alpha carbon resonances lying in the standard ranges characteristic for those amino acids. A limit of up to four amino acids (-max2 = 4) is appropriate for tests that check two different resonance ranges -- say both alpha and beta carbons. For tests that check still more dimensions, it is appropriate to set the -maxi flags even lower still.

It is a good practice to limit the total number of resonances read into any single Reside function by including unrelated resonances in separate Reside calls. For example if a fragment contains Ca and Cb resonances from the previous residue, then these should be treated separately from the other resonances in the fragment that are from the current amino acid. All the resonances from the current residue should be excluded by setting resonance types to NULL using the Set Frag command, and then the Reside function should be run for the two remaining resonances.  After that the Ca and Cb assignment buffers corresponding to the previous residue should have resonances set to NULL and all the other assignment buffers should have resonances set back to appropriate values so that a second call to the Reside function can be made for the current residue resonances.

11.6 Reside Probability Scoring

The Set Shiftr function can be used not only to set resonance ranges, but it can also be used to read into CONTRAST probability distributions for the resonance ranges. If probability distributions are read into the program, then the Reside function can be used in a much more powerful fashion to generate tests amino acid tests that yield probabilities instead of point values.

 

Chapter 12

Shuffling Routines

Once fragments have been assembled (either by hand or using the Contrace function), they must be shuffled to match the sequence of the protein. Since CONTRAST fragments all arise from peaks in the source spectrum, a shuffling of fragments corresponds to a shuffling of the peaks in the source spectrum so that the peak from the first residue in the sequence is first in the peak list, the peak from the second residue is second, and so forth. CONTRAST shuffling functions all use overlap tests to determine sequential connectivity, and some sequencing functions use amino acid type tests together with the sequence of the protein to make mappings of the fragments onto the sequence. 

The CONTRAST program includes 12 different shuffling routines for ordering fragments. These functions are Shuffle, AnnealBF, Anneal, AnnealQ, AnnealBFQ, AnnealLQ, AnnealBQ, AnnealAQ, Anneal3Q, ShufQ, AlignQ, and ShufSeq. Of these functions 3 are not recommended (ShufQ, AlignQ, and ShufSeq) and only 2 (Shuffle and AnnealBF) are described in this section. The remaining functions are variations on the AnnealBF function. They are all called in the same way and they have the same function as AnnealBF, but they use different techniques and thus can yield different results. Variations in the assignments generated by the different shuffling techniques are important indicators of the quality of the assignments. One can have more confidence in regions of the assignments that are independent of the technique used to make them, but one must be suspicious of those regions vary with the technique if the scores for those assignments are comparable.

12.1 The Shuffle Function

The call to the Shuffle function has the following syntax.

shuffle [S,] [-d,-s] ["N"]

S The name or number of the source spectrum.

-d Flag to make shuffle compare the deviations between the top two scores

-s Flag to make shuffle compare the highest score plus the deviations

N The deviation percentage below which deviations are used for comparisons. N=100 means that deviations are always used while N=0 means that raw scores are always used.

 

Shuffle is almost always called with the default parameters as in the following example so the other variations will not be discussed.

shuffle 1

 

In this example shuffle takes all of the overlap tests that have been read into CONTRAST with the Set Ovl function and uses a "best first" method to arrange the fragments so that fragments with the best connectivities to one another are adjacent. The Shuffle function does not use amino acid tests or the sequence of the protein to map the fragments onto the sequence. Instead it forms the fragments into an unbroken circular chain. The user must determine from the scores of the connections between the links where the chain should be broken.

 

12.2 The AnnealBF Function

 

The AnnealBF function employs a hybrid between a "best first" and a simulated annealing algorithm to seek an optimum mapping of the fragments to the protein sequence. The function uses overlap tests and amino acid tests together with the sequence of the protein to make sequential assignments. Rather than searching for a minimum "energy" like traditional simulated annealing algorithms, this algorithm searches for a global maximum score which is a scaled combination of all of the connectivity and amino acid scores plus bonuses such as comment label bonuses (see Section @). The command line call to the AnnealBF function has the following syntax.

annbf S[,] ["Temp[, Tfactor[, MaxPerTemp[, MinPerTemp[, LoTemp]]]]"] [-F] [-O] [-x N]

S The name or number of the source spectrum.

Temp The percentage of the highest possible temperature change that will be used to determine the starting temperature.

Tfactor The percentage by which the temperature gets reduced at each annealing round.

MaxPerTemp The number of attempted moves divided by the number of source peaks per temperature level before the temperature is lowered.

MinPerTemp The number of successful moves at a temperature level that will allow for an early exit from that level.

LoTemp The absolute temperature that the algorithm must reach before exiting.

F A flag that can have the values of either 's', 'm', or 'u' that determines how well connectivity scores are scaled to fall between 0 and 100.

's' Rigorously scaled connectivity scores. (Requires more calculation time.)

'm' Moderate level of scaling.

'u' Unscaled connectivity scoring.

O An optional flag that can have a value of either 'b' or 'l' and that determines the type of overlap scoring used by the AnnealBF algorithm.

-b [DEFAULT] Nonlinear overlap scoring (bonus awarded for the best overlap scores).

-l Linear overlap scoring (no bonuses awarded).

N Value that follows the -x flag that indicates the number of extra cycles that the algorithm will go through at the end of the simulated annealing cycle. (Default: End with the simulated annealing cycle.) Suggested: -x1

 

Note that if an optional quoted parameter is to be specified, then all of the parameters before it must also be specified, since the parameters are defined by their positions in the parameter sequence. This function and the other Anneal functions are the only functions that use such a parameter list. The following is an example of a call to the AnnealBF function.

 

annbf 1, "50, 2, 100, 10, .1" -u

In this example all of the default values for optional parameters are indicated. To change only the Temp the function can be called:

annbf 1, "40" -u

but to change the MinPerTemp value the function must be called:

annbf 1, "50, 2, 100, 8" -u

or

annbf 1, "0, 0, 0, 8" -u

where parameters given zero values are automatically set to their default values. The default values are the same for all of the other Anneal functions, but the AnnealBF function is the only function that takes the -b/-l and the -x N flags.

The Temp parameter can set to achieve different effects. A Temp parameter of over 100% assures that the combinatorial optimization will be done from a completely random starting point while Temp parameters of less than 5% can be used to make minor refinements in a sequential ordering. Setting the temp factor to a very small value (eg. .0001) causes the function to behave like a conjugate gradient maximizer so that it can only find local maxima.

 

Chapter 13

Output Files

CONTRAST output files are for the most part low-level representations of the internal states of the buffers in the program. Very little effort has been made to simplify or interpret the data in the program for two reasons: 1) To do so would hide information from the user, 2) To do so might lead the user to have a false confidence in the assignments. CONTRAST output files force the user to analyze the data in order to extract assignments. It is hoped that this analysis will bring to light the ambiguous or incorrect assignments that are almost always a part of any assignment process. CONTRAST should be used in an iterative and interactive process that involves the user's judgment as much as possible between assignment rounds.

The next several sections describe CONTRAST output options. These options should not be viewed as mutually exclusive. Several of these options should be used at each round of assignments.

13.1 Display (d)

The display function is used to enter display mode for interactive viewing of internal buffer information. Display mode is entered by typing 'd' followed by RETURN at the CONTRAST command line. Once in the display mode any number of display commands can be executed interactively. Most of these commands are one character commands which do not require a RETURN to be entered after the character is typed. Display commands can be launched non-interactively from the CONTRAST command line by typing "d " followed by the capitalized display command character. This will execute the display command and automatically return to the CONTRAST command line.

 

Display Mode Commands:

0 Activates all buffers so that all buffers will be effected by commands.

Num Activate the buffer number Num so that that buffer will be effected by commands.

a Start at first buffer and show buffers with column widths and separations determined automatically.

b Also HOME key on most systems. Display beginning of buffer.

c Set the buffer columns to be displayed.

d Also DOWN key on most systems. Move active buffer(s) one row down.

e Edit indicated buffer fields.

f Enter field format.

gX Go to buffer number X.

h Help. View partial list of display commands.

i Toggle on/off information string for active buffers.

l Also LEFT key on most systems. Shifts display window one buffer to the left.

m Displays buffers using current buffer marker position.

n Toggle on/off buffer name for active buffers.

o Also PAGEDOWN key on most systems. Move active buffer(s) one page down.

p Also PAGEUP key on most systems. Move active buffer(s) one page up.

q Quit display mode.

r Also RIGHT key on most systems. Shift display window right by one buffer.

s Select buffers for display. (A suboption of the 'c' command.)

t Toggle on/off titles of fields in active buffers.

u Also UP key on most systems. Move active column(s) one row up.

vc Video Columns. Set the number of columns on video screen.

vr Video Rows. Set the number of rows on video screen.

wb Write Buffer. Write buffer to an ASCII file.

ws Write Spectrum. Write spectrum to an ASCII file.

x Don't change current settings. (An escape from the 'c' command.)

z Start at last buffer and show buffers with column widths and separations determined automatically.

-X Shift display window left by X buffers.

+X Shift display window right by X buffers. (Same as =X).

=X Shift display window right by X buffers. (Same as +X).

13.2 DisplayToFile (dtf)

The DisplayToFile (dtf) command prints the contents of all the buffers to a file in a format similar to that of the interactive Display command. The following is the syntax of the dtf command.

dtf [>]file.name [-w||-a] [-v||-h] ["Header"]

Fname The name of the file to which the buffer information is written.

-w Overwrite Flag. Causes the file to be overwritten if it already exists. [default]

-a Append Flag. Causes the file to be overwritten if it already exists.

-v Vertical Flag. Causes the buffers to be written vertically (on sequential lines) in the file. [default]

-h Horizontal Flag. Causes the buffers to be written horizontally (across lines) in the file.

13.3 ShuffleToFile (stf)

[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ V 138 ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]

FRAGMENT 117: < 8.78122 124.384 173.658 1 > Comment: hnco117

NEXT FRAGMENT: 105 < 8.64135 116.992 174.554 1 > Comment: hnco105

Top Scoring Fragments: (Choice = 1)

>NEXT = 105 REPEATS = 3 SCORE = 16.5191 [ambig = 22.34]

NEXT = 98 REPEATS = 3 SCORE = 13.2406

NEXT = 99 REPEATS = 3 SCORE = 12.8427

NEXT = 2 REPEATS = 2 SCORE = 12.3592

NEXT = 33 REPEATS = 2 SCORE = 7.974

Buffers:

hnco: (Buffer #697) Search: d1 8.7812 .05 and d2 124.3840 .25

# spectr dev rep ire < Hn N CO ntens > comment

1:hnco 2.40 1 1 < 8.78 124.38 173.66 1 > hnco 117

hnca: (Buffer #698) Search: d1 8.7812 .05 and d2 124.3840 .25

# spectr dev rep ire < Hn N Ca ntens > comment

1:hnca 1.89 1 1 < 8.77 124.46 62.09 1 > hnca 223

hncoca: (Buffer #701) Search: d1 8.7812 .05 and d2 124.3840 .25

# spectr dev rep ire < Hn N <Ca ntens > comment

1:hncoca 1.42 1 1 < 8.79 124.37 62.54 1 > hncoca 12

2:hncoca 0.44 1 1 < 8.82 124.48 51.52 1 > hncoca 200

tocsy: (Buffer #702) Search: d1 8.7812 .05 and d2 124.3840 .25

# spectr dev rep ire < Hn N Ha ntens > comment

1:tocsy 2.24 1 1 < 8.79 124.39 4.35 1 > tocsy 223

hcaco: (Buffer #700) Search: d1 4.3473 .05 and d2 62.0887 .25

# spectr dev rep ire < Ha Ca CO ntens > comment

1:hcaco 0.94 1 1 < 4.36 62.13 175.44 1 > hcaco 144

2:hcaco 0.64 1 1 < 4.33 62.20 174.51 1 > hcaco 19

hcan: (Buffer #699) Search: d1 4.3473 .05 and d3 62.0887 .25

# spectr dev rep ire < Ha N Ca ntens > comment

 

1:hcan 0.68 1 1 < 4.31 116.55 62.09 1 > hcan 178

2:hcan 0.62 1 1 < 4.36 110.81 62.24 1 > hcan 59

13.4 ShuffleToSpectrum (sts)

The ShuffleToSpectrum (sts) command is used to rearrange the source spectrum in the new "shuffled" order determined by sequencing functions such as Shuffle or AnnBF. The current source spectrum is copied in the new shuffled order to a new spectrum. This new spectrum can then be written to a spectrum file with the WriteSpectrum (ws) command. The ShuffleToSpectrum / WriteSpectrum sequence is useful for doing successive rounds of assignments, since the starting order of fragments is determined by the order of the peaks in the source spectrum.

The syntax of the ShuffleToSpectrum (sts) file follows.

sts S[,] [[>]Fname]

S The name or number of the source spectrum.

Fname The file name to which the shuffled source spectrum is written.

13.5 WriteSpectrum (ws)

The WriteSpectrum command writes a spectrum in memory to a CONTRAST spectrum file. The function reads all of the information in memory and combines that information with any initial header comments from the original spectrum's file in order to create a complete spectrum file.

If a spectrum name is not given, then a name will be created by incrementing the numeric part of the original file's suffix (eg. hnco.con is converted to hnco.con2 and hnca.con3 is converted to hnca.con4).

Syntax:

ws S [[>]Fname]

S The name or number of the spectrum to be written to a file.

Fname The name of the new spectrum file to be written.

 

 

Appendix A: GLOSSARY

List Generic term referring to either spectra or buffers.

set of actions. The search can be performed on any attribute of the peak (eg. dimension 1 > 4.5, intensity < 100, comment > 5, or score > 29). The extra attributes associated with peaks in buffers (eg. score, deviation value, etc.) can also be used in the search. Other peaks or groups of peaks can also be used in the search. A Boolean expression is used to define the parameters for a search. Boolean search expressions should always be enclosed in parentheses on the command line. A '!' symbol preceding such a parenthetic Boolean indicates that the complement of the set should be used instead of the peaks found in the normal search.

Example: !(d1 < 4.5)

Target The targets of a search are the delimiters that are used in the search. In the search (d1 < 4.5) the first dimensions (d1) of the peaks in the list being searched make the Boolean true if they are less in value than the target value of 4.5. If the peaks of a list are used as target values to search another list, the targets are often expressed with the dimension number following a '%' symbol. For example, (%1 > d2) indicates that the second dimension (d2) of a peak in the list being searched must be smaller than the first dimension of a peak in the target list (%1).

Match If a peak or list of peaks is used to generate target values, a peak that is found in a successful search is said to match the peak used to generate the target value(s).

Tolerance Ranges of values centered at a target value are usually used in searches to do assignments due to the lack of precision of NMR data. A tolerance is one-half of the width of this range. In the Boolean search (6.5 <.02> d1) a "tolerance operator" is specified in order to make the Boolean true iff d1 is between 6.5 + .02 and 6.5 - .20. The opposite case (6.5 >.02< d1) is true when d1 is greater than 6.7 or less than 6.3.

Flags Alphanumeric characters used to modify the performance of a given function. Flags are immediately preceded by hyphens on the command line. In the following example the '-s' flag causes the Scan function to compare matching peaks using "scaled scores" and the '-b' flag causes the function to return only the "best" scoring match. BUFFER NAMES: The HNCA peaks for which the Boolean expression is true are added to a buffer which is given the default name, "|hnca," taken from the name of the spectrum being searched. A different name could be specified by adding it to the Scan expression AFTER the Boolean expression. NOTE: buffer names are always designated with a "|" symbol! The following example saves the matching peaks found by the search to a buffer named "|fred":

scan hnco hnca (%2 <.2> d2 || %1 <.02> d1) |fred

If the Boolean of the above example did not contain "%"s to indicate a search target, then the roles of the two listed spectra would be interpreted differently by the search function. In the examples:

scan hnco hnca (8.5 <.02> d1 || 114 <.2> d2)

scan hnco hnca (114 <.2> d2 || 8.5 <.02> d1)

Example: scan 1, 1 2 (d1 < 3.0) -s-b |fred

Booleans

Resonance

Fragment

Comment

Coordinate

Dimension

Target

Deviation

Spectrum

Buffer

Assignment Buffer

Working Buffer

Primary Assignment

Sequential Assignment

Amino Acid Type Assignment

BASIC SEARCHES: Most functions that involve searches use a standard Boolean format. The format is necessarily rigid since there are so many options in performing searches in automated assignment work. The format of a basic search is demonstrated by the equivalent Scan function:

scan hnca (d1 > 7.2 && d2 <.02> 114.2)

scan hnca (7.2 < d1 && d2 <.02> 114.2)

In this example, the first dimension of each peak in the HNCA spectrum is searched for peaks whose first dimension (d1) is greater than 7.2 and (&&) whose second dimension (d2) is within a tolerance of +/- .02 of 114.2.

VARIABLE TARGETS: In the following equivalent examples:

scan hnco hnca (%1 <.02> d1 || %2 <.2> d2)

scan hnco hnca (%2 <.2> d2 || %1 <.02> d1)

peaks in the HNCO spectrum are used to generate targets for searches of the HNCA spectrum. %1 and %2 refer to dimensions one and two respectively of the HNCO spectrum, and d1 and d2 refer to dimensions one and two of the spectrum which is being searched (hnca). Targets are taken from each peak of the HNCO spectrum, and each peak of the hnca spectrum is searched using each set of targets.

two separate searches are performed. The HNCO spectrum is searched and matching peaks are added to a buffer named "|hnco", and the HNCA spectrum is then searched and the matching peaks are added to a buffer named "|hnca". In the example:

scan hnco hnca (8.5 <.02> d1 || 114 <.2> d2) |fred

matching peaks from both of the searches are added to a single buffer named "|fred". Finally, in the example:

scan hnco hnca hncoca (8.5 <.02> d1 || 114 <.2> d2) |co |ca

matching peaks from HNCO are added to the |co buffer, matching peaks from hnca are added to the |ca buffer, and because there are no more buffer names listed, matching hncoca peaks are added the |ca buffer (the last buffer listed).

SPECIFYING LISTS: Both buffers and spectra can be searched using search routines. Buffers are referred to as either the buffer name or number following the "|" symbol, and spectra are referred to in the same fashion but without the "|" symbol. For example:

scan hnco |hnca hncoca (8.5 <.02> d1 || 114 <.2> d2) |co |ca

searches a spectrum, a buffer, and then a spectrum.

scan 1 |1 2 (8.5 <.02> d1 || 114 <.2> d2) |fred

searches a spectrum, a buffer, and then a spectrum.

scan 1 |hnco hnca (8.5 <.02> d1 || 114 <.2> d2)

searches a spectrum, a buffer, and then a spectrum.

PARENTHESIS: AND SPACES: All Boolean expressions must be enclosed by parentheses and followed by at least one space. Parentheses can be nested to any level. Spaces are used within Boolean expressions only as delimiters -- otherwise they are ignored.

ARGUMENTS: (VALUES:) The arguments compared in Boolean expressions may take many different forms. The following examples give an idea of the argument syntax used in Boolean expressions. There are many different attributes of peaks in lists and all of them may be accessed to some extent in Boolean expressions. The values can be a part of mathematical expressions of any complexity just as long as each expression contains less than two variables to be stepped through by the Boolean. A list of functions recognized by Booleans follows this section.

&a = the value of the variable 'a'.

23.1 = a number.

e = 2.7182818

PI = 3.1415927

#|fred = the number of peaks in buffer fred.

w|fred = the level of the buffer fred.

m|fred = the number of dimensions in the indicated peak.

d1|fred,4 = the first coordinate of the 4th peak in fred (or p1).

dx|fred,4 = if any of coordinates of fred matches test.

da|fred,4 = if 2 peaks are being compared, then all dimensions must match.

dc|fred,4 = if 2 peaks are being compared, then combinations of at least the minimum number of dimensions between the peaks must match.

d1 = the first dimension of either the buffer or spectrum to be searched.

%1 = the coordinate of the first dimension of the source spectrum.

d1|fred = the d1 value of all combinations of peaks in fred.

d1|fred,f4 = the d1 value of the first four peaks in fred.

i|fred,b = the intensity of the first peak (or d0 or p0) in fred.

c|fred,e = the numeric part of the comment from the last peak in fred.

v|fred,h = the value of the highest valued peak in fred.

g|fred,l = the lowest grade in buffer fred.

d|fred,1 = the deviation of the first peak in buffer fred.

n|fred,1 = the number of internal repeats of the first peak in buffer fred.

t|fred,1 = the tolerance of the value of the first peak in buffer fred.

r|fred,1 = the number of repeats for the first peak in buffer fred.

NOTE: The "|" (buffer) signs above can be replaced with "$" (spectrum) signs to specify spectra instead of buffers. For example:

i$fred,b = the intensity of the first peak in the spectrum fred.

d1$3,h = the value of the highest first dimension in spectrum fred.

#$fred = the number of peaks in the spectrum fred.

d1$hnca,4 = the d1 value of the fourth peak in the spectrum hnca.

d1$3,f4 = the d1 value of the first four peaks in the third spectrum.

EXCEPTION: The "w" field corresponds to the column width when used with spectra, but when used with buffers, it refers to the buffer level.

w$fred = the column width of the spectrum fred.

FUNCTIONS: The following is a list of the most common functions that can be used in Boolean arguments.

* Multiplication. ( w|fred * 2.5)

/ Division. ( d1 / 2 )

+ Addition. ( 23.1 + 413.23 + PI )

- Subtraction. ( d1$fred,4 -8 )

^ To the power of. ( 4^(2^2) )

% Modulus. ( 5 % 2 )

cos Cosine (in degrees). ( cos(90) )

sin Sine (in degrees). ( sin(90) )

tan Tangent (in degrees). ( tan(90) )

log Log based ten. ( log(1) )

ln Natural log (base e). ( ln(.5) )

OPERATORS: AND CONJUNCTIONS: The following is an exhaustive list of legal operators and conjunctions in Boolean statements.

> Greater Than (combinatorial)

>= Greater Than or Equal To (combinatorial)

< Less Than (combinatorial)

<= Less Than or Equal To (combinatorial)

= Equals (combinatorial)

!= Not Equals (combinatorial)

<tol> Within a Tolerance (tol) of (combinatorial)

>tol< Outside a Tolerance (tol) of (combinatorial)

& And (combinatorial)

| Or (combinatorial)

>> Greater Than (synchronous)

>>= Greater Than or Equal To (synchronous)

<< Less Than (synchronous)

<<= Less Than or Equal To (synchronous)

== Equals (synchronous)

!!= Not Equals (synchronous)

<<tol>> Within a Tolerance (tol) of (synchronous)

>>tol<< Outside a Tolerance (tol) of (synchronous)

&& And (synchronous)

|| Or (synchronous)

!(Boolean) The complement of the set of matches found with Boolean.

SYNCHRONOUS: vs. COMBINATORIAL: OPERATORS: AND CONJUNCTIONS: Synchronous operators and conjunctions (doubled symbols, e.g. ">>") force the arguments they act on to come from the same peak or from comparable peaks in the specified lists. For example, in the Boolean

spec1 spec2 (%1 >> d1)

the first peak in spec1 is compared to the first peak in spec2, the second peak in spec1 is compared to the second peak in spec2, and so on until the end of either spec1 or spec2 is reached. On the other hand, combinatorial operators and conjunctions (single symbols, e.g. ">") force all combinatorial possibilities of peaks between the two arguments to be considered whenever the two arguments are from different lists. For example, in the Boolean

spec1 spec2 (%1 >> d1)

the first peak in spec1 is compared to the first peak in spec2, the first peak in spec1 is compared to the second peak in spec2, and so on until the first peak in spec1 has been compared to all of the peaks in spec2, and then all subsequent peaks in spec1 are likewise compared to each peak in spec2. If d1 and %1 had referred to the same list, then a synchronous comparison would have been performed. For synchronous operators, symmetry is forced (where allowed by the identical list rule) on the two joined expressions when as if the two expressions were symmetric.

RULES

1: If any two variable arguments in a Boolean expression come from the same structure, whether buffer, spectrum or peak, then they will be synchronized so that the same element will be used for each argument as each element in the argument is stepped through.

2: Synchronized operators (eg. >> << == && ||) force symmetry between operands unless overridden by rule number 1. The following example connects elements which are in sync.

___________________ ___

| ________|__________ ___|-due to && operator

|__________| | | ___

| | | | `-due to <<>> operator

(d1 <<.02>> %1 && d2|fred > d2|barney)

prompts: In CONTRAST the user enters commands and data at prompts. Prompts from main menu of commands use the '>' character. At these prompts and similar prompts within other command menus, simply enter the command and type return. Other prompts will suggest a default value in arrow brackets (ex: <1.23>: ) which can be accepted unaltered by typing return or another value can be entered by typing the new value and returning. The escape key followed by one of the following characters has the following effect at any type of prompt.

OPTIONS:

ESC-ESC = Escapes out of loops or routines.

ESC-Q = Drops out of the current routine or shell.

ESC-E = Edit the current value being entered. (see editing:)

ESC-D = Edits the Default value for the prompt. (see editing:)

ESC-S = Shells to main command menu.

ESC-Z = Shells to operating system.

ESC-LS = Displays the file names in the current directory.

ESC-O = Displays these prompt options.

ESC-H = Context sensitive help. (see page:)

<-- = Edit current (or default if no characters entered yet) string.

DEL = Delete current string and start over.

BKSPC = Delete last character of current string.

 

 

 

Appendix B: CONTRAST COMMANDS

This section contains an alphabetical list of CONTRAST commands and includes command syntaxes, examples, and known bugs. The following is a sample entry for a command.

CommandAbbreviation (Command Name)

DESCRIPTION

A description of the use of the CONTRAST command.

EXAMPLES

A list of examples of the use of the CONTRAST command and a description of the effect of each example.

SYNTAX

cmnd >value [optional] [opt1 | opt2 | opt3] [-flag1] [-flag2] last

cmnd A valid abbreviation for the command.

[] Optional parameters.

| Or. The use of one parameter excludes the use of the other.

-flag Each flag or represents a different command option. A hyphen must precede each individual flag.

italics Text is italicized to show that it should not be taken literally but should be replaced

with the text or values that the italicized text describes. E.g. >filename would be replaced with the name of a file.

-> In the syntax section as well as in actual CONTRAST commands, this symbol indicates that a line has been broken and is continued on the next line.

line Underlined parameters must be included at the indicated positions on the command line. (Although the order of parameters is unimportant in many CONTRAST commands, it is a good practice to write command parameters in the order that they are listed in the syntax statement. All whitespace characters such as spaces and tabs are ignored by CONTRAST.

CAVEATS

RELATED COMMANDS

BUGS

help:, "h", "?"

"al" <align> Calculates rough alignment corrections between spectra.

"ala" <auto lock all> Locks fragments together based on input criteria.

"alt" <alter> Operates on columns in a spectrum (-l to list, -f to file).

"ann" <anneal> Uses simulated annealing to order fragments.

"ap" <autparamset> Allows user to set the parameters used for AUTO.

"aut" <auto> Does automated connectivity tracing (CONTRAST).

"beep" <beep> Produces audible beep. Useful for alerting user to macro end.

"bob" <buffer overlap buffer> Prints the degree of overlap between buffers.

"btf" <buffer to file> Prints specified buffer contents to a file.

"bye" <bye> Also "q" <quit>. Exits program or subprogram.

"cbl" <create buffer link> Allocates buffer links for each peak in a spec.

"clr" <clrb> Clears indicated buffer (default: clears ALL buffers).

"cls" <cls> Clears the screen.

"cob" <children overlap buffer>

"com" <compress> Compresses, sorts and tabulates repeats for buffers.

"conv" <convert> Converts peak list files to CONTRAST format.

"cyc" <cycle> Loops through a series of commands (-b to begin, -c to clear).

"csa" <combined search all>

"cs" <combined search>

"d" <display> Interactive display of one or more buffers in columns.

"dir" <directory> Displays current directory.

"df" <doubletfilter> Removes the doublets from a spectrum.

"ed" <edit> Edits the following command or last command by default.

"ev" <evaluate> Evaluates a numeric expression.

"exe" <execute> Executes a macro file.

"fill" <fill> Fills in missing peaks in one spectrum with peaks from another.

"fit" <fit> Calculates least squares fit to a straight line in a matched file.

"fit0" <fit0> Calculates zero order fit in a matched file.

"fd" <full display> Displays the specified contents of a buffer.

"fdf" <full display to file> Writes specified contents of a buffer to a file.

"fs" <fscan> Fast scan for one target using rapid search from HASH table.

"h","?" <help> Pages to this menu or to specified command.

"hash" <hash> Creates HASH table from active spectra for use by FSCAN.

"int" <intersect> Takes an intersection of specified buffers.

"inta" <intersect all> Takes an intersection for each fragment.

"key" <key binding> Sets the return values of keystrokes.

"lf" <load file> Loads a spectrum in CONTRAST format.

"load" <load> Loads spectra in CONTRAST format using log file.

"lock" <lock> Locks the keyboard until the argument of lock is typed.

"mat" <match> Matches two spectra (-l to list, -f to file).

"mal" <matchalign> Does brute force fine tuning alignment of 2 spectra.

"mm" <main menu> Calls another shell with the main menu.

"op" <operate> Operates on specified dimensions of a spectrum.

"opf" <operate file> Operates on dimensions of spectra and saves to a file.

"ord" <order buffer> Orders the buffer(s) by the specified field.

"ordc" <order counter> Counts the number of sequential comments in a spectrum.

"pa" <page> Pages through a file. Allows for searches. Used by HELP.

"pr" <prompt> Prints prompt (for use within CYCLE or EXE).

"prn" <print> Prints out the global variable or variable list.

"pru" <prunebuffer> Filters multiple occurrences of peaks in different buffers.

"ran" <random> Generates random numbers.

"rec" <recycle> CYCLE command in which edited commands are updated.

"rl" <readlog> Reads global parameters from a log file.

"sa" <scan all> Scans spectra using search string from each peak of spectra.

"saf" <saveasfile> Saves a buffer as a peak list file in CONTRAST format.

"sbob" <single dimension buffer overlap buffer> sbob |buf1,d1 |b2,d3 -s-b .1

"sa" <scan all> Searches other spectra for each peak of a source spectrum.

"sc" <scan> Searches spectra using search strings.

"sco" <score> Calculates the score for all of the elements of a buffer.

"sd" <scaledev> Scales the deviation values in a buffer.

"set" <set> Sets the values of global variables and structures.

"sh" <shell> Shells out to operating system.

"shuf" <shuffle> Orders the peaks in source spectrum based on neighbor's score.

"si" <scale intensities> Scales the intensities of all peaks in a spectrum.

"sn" <score neighbor> Determines amount of overlap between 2 buffer sets.

"sp" <spec> Lists and allows user to define active spectra.

"spl" <split> Splits peaks in buffer into different values.

"ss" <sstr> Produces a list of past search strings which can be edited.

"stf" <shuffle to file> Prints a quick and dirty log of the shuffled spectra.

"stp" <shuffle to plot> Prints shuffling to a data file to plot ambiguity levels.

"sts" <shuffle to spectrum> Produces an ordered copy of the source spectrum.

"ti" <time> Displays the time and date.

"timer" <timer> Calculates the time between timer calls.

"ubl" <update buffer links> Increases the number of buffer links.

"un" <union> Adds the contents of two buffers to a third.

"una" <union all> Forms a union in each fragment.

"wb" <write buffer> Writes buffer to CONTRAST spectrum file.

"wl" <writelog> Writes global parameters to a log file.

"ws" <write spectrum> Writes spectrum to a CONTRAST spectrum file.

"q" <quit> Quits CONTRAST or exits current routine.

"qq" <qq> Quits both CYCLE and CONTRAST.

AAPM (Auto Amino Acid Probability )

DESCRIPTION

Reads in a file in a flat ascii format and generates Amino Acid tests based on standard probability distributions of the amino acids in the protein. Uses seq.con (protein sequence file) to calculate probabilities based on the amino acid count of the protein and amino acid probability distributions. Limit 20 dimensions.

SYNTAX

aapm source >file.in >file.out -a-n; >Hx.aa "d3|ntoc,1" (bin.1,av.05,lim.05); ->

>Ca.aa "d3|hnca,1" (bin.5,av.1,lim.05); ...

source

-p,-n = prev test (paa) or next test (naa). DEFAULT = normal test (aa).

-a,-w = append to existing file, or write new file (DEFAULT).

Reads in file, file.in, and uses the dimensions of file.in specified in h1.aa to create an amino acid test for each amino acid listed in h1.aa. Tests are automatically read into program and output is appended to file, file.out. Above, source spectrum is indicated with an integer 1.

EXAMPLES

aapm 1 >file.in >file.out -a-n ->

>Hx.aa "d3|ntoc,1" (bin.1,av.05,lim.05) ->

>Ca.aa "d3|hnca,1" (bin.5,av.1,lim.05) ...

The arrow (->) indicates a continued line. This is taken care of in MainMenu.

aapm 1 "d3|ntoc" (binwidth=.1, av = .05, lower limit = .03) -a -n >flat.cmp >Hx.aa >Hx.mac

aapm 1 "d3|ntoc,1" (bin=.1,av=.05,lim=.03) -a -n >flat.cmp >Hx.aa >Hx.mac

Example format of file, Hx.aa:

d3 = A 13 15

d3 = C 13 15 16

d3 = D 13 15 16...

CAVEATS

RELATED COMMANDS

BUGS

ATE, Aatesteval (AA Test Eval)

DESCRIPTION

Tests the spin system identification routines: aa, paa & naa. The first part goes through each aa test entered and tells the number of times each test...

SYNTAX

ate 1, >file.out "Header to appear at top of entry in file " -a

-a append to file

-w write new file

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

AC (Auto Con) - see CT (Contrace)

ALA (Auto Lock All)

DESCRIPTION

NOTE: Parenthesis in the Boolean expressions can only be nested to a depth of MAXNEST = 10. NOTE: If math is to be done to calculate the lock, the operations are performed one at a time in order. There is no hierarchical order of operations.

SYNTAX

ala 1, if(score > 75% and diff >= 25% and num = 1) lock = %s * 100.0

ala 1, lock if score > 50

ala 1, score > 50 or diff > 25% lock = %d

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

ALIGN

DESCRIPTION

SYNTAX

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

ALT, ALTF - see OP, OPF (Operate)

ANN (Anneal)

DESCRIPTION

Anneal algorithm that uses ovl tests (Set Ovl). To alter the annealing schedule you must enclose all of the parameters. Use IN ORDER in quotation marks with 0's for the default values. Otherwise default values will be used. These values can be viewed and changed under: "set Temp", "set Tfactor", "set MaxPerTemp", and "set MinPerTemp".

Temp = the percentage of the highest possible energy change.

Set Temp to over 200% for simulated annealing from a random start.

Set Temp to < 5% for using the routine to refine a sequence.

Set Temp to a very small number to make it find only a local minimum.

Tfactor = the percentage that the temperature gets lowered at each annealing round.

Set Tfactor to > 8 for fast rough calculations

Set Tfactor to < 8 for slow refined calculations

MaxPerTemp: MaxPerTemp * NumPeaks = the number of attempted moves per temperature level before the temperature is lowered.

MinPerTemp: MinPerTemp * NumPeaks = the number of successful moves per temperature before the temperature is lowered.

SYNTAX

ann 1 "temp, tfactor, maxpertemp, minpertemp"

ann 1 "50, 8, 100, 10"

[Note: The above values represent the default values.]

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

AAQ, ANNAQ (Anneal AQ)

DESCRIPTION

Anneal algorithm that uses ovl tests (Set Ovl) and aa tests (set aa ala) to make assignments based on the sequence. This implementation uses the inefficient but robust "best" algorithm which rearranges the sequence by swapping groups of residues by breaking at weak links. The algorithm cycles between using connectivity information and position information, connectivity information only, and position information only to calculate the frequency that proposed moves are accepted.

NOTE: to alter the annealing schedule you must enclose all of the parameters. Use IN ORDER in quotation marks with 0's for the default values. Otherwise, default values will be used. These values can be viewed and changed under: "set Temp", "set deltaTemp", "set MaxPerTemp", and "set MinPerTemp".

Temp = the percentage of the highest possible energy change.

Set Temp to over 200% for simulated annealing from a random start.

Set Temp to < 5% for using the routine to refine a sequence.

Set Temp to a very small number to make it find only a local minimum.

Tfactor = the percentage that the temperature gets lowered at each annealing round.

Set deltaTemp to > 8 for fast rough calculations

Set deltaTemp to < 8 for slow refined calculations

MaxPerTemp: MaxPerTemp * NumPeaks = the number of attempted moves per temperature before the temperature is lowered.

MinPerTemp: MinPerTemp * NumPeaks = the number of successful moves per temperature before the temperature is lowered.

loTemp: The absolute temperature that the algorithm must go to before exiting.

SYNTAX

annaq 1 3 "temp, deltaTemp, maxperTemp, minperTemp, loTemp" -s

annaq 1 3 "50, 2, 100, 10, .1" -u

1 = source spectrum

3 = 3 temperature levels before switching the way proposed moves are accepted.

50 = percentage of the highest possible temperature change of the initial temperature.

2 = percentage that temperature gets reduced by at each annealing round.

100 = the number of attempted moves (x numPeaks) per temperature level before the temperature is lowered.

10 = the number of successful moves per temperature level before the temperature is lowered.

.1 = The absolute temperature that the algorithm must reach before exiting.

[Note: The above values represent the default values.]

How well are connectivity scores scaled to fall between 0 and 100?

-s Rigorously scaled connectivity scores. (Requires more calculation time.)

-m Moderate level of scaling.

-u Unscaled connectivity scoring.

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

ABQ, ANNBQ (Anneal BQ)

DESCRIPTION

Anneal algorithm that uses ovl tests (Set Ovl) and aa tests (set aa ala) to make assignments based on the sequence. This implementation uses the inefficient but robust "best" algorithm which rearranges the sequence by swapping groups of residues by breaking at weak links. NOTE: to alter the annealing schedule you must enclose all of the parameters. Use IN ORDER in quotation marks with 0's for the default values. Otherwise default values will be used. These values can be viewed and changed under: "set Temp", "set Tfactor", "set MaxPerTemp", and "set MinPerTemp".

Temp = the percentage of the highest possible energy change.

Set Temp to over 200% for simulated annealing from a random start.

Set Temp to < 5% for using the routine to refine a sequence.

Set Temp to a very small number to make it find only a local minimum.

Tfactor = the percentage that the temperature gets lowered at each annealing round.

Set Tfactor to > 8 for fast rough calculations.

Set Tfactor to < 8 for slow refined calculations.

MaxPerTemp: MaxPerTemp * NumPeaks = the number of attempted moves per

temperature before the temperature is lowered.

MinPerTemp: MinPerTemp * NumPeaks = the number of successful moves per temperature before the temperature is lowered.

loTemp: The absolute temperature that the algorithm must reach before exiting.

SYNTAX

annlbq 1 "temp, tfactor, maxpertemp, minpertemp, loTemp" -s

annlbq 1 "50, 2, 100, 10, .1" -u

1 = source spectrum

50 = percentage of the highest possible temperature change of the initial temperature.

2 = percentage that temperature gets reduced by at each annealing round.

100 = the number of attempted moves (x numPeaks) per temperature level before temperature is lowered.

10 = the number of successful moves per temperature level before temperature is lowered.

.1 = The absolute temperature that the algorithm must reach before exiting.

[Note: The above values represent the default values.]

How well are connectivity scores scaled to fall between 0 and 100?

-s Rigorously scaled connectivity scores. (Requires more calculation time.) (DEFAULT)

-m Moderate level of scaling.

-u Unscaled connectivity scoring.

-b Nonlinear overlap scoring (bonus awarded for best overlap scores). (DEFAULT)

-l Linear overlap scoring (no bonuses awarded).

-x # The number of extra cycles that the algorithm will go through at the end of the simulated annealing cycle. (Default: End with the simulated annealing cycle.) Suggested: -x1

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

ALQ, ANNLQ (Anneal LQ)

DESCRIPTION

Anneal algorithm that uses ovl tests (Set Ovl) and aa tests (set aa ala) to make assignments based on the sequence. This implementation uses the inefficient but robust "long" algorithm which rearranges the sequence by swapping groups of residues. To alter the annealing schedule you must enclose all of the parameters. Use IN ORDER in quotation marks with 0's for the default values. Otherwise default values will be used. These values can be viewed and changed under: "set Temp", "set Tfactor", "set MaxPerTemp", and "set MinPerTemp".

Temp = the percentage of the highest possible energy change.

Set Temp to over 200% for simulated annealing from a random start.

Set Temp to < 5% for using the routine to refine a sequence.

Set Temp to a very small number to make it find only a local minimum.

Tfactor = the percentage that the temperature gets lowered at each annealing round.

Set Tfactor to > 8 for fast rough calculations

Set Tfactor to < 8 for slow refined calculations

MaxPerTemp: MaxPerTemp * NumPeaks = the number of attempted moves per

temperature before the temperature is lowered.

MinPerTemp: MinPerTemp * NumPeaks = the number of successful moves per

temperature before the temperature is lowered.

SYNTAX

annlq 1 "temp, tfactor, maxpertemp, minpertemp" -s

annlq 1 "50, 8, 100, 10" -u

[Note: The above values represent the default values.]

How well are connectivity scores scaled to fall between 0 and 100?

-s Rigorously scaled connectivity scores. (Requires more calculation time.)

-m Moderate level of scaling.

-u Unscaled connectivity scoring.

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

ANNQ, (ANNEAL Q)

DESCRIPTION

Anneal algorithm that uses ovl tests (Set Ovl) and aa tests (set aa ala) to make assignments based on the sequence. This implementation uses the efficient, but overly simplified "swap" algorithm which rearranges the sequence by swapping individual residues. To alter the annealing schedule you must enclose all of the parameters. Use IN ORDER in quotation marks with 0's for the default values. Otherwise default values will be used. These values can be viewed and changed under: "set Temp", "set Tfactor", "set MaxPerTemp", and "set MinPerTemp".

Temp = the percentage of the highest possible energy change.

Set Temp to over 200% for simulated annealing from a random start.

Set Temp to < 5% for using the routine to refine a sequence.

Set Temp to a very small number to make it find only a local minimum.

Tfactor = the percentage that the temperature gets lowered at each annealing round.

Set Tfactor to > 8 for fast rough calculations

Set Tfactor to < 8 for slow refined calculations

MaxPerTemp: MaxPerTemp * NumPeaks = the number of attempted moves per

temperature before the temperature is lowered.

MinPerTemp: MinPerTemp * NumPeaks = the number of successful moves per

temperature before the temperature is lowered.

loTemp: The absolute temperature that the algorithm must go to before exiting.

SYNTAX

annq 1 "temp, tfactor, maxpertemp, minpertemp" -s

annq 1 "50, 2, 100, 10, .1" -u

1 = source spectrum

50 = percentage of the highest possible temperature change of the initial temperature.

2 = percentage that temperature gets reduced by at each annealing round.

100 = the number of attempted moves (x numPeaks) per temperature level before the temperature is lowered.

10 = the number of successful moves per temperature level before temperature is lowered.

.1 = The absolute temperature that the algorithm must reach before exiting.

[Note: The above values represent the default values.]

How well are connectivity scores scaled to fall between 0 and 100?

-s Rigorously scaled connectivity scores. (Requires more calculation time.)

-m Moderate level of scaling.

-u Unscaled connectivity scoring.

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

A3Q, ANN3Q (Anneal 3Q)

DESCRIPTION

Anneal algorithm that uses ovl tests (Set Ovl) and aa tests (set aa ala) to make assignments based on the sequence. This implementation uses the inefficient but robust "best" algorithm which rearranges the sequence by swapping groups of residues by breaking at weak links. The algorithm cycles between using connectivity information and position information, connectivity information only, and position information only to calculate the frequency that proposed moves are accepted.

To alter the annealing schedule you must enclose all of the parameters. Use IN ORDER in quotation marks with 0's for the default values. Otherwise default values will be used. These values can be viewed and changed under: "set Temp", "set deltaTemp", "set MaxPerTemp", and "set MinPerTemp".

Temp = the percentage of the highest possible energy change.

Set Temp to over 200% for simulated annealing from a random start.

Set Temp to < 5% for using the routine to refine a sequence.

Set Temp to a very small number to make it find only a local minimum.

Tfactor = the percentage that the temperature gets lowered at each annealing round.

Set deltaTemp to > 8 for fast rough calculations.

Set deltaTemp to < 8 for slow refined calculations.

MaxPerTemp: MaxPerTemp * NumPeaks = the number of attempted moves per

temperature before the temperature is lowered.

MinPerTemp: MinPerTemp * NumPeaks = the number of successful moves per

temperature before the temperature is lowered.

loTemp: The absolute temperature that the algorithm must reach before exiting.

SYNTAX

ann3q 1 3 "temp, deltaTemp, maxperTemp, minperTemp, loTemp" -s

ann3q 1 3 "50, 2, 100, 10, .1" -u

1 = source spectrum

3 = 3 temperature levels before switching the way proposed moves are accepted.

50 = percentage of the highest possible temperature change of the initial temperature.

2 = percentage that temperature gets reduced by at each annealing round.

100 = the number of attempted moves (x numPeaks) per temperature level before the temperature is lowered.

10 = the number of successful moves per temperature level before temperature is lowered.

.2 = The absolute temperature that the algorithm must reach before exiting.

[Note: The above values represent the default values.]

How well are connectivity scores scaled to fall between 0 and 100?

-s Rigorously scaled connectivity scores. (Requires more calculation time.)

-m Moderate level of scaling.

-u Unscaled connectivity scoring.

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

AP, APS, APSET, AUTP (Aut Param Set)

DESCRIPTION

Displays the parameters for the automated tracing of spin systems so that the user can change any parameter by typing the letter representing that parameter and following the instructions, or the user can accept the parameters by hitting the return key. Prints the screen that lets you change the parameters in autorec. Returns(0) if 'q' is hit.

Starting Points:

Creating a new file of starting points will create a 'spectrum' which contains the input starting points. Starting points should be entered so that each different type of nucleus is in a different column.

SYNTAX

AutParamSet( aut, &plr, 30 );

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

AUTOAA, RESIDE (Auto AA)

DESCRIPTION

Reads in chemical shift range information (set shift) and fragment definitions (Set Frag) that have been already read into CONTRAST. If this information has not been read into CONTRAST the function will look for separate macro files on the command line with the chemical shift range file first (default file name, chemshft.mac), and the fragment template file (default file name, fragment.mac) second. Function will create a set of AA tests (default file name, aa.mac) which it enters into the program. If the AA test file already exists, it will be overwritten. If Set Shiftr or Set Frag has already been used to load those tests, then the filenames for the tests are not necessary, and the files will not be visited even if the filenames are included.

SYNTAX

autoaa >aa.mac [chemshft.mac] [fragment.mac] [-d] [-res RangeResolution] [-max MaxAA]

-d: The "do flag". If -d is used, then autoaa will load the command into memory as it is created; otherwise, (by default) the indicated file will be created and the exe command must be used later to load the "setaa" commands. NOTE: that the buffers must first be created before Reside can be run with the '-d' option.

-res: RangeResolution is the desired resolution of the AA tests generated by the tests. A resolution of .2 means that there will be no chemical shift ranges generated that are smaller than .2 ppm.

-maxa: MaxAA is the maximum number of amino acids that a test can score true for the given sequence. If MaxAA is input as a fraction then the number of AA in the sequence will be used to generate the maximum number of AA.

-max1,2,3...: The maximum number of amino acids that can score true for a given test that has 1,2,3,... dimensions.

-mint: Mintests is the minimum number of tests per amino acid type.

-maxt: Maxtests is the maximum number of 1D tests in which the number of amino acids that the test is true for is greater than MaxAA.

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

AUTOTRACE (formerly CONTRACER)

DESCRIPTION

Note: pr.score = -1 when the object has been included in spin system.

SYNTAX

AutoTrace( &aut, choptr );

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

BEEP

DESCRIPTION

SYNTAX

Beep();

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

BOB (Buffer Overlap Buffer)

DESCRIPTION

Calculates and prints the amount of overlap between two buffers by comparing the two buffers on a coordinate by coordinate (as opposed to value by value) basis.

SYNTAX

BufferOverlapBuffer(plrptr1, plrptr2, &rep, &srep, tol, flag);

bob -a |plr1 |plr2 .02

-a = all coordinates checked

-m = minus the matches (default)

-c = split and compress the plr's first

(NOTE: always does the split without including the matches (involves less doubling))

EXAMPLES

CAVEATS

RELATED COMMANDS

sbob

BUGS

BOP, BOLP, BOVP (Buff Overlap Peak)

DESCRIPTION

Returns the number of times a member of L is also found in peak in rep. Returns the # of times a member of L is found in peak scaled by L dev, L internal confidence fractions, L intens, dev from overlap. Reps are taken care of in compressed list. Each peaks must get divided by their internRep for each peak in the buffer.

NOTE: When creating globtol, put most specific labels first and more general last to serve as defaults. Position [0] should have universal tolerance.

NOTE: Overlap must be done on compressed list.

NOTE: If peaks are "man-made" then use intens value as an indication of how good the peak is. 1 has no effect, higher values mean a good peak.

NOTE: "rep" and "srep" can be passed into this routine with nonzero values and scores will be added to them.

SYNTAX

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

BTF (Buff To File)

DESCRIPTION

Writes the specified contents of a buffer to a file. By default the contents of the current buffer are appended to the end of the specified file.

SYNTAX

order: Unimportant except that the first unspecified argument (any argument without one of the prefixes: >, ", |, #, :, or -) is taken to be the file name if one is not specified by '>'.

default: Uses current buffer and it appends to end of a file which must be specified.

>filename: The path and file name to save buffer to.

|buffer: The name or number of the buffer to save.

:code: The one letter code identifier for the specification of the buffer.

-flags:

-a = append (default).

-w = overwrite (also -o).

-n = don't print header information.

-h = print header information (default).

"string" The format for each peak in the buffer that gets printed. Embedded quotes are used to show that the exact contents of the quotes should be printed character for character.

# = FIELD WIDTH. Any integer directly following a character in "string" represents the maximum width of the field that the value that the character represents will have.

#.# = PRECISION. Any fractional part following the field width specifies the number of digits that will follow the decimal for real values to be printed.

' ' = SPACE. Prints a space. (Can be repeated by following the space with a FIELD WIDTH argument.

"" = EMBEDDED QUOTES. The contents of embedded quotes will be printed exactly as is.

c = CODE. Prints the one character nucleus type.

d = DEVIATION. Prints the deviation of MATCH from TARGET.

i = INTENSITY. Prints the intensity of the peak.

m = MATCH. Prints the value that matched TARGET w/in TOL.

n = INTERNAL REPEATS. Prints the number of peaks from the same spectrum that was repeated in the buffer.

p = PEAK. Prints the coordinates of the peak.

r = REPEATS. Prints the number of times that the value was repeated in the buffer.

s = SPECTRUM. Prints the name of the spectrum that the peak was taken from.

t = TOL. The tolerance used in the search (corresponds to the tolerance associated with the nucleus.

v = VALUE. Prints the new value resulting from the search.

EXAMPLES

""spec = "s4 "p= "p6.2 i4.0 3"code 4="c3 "

Result: spec = noes p= 10.34 4.45 4300 code 4= H

spec = hnco p= 133.23 8.93172.33 4200 code 4= n

NOTE: This example illustrates that at the present time this program does not keep the display in correct columns when spectra with different numbers of dimensions are used. To make this output more readable it is suggested that you put the PEAK field at the end of the line. Also note that the format specified by the PEAK arguments is responsible for the spacing between the coordinates. If 'p7.2' was used instead of 'p6.2', the coordinates 8.93 and 172.33 would have been separated.

btf |buffname >filename -w-n Overwrites buffer (buffname) to file, filename without a header.

btf fname Appends current buffer to file, fname.

btf :A fname Appends first buffer w/ code = 'A' to the file, fname.

btf |2 >hist.fil -n ""spec= "s" Prints the string 'spec= ' followed by the name of the spectrum that each peak came from in the 2nd buffer to the file, hist.fil w/ no header.

CAVEATS

RELATED COMMANDS

BUGS

BTSS (Buff To Spec Shell)

DESCRIPTION

Copies the contents of a buffer into a spectrum *.

Note: Does not allocate the spectrum or change numSpec.

SYNTAX

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

CBL (Create Buffer Links)

DESCRIPTION

Reallocates buffer pointers to each peak in a spectrum. Adds the indicated number of buffer links to those there already and sets new buffptrs to null. Used for the automatic peak ordering algorithms.

SYNTAX

cbl(specptr, numptrs);

EXAMPLES

cbl 1, 7 (adds 7 buffer links for each peak in spectrum 1)

CAVEATS

RELATED COMMANDS

BUGS

CFS, CF (Cluster Filter Shell)

DESCRIPTION

Takes intensity weighted average of peak positions that are within the tolerances specified in the Boolean. In the unlikely event that peak are given zero intensity, the peaks are assigned weights of 0.1 to prevent the calculations from being skewed. Returns the number of peaks deleted.

SYNTAX

cf hnco (d1 <.05> %1 && d2 <.4> %2)

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

CLRB (Clear Buffers)

DESCRIPTION

Clears specified buffers. Clears all buffers when command is given no argument. When all buffers are cleared, all buffer links are also cleared.

SYNTAX

EXAMPLES

clrb 1 2 6 Buffers 1, 2, and 6 are cleared.

clrb 1 2-7 8 9 Buffers 1 through 9 are cleared.

clrb 1 4 >6 Buffers can be specified using bounds:

clr <6 9 10

clrb All buffers are cleared.

CAVEATS

RELATED COMMANDS

BUGS

CLS

DESCRIPTION

UNIX version only.

SYNTAX

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

COB (Child Overlap Buffer)

DESCRIPTION

Measures the overlap between the children of each peak in buffer 1 with the members of buffer 2.

NOTE: The buffer whose peaks are to be used to create a search should be listed first. The children are found by searching a the spectrum that was searched to generate buffer 2 using the supplied search template where the %#'s will be replaced by the specified first, second, etc. coordinate of the peak. The supplied tolerance will be used to determine if a match occurs.

SYNTAX

ChildOvlBuff( choice, cobstr);

EXAMPLES

cob -a |2 |3 .03, "d1 %1 .04 and d2 %2 .1"

cob -all coords matched |search buffer |comp buffer, tolerance used for match, "template string"

-a all coords matched in comparing 2 peaks

-m (default) minus the matches (only the non-matches are searched)

CAVEATS

RELATED COMMANDS

BUGS

COM, CO, COMP (Compress)

DESCRIPTION

Compress is used to prepare the buffer for the automatic SCORE function as well as automatic OVERLAP calculations. Compress orders the values in the buffer and determines the number of times they are repeated between experiments, rep, and the number of times they are repeated within an experiment, internRep. Multiple appearances of the same peak in a buffer are discarded.

NOTE: Compress must be performed on a buffer that has been split into the peak's constituent values. If the list has not previously been SPLIT the COMP command splits the list automatically leaving out the matched values.

You must update llr after using this routine if you want the changes saved.

Note: The only time a peak is deleted is if there are 2 of same peak, or if it is near a peak already chosen to be in the spin system. InternRep peaks will only increase the rep of other peaks by one.

SYNTAX

Compress( &plr );

order: Any order of arguments will be accepted.

default: Eliminates duplicate peaks and splits peaks only into those values that were not matched in the search for the peak.

|buffer The buffer name or number to compress.

:code The identifying 1 letter nucleus code of the buffer.

-m Split peaks into all of the coordinates (including the matched values).

EXAMPLES

comp |fred -m Compresses the buffer named fred and includes match values.

CAVEATS

RELATED COMMANDS

BUGS

CONTRACER - see AUTOTRACE

CS (Combined Search)

DESCRIPTION

Scans listed spectra using a search string constructed from a template which is assembled from all combinations of the best peaks in 2 different buffers. CS works best if ORD has been used on the two buffers to position the best peaks at the beginning of the buffer.

SYNTAX

cs 11 |2,4 |12,4 |result "d2 %3 .2 and d1 %3r2.9,6.5 .03"

Searches spectrum 11. Takes top 4 peaks from buffer 2. Takes top 4 peaks from buffer 12. Puts the resulting finds in buffer |result. Search string: Searches dim 2 of spec 11 for a target taken from buffer 2 (listed first) within a tolerance of .2 AND dim 1 of spec 11 for a target (within the range of 2.9 and 6.5) taken from buffer 12 within a tolerance of .03.

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

CSA (Combined Search All)

DESCRIPTION

Scans listed spectra using a search string constructed from a template which is assembled from all combinations of the best peaks in 2 different buffers linked to each peak of the source spectrum.

NOTE: All buffers must be specified by NAME rather than number. If more than one spectra are listed to be searched, the results of the searches will all be put into one buffer. CSA works best if ORD has been used on the two buffers to position the best peaks at the beginning of the buffer. NOTE: csTopNum1 and csTopNum2 should be set also. They are global variables and are used for both CS and CSA.

SYNTAX

csa(choptr);

EXAMPLES

csa 1, 11 |hnca,4 |ntocsy,4 |result "d2 %3 .2 and d1 %3r2.9,6.5 .03"

Goes through each peak in spectrum 1 and uses/creates buffers linked to that peak. Searches spectrum 11. Takes top 4 peaks from buffer hnca. Takes top 4 peaks from buffer ntocsy. Puts the resulting finds in buffer |result. Search string: Searches dim 2 of spec 11 for a target taken from buffer hnca (listed first) within a tolerance of .2 AND dim 1 of spec 11 for a target (within the range of 2.9 and 6.5) taken from buffer ntocsy within a tolerance of .03. NOTE: It is IMPORTANT that the first target in the string correspond to the first buffer listed!

CAVEATS

RELATED COMMANDS

BUGS

CT (Contrace)

DESCRIPTION

Generates a CONTRAST macro that can be used as is or modified to create assigned fragments (primary assignments) for a protein. Bases its analysis on the experiments that have already been read into the program. The correlations per experiment should already be included in the spectra files (or they could be added after reading in the experiments using the AddCorrelation routine which has not been written yet).

Whether or not any chemical shift range filtering is to be performed, Set Shiftr statements must be used to load chemical shift ranges for each resonance type in set of input spectra. The fuzziness of filtering is expressed as a percentage on the command line.

0% fuzziness: the resonance chemical shift range will be used as it was set.

100% fuzziness: the resonance chemical shift range will be doubled.

<0% fuzziness: (default) No filtering will be done.

The spectrum to be used as the source spectrum can be specified (using the name or number of the spectrum).

SYNTAX

'r' = Rigorous calculations.

'h' = Uses heuristics to cut down on computation time.

'n' = Does heuristics to compare with exptl. no noise prob calculation.

'm' = (default) Chooses between rigorous and heuristic automatically.

'f' = Fill source spectrum. Default= Don't fill.

'g' = Glycine filter source spectrum. Default= Don't glycine filter.

'a' = Arginine filter source spectrum. Default= Don't arg-filter.

'x' = SetX cross-checking. Default= Don't cross-check.

-devi = Inline deviation filtering.

-devo = Out of line deviation filtering. (Do deviation filtering at end of macro.)

Fragment Filtering Flags:

'D' = (default) Frag-filter when determined necessary.

'F' = Frag-filter even when numRepeats of theRes > than other correlations.

'N' = No frag filtering.

'P' = Percent frag-filtering.

'C' = (default) Constant frag-filtering.

NOMENCLATURE:

BS[].stat = -1 = low fragOvl slush buffer that's been used already

BS[].stat = 0 = slush buffer

BS[].stat = 1 =

BS[].stat = 6 = frag

correlation: the correlations w/in a spectrum.

xcorrelation: the correlations between spectra or buffers (when dimensions are in common)

slush buffer:

primary buffer:

immature primary buffer:

resonance overlap: when resonances (part of the cor's) from one buff or spec overlap w/ res from another

incomplete

complete

specific range filter:

unique: A dimension with only one type of resonance. It's unique even if spec has several correlations.

EXAMPLES

set shift all Ca 39-69

set shift ST Cb 61-73

set shift others Cb 18-48

NOTE: If you want the program to include a "Ile Boost" to make sure Hg12, Hg13 and Cg1 values don't get ignored if they get assigned to the second buffers Hgi2 and Cgi2, then you must explicitly include the chemical shift ranges for those resonances.

Example:

set shift Ile Cg2 14-22

set shift Ile Hg2 0-1.2

Values don't get ignored if:

contrace 1, >contrace.mac 0.0% -m

CAVEATS

RELATED COMMANDS

BUGS

CYC (Cycle)

DESCRIPTION

SYNTAX

Cyc(choice,&cycL,&inputF);

inputF = 'c' Clear and create new cycle from beginning

= 'a' append to end of cycle list

= 'b' begin running cycle from beginning

= 'x' at end of cycle

= 'r' Running cycle at current position

= 'e' edit cycle list

= 'f' false (not cycling)

= 'm' for macro input

Need to have routines in choice command interpreter:

"q" to quit cycle

"qq or "q q" to quit cycle and program

"prompt" to allow you to put in any command

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

DELALL (Delete All Shell)

DESCRIPTION

SYNTAX

EXAMPLES

delall 1, |hnca () deletes each peak in |hnca

delall 1, |hnca (d1>3) deletes selected peaks in |hnca

delall 1, |hnca !(d1>3) deletes all but selected peaks in |hnca

delall 1, |hnca deletes the buffer |hnca and all associated peaks

delall 1, hnca (d1>3) deletes selected peaks in $hnca

delall 1, hnca !(d1>3) deletes all but selected peaks in $hnca

CAVEATS

RELATED COMMANDS

BUGS

DEL (Delete Shell)

DESCRIPTION

SYNTAX

EXAMPLES

del |hnca () deletes each peak in |hnca

del |hnca (d1>3) deletes selected peaks in |hnca

del |hnca deletes the buffer |hnca and all associated peaks

del $hnca () deletes previous three for spectra

delall 1, |hnca () deletes previous three for buffers in each fragment

delall 1, |hnca !(d1>3)

CAVEATS

RELATED COMMANDS

BUGS

DF (Doublet Filter)

DESCRIPTION

SYNTAX

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

DIR

DESCRIPTION

SYNTAX

Dir();

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

DISP, D (Display)

DESCRIPTION

Below is a summary of the commands available within the interactive display mode. A more detailed description of each command follows directly.

DISPLAY MENU

0: All columns will be effected by commands.

1,2...: Only indicated column will be effected by commands.

a: Fills screen with first buffers in list.

z: Fills screen with last buffers in list.

c: Set the Columns to be displayed.

m: Displays columns using current position.

HOME,^A: Displays ALL buffers from the beginning.

UP,u: Moves active buffer(s) one row Up.

PGUP,p: Moves active buffer(s) one page Up.

DOWN,d: Moves active buffer(s) one row down.

PGDN,o: Moves active buffer(s) one page down.

LEFT,l: Shifts displayed buffers to the left.

RIGHT,r: Shifts displayed buffers to the right.

e: Edit indicated spectral fields.

f: Select new Fields to be displayed.

t: Toggle on/off column Titles.

n: Toggle on/off spectrum Names.

i: Toggle on/off Information line display.

h,?: Help Menu for DISPLAY.

vr: Set the number of Rows on video screen.

vc: Set the number of Columns on video screen.

wb: Write buffer to an ASCII file.

ws: Write spectrum to an ASCII file.

RETN,q: Quit display mode.

SYNTAX

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

DISP E (Display Edit, Edit Spectrum)

DESCRIPTION

Called within interactive display mode by typing 'e', this command allows many different fields of a spectrum to be edited. All of the editing commands effect the spectrum being referred to directly except for the remove peak (rm) command which can delete a peak from the buffer only.

SYNTAX

Dimension # should always directly follow the field to be edited.

The buffer number always comes first.

EXAMPLES

e d1 2 3 edit dim 1 of 3rd peak in buffer 2

e i 2 3 edit intensity of 3rd peak in buffer 2

e c 2 3 edit comment of 3rd peak in buffer 2

e r 2 3 removes peak 3 from buffer 2

e n 2 edit spectrum name of buffer 2

e l1 2 edit label for dim 1 of buffer 2

e t1 2 edit tolerance for dim 1 of buffer 2

e f1 2 edit format for dim 1 of buffer 2

CAVEATS

RELATED COMMANDS

BUGS

DISP F (Display Fields)

DESCRIPTION

fields:

w - the level of automatic tracing

g - the grade (score) for that peak

l - the line number

c - the comment

x - the one letter nucleus code

d - the deviation of the match value from the target

n - the number of internal repeats

s - spectrum name

m - the match values

v - the main value being considered for a peak

t - the tolerance of that value

r - the number of repeats

p - the peak

i - the peak's intensity

' ' extra space

"" quoted literals

SYNTAX

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

DTF (Display To File)

DESCRIPTION

Prints all buffers to file as if they were screen dumped from display.

Make specRay[i] a pointer to the plr;

Make colsrch a field in print;

SYNTAX

DTF(choptr);

dtf >file.name -a "Header string"

-w (default) overwrite

-a append

-v (default) creates vertical file

-h created horizontal file

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

EVAL (Evaluate)

DESCRIPTION

Evaluates the expression in the string. Note that the first character of the string must be the opening parenthesis for the expression to be evaluated or it must be the first character of the expression and the string must end right after the expression to be evaluated.

NOTE: If a range is specified the highest value from that range will be returned unless ,L is specified for lowest value.

MarkandEval calls Eval and finds the endpoint automatically to evaluate expressions that are part of longer strings. It too must begin on the expression to be evaluated.

SYNTAX

value = Eval(str,source,fragi,varPeak,numVars,com);

source -> Limits the search for matching buffer to a particular fragment if source != NULL.

fragi -> Used if source != NULL to specify fragment.

varPeak -> Array of pointers to either spectra or buffers to consider if lists aren't specified.

numVars -> The number of elements in array varPeak.

com -> The text form of the returned value or the resulting string if a string operation is

specified (in which case the value returned will be NONFLOAT).

special numbers = E, PI

Operators:

for numeric operations:

* + - / ^ %(modulus) sin cos tan log ln val1, val2, ...

Operands of all trig functions should be in degrees.

Val#(string) takes the #th value from the string. If there aren't # values then it takes the last value.

for text operations:

+ Union "fred" + "ted" = "fredted"

^ Intersection "fred" ^ "ted" = "ed"

- Delete Intersection "fred" - "ted" = "fr"

* Count Number of Intersections "eded" * "ed" = 2

/ Remove Characters "fred" / "det" = "fr"

% Remove all but characters "fred" % "det" = "ed"

Values:

&a = the value of the variable 'a'

23.1 = a number

e = 2.7182818

PI = 3.1415927

#|fred = the number of peaks in fred

w|fred = the level of the buffer

m|fred = the number of dimensions in the peaks

d1|fred,4 = the first coordinate of the 4th peak in fred (or p1)

dx|fred,4 = the number of coordinates for evaluations or assignments or looks at each coordinate and demands at least one match in Boolean tests.

dc|fred,4 = the number of coordinates for evaluations or assignments or tests all combinations of coordinates for at least numDim matches.

da|fred,4 = demands that all dimensions match.

i|fred,b = the intensity of the first peak (or d0 or p0)

c|fred,e = the numeric part of the comment from the last peak

v|fred,h = the value of the highest valued peak in fred

g|fred,l = the lowest grade in fred

d|fred,1 = the deviation

n|fred,1 = the number of internal repeats

t|fred,1 = the tolerance of the value

r|fred,1 = the number of repeats

#$fred, m$fred, d1$fred,4, i$fred,b, and c$fred,e also apply to spectra

w$fred = the column width of the spectrum fred

Field indicators:

# the number of peaks

b the ambiguity of the buffer

w the level of the buffer or column width of the spectrum

m the number of dimensions in the peaks

d1... the first coordinate ...

i the intensity

v the value (buffer only)

g the score (buffer only)

s the score (buffer only)

C the first numeric value in a comment

c the text of the comment

d the deviation (buffer only)

n the N variable

x the X variable

q the quality factor (ambiguity or confidence)

r the number of repeats (buffer only)

k the number of links (spectrum only)

List indicators:

| buffer

$ spectrum

Peak indicators:

blank Specified by the search.

,1... The first peak ...

,b The beginning or first peak.

,f4.. The first four peaks...

,l4.. The last four peaks...

,e The end or last peak.

,H The highest value in the list.

,L The lowest value in the list.

,2-5.. Peaks 2 through five.

NOTE: H or L can be appended to a range to have the range return the highest (default) or lowest value.

EXAMPLES

i|fred,f4H = the intensity of the highest intensity peak from the first 4 peaks of buffer fred.

i$fred,f4 = the intensity of the first four peaks in spectrum fred.

i|fred,l4 = the intensity of the last four peaks in fred

d1$fred,4+ = the first coordinate of the fourth through the last peaks in spectrum fred.

i|fred,2-5 = the intensity of the second through fifth (inclusive) peaks

v|fred,H = the value of the highest valued peak in buffer fred

s|fred,L = the lowest grade in buffer fred.

C|fred,e = the numeric part of the comment from the last peak in buffer fred.

c|fred,1 = the comment text string from the first peak in buffer fred.

d|fred,b = the deviation of the first peak in buffer fred.

#$fred = the number of peaks in spectrum fred.

w$fred = the column width of the spectrum fred.

CAVEATS

RELATED COMMANDS

BUGS

EXE (Execute)

DESCRIPTION

Executes a macro file given the file's path and name. Each line of the macro file should contain only one command as it would be typed at the main menu prompt.

SYNTAX

exe flav.mac -e

-e = (default) If you want the commands read from the macro file to be echoed on the display.

-n = No echo.

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

FB (Fast Boolean)

DESCRIPTION

SYNTAX

if(FastBool(str))...

Note: FB uses absolutely no spaces. The input string must be put in paren.

Operators:

> >= < <= == != && || <tol> >tol<

&& = and

|| = or

<tol> = is within a tolerance, tol, of next value

>tol< = is outside of a tolerance, tol, of next value

Values: Only floating pt numbers or integers.

Functions: None

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

FCS (File Compress Shell)

DESCRIPTION

Reads in a file in a flat ascii format and compresses lines if the Boolean is true. The first comparison in the Boolean MUST be between values from the same fields and the same file. This field is called the primary field of the compression, and it's values should be sorted so that all identical values are grouped together. Since the Boolean contains a comparison from this field, all matches must be contained within contiguous blocks.

The output file is created so that the values of all of the selected matching lines get averaged.

SYNTAX

Operations:

<> = != < > >= <= ><

EXAMPLES

Given a sample space and tab delimited input file, file.out:

**var seq AA grpID rfID fMol fPPM lopH hipH loTmp hiTmp shifts...

165 8 H 495 390 1001 0.00 5.50 5.50 45.00 45.00 NULL 5.04 NULL NULL NULL

164 8 H 497 391 1003 0.00 5.50 5.50 45.00 45.00 53.4 NULL 29.5 NULL NULL

165 8 H 530 389 1001 0.00 5.10 5.50 45.00 45.00 53.4 5.04 29.5 NULL NULL

164 8 H 1704 389 1001 0.00 5.10 5.50 45.00 45.00 53.4 5.04 29.5 NULL NULL

165 8 H 1875 957 1001 0.00 5.50 5.50 45.00 45.00 53.4 5.03 29.5 NULL NULL

164 9 K 495 390 1001 0.00 5.50 5.50 45.00 45.00 NULL 4.82 NULL 1.53 1.53

164 9 K 497 391 1003 0.00 5.50 5.50 45.00 45.00 54.5 NULL NULL NULL NULL

164 9 K 1875 957 1001 0.00 5.50 5.50 45.00 45.00 54.3 4.82 NULL 1.53 1.53

164 10 E 495 390 1001 0.00 5.50 5.50 45.00 45.00 NULL 5.06 NULL 2.16 2.16

164 10 E 497 391 1003 0.00 5.50 5.50 45.00 45.00 51.8 NULL NULL NULL NULL

164 10 E 1875 957 1001 0.00 5.50 5.50 45.00 45.00 51.9 5.08 NULL 2.16 2.16

fc file.out (d2 = d2 && d1 = d1 && d8 <.5> d8) >short.out

Produces the tab delimited output file, short.out:

**var seq AA grpID rfID fMol fPPM lopH hipH loTmp hiTmp shifts...

165 8 H 967 579 1001 0.00 5.37 5.50 45.00 45.00 53.4 5.04 29.5 NULL NULL

164 8 H 1100 390 1002 0.00 5.30 5.50 45.00 45.00 53.4 5.04 29.5 NULL NULL

164 9 K 956 579 1002 0.00 5.50 5.50 45.00 45.00 54.4 4.82 NULL 1.53 1.53

164 10 E 956 579 1002 0.00 5.50 5.50 45.00 45.00 51.8 5.07 NULL 2.16 2.16

CAVEATS

RELATED COMMANDS

BUGS

 

FDF (Full Display To File)

DESCRIPTION

Writes out the contents of the specified buffer to the specified file using the template set by "set fd".

SYNTAX

fdf ( plrptr, fname, appendF);

fdf |3 >file.name -a-n

-a = append to the end of the existing file.

-w = overwrite existing file. (default)

-h = print header information to file. (default)

-n = no header information printed.

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

FDS (Full Display Shell)

DESCRIPTION

SYNTAX

FullDisplayShell( str );

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

FF (File Filter)

DESCRIPTION

Reads in a shift file in a flat ascii format and copies the file to an output file (short.out) filtering out lines for which the Boolean is true. Deletes lines of this file based on the values of a single field. NOTE: Function reads in each tab/space-delimited field as if it were a floating point number.

SYNTAX

-p Prints out each line that is filtered to the screen.

-n (DEFAULT) Doesn't print out each line to the screen.

operations:

<> = != < > >= <= ><

EXAMPLES

ff file.out !(d1 == d1>good.var.ids,f5 ) -p >short.out

-p Prints out each line that is filtered to the screen.

ff file.out (d6 < 4.3) -n >short.out

-n Doesn't print out each line to the screen.

CAVEATS

RELATED COMMANDS

BUGS

FILESTAT (File Stat)

DESCRIPTION

Reads in a file in a flat ascii format and calculates statistical parameters for indicated columns.

SYNTAX

-a Append to output file.

-w (default) Overwrite output file.

EXAMPLES

filestat file.in (d1 = .1, d2 = #100, d3 = 10) -a -0 >file.out

Reads in file, file.in, and calculates the mean, standard deviation, etc. for the first (d1), second (d2) and third (d3) columns in the file. It also calculates a values for a binned probability distribution where the bin width in d1 = .1, the bin width in d3 = 10, and the width of the bins in d2 is calculated to give 100 bins total for the data. If the width of a column is set to be zero then the probability distribution values are not calculated. -a flag causes the new output to be appended to the end of the output file if it already exists. -0 is the value that the bins are to be aligned with. If the first data point is 50.094 then the first bin will start at 50.0 since 50.0 is the highest number less than 50.094 for which (50.0-0) % .1 = 0. Output is put in the file, file.out.

CAVEATS

RELATED COMMANDS

BUGS

FIL (Fill)

DESCRIPTION

Fills the first listed structure (either a buffer or a spectrum) with the peaks from the second structure that are found in the search (Boolean). The designated dimension is set to the designated value before adding the peaks to the first structure.

SYNTAX

Fil (str);

-a (DEFAULT) All matches marked.

-n Noesy-type matches are marked.

-b Best match marked.

-u (DEFAULT) Unscaled. Matches are compared using only their deviations.

-s Scaled. Matches are compared to determine best (-b or -n) using scaled scores.

EXAMPLES

Normal call:

fil hnco hnca !(d1 <.05> %1 && d2 <.4> %2) d3=0.0

(Note: The '!' sign means "not" and in this case takes the complement of the results from the search.)

Other calls:

fil hnco hnca (d1 <.05> %1 && d2 <.4> %2) d3=0.0

fil |hnco hnca (d1 <.05> %1 && d2 <.4> %2) d3=0.0

fil hnco |hnca (d1 <.05> %1 && d2 <.4> %2)

fil |hnco |hnca !(d1 <.05> %1 && d2 <.4> %2)

CAVEATS

RELATED COMMANDS

BUGS

FILA (Fill All)

DESCRIPTION

Returns the number of peaks that are filled in in the first listed list.

SYNTAX

Fila(str);

Flags:

-a (DEFAULT) All matches marked.

-n Noesy-type matches are marked.

-b Best match marked.

-u (DEFAULT) Unscaled. Matches are compared using only their deviations.

-s Scaled. Matches are compared to determine best (-b or -n) using scaled scores.

EXAMPLES

Normal Call:

fila 1, |hncoca |hnca (#|hnca > 1 && %3 <.1> d3) -f

Adds |hnca peaks to |hncoca buffer when they match with peaks already in the |hncoca buffer for each fragment.

Other Calls:

fila 1, hnco hnca (d1 <.05> %1 && d2 <.4> %2) -s -b d3=0.0

fila 1, |hnco hnca (d1 <.05> %1 && d2 <.4> %2) d3=0.0

fila 1, hnco |hnca (d1 <.05> %1 && d2 <.4> %2)

fila 1, |hnco |hnca !(d1 <.05> %1 && d2 <.4> %2)

CAVEATS

RELATED COMMANDS

BUGS

FILTER

DESCRIPTION

Performs a search of other lists in order to delete peaks from a list. Function marks all peaks in the first list based on whether a match is found in the following lists so that a peak in the first list is allowed to be marked only once for each list that follows. An operator and integer following the search specifies the number of marks necessary for a peak to be deleted. In the first example a peak is deleted if a search of each of the following three lists (hnca, hncoca, and ntoc) does not find a match for that peak. If "== 2" had been specified rather than "== 3" then a peak would have been deleted if there were no matches for two of the three spectra.

Returns: The number of peaks deleted.

Returns the number of peaks that are filled in the first listed spectrum. (Actually doesn't return number yet).

SYNTAX

Filter (str);

-f (DEFAULT)

-u (DEFAULT)

-a (DEFAULT)

EXAMPLES

Normal Call:

filter hnco hnca hncoca ntoc !(d1 <.05> %1 && d2 <.4> %2) == 3

**deletes all pks from hnco not in all 3 other spectra

filter hnco hnca hncoca (d1 <.05> %1 && d2 <.4> %2) < 1

**deletes all pks from hnco not in all 2 other spec

Other Calls:

filter hnco hnca (d1 <.05> %1 && d2 <.4> %2) == 1

**marks and deletes peaks in hnco that are also in hnca

filter |hnco hnca (d1 <.05> %1 && d2 <.4> %2) < 1

**deletes peaks in buffer that are not in hnca

filter hnco |hnca |ntoc (d1 <.05> %1 && d2 <.4> %2) > 1

**deletes pks in hnco that are in both buffers

CAVEATS

RELATED COMMANDS

BUGS

FIT, LS, LSQ (Least Squares)

DESCRIPTION

page 527 of Numerical Recipes

SYNTAX

Form: LeastSquares(specptr, choptr, "", 's');

if ch = 's' -> print to screen

if ch = 'c' -> also generate a comment string

mine theirs

m = b

b = a

numPeaks = ndata = ss

xav = sxoss

EXAMPLES

fit 1 d1 d2

fit spectrum.number dim1 correlated.with.dim2

CAVEATS

RELATED COMMANDS

BUGS

FIT0, FT0 (Fit 0)

DESCRIPTION

SYNTAX

LeastSquares(specptr, choptr, "", 's');

if ch = 's' -> print to screen

if ch = 'c' -> also generate a comment string

EXAMPLES

fit 1 d1 d2

fit spectrum.number dim1 correlated.with.dim2

CAVEATS

RELATED COMMANDS

BUGS

HELP, H (Page)

DESCRIPTION

The CONTRAST HELP SYSTEM uses the PAGE function to page through the contrast.hlp file. Backward (^R) and forward (^S) incremental searches can be performed within page as well as paging up and down.

SYNTAX

HELP COMMANDS: (see PAGE for more details)

PGUP,^P: Page up.

UP,^U,u: Line up.

PGDN,^O: Page down.

DOWN,^D,d: Line down.

HOME,^H: Beginning of file.

END,^E: Goes to end of file.

m: Marks a page.

g: Returns to a marked page.

q,^Q,ESC: Quits PAGE.

c:(toggle) Case sensitive/insensitive.

^S,S,s: SEARCHES FORWARD

^R,R,r REVERSE SEARCH

h,?: Help page for PAGE.

COMMANDS WITHIN SEARCH MODES:

BKSPC,^B Returns to the position of the search for the previous letter and removes the last letter added to search string.

DEL,^G Goes to the position of the search at its beginning and deletes the search string.

^R If in REVERSE search mode it searches backwards for the next occurance of the string.

If in FORWARD search mode it reverses the direction of the search.

If no search string has been entered then last search is retrieved and searched.

^S If in FORWARD search mode it searches forwards for the next occurance of the string.

If in REVERSE search mode it changes the direction of the search so that it starts searching forward.

If no search string has been entered then last search is retrieved and searched.

^Q,ESC Quits search mode leaving the file at the current page.

alphanum Adds character to search string.

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

IF (Boolean)

DESCRIPTION

SYNTAX

if( #|fred > 2 || 3^3 != &tom ) function

if( g|fred,2 < 13 ) function

if( d1|fred,4 == 10 || (&tom>3 && cos180/&tom==2) ) function

Operators:

> >= < <= == != && || <tol> >tol< and or

&& = and

|| = or

<tol> = is within a tolerance, tol, of next value

>tol< = is outside of a tolerance, tol, of next value

Values:

&a = the value of the variable 'a'

23.1 = a number

e = 2.7182818

PI = 3.1415927

#|fred = the number of peaks in fred

w|fred = the level of the buffer

d1|fred,4 = the first coordinate of the 4th peak in fred (or p1)

dx|fred,4 = if any of coordinates of fred matches test

da|fred,4 = if 2 peaks are being compared then all dimensions must match.

dc|fred,4 = if 2 peaks are being compared then combinations of at least the minimum number of dimensions between the peaks must match.

i|fred,b = the intensity of the first peak (or d0 or p0)

c|fred,e = the numeric part of the comment from the last peak

v|fred,h = the value of the highest valued peak in fred

g|fred,l = the lowest grade in fred

d|fred,1 = the deviation

n|fred,1 = the number of internal repeats

t|fred,1 = the tolerance of the value of the first peak

r|fred,1 = the number of repeats for the first peak

Functions:

* / + - ^ % cos sin tan log ln

Standard Boolean Calls from other Functions:

d1 = replaced by functions with each peak in either buffer or spectrum

%1 = the coordinate of the first dimension of the source spectrum

d1|fred = the d1 value of all combinations of peaks in fred

d1|fred,f4 = the d1 value of the first four peaks in fred

d1$hnca,4 = the d1 value of the fourth peak in the spectrum hnca

d1$3,f4 = the d1 value of the first four peaks in the third spectrum

EXAMPLES

CAVEATS

RELATED COMMANDS

BUGS

INTA (Intersect All)

DESCRIPTION

Takes the intersection of two buffers to form a third all the way down the buffers linked to each peak in a spectrum. Take only the top num1 and num2 peaks of the two buffers.

NOTE: The actual buffer names MUST be used in this routine. Also all of the peak groups must contain the same types of buffers with the same names. The third buffer is added right after the second buffer.

SYNTAX

inta(choptr);

inta source, |buff1[,num1] |buff2[,num2] (Boolean) [|int] [-f -s]

OR

inta source, [num1, num2] ( FD|buff1[,peak] op FD|buff2[,peak] ) [|int] [-f -s]

where FD = field(s) {d1,dx,da,dc,d,c,#,w,i,v,g,n,t,r}

and op = operator(s) { <tol> >tol< == != >= > <= < && || }

and num2 and num2 = the number of peaks to consider for intersection

Flags:

-f delete first buffer

-s delete second buffer

EXAMPLES

inta 1, |NH,3 |HNCO,4 (dx|NH <.02> dx|HNCO) |INT

inta 1, 3 4 (dx|NH <.02> dx|HNCO) |INT