User’s Guide of

NMR-SAMS

 

An expert system for computer-assisted structure elucidation of organic and natural product compounds based on multidimensional spectroscopy

 

 

 

 

 

 

 

 

 

 

 


 

NMR-SAMSTM User's Guide, April 1998.

This manual describes release 2.0 of the Windows 95/NT version of the NMR-SAMSTM Software.

Copyright Notice

Copyright © 1996 through 2001 Spectrum Research, LLC.  All rights reserved.

No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form by any means without the written permission of Spectrum Research, LLC.

All possible care has been taken in the preparation of this document but Spectrum Research accepts no liability for any errors/omissions that may be found.

Spectrum Research, LLC. reserves the right to change the information in this document without prior notice.

Trademarks

SpecManTM and NMR-SAMSTM are trademarks of Spectrum Research, LLC.

Acknowledgments

NMR-SAMSTM (originally known as CISOC-SES) has been developed by Dr. Shengang Yuan, Dr. Chen Peng and Prof. Chongzhi Zheng at the Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences in 1988-1994.  It has been further improved by Dr. Chen Peng in the group of Dr. Geoffrey Bodenhausen in the National High Magnetic Field Laboratory in 1995-1996. Portions of NMR-SAMSTM are copyright © 1988 through 1995, Shanghai Institute of Organic Chemistry and Florida State University, and are exclusively licensed to Spectrum Research, LLC.   Title and full ownership rights to the converted/modified NMR-SAMSTM will remain solely with Spectrum Research, LLC, and NMR-SAMSTM is asserted to be Spectrum’s proprietary information and trade secret.

Credits

If the results (figures and/or data) obtained by NMR-SAMSTM application is used for publication purpose, please refer to it in the following manner or any other equivalent form:

"NMR-SAMSTM software, developed by Spectrum Research, LLC., was used to compute the results in this publication".

 

 


Contents

Contents...............................................................................................................................

Abbreviations And Acronyms...............................................................................................

Introduction........................................................................................................................

1.1 General................................................................................................................................................................

1.2 Application Limitations....................................................................................................................................

1.3 System Requirement.........................................................................................................................................

1.4 Help Facility........................................................................................................................................................

1.5 Typographical Conventions..............................................................................................................................

Getting Started with NMR-SAMS.......................................................................................

2.1 Installation of the Program..............................................................................................................................

2.2 Spectrum Research Licensing........................................................................................................................

2.3 Starting NMR-SAMS........................................................................................................................................

2.4 Brief Introduction to Microsoft Windows.....................................................................................................

2.5 Description of the Main Menus.......................................................................................................................

2.6 The NMR-SAMS Toolbar.................................................................................................................................

Understanding NMR-SAMS...............................................................................................

3.1 Overview..............................................................................................................................................................

3.2 General Procedure of Structure Elucidation with NMR-SAMS...............................................................

3.3  What Spectral Data Does NMR-SAMS Use?..............................................................................................

3.4 Use of 2D NMR Connectivities: Bond Constraints...................................................................................

3.5 Use of Chemical Shifts And Peak Multiplicities.......................................................................................

3.6  Structure Generation....................................................................................................................................

3.7 User Intervention.............................................................................................................................................

3.8 Control Parameters........................................................................................................................................

Working Data Set.............................................................................................................

4.1 Overview............................................................................................................................................................

4.2 Opening An Existing Working Data Set....................................................................................................

4.3 Opening A New Working Data Set..............................................................................................................

4.4 Input Molecular Formula...............................................................................................................................

4.5 Save A Working Data Set..............................................................................................................................

4.6 Save A Working Data Set as Different Name............................................................................................

4.7 Exiting NMR-SAMS........................................................................................................................................

Input of NMR Spectral Data............................................................................................

5.1 Overview............................................................................................................................................................

5.2 Conversion of SpecMan 1H Peak List..........................................................................................................

5.3 Conversion of SpecMan 13C Peak List.........................................................................................................

5.4 Conversion of SpecMan DQF-COSY Peaks Table....................................................................................

5.5 Conversion of SpecMan HMQC/HETCOR Peaks Table..........................................................................

5.6 Conversion of SpecMan HMBC/COLOC Peaks Table............................................................................

5.7 Conversion of SpecMan NOESY Peaks Table............................................................................................

5.8 Conversion of SpecMan INADEQUATE Data.............................................................................................

5.9 Manual Peak Picking.....................................................................................................................................

Spectral Interpretation.....................................................................................................

6.1 Overview............................................................................................................................................................

6.2 Interpretation of MF, 1H, 13C and HMQC Data as Building Blocks.......................................................

6.2.1 Interpretation of Molecular Formula...................................................................................................

6.2.1......................................................................................................................................................................

6.2.2. Interpretation of 1D 1H Data.................................................................................................................

6.2.2......................................................................................................................................................................

6.2.3 Interpretation of 1D 13C Data.................................................................................................................

6.2.4 Interpretation of HMQC/HETCOR Connectivities...........................................................................

6.2.5  Generation of Building Blocks............................................................................................................

6.3 User-Defined Building Blocks......................................................................................................................

6.4 Interpretation of 2D Spectral Data as Bond Constraints.........................................................................

6.4.1 Interpretation of COSY Connectivities...............................................................................................

6.4.2 Interpretation of HMBC/COLOC Connectivities..............................................................................

6.4.3 Interpretation of NOESY Connectivities.............................................................................................

6.4.4  Interpretation of INADEQUATE Connectivities...............................................................................

6.4.5 Transformation of Bond Constraints...................................................................................................

6.4.6 Setting up Atom-Atom Connection Matrix (ACMX).........................................................................

2D Structure Generation..................................................................................................

7.1 Overview............................................................................................................................................................

7.2  User-Defined Bond Constraints...................................................................................................................

7.2.1. Interactive Structure Generation........................................................................................................

7.3  User-Defined Atom Environment Constraints..........................................................................................

7.4  Structure Generation....................................................................................................................................

Resonance Assignment.....................................................................................................

8.1 Overview............................................................................................................................................................

8.2  Input of the Target Structure.......................................................................................................................

8.2.1. Inputting the Target Structure Interactively....................................................................................

8.2.2. Inputting the Target Structure via MDL File....................................................................................

8.2.3. Setting up the Assignment Matrix......................................................................................................

8.3  User-Defined Resonance Assignment........................................................................................................

8.4  Resonance Assignment.................................................................................................................................

Isomer Enumeration/Quick Elucidation...........................................................................

9.1 Overview............................................................................................................................................................

9.2 MF-based Isomer Enumeration.....................................................................................................................

9.3 Quick Structure Elucidation.........................................................................................................................

Graphical Display of Results............................................................................................

10.1 Overview..........................................................................................................................................................

10.2 Display of Structural Building Blocks.....................................................................................................

10.3 Display of Target Structure........................................................................................................................

10.4 Display of Generated Structures/Assignments......................................................................................

10.5  Status Window..............................................................................................................................................

10.6 Display Options.............................................................................................................................................

10.7  Editing the Display of Generated Structures..........................................................................................

Exporting Results.............................................................................................................

11.1 Overview..........................................................................................................................................................

11.2  Exporting NMR Spectral Data...................................................................................................................

11.3  Exporting Resonance Assignment............................................................................................................

11.4  Exporting Candidate or Target Structures.............................................................................................

NMR Data File.................................................................................................................

1D Spectral Data....................................................................................................................................................

2D Spectral Data....................................................................................................................................................

Master Data File..............................................................................................................

CCSS-13C Chemical Shift Range Correlation Table..........................................................

Control Parameters..........................................................................................................

Parameters for Spectral Interpretation.............................................................................................................

Parameters for Setting up ACMX......................................................................................................................

Parameters for Structure Generation...............................................................................................................

References........................................................................................................................

Index................................................................................................................................


Abbreviations And Acronyms

d13C                                  13C chemical shift.

d1H                                   1H chemical shift.

1D                                    One-dimensional.

2D                                    Two-dimensional.

ACMX                            Atom-atom Connection MatriX, which summarizes the bond-formation probabilities between the constituent atoms of an unknown.

BB                                    Structural Building Blocks for structure generation, e.g.,  CH3-, CH2<, and -OH.

BC                                    Bond Constraint derived from 2D NMR spectral data, which defines the number of intervening bonds between the correlated spins.

CCSS                               Carbon-Centered Single-spherical Substructure.

COLOC                           COrrelation via Long-range Coupling, a kind of 2D spectrum that provides 2-to-3-bond 13C-1H connectivities.   

COSY                              COrrelated SpectroscopY, a kind of 2D spectrum that provides 1H-1H through-bond connectivities.

CPU                                 Central Processing Unit.

DEPT                               Distortionless Enhancement by Polarization Transfer, a kind of 1D spectra that provides information concerning the number of attached protons on each carbon atom.

EC                                    Environment Constraint, limitation on the neighboring types of atoms attached to a central atom specified by the user. 

HETCOR                         HETeronuclear Correlation, also called C-H COSY, a kind of 2D spectrum that provides one-bond 13C-1H connectivity information.  

HMBC                             Heteronuclear Multi-Bond Connectivity, a kind of 2D spectrum that provides 2-to-3-bond 13C-1H connectivity information.

HMQC                            Heteronuclear Multiple Quantum Coherence, a kind of spectrum that provides one-bond 13C-1H connectivity information.

INADEQUATE              Incredible Natural Abundance Double Quantum Transfer Experiment, a kind of 2D spectrum that provides one-bond 13C-13C connectivity information.

MDF                                The Master Data File produced while using NMR-SAMS for structure elucidation. This file stores the intermediate and final results produced during the execution of NMR-SAMS.

MF                                   Molecular formula or empirical formula of a molecule, which is usually derived from mass spectral data.

NMR                               Nuclear Magnetic Resonance

NOESY                            Nuclear Overhauser enhancement and Exchange SpectroscopY, a kind of 2D spectrum that provides 1H-1H through-space connectivity information.

NSBC                              Number of “Sub-bond constraint(s)”, or pair(s) of relevant atoms, that must satisfy a bond constraint in the generated structure.

PSE                                  Partial Structure Elucidation,   Structure elucidation based on information available on a portion of the spectral data, which is usually the well-resolved part 


Chapter 1

Introduction

1.1 General

NMR-SAMS (NMR Spectral Assignment Made Simple), is an expert system for computer-assisted    structure elucidation of unknown organic or natural product compounds from multidimensional spectroscopy, e.g., MS, NMR, IR and UV providing complementary information of chemical compounds.  In particular, NMR-SAMS uses information of chemical compounds from routine 1D and 2D NMR spectroscopy.   Together with SpecMan, it serves as a chemist’s workbench for de novo structure elucidation of small molecules such as organic compounds, natural products, peptides, and other small biomolecules.  NMR-SAMS is also used for automated resonance assignment of known compounds.     

The basic strategy of structure elucidation using NMR-SAMS is illustrated in Fig. 1.1. When dealing with an unknown compound, the molecular formula (MF) must be first determined by mass spectroscopy or other approaches.   Next, the 1D and 2D NMR chemical shifts, multiplicities, J-couplings and intensities are extracted from the processed 1D and 2D spectra (transformed through conventional FFT or Non-FFT techniques) using SpecMan software.  The 1D and 2D spectral data extracted as peak lists using SpecMan are imported into NMR-SAMS and interpreted as structural building blocks and bond constraints based on the one-bond, two-bond and other long-range connectivities.  Finally the building blocks, NMR-derived bond constraints, and other user-defined bond constraints are used to generate the plausible candidate structures with resonance assignments.  If the structure is already known, you can specify the proposed structure and let NMR-SAMS complete the resonance assignments directly.   

Figure 1.1. Data flow diagram of NMR-SAMS representing the different phases of spectral interpretation, structure generation and resonance assignment. Gray boxes represent optional input data. PSE: means partial structure elucidation based on incomplete spectral data. A bond constraint is represented as n intervening bonds, (B)n, between the correlated atoms.

NMR-SAMS has the following main features:

·        Input of peak tables  with chemical shifts, multiplicities, J-coupling and intensities, from a variety of 1D and 2D NMR experiments.

·        Automated interpretation, bookkeeping, and cross-checking of spectral data with respect to the molecular formula.

·        Novel representation of 2D NMR correlation information based on the concept of chromatic graph.

·        Structure determination and identification of unknown compounds based on full use of 2D NMR correlation information, and complementary spectral information from MS, UV and IR spectral data. 

·        Partial structure elucidation of compounds based on incomplete spectral data.  

·        Graphical tools for interactive building and editing of molecular fragments, and defining bond constraints and atom environment constraints.

·        Graphical tools to display and browse through candidate structures and sub-structures.  Graphical interaction between structures and bond constraints.

·        Background information-independent structure elucidation, which minimizes the potential human bias introduced into the structure elucidation process.

·        Fast structure generation of complex molecules when sufficient constraints are available. 

·        Fast resonance assignment and structure verification of large complex molecules based on proposed structures.  

·        Automated resonance assignment based on assigned resonances of compounds.

·        Flexible format for report generation of the results of spectral and structural analysis. 

1.2 Application Limitations

The current version of NMR-SAMS can only handle molecules that have less than 128 non-hydrogen atoms. The total number of free bonds (unsatisfied valences) of the structural building blocks before structure generation, which determines the complexity of the problem of structure generation, must not exceed 220. (The total number of free bonds is equal to the sum of valences of heavy atoms, less the number of protons and twice the number of known bonds.) The maximum number of peaks in a 1D and 2D spectrum is limited to 200 and 1000 respectively.  The maximum number of bond constraints is limited to 1000.

Most of the previously proposed CASE (computer assisted structure elucidation) systems either use a chemical shift-substructure correlation database or a more concise chemical shift-substructure correlation model, and rely to a large extent on the knowledge of a human expert.  Such systems have been limited to very simple and small molecules.  NMR-SAMS has demonstrated the impact of using 2D NMR correlation information on improving the efficiency of CASE systems when dealing with real-world complex molecules. For efficient structure elucidation of unknown compounds, NMR-SAMS requires the molecular formula (which may or may not be known accurately from MS or other methods.  If the molecular formula is unknown, NMR-SAMS uses the number of observed carbon and proton peaks along with any available heteroatoms information to estimate the Molecular formula), 1D 1H, 13C, DEPT (or APT), and 2D DQF-COSY, HMQC (or HETCOR), HMBC (or COLOC, FLOCK), and INADEQUATE spectral data. It is not mandatory to have all of these experimental NMR data sets available, because NMR-SAMS can also solve structure elucidation problems with different possible combinations of experimental data (for details refer to Section 3.3).  Structure elucidation based on 1D 13C chemical shifts is only possible for very simple molecules, and is not practical for complex molecules.  NMR-SAMS cannot elucidate unknown structures based on only 1D 1H chemical shifts.

Although most spectra used by NMR-SAMS, e.g., 1D 1H, 2D DQF-COSY and HMBC, are allowed to have peak degeneracy, the 1D 13C spectrum and HMQC (or HETCOR) must be completely resolved for complete structure elucidation.  If severe overlap prevents resolving all 13C peaks, NMR-SAMS will use only the well-resolved spectral data to generate the plausible substructures. This is called partial structure elucidation (PSE).  Some limitations on PSE are described in Section 7.1.

In the current version, NMR-SAMS does not consider molecular symmetry, so partial structure elucidation is performed for a molecule with global symmetry.  For a molecule with local symmetry where  the 13C signals corresponding to symmetric carbons can be identified, complete structure elucidation by NMR-SAMS is possible.

Most of the steps in NMR-SAMS such as interpretation of 1D and 2D data into bond constraints, and generation of the building block sets, are usually performed very fast.   Structure generation, on the other hand, is more time-consuming because of its combinatorial nature. The efficiency of structure generation (which is a factor of the computation time, the quality of the structure generated, and the number of structures generated) depends on the size of the molecule and the quality and quantity of the spectral data.  When the unknown molecule is big (e.g. with more than 40 heavy atoms) and the correlation information derived from the spectral data is not sufficient,  the structure generation could take very long to finish.  In such cases you are advised to input as many as known substructures as possible to accelerate the structure generation process.  Also you can take advantage of the other tools of NMR-SAMS to tackle the structure, such as the resonance assignment function to verify a proposed structure,  and the flexible graphics tools to interactively build the structure.

Although the spectral interpretation routines of NMR-SAMS are general-purpose, the structure generator of NMR-SAMS can not deal with molecules containing ionic atoms, tautomeric or coordinate bonds.  It recognizes only single, double and triple bonds. Aromatic bonds are represented as alternating single and double bonds. Sometimes this might cause redundancy in the structure generation of aromatic compounds.

In the current version of NMR-SAMS, if the structure is already known, then target structure based resonance assignment is possible, provided the NMR data set is complete.

Although NMR-SAMS can recognize all the chemical elements, the current substructure/d 13C knowledge base (see Appendix III) contains only the substructures consisting of commonly occurring elements, i.e., C, H, O, and N.   This knowledge base can be customized by you. You will be informed about the undefined substructures when other elements exist in the molecule, and this could reduce the efficiency of structure generation.  

NMR-SAMS can be viewed as an expert assistant helping spectroscopists and chemists to solve structure elucidation problems, and is by no means expected to replace the human expert.  NMR-SAMS is designed for flexible human intervention, and efficiently uses the additional user knowledge and judgment to control and enhance the structure elucidation.  

1.3 System Requirement

The IRIX version of NMR-SAMS runs on SGI systems running IRIX 5.3 or higher and 6.x with R4000 or higher processors and at least 32 MB of RAM and 8-bit graphics.  R8000 or higher processors and 64 MB or more RAM is recommended.  A faster and smaller 64-bit version of NMR-SAMS can be supplied to users running R8000 or higher systems with IRIX 6.x.

The Solaris version of NMR-SAMS runs on Sun systems running Solaris 2.x (SunOS 5.x) with SPARC processors and at least 32 MB of RAM and 8-bit graphics. 64 MB or more RAM is recommended.  X/Motif 1.2.3 libraries are required.  These are usually supplied with the Sun Common Desktop Environment (CDE).

The Microsoft Windows version of NMR-SAMS runs on Intel 386 or higher processors (or 100% compatibles) with at least 32 MB of RAM running Windows 95 or Windows NT 3.51 or later and a VGA or better monitor.  A Pentium or higher processor with 32 MB or more RAM is recommended.  

NMR-SAMS requires from 2 MB to 55 MB of hard disk space, depending on the sample data that is installed.  The sample data with original spectra requires 40MB of hard disk space.  Swap drive space (i.e. virtual memory) required is proportional to the complexity of the data being analyzed.

1.4 Help Facility

NMR-SAMS provides on-line help.  Most of the dialog boxes have Help buttons which can be clicked to get help message about the dialog box.

1.5 Typographical Conventions

Unless otherwise noted in the text, the User’s Guide of NMR-SAMS uses the typographical conventions described below:

·        A command to select is represented in bold type face by the menu name, the option, and the pull-right option (if any). For example, the command:

Display/Display Options/Chemical Shifts    

means, first click Display menu on the menu bar, then click Display Options in the opened menu.  And then click Chemical Shifts in the pull-right options. 

·        Transcript of a computer file or display is printed in Courier New letters with the keywords shown in bold, and the annotations (if any) in italic Times letters. (Such annotations do not appear in the file or display itself).

ATOM~~ATOM:

For each correlation, listed are the IDs of the correlated atom pair, the range of intervening bonds, and the bond type (0: meaningless or unknown)

(1-23: 1~1 2)
(6-22: 1~1 3)

     .

     .

     .

·        Filenames and parameters are printed in Courier New letter. For example:

Files phasefile and procpar are used for peak picking with SpecMan. 

Parameter GEN_FLAG controls the search criteria of the structure generation.

·        Terms introduced for the first time are presented in boldface type.

·        Words in italic represent variables. For example:

There are n intervening bonds between the correlated atoms.


Chapter 2

Getting Started with NMR-SAMS

2.1 Installation of the Program

To install NMR-SAMS, please refer to the Release Notes of Spectrum Research Products.

2.2 Spectrum Research Licensing

NMR-SAMS is copy protected by the Spectrum Research Licensing System.  This licensing system allows NMR-SAMS to run only on the computer for which it was sold.  You should have received a license.dat file along with your installation.  This plain text file should be placed in the main NMR-SAMS directory (C:\Spectrum\ Nmr-sams by default). 

If you did not receive a license file with your NMR-SAMS installation, please contact Spectrum Research.  To create a license file for you, we need to have your Windows Serial Number (Product ID) or UNIX System ID.  Under Windows 95 and Windows NT 4.0, you can find this by clicking the right mouse button on the “My Computer” icon on your Windows 95 Desktop.  Choose “Properties” from the menu that pops up.  You Windows Serial Number is printed last in the “Registered To:” Section and is of the form XXXXX-XXX-XXXXXXX-XXXXX, where the X characters are replaced by numbers and letters.  Under Windows NT 3.51, choose “About Program Manager” from the “Help” menu of Program Manager.  Windows NT 3.51 serial numbers are of the form XXXXX-XXX-XXXXXXX.  The Product ID is listed on the dialog that appears.  On SGI systems, type /etc/sysinfo at a UNIX prompt to get your System ID.  SGI System ID’s are hexadecimal numbers.  The first 8 digits (4 groups of 2 digits) are the ones that are needed by the Spectrum Research License Manager.  On Sun Systems, type ‘hostid’ (usually in the /usr/bsd/ directory).  The 8 digits that are given are the identifier that is needed for the license.

When your licensing time period is nearing expiration, NMR-SAMS will warn you with a dialog box that tells you the number of days remaining.  Please contact Spectrum Research for a renewal at this time.


2.3 Starting NMR-SAMS

From the Program Manager or the Start Menu, click the NMR-SAMS icon in the Spectrum Research group to launch the NMR-SAMS program.  The program starts with a Main Graphics Window that has a menu bar and status bar. By default, a Status Window is also opened, which displays text messages to indicate the current status of the structure elucidation, and also prompts you with the “what to do next” steps.  The main graphics window is shown below:

When NMR-SAMS is started, it reads the following three files in the directory where you launched NMR-SAMS.  If any of these files are not found, it will try to read the missing files from the  installation directory of  NMR-SAMS.  If the files are still not found, except for nmrsams.ini file, it will warn that the rest of the files are missing.

nmrsams.ini, which defines some of the initial settings of the program, such as the window sizes, the colors of the background, atom, and bonds,  and the preferred editor etc.  If this file is not found, default settings are used.

periodic_tab.def, which defines some properties of the chemical elements.  If this file is not found or it is not properly read, NMR-SAMS will not be able to recognize any element symbols, and perform the related functions. 

chemical_shifts.def, which defines the knowledge base of  13C chemical shift dispersion ranges for some common carbon-centered single spherical substructures (CCSS) (see Appendix III).  If this file is not found or it is not correctly read, the structure generation will not be possible. (see Section 3.5).

2.4 Brief Introduction to Microsoft Windows

If you are new to Microsoft Windows or Windowing systems in general, please read this section before using NMR-SAMS.  It will help you to become acquainted with the NMR-SAMS interface.

First, It is a good idea to become acquainted with the online help system provided by Microsoft Windows.  The online help system called from within NMR-SAMS when you click on a "Help" button.  It brings up context sensitive help in a window.  There is also a Help Contents facility (also known as an Index).  This consists of a list of the topics in the online-help.  You can click on one of these items to bring up its corresponding information.  The Contents is available via NMR-SAMS's Help menu and from the Online Help Viewer window by clicking on the “Contents” button.

When you first start NMR-SAMS, a window will appear with "NMR-SAMS, version 2.0, (C) Spectrum Research, LLC." on the top.  The area where this text appears is referred to as the "Title Bar."  You can press the left mouse button while the arrow pointer (which is called the "Cursor") is on the title bar and then move the mouse to move the window.  Release the mouse button to stop moving the window.  That combination of events (pressing a mouse button, moving the mouse, and then releasing) is known as "Dragging".  Position the mouse pointer so that it is over the word "File", located immediately below the title bar.  Now press and then immediately release the left mouse button.  This procedure (pressing a mouse button and then releasing without moving the mouse) is known as "Clicking".  The item that you clicked on was the "Menu Bar".  The menu bar consists of several "Menus" ("File", “Edit”, "Display", "Analysis", and "Help").  After you clicked on the File menu, a "Pulldown" appeared.  This pulldown consists of "Menu Items" ("Open...", "New...", etc.).  If you click on one of these menu items, something will occur.  Menu items are the primary way that you, as a "User" of NMR-SAMS, communicate your wishes to NMR-SAMS. 

Some items on menus are not menu items, however.  The line that appears above the "Quit" menu item is known as a "Separator".  Its purpose is solely to make the menu easier to read. Click on the "Display" menu.  Notice that the "Create NMR Data File" menu item has a right pointing triangle after its text.  This type of menu item is known as a "Pullright".  Click the mouse on the " Create NMR Data File " menu item.  You will see another group of menu items appear to the right of it.  The pullright feature is used to group related menu items together, reducing the size of the main pulldowns.  Click on the "Display" menu and you’ll see the menu item "Status Window", which is known as a "Toggle".  Toggles have two states:  "Off" (also known as "Deselected" or "Deactivated"), and "On" (also known as "Selected" or "Activated").  If the status window is on, turn off the "Status Window" toggle by clicking on it.  You will notice that the status window disappear. Click on the "Display" menu and turn on the “Status Window” toggle by clicking on it again, you will now notice that the status window pops up again.

Position the mouse cursor over the frame that surrounds the entire NMR-SAMS window.  Drag the mouse to change the size of the NMR-SAMS window.  All sides of the NMR-SAMS window can be moved to size the window. The field below the NMR-SAMS Toolbar is known as the "Main Graphics Window".  This is where information about chemical structures is displayed.  At the bottom of the Main Graphics Window is the "Status Bar".  The status bar prints out information about what is going on in NMR-SAMS.  It will notify you if you do something that NMR-SAMS isn't prepared to do.  Also, it will give you hints about using NMR-SAMS. 

Click on the "Open..." menu item from the "File" menu.  A window will appear with the title of "Open ".  This type of window is known as a dialog box.  While a dialog box is displayed, you must interact with it before continuing with other areas of NMR-SAMS. Dialog boxes also have a "Help" button which will bring up online help about the dialog box when clicked.  The dialog box that is currently displayed is referred to as the "File Browse Dialog".  It is used to specify a file. To get to a certain directory, use the “Directory” combo box to find the proper parent directory.  You can descend the directory structure by double clicking on a directory name from the list.  (A “Double Click” is two clicks followed in rapid succession.)  After you have changed to the proper directory, you will see a list of "Files" that have an extension of “.mdf”.  Click on one of the filenames to select it. The "OK" button on the bottom of the dialog box is used to accept the input that you have selected.  Click the "Cancel" button to close the dialog box without performing an action.

When multiple candidate structures are generated, the first structure is displayed along with a window titled Structure Browser.  This window is known as a "Palette."  Palettes are similar to dialog boxes, however you can interact with them and with the main NMR-SAMS window at the same time.  The "Structure Browser" palette is used to control the display of the candidate structures. In the "Structure Browser" palette, you will notice a "Slider".  You can drag the slider bar to the left/right to raise/lower its value, which determines the sequential number of the structure to be displayed.  Some palettes also have text fields where you can type in numbers or text.

You should now have enough information to start exploring NMR-SAMS.  Note that NMR-SAMS grays out menu items that you can not select depending on the current progress of your structure elucidation process. For example, if you have not prepared the NMR data file, the menu item Analysis/Interpret NMR Data  remains grayed out.  This guides you step-by-step through the structure elucidation process.

 

2.5 Description of the Main Menus

The menu bar appears at the top of the main graphics window and contains the names of the five NMR-SAMS menus:

You perform all tasks in NMR-SAMS by selecting options from these five menus. The five menus are described briefly on the following pages and in greater detail in the other chapters of this book.


The File menu     
The File menu lists options related primarily to reading data into and out of NMR-SAMS. The following figure illustrates the File menu:

 

 

The Edit menu     
The Edit menu lists options related to editing of the working data set files and the generated structures.  The following figure illustrates the Edit menu:

 

The Display menu                               
The Display menu lists options related to the graphical display of intermediate and final results of NMR-SAMS. The following figure illustrates the Display menu:

 


The Analysis menu                             
The Analysis menu lists the options related to structure elucidation. The following figure illustrates the Analysis menu:

The Help menu:                   
The Help menu lists the options related to the on-line help of NMR-SAMS. The following figure illustrates the Help menu:

2.6 The NMR-SAMS Toolbar

The toolbar appears between the menubar and the Main Graphics Window.  It contains icons (pictures) that represent commonly used menu items.  If you click on one of the icons, the same action occurs as the corresponding menubar item. 

The following menu items have associated toolbar icons:

    File/New

    File/Open

    File/Save

     Display/Building Blocks & Fixed Bonds

    Display/Target Structure

    Display/Generated Structures or Assignments

     Display/Status Window

    Display/Display Options/Balls

     Display/Display Options/Carbon Symbols

    Display/Display Options/Numbers

    Display/Display Options/Chemical Shifts

    Display/Display Options/Protons

    Display/Display Options/Molecular Formula

    Display/Display Options/Connection Table

    Display/Display Options/Refine

    Help/Contents


Chapter 3

Understanding NMR-SAMS

3.1 Overview

This chapter introduces the basic procedure of structure elucidation, with a brief description of the concepts and principles of the NMR-SAMS, and concludes with a high-level discussion of the typical flow of activity through NMR-SAMS. 

3.2 General Procedure of Structure Elucidation with NMR-SAMS

The process of structure elucidation of an unknown compound through NMR spectroscopy consists of the following steps:  

1.      Determination of the molecular formula (MF) by MS.  Determination of some functional groups in the unknown compound through IR and UV spectroscopy.  MF is optional to NMR-SAMS v2.0.

2.      Data acquisition of 1D and 2D NMR data.  See Section 3.3 for the spectral data used by NMR-SAMS.

3.      Extraction of peak tables with chemical shifts, intensities, J-coupling and multiplicities.  Peak picking of  1D and 2D NMR spectral data is performed with SpecMan using automatic and semi-automatic procedures (for details see User’s Guide of SpecMan).  The peak tables are converted to NMR-SAMS representation of connectivity information. (see Chapter 5)

4.      Setup of the parameters to control the spectral interpretation and structure generation.  In most cases, the default values of these parameters can be used.  (see Appendix IV)

5.      Interpretation of molecular formula, if any, along with 1D 1H, 13C, and HMQC spectral data to obtain the structural building blocks.  If the MF is unknown, you can interactively add heteroatoms into the building block sets (see Chapter 6).

6.      Interpretation of other 2D NMR spectral data to obtain the bond constraints (see Chapter 6)

7.      Generation of candidate structures that are consistent with the experimental data for unknown compounds (see Chapter 7), or verification of the proposed structure and completion of 1H and 13C  resonance assignments (see Chapter 8) for known compounds.  Interactive structure generation and resonance assignment is also possible (see Section 7.2.1).

8.      Exporting results of structure generation and resonance assignments (see Chapter 11).

Structure elucidation is usually an iterative approach, so this process may need to be repeated several times until you get satisfactory results.  NMR-SAMS assists you in identifying and correcting the inconsistencies in the input data.  When sufficient input data is not available, NMR-SAMS generates only partial structures with resonance assignments.   NMR-SAMS also warns you about some common pitfalls that could lead to incomplete or incorrect structure generation, and provides clues for further refinement.

3.3  What Spectral Data Does NMR-SAMS Use?

The possible combinations of 1D and 2D spectral data used by NMR-SAMS for structure elucidation are listed in Table. 3.1. The fifth combination which uses  routine 1D and 2D spectra along with other complementary information from other spectral data (MS, UV and IR), is the recommended one for structure elucidation of real-world complex molecules.  Other spectral sources such as MS, IR, and UV are not directly interpreted by NMR-SAMS but they can be conveniently used as user-defined bond/environment constraints. 

Table 3.1. Possible combinations of 1D and 2D NMR spectral data used by NMR-SAMS a

 

1D

2D

Comments

1

None

None

Pure isomer enumeration from MF

2

13C (and DEPT b)

None

Very low efficiency except for simple molecules.

3

13C, DEPT b

INADEQUATE

Very high efficiency, if data available.

4

13C, DEPT b, 1H

DQF-COSY c, HMQC d

Low efficiency except for H-rich molecules.

5

13C, DEPT b, 1H

DQF-COSY c, HMQC d, HMBC e (NOESY f)

Most practical way for de novo structure elucidation of complex molecules.

6 g

1H

DQF-COSY c, HMQC d, HMBC e (NOESY f)

Practical when the amount of sample does not allow carbon-detecting experiments.

a TOCSY is not used directly by NMR-SAMS but can be used by SpecMan  to assist the peak picking of DQF-COSY.

b INEPT, or APT can also be used.

c Other types of COSY experiment, as long as it provides geminal and vicinal H-H through-bond connectivity, can also be used.

d HSQC, HETCOR, or other type of spectra can also be used, as long as it provides one-bond C-H connectivity.

e COLOC, FLOCK, or other type of spectra can also be used, as long as it provides long-range C-H connectivity.

f NOESY or ROESY is optional.

g HMBC and HMQC must be clean enough to allow extraction of 13C chemical shifts and multiplicity information. 13C chemical shifts can be automatically extracted from HMBC using SpecMan.  13C multiplicities must be identified manually from the HMQC spectrum.

3.4 Use of 2D NMR Connectivities: Bond Constraints

NMR-SAMS uses mainly 2D NMR-derived through-bond spin-spin connectivity information for structure elucidation, because they are reliable and provide comprehensive structural information for de novo structure elucidation.

In NMR-SAMS, the coordinates of 2D cross peaks are first converted into connectivities between the relevant 1D peaks, and  then interpreted as bond constraints on the relevant atoms. A bond constraint (BC) is a requirement of a certain number (or a range) of intervening chemical bonds between the correlated spins. For an asymmetric molecule, such spin-spin BCs are directly used as atom-atom bond constraints.  In addition to its efficient utilization of BCs involving ambiguous bond separation (e.g., 2 or 3 bonds between two HMBC-correlated spins), NMR-SAMS can also cope with BCs concerning ambiguous atoms. Such ambiguity typically arises from peak degeneracy or low digital resolution.

In NMR-SAMS, a BC is represented in the following general format:

(Atom_y ... - Atom_x ... : minBond ~ maxBond; BondType; minNSBC ~ maxNSBC)Source

where

Atom_y ... is the correlated atom(s) along the Y dimension (13C domain for a heteronuclear spectrum). It could be more than one in the case of ambiguity.

Atom_x ... is the correlated atom(s) along the X dimension (1H domain for a heteronuclear spectrum).  It could be more than one in the case of ambiguity.

minBond and maxBond are the minimum and maximum bond separations between the relevant atoms.

BondType is the type of the intervening bond between the atoms. Valid choices are: 0, 1, 2, or 3 for unknown, single, double, and triple, respectively.

minNSBC and maxNSBC are the minimum and maximum numbers of relevant atom pair(s) that must satisfy this BC in the generated structure. 

Source encodes the connectivity (or other source) from which the BC was derived. A connectivity is represented by its spectral type and its ID number. The following codes are used to represent the different spectral types:

“C” for COSY, “Q” for HMQC (or HETCOR), “B” for HMBC (or COLOC), “N” for NOESY, “I” for INADEQUATE.

Note: The ID of a connectivity is different from, though related to, the peak ID(s) in the SpecMan peak tables (For more details see Fig. 6.4 in Chapter 6).

The following codes are used to represent other kinds of source:

“S” for a pseudo BC added by the program, “U” for a user-defined BC, and “G” for a previously generated bond (when using a generated substructure as the starting point for the next structure generation cycle).

 

For example, an HMBC-derived bond constraint is represented as:

(10 - 17 18: 2 ~ 3; 0; 1 ~ 2)B10

In the above example, the first set of numbers “10 - 17 18: ” denote the atoms that are correlated. In this case  since the chemical shifts of H-17 and H-18 are very close, it is hard  to resolve which one of them is really correlated to C-10.  So both the protons are retained to represent the possibilities that there could be a correlation between either C-10 and H-17, or C-10 and H-18, or both.  The next set of numbers “2~3” represent that there could be two or three intervening bonds between the correlated C-H pair(s).  The next number “0” represents the bond type of the intervening bonds, and in this case they are treated as unknown. The next set of numbers “1~2” represent that either one or both pairs of the atoms involved in the bond constraint must satisfy this bond constraint in the computed structure (i.e., C-13 and H-17, or C-10 and H-18, or both pairs).   Finally, the character string “B10” means that this bond constraint was derived from the HMBC connectivity #10.   From the comment of this connectivity, the ID of the actual cross peak (in the SpecMan peaks table) can be found in the .nmr file. (See Fig. 6.4 in Chapter 6).

By default, NMR-SAMS treats the unambiguous BCs, which have exactly two correlated atoms, one-bond separation, and minNSBC = maxNSBC = 1 (which means the BC must be satisfied in a generated structure), as fixed bonds. The rest, which either have ambiguous bond separation, or ambiguous number of correlated atoms, or both, are treated as ambiguous BCs.  The ambiguous BCs are used as the major constraints for structure generation.  During structure generation, NMR-SAMS computes the number of violations of BCs for the current substructure/structure.  If  the actual number of violations of a substructure/structure is less than the upper limit of allowed number of violations, then the substructure/structure is retained, otherwise it is rejected.   The BCs are also used by some advanced heuristic methods for acceleration of the structure generation process.  (See Section 7.4)

3.5 Use of Chemical Shifts And Peak Multiplicities

NMR-SAMS uses chemical shifts as the labels of carbon atoms, so that 2D NMR-derived correlation information can be used as bond constraints on specific atoms. This is also the reason why a generated structure always has unequivocal 1H and 13C resonance assignments.

13C chemical shifts are also used to evaluate the intermediate structures/substructures produced during the structure generation process.  A knowledge base consisting of a correlation table of substructure and 13C chemical shift (d) range is used for predicting 13C chemical shift ranges.  Each of the substructures consists of the central carbon atom (which is being considered), its attached bonds, and the first layer of its neighboring atoms (the outwards bonds of these atoms are not considered).  This is referred to as a carbon-centered single-spherical substructure (CCSS).  Currently, this table consists of the 13C chemical shift ranges of around 93 CCSSs composed of C, N, O, and other common elements which have been adapted from literature.  The correlation table is stored as an ASCII file, chemical_shifts.def (see Appendix III), with the code for each CCSS and its expected minimum and maximum 13C chemical shift. This file can be customized by you.  The file is read when NMR-SAMS is started.

During structure generation, whenever a carbon atom has a complete CCSS (i.e., its immediate neighbors are known), then its expected chemical shift range is derived from the knowledge base and compared with the observed 13C chemical shift of the central carbon. If the observed shift satisfies this range, then it is accepted, otherwise the substructure is discarded.  If the CCSS is not defined in the knowledge base table, the test is assumed to be passed and the undefined CCSSs is reported after the structure generation has been completed.  As the CCSSs cover only very limited structural features, their chemical shift ranges are very broad.  Thus in NMR-SAMS, 13C chemical shifts act as a much looser constraint on the structure generation than the 2D NMR connectivities.  Hence it is very important to include as much correlation information as possible for efficient structure generation.  Sometimes the correct structure could be overlooked if the molecule has carbons that show odd chemical shifts.  In such cases, you are recommended to broaden the predicted chemical shift ranges by specifying an extra tolerance (For details refer to the Appendix IV describing parameter ADD_C13_RNG). 

13C peak multiplicities play an important role in determining the number of attached protons of heavy atoms (i.e., the building blocks). So you are recommended to use DEPT (or INEPT, APT) spectra to obtain complete 13C multiplicity information.

In the current version, 1H chemical shifts are not used to evaluate substructures. 1H peak multiplicities are used to limit the neighboring atoms of the concerned atom. (For details refer to the description about H1MULT_FLAG in Appendix IV.)

3.6  Structure Generation 

During structure generation NMR-SAMS searches all possible ways to assemble the structural building blocks into complete structures.  Within some allowance for the violation of constraints, the generated structures are consistent with all of the available spectral data and chemical constraints. 

The efficiency of structure generation is a factor of the computation time, the quality of the structure generated, and the number of structures generated. Because it is a combinatorial problem, structure generation is usually the most time-consuming step.  “Combinatorial explosion” has been the major bottleneck of early attempts of automated structure elucidation.  NMR-SAMS provides novel heuristic search algorithms that reorder the solution space based on bond constraints, and search only the most probable portion of this space for candidate structures.  These methods exponentially reduce the CPU time for structure generation and hence make it practical for complex molecules.  Moreover, as a user you have full control of the usage of these methods to perform optimized structure generation. For example, by modifying a few parameters, you can extend the search space to a more complete search, or simply turn off the heuristic search methods to perform an exhaustive search. On the other hand, you can limit the search space for faster structure generation.  (See Section 7.4 and Appendix IV about the parameters GEN_FLAG, SAT_BC_RATE and N_FBX_STEP).

For relatively small molecules (e.g. < 30 heavy atoms) with reasonably clean and sufficient spectral data, this process is usually completed in seconds or minutes. In most cases the correct structure is generated either uniquely or along with a few alternatives.  For more complex problems (bigger molecules and insufficient spectral constraints), structure generation can be completed in a reasonable computation time if adequate user-defined constraints are included.   

The candidate structures generated by NMR-SAMS include complete structures and optionally, substructures.  A complete structure is defined as one having no unsatisfied free bonds.  In the case of partial structure elucidation (see Section 7.1 for details), the chemically incomplete structure obtained are still referred to as a complete structure,  because all of the free bonds are satisfied either by real bonds or dummy bonds.  During structure generation, the program enables saving the largest intermediate substructures. The substructures are useful when the generation of complete ones is not possible due to errors in spectral data or other reasons, and they provide clues and hints for improving the input spectral data and completing the structure elucidation successfully.

3.7 User Intervention 

NMR-SAMS was developed to streamline and automate the structure elucidation process with less user-intervention.  But when the molecular size of the unknown is big (e.g., number of non-hydrogen atoms is greater than 40), or insufficient connectivity information is available, user-intervention is absolutely  necessary to improve the efficiency of structure generation.  Currently you can interact with the structure elucidation procedure in the following ways:

1.      Change the control parameters for NMR interpretation and structure generation. For example, you can decide whether or not to use the “negative information” of DQF-COSY based on the spectral quality.  You can also limit the ring sizes to either 5 or 6-membered rings in the generated structure and discard structures containing other ring sizes.

2.      Modify the intermediate results in the MDF by using Edit/Master Data File.

3.      Supply structural building blocks by using Analysis/Edit Building Blocks if the MF is unknown.

4.      Supply known structural information as user-defined bond constraints. This is very important especially for heteroatoms that are either not observed or have sparse connectivity information in 2D NMR experiments. Also, different spectral data, such as IR and UV, normally provide positive evidence of some known functional groups.  Using Analysis/User-defined Bond Constraints, you can add as many known bonds as possible between the constituent atoms (see Section 7.2).  Using this feature, you can also manually assemble the building blocks as a complete structure, or use a selected substructure (which was previously generated ) as the starting point for the next structure generation.

5.      Supply known structural information as atom environment constraints (EC).  An EC defines the number of occurrence of a certain type of atom(s) as the immediate neighbor(s) of an atom under consideration (See Section 7.3).

6.      Propose a possible structure for the unknown and perform resonance assignment.  This way you can verify user-proposed structures and complete the structure elucidation.

7.      Modify the results of resonance assignment of a target structure using Analysis/User-Defined Assignment.

3.8 Control Parameters

The parameter file (.par file) stores the parameters for controlling the spectral interpretation, setting up ACMX, and structure generation.  All of the parameters can be changed through the dialog boxes after choosing Edit/Parameters/NMR Interpretation, Edit/Parameters/Setup ACMX or Edit/Parameter/2D Structure Generation.  Default values are assigned to the parameters according to the .ini file when a new working data set is opened.  The default values can be customized by editing the .ini file before starting the program. In most cases the default parameters provided in the .ini file provided by Spectrum Research should be a good starting point for structure elucidation.

In the following chapters, the name of the parameter, e.g., GEN_FLAG, is used to refer to a parameter. The corresponding titles in the dialog boxes and details about the usage of the parameters are described in Appendix IV.


Chapter 4

Working Data Set

4.1 Overview

This chapter describes the operations related to the data files used by NMR-SAMS.  During each session of structure elucidation, NMR-SAMS works with a working data set, which consists of five text files with the same root name but different extensions.  Suppose the root name is Q-2-test, then the working data consists of the following files:

·        A master data file (MDF), Q-2-test.mdf, where all of the intermediate and final results are stored. You can view and edit this file by using Edit/Master Data File (See Appendix II).

·        A parameter file, Q-2-test.par, where the control parameters used for the data interpretation and structure generation are stored. You can access the parameters by using the commands in the pull-right menu of Edit/Parameters (see Appendix IV).

·        An NMR data file, Q-2-test.nmr, where the NMR data converted from the SpecMan peaks table are stored.  You can view and edit this file by using Edit/NMR Data File (see Appendix I).

·        A log file, Q-2-test.log, where most of the information, warning, and error messages produced during the analysis are stored.  You can view the log file by using Edit/Log File.

·        A structure file, Q-2-test.str, where the atom-atom connection table of the generated structures and their resonance assignments are stored.  You can display the structures by using Display/Generated Structures (see Chapter 10).

·        A lock file, Q-2-test.lock, which is used to prevent two users opening the same data set simultaneously.

The operations related to the working data set can be found in the File menu shown below:

4.2 Opening An Existing Working Data Set

Command: File/Open.

Description:  This procedure is used to open an existing working data set.  An existing working data set stores the data and results of the last session of structure elucidation with NMR-SAMS.  Opening an existing working data set allows you to continue your work from where it was saved.  After selecting File/Open, a file browser is displayed, listing the master data files in the current directory.  If necessary, you can switch to the desired directory, and then click the desired master data file name.  The selected file name appears in the Open MDF field.  Next click OK, and the working data set is then opened for use.

After a working data file is opened, the following message (as seen below) prompts you to confirm removing of old log messages from the previous session.   To retain them, you must click No, or else Yes, to overwrite with the new log messages. .

The status window shows the current state of structure elucidation.  It lists the NMR data files that are being used.  It also lists the steps that have been completed, and provides tips to you about what needs to be done next.    The structural results, such as building blocks or candidate structures, are displayed in the main graphics window (see Chapter 10).

Note:  If you choose to open another working data set before saving the current modified working data set, you will be prompted to save the changes. 

If you want to discard the changes you have made to the current working data set without exiting the program, open it again, and click Yes to the following message. Then you can start from the point you last saved the working data set.

If you select a data set that is being locked by another user of NMR-SAMS, you will be warned by the following message:

Click Yes to open the data file anyway, or click No to cancel.  Note that if you click Yes, it may cause problems.

4.3 Opening A New Working Data Set

Command: File/New.

Description: This procedure is used to create a new working data set. When dealing with a new structure problem, you must open a new working data set.  You can open a totally new working data set, or open one starting from an existing NMR data file that has already been prepared.

To open a totally new working data set, choose File/New. In the displayed file browser, make sure the option Starting with Existing NMR File is turned off.  Switch to the desired directory if necessary, and type a root name for the new working data set.  The extension will be automatically added for each file so you do not need to type it.

After clicking OK, NMR-SAMS creates the five new files described in Section 4.1.  All files are empty except the parameter file, which stores the default parameters.

Next NMR-SAMS prompts you to input the molecular formula (MF) of the unknown when a new working data set is opened. 

Type the molecular formula in the dialog box. See Section 4.4 for more about inputting molecular formula.

To open a new working data set starting with an existing NMR file, check the option Start with Existing NMR File in the file browser.  Then the existing .nmr files in the current directory are listed.  Switch to the desired directory if necessary, and click the desired .nmr file.  Next click OK,  and a new working set is created with the selected .nmr file. 

Note: If you select a filename of an existing data set (with or without selecting the option Existing NMR file), NMR-SAMS warns you (as shown in the dialog box below) about existing files with the same root name.  You can select Yes, and the program will overwrite the existing files, except the .nmr file if you are starting from an existing NMR data file.

If you don’t want to overwrite the existing files, but you still would like to use the existing .nmr file, then first  click No to cancel this dialog box.  Next use a UNIX window to make a copy of the .nmr file with a new root name.  After that repeat the operations described above for opening a new working data set using an existing .nmr file.      

4.4 Input Molecular Formula

Command: File/Molecular Formula.

Description:  This procedure is used to define the molecular formula of the unknown.  Normally this command is used when you want to change the MF, since you are always prompted to enter the MF when you open a new working data set (see Section 4.3).  Note that the element symbol must be typed with the first letter in upper case and the second one, if any, in lower case.  For example:

You can specify valence of an atom in a pair of parenthesis following the element symbol. For example, C10H12N(V)N2S(VI)O8.  If you do not specify the valence, the most common chemical valence is adopted for an element with multiple valence.  In the above example, if it were not explicitly specified, valence 3 and 2 would be adopted for N and S, respectively.  You can also change the valences later using Analysis/User-Defined Building Blocks. 

If you do not know the exact MF, try to enter the closest possible formula, or type “unknown”.  In any case, you can modify the elemental composition of the molecule by using Analysis/User-defined Building Blocks later (see Section 6.3).

The MF is interpreted if it is known.  A dialog box reports the standardized MF, the molecular weight, and the double bond equivalence (DBE).  For example:

Two records are written into the MDF. The first record starts with the keyword “MF:” and contains the standardized MF:

MF: C30H48O3

The second record starts with the keyword “ATOMS:  Following this are the molecular weight and the degree of unsaturation (or double bond equivalence) in the same line.  The second line is a brief description of the entries in each of the remaining lines.  Each line consists of the ID, the atomic number, the chemical valence, the minimum and maximum attached protons, the minimum and maximum of attached double bonds, and the minimum and maximum attached triple bonds of a constituent heavy atom, respectively.  The constituent heavy atoms are listed with carbon first, and the remaining elements in the alphabetic order of their element symbol.

ATOMS:  (MW = 456.7074, DBE = 7.0)                    

#Atom; Element; Valence; Min. & max. attached H; Min. & max. double bonds; Min. & max. triple bonds

# 1.  C 4   0 3   0 2  0 1

# 2.  C 4   0 3   0 2  0 1

# 3.  C 4   0 3   0 2  0 1

      .

      .

      .

#30.   C 4   0 3   0 2  0 1

#31.   O 2   0 1   0 1  0 0

#32.   O 2   0 1   0 1  0 0

#33.   O 2   0 1   0 1  0 0

Note: You can specify an uncommon valence while inputting the MF.  Otherwise, if an atom has multiple valences, the most common valence is adopted by default.  Modifying the valence manually in the .mdf file is not recommended, because whenever you choose Analysis/Building Blocks the MF will be re-interpreted and the previous changes will be overwritten.

For example, the valence 3 is always adopted for N by default.   If you know that there is a -NO2 group in the molecule, input the MF containing a “N(V)”  (e.g.,  C6H5N(V)O2).

4.5 Save A Working Data Set

Command: File/Save.

Description:  This command allows NMR-SAMS to update the working data set with the current state of structure elucidation.  This operation is not absolutely necessary because you will be prompted to save changes before exiting the program or opening another working data set.

 

4.6 Save A Working Data Set as Different Name

Command: File/Save As.

Description:  This command allows NMR-SAMS to save the current state of structure elucidation in a working data set with a different root name.  After selecting File/Save As, the following file browser is displayed.  Switch to the desired directory if necessary, and type the new root name, then click OK.

4.7 Exiting NMR-SAMS

Command: File/Exit.

Description:  This command allows you to exit NMR-SAMS.  If some changes have been made to at least one of the three data files, namely, the .nmr, .mdf, and .par files, and have not been saved, NMR-SAMS prompts you to save them before exiting the program:

If you click Yes, the changes are updated before exiting the program. If you click No, the changes are ignored before exiting the program.  If you click Cancel, the command is ignored.


Chapter 5

Input of NMR Spectral Data

5.1 Overview

It is important to generate a clean and reliable set of peak lists from different NMR experiments before using them by NMR-SAMS. SpecMan provides several advanced and intelligent peak picking tools to perform fast and reliable peak picking.  For details regarding peak picking, refer to SpecMan Users’ Guide.  Although peak picking can be independently performed by SpecMan, we recommend you to perform the two steps (i.e., peak picking using SpecMan and peak table conversion by NMR-SAMS) in tandem for each spectrum, because the consistency-checking during the conversion process helps you find potential errors in the peak picking result. 

This chapter describes how to prepare 1D and 2D NMR spectral data as input to NMR-SAMS. (For details about the NMR Data File format see Appendix I).  It is assumed that the peak picking has already been performed by SpecMan.  The peak tables from SpecMan are then converted into the NMR-SAMS format.  The conversions are done with the pull-right options of Create NMR Data File in the File menu as shown below:

5.2 Conversion of SpecMan 1H Peak List

Command: File/Create NMR Data File/H1.

Descriptions: In this procedure, SpecMan 1H peaks table is converted into NMR-SAMS format.  First the following dialog box is displayed which prompts you to enter the filename of the 1H peaks table from SpecMan. 

Click Browse to locate the peaks table file, then click OK.  An information dialog box displays the number of 1H peaks that have been converted.

In the current version of SpecMan all 1H peak multiplicities are marked as unknown (u) by default.  That’s why NMR-SAMS prompts you to supply the 1H multiplicity for the peaks (referring to their splitting patterns). As shown in Fig. 5.1, if you know the multiplicities of all or some of the 1H peaks, select Edit/NMR Data File to open the NMR data file and replace the unknown multiplicity (represented as “u”) by one of the following symbols recognizable to NMR-SAMS:

s: singlet, d: doublet, t: triplet, q: quartet, m: other multiplet.

If the multiplet is unknown, leave it as unknown (u). 

NMR-SAMS uses 1H multiplicity information to eliminate inappropriate bonds while setting up ACMX. For details refer to the usage of parameter H1_MULT_FLAG (in Appendix IV).

 

Figure. 5.1. Running NMR-SAMS and SpecMan side-by-side provides a convenient way to verify and edit the 1D peaks converted from SpecMan peaks table. Left (NMR-SAMS): select Edit/NMR Data File to open the .nmr file.  Right (SpecMan): Open the 1D spectrum and load the 1D peaks table. From the comment field of a converted peak, the ID (#32) of the original peak is found. By clicking the corresponding entry in the peaks table, the 1D peak (#32, shown in cyan) is highlighted in the spectrum for you to see and recognize the multiplicity of this peak before modifying the .nmr file.

Possible Errors: Generally NMR-SAMS cross-checks the converted 1H peak list against the MF (if known) and alerts you of any potential conflicts.  The following situations will be reported when there is a conflict:

·        If the multiplicity information is unknown for more than three fourths of the peaks, a warning message prompts you to supply this information if possible.

·        If the number of 1H peaks exceeds the constituent protons, an error message prompts you to correct either the peak picking result or the MF.

Results:  After the conversion, the .nmr file is updated with information regarding proton peaks starting  with the keyword “H1:”.  Following is a transcript of the converted 1H peaks:

H1: /usr/people/peng/NMR-SAMS/ndat/Q-2-test/h1p.pks

 #1. 4.930 s   ;1

 #2. 4.755 s   ;2

 #3. 3.509 u   ;3

      .

      .

      .

 #32. 0.818 s   ;32

 #33. 0.811 u   ;33

The first line which begins with the keyword “H1:” indicates the start of 1H peak list. Following the keyword and a blank space, comments may be added up to 80 characters in length. The entries in the rest of the lines represent the following attributes of a 1H peak:

·         Peak ID, a serial number that uniquely identifies this peak.

·         Chemical shift of the peak in ppm.

·        Multiplicity, which is designated as s (singlet), d (doublet), t (triplet), q (quartet), m (other multiplet) or u (unknown).  By default it is assigned as unknown. 

·         Comments, which are optional. The number in the comment field corresponds to the ID of the 1H peak in the SpecMan peaks table.

One or more space(s) is used as a delimiter for all items except comments which are separated by “;”.   Items marked as optional can be omitted unless an item following them is included.  In such a case, you must include default values for ignored items even if they don’t get used.  Comments can always be included as long as they follow a “;”.For 1H peak list the peak intensities and comments are not currently used by NMR-SAMS.

Note: Whenever you repeat a 1H peaks table conversion, or modify the converted peak list (using Edit/NMR Data File), you must make sure to convert the dependent 2D spectra again.  For example, if you add a 1H peak in the converted 1H peak list, you must convert the COSY, HMQC, HMBC, and NOESY data again, if they have been converted before. Otherwise the added 1H peak will not be reflected in the 2D data.

5.3 Conversion of SpecMan 13C Peak List

Command: File/Create NMR Data File/C13 and DEPT.

Descriptions: In this procedure the SpecMan 13C and DEPT/APT peak tables are converted into a peak list of 13C chemical shifts and multiplicities. NMR-SAMS requires 13C multiplicity information  for reliable structure elucidation. In order to get the complete 13C multiplicity information, you need 13C, DEPT-90/APT-90 and DEPT-135/APT-135 experimental data.  However, NMR-SAMS provides a flexible way to derive the 13C multiplicity information from any combination of available experiments as described below:  

1.      13C Only. In the dialog box that appears, select None for Peak Multiplicity Experiments.  Click Browse to enter the SpecMan C-13 Peaks Table. 

After clicking OK, NMR-SAMS updates the .nmr file with a list of 13C chemical shifts having unknown multiplicities as shown in the Results section below.  If you know the multiplicities of some peaks, you can manually edit the .nmr file to supply this information.

2.      13C and DEPT.  In the dialog box that appears, click Browse to enter the SpecMan C-13 Peaks Table.  Next select DEPT for Peak Multiplicity Experiments.  Enter the peaks table filenames for DEPT-45, DEPT-90, and DEPT-135 experiments. All of the DEPT experiments are optional as mentioned previously, so if you do not have a certain DEPT data, turn off the corresponding toggle.  Note that, except for DEPT-45, ignoring some DEPT experiments could leave some peaks with unknown multiplicities.

Also you need to enter a matching tolerance (in ppm) to match 13C and DEPT peaks. After clicking OK, NMR-SAMS updates the .nmr file with a list of 13C chemical shifts and derived multiplicities as shown in the Results section below. 

3.      13C and APT. In the dialog box that appears, click Browse to enter the SpecMan C-13 Peaks Table.  Select APT for Peaks Multiplicity Experiments.  Enter the peaks table filenames for APT-45, APT-90, and APT-135 experiments. All of the APT experiments are optional as previously described, so if you do not have a certain APT data, turn off the corresponding toggle.  Note that, except for APT-45, ignoring some APT experiments could leave some peaks  with unknown multiplicities.

Also you need to enter a matching tolerance to match 13C and APT peaks.  After clicking OK, NMR-SAMS updates the .nmr file with a list of 13C chemical shifts and derived multiplicities as shown in the Results section below. 

Possible Errors: During the conversion NMR-SAMS cross-checks the 13C peak list with the MF, and alerts you of potential inconsistencies.  In such cases, the following general messages will be reported:

·        If there are more 13C peaks than the constituent carbon atoms, an error message will prompt you to remove peak artifacts or correct the MF.

·        If there are fewer 13C peaks than the constituent carbon atoms, a warning message will prompt you to resolve 13C peak overlap.  Define the overlapping peaks as individual peaks with slightly different chemical shifts by choosing Edit/NMR Data File and editing the NMR data file (It is usually possible to resolve such ambiguities by looking at the peak intensity and the HMQC spectrum, or by acquiring the spectrum at different conditions).  If you are unable to resolve overlapping peaks (for example, in the case of a symmetric molecule, or due to  severe overlap in spectrum), then partial structure elucidation will be performed (see Section 7.1). 

·        If the multiplicity of one or more 13C peaks is unknown, a warning message will prompt you to supply this information, if possible.  Lack of this information may result in multiple building block sets (see Section 6.2).

·        The number of carbon-attached protons (n_CH ) is calculated based on the 13C multiplicities. If n_CH is greater than the number of constituent protons, an error message will prompt you to correct either the multiplicity information or the MF.

·        When the number of 13C peaks is equal to that of the carbon atoms, and all 13C multiplicities are known, the maximum number of heteroatom-attached protons (max_XH ) is calculated based on the valence of the constituent heteroatoms. If (n_CH + max_XH) is smaller than the number of constituent protons, an error message will prompt you to correct either the multiplicity information or the MF.

Results: After the conversion, the .nmr file is updated with information regarding the 13C peaks starting with the keyword “C13:” in the .nmr file.  The following is a transcript of a converted 13C peak list (Note that if DEPT or APT is not used, the multiplicities will be unknown “u” for all peaks.):

C13: /usr/people/peng/NMR-SAMS/ndat/Q-2-test/c13.pks

 #1. 178.822 s ;1

 #2. 151.323 s ;2

 #3. 109.931 t ;3

      .

      .

      .

 #28. 16.340 q ;28

 #29. 14.929 q ;29

The first line which begins with the keyword “C13:” indicates the start of the 13C  peak list. Following the keyword and a blank space, comments may be added up to 80 characters in length. The entries in each of the rest of the lines represent the following attributes of the 13C peak:

·         Peak ID, a serial number that uniquely identifies this peak.

·         Chemical shift of the peak in ppm.

·        Multiplicity, which is designated as s (singlet, C), d (doublet, CH), t (triplet, CH2), q (quartet, CH3), or u (unknown).

·         Comments, which is optional. The number in the comment field corresponds to the ID of the 13C peak in the SpecMan peaks table.

One or more space(s) is used as a delimiter for all items except comments which are separated by “;”  Items marked as optional can be omitted unless an item following them is included.  In such a case, you must include default values for ignored items even if they don’t get used.  Comments can always be included as long as they follow a “;”.  For 13C peak list the peak intensities and comments are not currently used by NMR-SAMS.

Note: Whenever you repeat a 13C peaks table conversion, or modify the converted peak list (using Edit/NMR Data File), you must make sure to convert the dependent 2D spectra again.  For example, if you add a 13C peak in the converted 13C peak list, you must convert the HMQC, HMBC, and INADEQUATE data again, if they have been converted before. Otherwise the added 13C peak will not be reflected in the 2D data. 

As shown in Fig. 5.1, you can run NMR-SAMS and SpecMan side-by-side, to verify the peak picking results of peaks mentioned in the warning or error dialog boxes.

5.4 Conversion of SpecMan DQF-COSY Peaks Table

Command: File/Create NMR Data File/COSY.

Descriptions:  In this procedure NMR-SAMS converts the DQF-COSY cross peak coordinates into connectivities between 1D 1H peaks.  As illustrated in Fig. 5.2, the coordinates of the peak center (shown as a cross) are matched to the 1D chemical shifts (shown as dotted lines).  The 1D peaks that match the peak center within the tolerances (±D2 and ±D1 in F2 and F1 dimensions respectively) are taken as the correlated 1D peaks.  If, in a certain dimension, more than one 1D peak (such as 1H peaks a and b in Fig. 5.2) match the cross peak center, then all are treated as possible correlated 1D peaks in that dimension. Such a connectivity is called an ambiguous connectivity.  Internally, NMR-SAMS will consider all possible correlations for an ambiguous connectivity.  (For details about ambiguous connectivity, see the example in Section 3.4).

Figure. 5.2. Illustrates the conversion of a COSY cross peak coordinates into a correlation between the 1D 1H peaks.  The cross (+) denotes the cross peak center.  The dotted lines denote the chemical shifts of the three 1D 1H peaks, a, b, and c, respectively.  D1 and D2 are the matching tolerances along F1 and F2, respectively.  All three peaks, which match the cross peak center within the tolerances, are taken as correlated 1D peaks.

Upon selecting the command File/Create NMR Data File/COSY, NMR-SAMS opens a dialog box that prompts you to enter the filenames of the COSY peaks table.  Also you are prompted to input matching tolerances along X (i.e. F2) and Y (i.e. F1) dimensions respectively.

The default values for the matching tolerance are 0.005 ppm for both dimensions.  It is important to select an appropriate tolerance because, too big a tolerance could result in undesired ambiguity, and too small a tolerance could ignore some real peaks.  To choose a suitable tolerance you must consider at least the following factors:

·        Accuracy of the peak picking.  The grid-intelligence-based peak picking of SpecMan provides you a very convenient way to verify the accuracy of peak picking by comparing the expected locations of the cross peaks with the picked peaks (See User’s Guide of SpecMan). If a peak list was carefully verified with this method, it is OK to start with a small tolerance.

·        Alignment between 1D 1H and the COSY spectra.  SpecMan provides convenient tools for you to correct frequency offset between the 1D and 2D spectra. Sometimes different experimental conditions introduce small chemical shift differences between 1D and 2D resonances. To further correct the differences due to sample conditions, use the grid-intelligence-based peak picking method of SpecMan.  If these corrections have been applied, it is OK to start with a small tolerance.

Possible Errors: During the peak table conversion, depending on the situation, NMR-SAMS may prompt the following error/ warning messages:

·        If the X or Y coordinate of a cross peak does not match any 1D 1H peak within the matching tolerance, the cross peak will be discarded.  When this message appears, you are supposed to verify this peak and check if it is an artifact.  If it is not an artifact, then either its center has not been picked accurately, or the tolerance used is too small.  Click Cancel to stop the conversion process, and try refining the peak picking results or repeating the conversion with a bigger matching tolerance.

·        If the X or Y coordinate of a cross peak matches more than one 1D 1H peaks within the matching tolerance, then an ambiguous correlation is obtained.  You can either click Cancel to stop this process, and then try a smaller tolerance to reduce ambiguities; or you can click OK to All to let it finish the conversion, then choose Edit/NMR Data File to manually remove the undesired ambiguities in the .nmr file.  Note that although NMR-SAMS can use ambiguous correlation information, too many ambiguous correlations will undermine the efficiency of the subsequent structure generation.   

·        If the X or Y coordinate of a cross peak matches more than six 1D 1H peaks within the matching tolerance, the peak will be discarded.  In such a case, you can either click Yes (or Yes to All) to go on without that peak, or click No to define a reduced matching tolerance and repeat this process.  You can also click Cancel to stop this process, and then merge the very close 1D 1H peaks as a degenerate peak in the SpecMan 1H peaks table before converting it again (see Section 5.2). After that, convert the DQF-COSY peaks table again.

Tips: As shown in Fig. 5.3, you can run NMR-SAMS and SpecMan side-by-side to verify the original peak picking results of peaks mentioned in the warning or error dialog boxes.  This is also useful when you edit the .nmr file using Edit/NMR Data File.

Figure. 5.3. Running NMR-SAMS and SpecMan side-by-side provides a convenient way to verify and edit the 2D peaks during peaks table conversion. Left (NMR-SAMS): a dialog box indicates that cross peak #33 is discarded by NMR-SAMS.  Right (SpecMan): Open the DQF-COSY spectrum and load the 2D peaks table. By clicking the corresponding entry in the peaks table, cross peak #33 is highlighted in the spectrum.  This peak was discarded because it is located too far away from the grid center.  If necessary, you can correct this peak by moving it closer to the grid intersection. After correcting such peaks, save the refined peaks table and repeat the peaks table conversion.  This method can also be used when editing the .nmr file to remove undesired ambiguities and to mark long-range coupled peaks.

For COSY and other homonuclear spectra, NMR-SAMS discards the diagonal peaks and merges symmetric peaks.  This is not done when ambiguous correlation is involved.  For example, the following connectivities are retained:

(10 - 10 11) 3   0.00   0.60

(8 - 9 10)   3   0.00   0.60

(8 - 9)      3   0.00   0.60

The first connectivity may arise from either a diagonal peak or a near-diagonal peak. The latter two, converted from two symmetric peaks, do not have exactly the same correlated 1H peaks so they are not merged.

For each converted COSY connectivity, the intensity level is assigned 3 (i.e., strong). The J-coupling constant is assigned 0.0 (i.e., unknown).  The reliability of the peak is assigned 0.60 if it is converted from a single peak, or 0.84 if from two symmetric ones.  Since the intensity level of a COSY peak is related to its structural interpretation, NMR-SAMS always prompts you to mark the connectivities that may be due to long-range couplings after the conversion is finished, as shown in the dialog box below:

Peaks showing very low intensity or involving sp2-C could be long-range coupled.  If you suspect some peaks to be due to long-rang coupling, select Edit/NMR Data File to edit the .nmr file.  Modify the intensity levels of such connectivities from “3” (i.e., strong) to “1” (i.e., weak), and save the changes.  As described in Fig. 5.3,  you can edit the .nmr file while looking at the original COSY cross peaks.

Note: A short-range coupling COSY connectivity is normally interpreted as 2 or 3 intervening bonds between the correlated protons. If a long-range coupling is mistakenly interpreted as a short-range one, NMR-SAMS will probably miss the correct structure.  A COSY connectivity marked as long-range coupling is usually interpreted as 3-5 intervening bonds between the correlated protons, which also covers the possibility of vicinal coupling.  It is safe to treat a short-range coupling peak as long-range coupling, but it may decrease the efficiency of structure generation. The geminal coupling is always automatically detected by the program.  (For details see Section 6.4).

Results: After the conversion, the .nmr file is updated with information regarding the converted COSY connectivities starting with the keyword “COSY:”.  The following is a transcript of a converted COSY connectivity list:

COSY:

 #1. (1 - 2)      1     0.0   ;1+4

 #2. (1 - 12)     1     0.0   ;2+31

 #3. (2 - 12)     1     0.0   ;3+32

 #4. (3 - 7 8)    3     0.0   ;6+18

 #5. (3 - 13)     3     0.0   ;7+33

 #6. (3 - 18)     3     0.0   ;5+49

.

.

.

The first line which begins with the keyword “COSY:” indicates the start of COSY connectivity list. Following the keyword and a blank space, comments may be added up to 80 characters in length. The entries in each of the rest of the lines represent the following attributes of a connectivity:

·         Connectivity ID, a serial number that uniquely identifies this connectivity.

·         IDs of the correlated 1D 1H peaks, (shown in parenthesis) For ambiguous correlations, the IDs of all possible 1D 1H peaks are included. 

·         Peak intensity level, which is classified as four types; strong, medium, weak, and unknown, and denoted as 3,2,1 and 0 respectively.  The default value is 3. For a short range coupled DQF-COSY connectivity, intensity levels should be either 3 or 2.  For a long-range one, the intensity levels should be 1.  If an intensity level 0 is used, NMR-SAMS will expect actual J-coupling values in the field which represents J-coupling.

·         J-coupling. 0.0 is assigned by default, representing unknown. This is optional if peak intensity level is bigger than 0.

·         Comments, which is optional and has a maximum length of 80 characters. The numbers in the comment field correspond to the IDs of the corresponding peaks in the SpecMan peaks table. For merged peaks these numbers are shown with a + sign.  Comments are ignored by NMR-SAMS.

One or more space(s) is used as a delimiter for all items except comments which are separated by “;”.   Items marked as optional can be omitted unless an item following them is included.  In such a case, you must include default values for ignored items even if they don’t get used.  Comments can always be included as long as they follow a “;”.

Note: The conversion of COSY peaks table is dependent on the converted 1H peak list. If you convert the 1H peaks table again, or modify the converted 1H peak list, you must convert the COSY peaks table again. 

5.5 Conversion of SpecMan HMQC/HETCOR Peaks Table

Command: File/Create NMR Data File/HMQC (or HETCOR).

Descriptions:  In this procedure NMR-SAMS converts the HMQC or HETCOR cross peak coordinates into connectivities between 1D 13C and 1H peaks.  In  principle the conversion process is very similar to what was described earlier in Section 5.4. 

Other things that you need to be aware of are as follows:

The correlated 13C peak(s) is always placed ahead of the correlated 1H peak(s) in a converted connectivity, and this applies to both HMQC or HETCOR. 

Unlike the other 2D spectral data, ambiguity is not allowed for an HMQC connectivity. NMR-SAMS first searches each 13C peak against an HMQC peak by matching 13C coordinate within the specified tolerance.   Next  the HMQC peak that has been identified by the previous step is searched against all 1H peaks by matching its chemical shift within the specified tolerance, and the 1H peak with the best match is taken as the correlated 1H peak.  This process is repeated until each HMQC connectivity has exactly one correlated 13C-1H pair.

Possible Errors: After the conversion, the resulting HMQC peak list is cross-checked against the 13C multiplicity information. NMR-SAMS may prompt the following error/warning messages:

·        If the number of correlated HMQC peaks of a certain 13C peak is fewer than expected (1 for CH and CH3, 2 for CH2), it warns you to check for missing HMQC peaks, or the 1H integral to verify if a CH2 shows degenerate 1H peaks.

·        If the number of correlated HMQC peaks of a certain 13C peak is more than expected (1 for CH and CH3, 2 for CH2), it prompts you to check for possible errors due to degenerate 13C peaks, wrong assignment, or artifacts.

NMR-SAMS automatically discriminates HMQC from HETCOR and does not consider diagonal peaks or symmetric peaks. Strong intensity level (represented as “3”) and the actual peak intensity (from SpecMan peaks table) are assigned to each peak.  The peak intensities are not used by NMR-SAMS so it is not necessary to take care of them (see Section 6.2.4).

Results: After the conversion, the .nmr file is updated with information regarding the converted HMQC connectivities starting with the keyword “HMQC:”. The following is a transcript of a converted HMQC connectivity list:

HMQC:

 #1. (3 - 1)      ;2

 #2. (3 - 2)      ;1

 #3. (4 - 4)      ;3

 #4. (6 - 33)     ;4

      .

      .

      .

The first line which begins with the keyword “HMQC:” indicates the start of  HMQC connectivity list. Following the keyword and a blank space, comments may be added up to 80 characters in length. The entries in each of the rest of the lines represent the following attributes of a connectivity:

·         Connectivity ID, a serial number that uniquely identifies this connectivity.

·         IDs of the correlated 1D 13C and 1H peaks (shown in parenthesis), which define the correlated 13C and 1H peaks respectively.

·         Comments, which are optional and has a maximum length of 80 characters. The numbers in the comment field correspond to the ID of the corresponding peak in the SpecMan peaks table.

One or more space(s) is used as a delimiter for all items except comments which are separated by “;”.   Items marked as optional can be omitted unless an item following them is included.  In such a case, you must include default values for ignored items even if they don’t get used.  Comments can always be included as long as they follow a “;”.

Note: The conversion of HMQC/HETCOR peaks table is dependent on the converted 1H and 13C peak lists.  If you convert the 1H/13C peaks table again, or manually modify the converted 1H/13C peak list, you must convert the HMQC/HETCOR peaks table again. 

5.6 Conversion of SpecMan HMBC/COLOC Peaks Table

Command: File/Create NMR Data File/HMBC (or COLOC).

Descriptions:  In this procedure NMR-SAMS converts the HMBC or COLOC cross peak coordinates into connectivities between 1D 13C and 1H peaks.  In  principle the conversion process is very similar to what was described earlier in Section 5.4. 

Other things that you need to be aware of are as follows:

The  correlated 13C peaks are always placed ahead of 1H in a converted connectivity, and this applies to both HMBC or COLOC. 

NMR-SAMS automatically discriminates HMBC from COLOC and does not consider diagonal peaks or symmetric peaks. Strong intensity level (represented as “3”) and the actual peak intensity (from SpecMan peaks table) are assigned to each peak. The peak intensity levels are useful if you want to interpret some weak peaks as connectivities longer than 3 bonds (see Section 6.4.2).

Results: After the conversion, the .nmr file is updated with information regarding the converted HMBC connectivities starting with the keyword “HMBC:”. The following is a transcript of a converted HMBC connectivity list:

HMBC:

 #1.   (1 - 6)    3     ;3

 #2.   (1 - 7 8) 3     ;4

 #3.   (1 - 13)   3     ;5

         .

         .

         .

 #128. (29 - 10) 3     ;133

 #129. (29 - 24) 3     ;131

The first line which begins with the keyword “HMBC:” indicates the start of HMBC  connectivity list. Following the keyword and a blank space, comments may be added up to 80 characters in length. The entries in each of the rest of the lines represent the following attribute of a connectivity:

·         Connectivity ID, a serial number that uniquely identifies this connectivity.

·         IDs of the correlated 1D 13C and 1H peaks, (shown in parenthesis). For ambiguous correlations the IDs of all possible 1D 13C & 1H peaks are included.

·         Peak intensity level, which is classified as four types: strong, medium, weak, and unknown, and denoted as 3,2,1 and 0 respectively.  This is optional and the default value is 3.

·         Comments, which are optional and has a maximum length of 80 characters. The numbers in the comment field correspond to the ID of the corresponding peak in the SpecMan peaks table.

One or more space(s) is used as a delimiter for all items except comments which are separated by “;”.  Items marked as optional can be omitted unless an item following them is included.  In such a case, please include default values for ignored items even if they don’t get used   Comments can always be included as long as they follow a “;”.

Note: The conversion of HMBC/COLOC peaks table is dependent on the converted 1H and 13C peak lists.  If you convert the 1H/13C peaks table again, or modify the converted 1H/13C peak list, you must convert the HMBC/COLOC peaks table again. 

5.7 Conversion of SpecMan NOESY Peaks Table

Command: File/Create NMR Data File/NOESY (or ROESY).

Descriptions:  In this procedure NMR-SAMS converts the NOESY (or ROESY) cross peak coordinates into connectivities between 1D 1H peaks in exactly the same way as described for COSY in Section 5.4. Strong intensity level (represented as “3”) and the actual peak intensity (from SpecMan peaks table) are assigned to the corresponding entries of each peak.  NMR-SAMS uses NOESY information in a very limited fashion so normally you do not need to take care of the peak intensity for 2D structure determination (see parameters IDEAL_COSY and NOESY_DIST in Appendix IV).

5.8 Conversion of SpecMan INADEQUATE Data

Command: File/Create NMR Data File/INADEQUATE.

Descriptions:  In this procedure NMR-SAMS converts the 2D INADEQUATE cross peak coordinates into connectivities between 1D 13C peaks.  In the following dialog box, you are prompted to define a matching tolerance.  This tolerance is used to match chemical shifts of 13C peak and the F2 coordinates of the INADEQUATE peaks.  This tolerance is also used to match the F1 coordinates to search for coupled INADEQUATE peaks.  Similar to the conversion process of DQF-COSY (Section 5.4), ambiguous connectivities are considered. 

Results: After the conversion, the .nmr file is updated with information regarding the converted INADEQUATE connectivities starting with the keyword “INAD:”. The following is a transcript of a converted HMBC connectivity list:

INAD: 

#1. (1 - 3)       ;1+2

#2. (2 - 4 5)     ;3+4

      .

      .

      .

The first line which begins with the keyword “INAD:” indicates the start of  the INADEQUATE connectivity list. Following the keyword and a blank space, comments may be added up to 80 characters in length. The entries in each of the rest of the lines represent the following attributes of a connectivity:

·        Connectivity ID, a serial number that uniquely identifies this connectivity.

·        IDs of the correlated 1D 13C peaks (shown in parenthesis).  For ambiguous correlations the IDs of all possible 1D 13C peaks are included.

·        Comments, which are optional and has a maximum length of 80 characters. The numbers in the comment field correspond to the ID’s of the corresponding INADEQUATE peaks in the SpecMan peaks table.

5.9 Manual Peak Picking 

If you do not have SpecMan (contact Spectrum Research), you can use the following procedure to manually prepare the NMR data file required by NMR-SAMS.

First number the 1D 1H and 13C peaks, preferably from down-field to upper-field (see Fig. 5.4). HMQC can be used to group multiplets and resolve overlapping peaks in 1H spectrum. If two (or more) 1H peaks overlap completely, treat them as one degenerate peak. The 1D 13C peaks must be resolved (i.e., no peak degeneracy is allowed).  If necessary, split a degenerate 13C peak as two peaks with slightly different chemical shifts.  In the worst case where parts of the spectra cannot be resolved due to multiple atoms with very similar chemical environments (e.g. multiple phenyl groups or a long methylene chain), the unresolved 13C (and 1H as well) peaks can be discarded. NMR-SAMS will then perform partial structure elucidation (PSE) based on the incomplete spectral data.

Figure. 5.4. Schematic illustration of the manual preparation of NMR data input to NMR-SAMS from the original spectral plots.  The 1D 1H and 13C peaks are numbered and 2D cross peaks are picked as pairs of correlated 1D peaks.  Two COSY peaks, #2 and #3, which are suspected to be due to long-range coupling, are marked as weak by an intensity level of 1.  HMBC peak #2, which is suspected to be an artifact, is marked with a reliability of 0.4. The grid lines in the 2D spectra illustrate the intra- and inter-spectral alignments of the 1D resonances. For clarity, only COSY and HMBC are shown. See Section 5.4 for details about the format.

Picking of the 2D cross peaks are based on the numbered 1D peaks. The 2D cross peaks are located and assigned to their corresponding 1D peaks in each dimension. A cross peak which can not be resolved can be assigned to more than two 1D peaks. If it is hard to discriminate the cross peak as a possible artifact or noise, use a probability smaller than 0.5 to designate it as an unreliable peak.  For a COSY peak, the interpretation of which is dependent on its intensity level (i.e., J-coupling constant), so a potential long-range coupling must be marked as a “weak” intensity level (represented as 1).  Finally the picked peaks can be listed in a text file format described in Appendix I.


Chapter 6

Spectral Interpretation

6.1 Overview

This chapter describes the steps involved in the interpretation of the molecular formula (MF), 1D and 2D NMR spectral data, and unification of bond constraints derived from NMR data.  First the possible set(s) of structural building blocks are determined from MF, 1H, 13C and HMQC spectral data.  Next the remaining 2D spectral data are interpreted as bond constraints between the building blocks.  In the same step, the various bond constraints are unified as a homogenous set of bond constraints, and an atom-atom connection matrix (ACMX) is setup to summarize the possibilities of bond formation between the building blocks.

The schematics of deriving bond constraints from different 2D NMR spectral data is illustrated in Fig. 6.1. The general definition of bond constraint (BC) has been provided in Section 3.4.

Figure 6.1.  Derivation of bond constraints from conventional 2D NMR experiments.  An INDEQUATE connectivity is interpreted as a C-C bond constraint (BC) of one bond, COSY connectivity as H-H BC of 2 to 5 bonds,  HMQC connectivity as a C-H BC of one bond, and HMBC connectivity as a C-H BC of 2 or 3 bonds.  The various BCs are transformed into a unified set of C-C BCs based on the HMQC connectivities. 

The spectral interpretation-related steps correspond to the first three options in the Analysis menu shown below:

6.2 Interpretation of MF, 1H, 13C and HMQC Data as Building Blocks

Command: Analysis /Building Blocks.

Description:  This procedure interprets the MF, 1H, 13C, and HMQC data, and generates all possible set of building blocks for structure generation.

You are prompted to enter the MF when a new working data set is opened.  If you want to enter a different MF, choose File/Input Molecular Formula to enter a new one.  The MF can be unknown.  See Section 4.4 for details.

1H, 13C, and HMQC data are read from the .nmr file.  If MF is unknown, you must at least have 13C spectral data.  If the MF is known, and you have no NMR data, you can perform isomer enumeration. 

Parameters: None.

Results:  The results of interpretation of MF, 1H, 13C, and HMQC data are written into the .mdf file. The first set of the generated building blocks are displayed on the screen. In the next few sections the results of this procedure are described in detail.

6.2.1 Interpretation of Molecular Formula

See section 4.4 for description of the interpretation of MF. 

6.2.2.Interpretation of 1D 1H Data

The 1H peak list in the NMR data file is interpreted and written into the MDF as a record starting with the keyword “1DH1:”.  Following the keyword are the number of 1H peaks, and the minimum and maximum number of heteroatom-attached protons.  The latter is currently not used so it is always set as 0 - 0.  The second line is a brief description of the entries in the rest of the lines.  Each of the subsequent lines include  the peak ID, the chemical shift, and the minimum and maximum numbers of the corresponding protons, and the multiplicity of the 1H peak.  The minimum and maximum numbers of the corresponding protons are not used now so they are always kept as zeros.  Following is a transcript of such a record:

1DH1: num.peaks = 33, num.hete.Hs = 0-0

#Peak. Chem.shift (min. protons ~ Max. protons multiplicity )

# 1. 4.930(0~0 1)

# 2. 4.755(0~0 1)

# 3. 3.509(0~0 0)

# 4. 3.435(0~0 0)

# 5. 2.725(0~0 0)

# 6. 2.611(0~0 0)

# 7. 2.235(0~0 0)

      .

      .

      .

6.2.3 Interpretation of 1D 13C Data

The 13C peak list in the NMR data file is interpreted and written into the MDF as a record starting with the keyword “1DC13:”.  Following the keyword are the number of 13C peaks.  The second line is a brief description of the entries in the rest of the lines. Each of the subsequent lines include the peak ID, the chemical shift, and the minimum and maximum numbers of the attached protons of a 13C peak.  If the multiplicity of a peak is unknown, a range of attached protons (i.e., 0 to 3) will be assigned to the carbon.

Another record, starting with the keyword “SYMMETRY:”, describes molecular symmetry of the unknown molecule.  Currently this entry is either “No”, when the number of 13C peaks equals that of carbon atoms, or “PSE” for partial structure elucidation.

Following is a transcript of such records:

1DC13: num.peaks = 21

#Peak, Chem.shift, (Rng.of att.H, i.e., mult.-1)

# 1. 196.06(0~0)

# 2. 145.56(0~0)

# 3. 144.65(0~0)

# 4. 140.75(1~1)

# 5. 123.40(0~0)

# 6. 121.57(0~0)

# 7. 56.28(1~1)

# 8. 53.85(0~0)

      .

      .

      .

 

SYMMETRY: No     

6.2.4 Interpretation of HMQC/HETCOR Connectivities

Each HMQC/HETCOR connectivity in the NMR data file is interpreted as a C-H BC according to the following rules:  

1.      All connectivities are interpreted as a C-H BC of exactly one bond. 

2.      If a 1H peak is found to have no HMQC peak, you will be prompted (as shown below in the dialog box) to supply the type of heteroatom attached to it.  The program then automatically assigns a  heteroatom to the proton and adds a X-H BC (X is the heteroatom) to the list of HMQC-derived C-H ones.  The program first lists all of the 1H peaks without HMQC connectivities, together with the recommended assignment of heteroatoms.  For example: 

If you agree with the H-X assignment, click Yes. Otherwise click No, and you will be prompted to assign heteroatoms to each of the 1H peaks. For example:

The current heteroatoms with attached 1H peaks are numbered and listed in the dialog box.  This is useful when you want to attach more than one 1H peaks to the same heteroatom.  In such a case, you can type a heteratom followed by a number in the list so that the current 1H will be attached to it.  

If you are not sure which kind of heteroatom should be connected to the 1H peak, leave the text field empty or type ‘unknown’.  NMR-SAMS will not attach this proton to any heteroatom.  In such a case, any connectivity information relevant to this proton will be ignored during the subsequent analysis.

The results of interpretation of HMQC connectivities are written into the MDF as a record starting with the keyword “HMQC:”. Following the keyword is a comment, denoting the sequence of the correlated atoms in each bond constraint. Each of the rest of the lines is a C-H bond constraint. Following is a transcript of the record:

 

HMQC: (Node sequence: C-13, H-1)

(3 - 1: 1 ~ 1; 0)Q1

(3 - 2: 1 ~ 1; 0)Q2

(4 - 4: 1 ~ 1; 0)Q3

(6 - 33: 1 ~ 1; 0)Q4

      .

      .

      .

 

6.2.5  Generation of Building Blocks

If the MF is known, this procedure allocates the constituent protons to the heavy atoms based on the 13C multiplicities and chemical valences of the heavy atoms.  The generated building blocks sets must comply with the 13C multiplicities and number of attached 1H peaks to the heteroatoms. Each heavy atom, with its attached protons and unsatisfied valence, is called a building block.  The unsatisfied valence is represented as free bonds. 

If the MF is unknown, carbon building blocks are derived directly from the 13C peaks, with a certain or uncertain number of attached protons depending on the 13C multiplicity is known or unknown.  If some 1H peaks are attached to heteroatom, heteroatom building blocks are also derived.   You can use the Analysis/User-Defined Building Blocks function to edit the building blocks.

The free bonds of different building blocks can be connected to form bonds, as illustrated in Fig. 6.2:

Figure. 6.2 Examples of structural building blocks and bond formation between them. 

 

The resulting building blocks are written in the MDF as a record starting with the keyword “FRAG_SET:”.  The following is a transcript of such a record:

FRAG_SET:

#1:   C  C  CH2 CH1 C  CH1 CH1 CH1 CH1 C 

      C  C  CH2 CH1 CH2 C  CH2 CH2 CH2 CH2

      CH3 CH2 CH2 CH2 CH3 CH2 CH3 CH3 CH3 CH3

      O  OH1 OH1

After the building blocks are generated, the first set of building blocks are displayed. If there are multiple building block sets, a Building Block Browser is displayed (as shown below) which allows  you to browse through each building block set by moving the slider. 

Multiple sets of building blocks are generated when either one of these conditions prevail: some or all the 13C multiplicities are unknown, or there are different kinds of heteroatoms with attached protons.  NMR-SAMS can use multiple sets of building blocks for structure generation, but it only uses the first one for target structure-based resonance assignment.  So wherever possible you are advised to delete the undesired ones.

To remove the building block set which is being displayed, click Delete in the Building Block Browser.  To select the displayed building block set as the only one for structure generation, click Select in the Building Block Browser, and the rest of the building block sets will be removed.

Note: In the case of a 13C peaks with unknown multiplicity, NMR-SAMS will try to enumerate all possible numbers of attached protons for its corresponding building block if MF and 13C spectral data are used.  If this is not possible (e.g. when MF is unknown, or there are fewer 13C peaks than carbon atoms), NMR-SAMS will generate a building block with unknown number of attached protons, such as ‘CH?’.  Such a building block will be forced to be ignored during the subsequent structure generation.

Possible Errors:

·        If no valid building block set is generated, you have to check the MF, 13C multiplicities, and the valence of the atoms.

·        The maximum number of building block sets is set to 500.  If it exceeds this number, the remaining ones are ignored.  In such a case, use 13C multiplicities to constrain the generation of building blocks.

6.3 User-Defined Building Blocks

Command: Analysis/User-Defined Building Blocks.

Description: Whether the MF is known or unknown, this option allows you to add, delete, or modify the building blocks.

To add a building block, select Add, and type the element symbol after Element. Select Ignored Atom if you want to ignore it in structure generation (see Section 7.1 for details regarding Ignored Atoms).  After Proton Count, select the correct number of attached protons.  If unknown, select Unknown.  The default valence will appear after Valence, although you can select a different one.  If you type “C” after Element, you will be able to check Assigned C-13 Shift and type a 13C chemical shift for it.  If the Proton Count is bigger than zero, you will be able to check Assigned H-1 Shift, and type one or two 1H chemical shifts for the protons.  When entering multiple proton shifts use a blank space as a delimiter. Then click at an empty place in the main graphics window, a building block with the defined attributes will be added.

You can copy the attributes from an existing building block by clicking on that building block while keeping the Ctrl key pressed.

Note:  There are some limitations on the use of the Add building blocks.  The newly added carbon building blocks will be ignored (i.e., not used for bond formation during the structure generation). Any building blocks that have unknown number of attached protons will be ignored.  Finally, the chemical shifts of the added building blocks are only for cosmetic purpose, i.e., they will not be evaluated during the subsequent analysis although they are always displayed.

To modify a building block, check Modify in the palette if it is not checked.  Next copy the attributes from that building block by clicking on it while pressing the Ctrl key.  Then change the corresponding attributes in the palette. Finally click on the building block again (without pressing the Ctrl key) and the building block will be modified accordingly.

Tip:  To modify a non-ignored building block as an ignored one (or vise versa), you do not need to copy all attributes before modifying it.  Just set the option Ignored Atom as required, and click on that building block.  The first time you click it will only toggle the ‘Ignored Atom’ state, if the required value is different from the current state of the building block.  If you want to change other attributes also, click it again and all other attributes will be modified according to those specified.

Note you can not modify all attributes except Ignored Atom and Proton Count for a carbon building block derived from 13C data.

To delete a building block, check Delete in the palette if it is not checked.  Then click on the building block you want to delete.

Note that you can not delete a carbon building block that was derived from a 13C peak. 

Results:  The modified building blocks are written in the MDF as a record starting with the keyword “FRAG_SET:”.  The original record is overwritten. 

 

6.4 Interpretation of 2D Spectral Data as Bond Constraints

Command: Analysis/Bond Constraints.

Description:  This procedure interprets the COSY, HMBC, NOESY, and INADQUATE spectral data in the .nmr data file to define bond constraints.  Then the various bond constraints are unified, and atom-atom connection matrix is setup for subsequent structure generation or resonance assignment.

Parameters: The relevant parameters for interpreting the 2D spectral data can be accessed from the dialog box shown below by choosing Edit/Parameters/NMR Interpretation. For explanation of the parameters, see Section Parameters for Spectral Interpretation in Appendix IV.

The relevant parameters for setting up the ACMX can be accessed from the dialog box shown below by choosing Edit/Parameters/Setting up ACMX. For explanation of the parameters, see Section Parameters for Setting Up ACMX in Appendix IV.

Results:  In the next few sections the results of this procedure are described for each type of spectral data. 

6.4.1 Interpretation of COSY Connectivities

The results of COSY interpretation are written into the MDF as a record starting with the keyword “COSY:”, which can be edited by choosing Edit/Master Data File.  Each COSY connectivity in the NMR data file is first classified as due to either potential long-range coupling or short-range coupling.  Based on that, a H-H BC is assigned to it.  The rules for this step are described below:  

1.      If the intensity level is weak (represented as “1”), it is treated as due to potential long-range coupling.

2.      If the intensity level is medium, strong (represented as “2” or “3”, respectively), or blank, it is treated as due to short-range coupling.

3.      If the intensity level is unknown (represented as “0”), then the J-coupling constant is used to classify short-range and long-range couplings.  If the J-coupling constant is also unknown (represented as 0.0), then an error message will be displayed and the interpretation is aborted.  If the J-coupling constant is defined as J Hz, it is compared with the parameter COSY_J_CATEG (which is set as 3.0 by default).  All connectivities that have J £ COSY_J_CATEG are treated as due to potential long-range coupling, and the rest as  short-range coupling. 

4.      When a connectivity is classified as due to short-range coupling, and has a correlated singlet 1H peak, then NMR-SAMS prompts you to confirm whether it is due to long-range coupling.  If you click Yes, it is classified as a long-range coupling, otherwise (for selection No) it remains as a short-range coupling. 

5.      If you like, you can active a check of possible long-range coupling based on 1H chemical shift.  To do this, select Edit/Parameters/NMR Data Interpretation, and add a proper value (e.g. 4.5) after Minimum H-1 Shift for Checking Long-Range H-H Coupling.  This checking is turned off by default (i.e., value set as 0).

6.      By default all connectivities due to short-range coupling are interpreted as H-H BCs with 2 to 3 intervening bonds.  By default all connectivities due to long-range coupling are interpreted as a H-H BCs with 3 to 5 intervening bonds.  The number of intervening bonds is controlled by the  parameter, COSY_BC.

7.      The bond types of the intervening bonds are always set as unknown (0).  The number of sub-bond constraints (NSBC) that must satisfy a BC, minNSBC  and  maxNSBC, are determined as follows:

minNSBC = 1 if  P ³ RELIAB_PEAK_PROB, or

minNSBC = 0 if  P < RELIAB_PEAK_PROB, and

maxNSBC  = n1 ´ n2 ,

where P is the reliability of the connectivity, and n1 and n2 are the number of correlated 1D peaks in each dimension, respectively.  The default value of the parameter, RELIAB_PEAK_PROB is set as 0.50.  For example, the following connectivity is due to an “unreliable” DQF-COSY peak since the reliability is 0.4: 

#8 (2 - 5 6) 3 0.00 0.4 ;unreliable, may be an artifact

So this connectivity is interpreted as the following H-H BC:

(2 - 5 6: 2 ~ 3; 0; 0 ~ 2)C8

which means that this BC is flexible enough to be considered as satisfied if none, one, or both of the proton pairs (i.e. H2-H5 and H2-H6) have a bond separation of two or three bonds in the generated structure.  

7.      If two 1H peaks are very close and no COSY peak is observed between them, you are alerted to check if any near-diagonal peak has been neglected between them.  If you are not sure about this,  the program allows you to add a "pseudo bond constraint" for this proton pair.  The tolerance for checking near-diagonal COSY peaks is controlled by a parameter called COSY_DIAG_RESO, and its default value is 0.02ppm.  You can change this by selecting Edit/Parameters/NMR Interpretation. The pseudo BC here is used to prevent two atoms from being forbidden to connect while setting up the ACMX.

The results of COSY interpretation are written into the MDF as a record starting with the keyword “COSY:”. Following the keyword is a comment, denoting the parameters used for the interpretation. Each line thereafter is a H-H bond constraint. Following is a transcript of the record:

COSY: (COSY_BC = 3 5 2 3; COSY_DIAG_RESO = 0.020)

(1 - 2: 3 ~ 5; 0; 1 ~ 1)C1

(1 - 12: 3 ~ 5; 0; 1 ~ 1)C2

(2 - 12: 3 ~ 5; 0; 1 ~ 1)C3

(3 - 7 8: 2 ~ 3; 0; 1 ~ 2)C4

      .

      .

6.4.2 Interpretation of HMBC/COLOC Connectivities

Each HMBC/COLOC connectivity list in the NMR data file is interpreted as a C-H BC according to the following rules:  

1.      Each connectivity is interpreted as a C-H BC of a certain range of intervening bonds based on the intensity level of the peak and the relevant parameters.  

2.      The bond types of the intervening bonds are always set as unknown (0).  The number of sub-bond constraints (NSBC) that must satisfy a BC, minNSBC  and  maxNSBC, are determined as follows:

minNSBC = 1 if  P ³ RELIAB_PEAK_PROB, or

minNSBC = 0 if  P < RELIAB_PEAK_PROB, and

maxNSBC  = n1 ´ n2 ,

where P is the reliability of the connectivity, and n1 and n2 are the number of correlated 1D peaks in each dimension, respectively.   The default value of the parameter, RELIAB_PEAK_PROB is set as 0.50.  For example, the following connectivity is due to an “unreliable” HMBC peak because its reliability is 0.4:  

#3 (10 - 8) 3 0.00 0.4 ;very weak, may be an artifact

So this connectivity is interpreted as the following C-H BC:

(10 - 8: 2 ~ 3; 0; 0 ~ 1)B3

The last two numbers, 0 and 1, mean that bond separation between C10 and H8, can either satisfy or violate this BC in the generated structure. 

The results of interpretation of HMQC connectivities are written into the MDF as a record starting with the keyword “HMBC:”. Following the keyword is a comment, denoting the parameters used for interpretation and sequence of the correlated atoms in each bond constraint. Each line thereafter is a C-H bond constraint. Following is a transcript of the record:

HMBC: (HMBC_BC = 2 3, Node sequence: C-13, H-1)

(1 - 6: 2 ~ 3; 0; 1 ~ 1)B1

(1 - 7 8: 2 ~ 3; 0; 1 ~ 2)B2

(1 - 13: 2 ~ 3; 0; 1 ~ 1)B3

(1 - 15: 2 ~ 3; 0; 1 ~ 1)B4

      .

      .

      .

6.4.3 Interpretation of NOESY Connectivities

A NOESY connectivity in the NMR data file is always interpreted as a H-H BC of 2 to 6 bonds. NOESY is useful to NMR-SAMS only when you opt to use the negative information of COSY together with NOESY.  For example, if there is neither a COSY nor a NOESY peak observed between two carbon atoms then this pair is forbidden to connect (see the usage of parameter IDEAL_COSY in Appendix IV).  In the current version of NMR-SAMS, the through space NOESY correlations are not used as bond constraints during structure elucidation. 

The results of interpretation of NOESY  connectivities are written into the MDF as a record starting with the keyword “NOESY:”. Following the keyword is a comment, denoting the parameters used for interpretation and sequence of the correlated atoms in each bond constraint. Each of the rest of the lines is a H-H bond constraint. Following is a transcript of the record:

NOESY: (NOESY_BC = 2 6 0, Node sequence: H-1, H-1)

(1 - 2: 2 ~ 6; 0; 1 ~ 1)N1

(1 - 3: 2 ~ 6; 0; 1 ~ 1)N2

(1 - 12: 2 ~ 6; 0; 1 ~ 1)N3

(2 - 12: 2 ~ 6; 0; 1 ~ 1)N4

(3 - 7 8: 2 ~ 6; 0; 1 ~ 2)N5

      .

      .

      .

6.4.4  Interpretation of INADEQUATE Connectivities

Each INADEQUATE connectivity in the NMR data file is interpreted as a C-C BC according to the following rules:  

1.      Each connectivity is interpreted as a C-C BC of one intervening bond by default.  The number of intervening bonds are controlled by the first two values of the parameters INAD_BC. 

2.      The bond type is controlled by the third value of the parameter INAD_BC, and by default is defined as unspecified (i.e., unknown).  This can be changed to single, double, or triple.  For example, if an INADEQUATE experiment is optimized to manifest only single C-C bond, you can set the third value of INAD_BC as 1, so that all of the connectivities are interpreted as C-C single bonds.  This will improve the efficiency of the structure generation since NMR-SAMS will not consider the other possibilities of these bonds.

3.      The number of sub-bond constraints (NSBC) that must satisfy a BC, minNSBC  and  maxNSBC, are determined as follows:

minNSBC = 1 if  P ³ RELIAB_PEAK_PROB, or

minNSBC = 0 if  P < RELIAB_PEAK_PROB, and

maxNSBC  = n1 ´ n2 ,

where P is the reliability of the connectivity, and n1 and n2 are the number of correlated 1D peaks in each dimension, respectively.   The default value of the parameter RELIAB_PEAK_PROB is set as 0.50.  For example, the following connectivity is due to an “unreliable” INADEQUATE peak since its reliability is set as 0.4: 

#18 (9 10 - 28) 3 0.0 0.4 ;C9 and C10 too close to resolve

This connectivity is interpreted as the following C-C BC:

(9 10 - 28: 1 ~ 1; 0; 0 ~ 2)B3

which means that this BC is flexible enough to be considered as satisfied if either none, one, or both of carbon pairs (i.e. C9-C28 and C10-C28) have a bond separation of one bond in the generated structure.     

The results are written into the MDF as a record starting with the keyword “INADEQUATE:”. Following the keyword is a comment, denoting the parameters used for interpretation. Each line thereafter is a C-C bond constraint. Following is a transcript of the record:

INADEQUATE: (INAD_BC = 1 1 0)

(2 - 1: 1 ~ 1; 0; 1 ~ 1)I1

(4 - 3: 1 ~ 1; 0; 1 ~ 1)I2

(5 - 4: 1 ~ 1; 0; 1 ~ 1)I3

(6 - 5: 1 ~ 1; 0; 1 ~ 1)I4

      .

      .

      .

6.4.5 Transformation of Bond Constraints

After interpreting the various 2D spectral data as bond constraints, this procedure transforms the various kinds of BCs into a homogenous set of C-C (or heteroatoms) BCs based on the HMQC-derived C-H BCs. The following rules are used:

1.      An INDEQUATE-derived C-C BC remains unchanged.

2.      The correlated 1H peaks in a DQF COSY-derived H-H BC is replaced by their correlated 13C peaks in HMQC, and the bond separation is reduced by 2.

3.      The correlated 1H peak(s) in an HMBC-derived C-H BC is replaced by their correlated 13C peaks in HMQC, and the bond separation is reduced by 1.

4.      The correlated 1H peaks in a NOESY-derived H-H BC is replaced by their correlated 13C peaks in HMQC, and the bond separation is reduced by 2.

5.      If a degenerate 1H peak has multiple correlated 13C peaks, pseudo C-C BCs are added between these 13C peaks.  The pseudo BC is used to prevent the two atoms from being forbidden to connect while setting up the ACMX.

Note: A degenerate 1H peak has multiple correlated 13C peaks in HMQC unless they arise from geminal protons. If a certain BC involves such a 1H peak, all correlated 13C peaks are included in the resulting C-C BC. So additional ambiguity is introduced to the resulting C-C BC.  In such a case, NMR-SAMS can use such ambiguous BCs for structure generation.

6.      The source of the relevant BCs are included as comments in the resulting C-C BC so that you can keep track of the various connectivities from which a C-C BC is derived.

Fig. 6.3 illustrates the transformation of an ambiguous COSY BC into C-C BC.  The ambiguity arises from the overlapping peaks of H8 and H9.

Figure 6.3 Illustration of the transformation of a DQF-COSY-derived H-H BC into a C-C BC based on the relevant HMQC connectivities.  The two protons in the circle can not be resolved in the DQF-COSY spectrum, thus introducing ambiguity in the resultant C-C BC.  For details about the format of the bond constraints, please refer to Section 3.4.

All resultant C-C BCs are cross-checked for mutual consistency.  If two BCs have the same relevant nodes, they are merged according to the following rules:

·        If all entries are identical except the source, their sources are merged.

·        If the ranges of bond separation, minBond and maxBond, are different and an intersection is possible,  then the intersection of the two ranges is adopted. Otherwise NMR-SAMS will prompt  you to supply a valid minBond and maxBond.  For example, if one BC requires a bond separation of 1 to 3 bonds, and the other, 1 to 1 bond, then the intersection, 1 to 1 bond (i.e., exactly one bond), is adopted for the merged BC.  On the other hand, if one BC requires a bond separation of 2 to 3 bonds, and the other, 1 to 1 bond, then the following message (as shown below) will prompt you to enter the proper bond separation because no intersection is possible between the two BCs.

In this example, type “1 1” if  you are sure  it is a vicinal coupling, or “1 3” if you are not.

·        Similar to bond separation, if the ranges of NSBC, minNSBC and maxNSBC, are different, the intersection of the two ranges is adopted whenever an intersection is possible. Otherwise you will be prompted with a similar message as above to supply a valid range for minNSBC and maxNSBC.

·        If  the bond types are different, then NMR-SAMS adopts the higher bond order ( the order of priority is: triple , double, single and unknown). 

Note:   Most of the BCs can be combined with other BCs (e.g., a COSY BC with an HMBC one) except  NOESY BCs,  which are treated differently.  NOESY BCs can be combined only with other NOESY BCs concerning the same 13C signals.

Results: The results are written into the MDF as a record starting with the keyword “C13~~C13:”. Following the keyword are some comments which are internally used by the program (Note: you must not change these comments). Every line thereafter represents a C-C bond constraint.  For details regarding the format of bond constraints, see Section 3.4.  Following is a transcript of the record:

 

C13~~C13: COSY-Y, NOESY-Y, HMBC-Y, INAD-N (Node sequence: C-13, C-13)

(3 - 25: 1 ~ 2; 0; 1 ~ 1)C2Q1Q27C3Q2Q27B13Q27B114Q1B115Q2

(9 - 15 19: 1 ~ 1; 0; 1 ~ 2)C4Q7Q11Q17B46Q11Q17

(9 - 8: 1 ~ 1; 0; 1 ~ 1)C5Q7Q6B39Q7B48Q6

      .

      .

      .

 

Tips: Running NMR-SAMS and SpecMan side-by-side provides a convenient way to inspect the original cross peaks when a bond constraint is mentioned in a dialog box, or when you are editing the bond constraints in the MDF.  Fig. 6.4 illustrates how to keep track of the cross peaks from which a bond constraint is derived.

Figure 6.4  Schematics representing the way to keep track of the cross peaks from which a bond constraint (BC) was derived.  Run NMR-SAMS and SpecMan side-by-side. From the comment field of the BC (which you are verifying), find the code of connectivities from which the BC was derived (“C3+66”, “Q18”, and “Q28” in this example). This means that this BC was derived from COSY peaks #3 and #66, and HMQC peaks #18 and #28. With SpecMan, load the COSY peaks table and then click the IDs of one of these cross peaks.  Upon clicking the IDs, SpecMan displays the cross peaks in the 2D spectral window.

 

6.4.6 Setting up Atom-Atom Connection Matrix (ACMX)

After the user selects Analysis/Bond Constraints, Analysis/User-Defined Bond Constraints, or Analysis/User-Defined Environment Constraints, NMR-SAMS tries to generates an ACMX for each building block set based on the available building blocks, bond constraints, and environment constraints. NMR-SAMS uses atom-atom connection matrix (ACMX, also known as free bond connection matrix) to represent the bonding possibilities between the constituent heavy atoms of the unknown molecule.   By default, the unambiguous bond constraints (which define one bond between exactly two atoms) are treated as fixed bonds, and the rest are used as constraints during the subsequent structure generation.

If there is only one set of building blocks, NMR-SAMS automatically forms some common functional groups based on 13C chemical shifts and elemental composition while setting up the ACMX.  These functional groups include >C=O, -COO-, -COOH, -COON<, -COONH-, -NO2, -OSO3Hn (n = 0 or 1), and -OPO3Hn (n £ 0, 1, or 2).  Sometimes these automatically added functional groups are not reliable so you are advised to check and modify them if necessary (see Section 7.2).

Results: For each building block set, a record starting with the keyword “ACMX: #x:” (where x is the sequential number of the ACMX) is written in the MDF.  The following is a transcript of such a record:

 

ACMX: #1:

(HETCON_FLAG = 0, CCBOND_FLAG = 1 1 1,  BC_WEIGHT = 48,

IDEAL_COSY = 1, H1MULT_FLAG = 1, MAX_GEN_ANBC = 3, FIX_BOND_FLAG = 1)

# 1. 6 0  0  1  1    1  0 2 1  0 1 0    3 31 31 32      0

# 2. 6 0  0  2  2    4  0 2 0  0 1 0    0               0

# 3. 6 2  0  3  3    2  0 2 0  0 1 0    0               0

       .

       .

       .

 

After setting up ACMX, the first building block set is displayed along with the fixed bonds, if any.  If there are multiple ACMXs, a Building Block Browser is displayed.  This browser enables browsing through the building block sets.  By default, atoms with satisfied valences are displayed in gray, and the ones with free bonds are displayed in blue and marked by an asterisk ( “*”).  Bonds of unspecified type are displayed as dashed lines. You can select Display/Display Options/Show Disconnectivities to highlight the atoms that can not be connected to a certain atom when you click it. You can also select Display/Display Options/Connection Table to display a Connection Table.  The Connection Table lists building blocks, their associated chemical shifts, and the current bond constraints and environment constraints (see Chapter 10).

The ACMXs are not displayed but can be viewed in the MDF by using Edit/Master Data File.

Possible Errors: Depending on the situation, the following potential error messages appear during the setup of ACMX:

·        Too many fixed bonds for a certain atom.  This means, either a long-range coupled COSY peak was mistakenly interpreted as a vicinal one, or the valence of this atom was set wrong.  In the former case, mark the long-range COSY connectivities in the .nmr file (see Section 6.4.1) and choose Analysis/Bond Constraints again.   In the latter case, modify the valence of this atom according to Section 4.4.

·        Too many double bonds for a certain atom.  The minimum and maximum number of attached double bonds of each atom are determined during the interpretation of the MF (see Section 4.4). If this happens, you can modify the corresponding entries and repeat this step.

·        Too many triple bonds for a certain atom. The minimum and maximum number of attached triple bonds of each atom are determined during the interpretation of the MF (see Section 4.4).  If this happens, you can modify the corresponding entries and repeat this step.

·        Too many free bonds.  The number of free bonds, n_free_bond, can be calculated as follows:

n_free_bond = Svalence - SH - 2 ´ Sfixed_bond

where Svalence, SH, and Sfixed_bond are, respectively, the sums of valences of the heavy atoms, the constituent protons, and the fixed bonds (double and triple bonds multiplied by 2 and 3 respectively).  n_free­_bond is one of the major factors that determines the complexity of the structure generation problem.  The current upper limit of the free bonds is 220.  If n_free_bond overflows, you can manually add some known bonds in a record starting from the keyword “ATOM~~ATOM:” in the MDF to reduce the free bonds (see Section 7.2).

 


Chapter 7

2D Structure Generation

7.1 Overview

This chapter describes the 2D structure generation of NMR-SAMS.  The structure generation of NMR-SAMS starts from an ACMX described in the previous chapter.  Usually, before structure generation, you add some known bonds, edit the fixed bonds derived by the program, add some environment constraints, and check the parameters for structure generation.  Next the structure generator of NMR-SAMS assembles the building blocks into complete structures that are compatible with all available spectral and chemical constraints. 

The structure generation is based on heteroatoms and the carbon atoms labeled by 13C chemical shifts. Depending on the number of observed 13C peaks, you can either perform complete structure elucidation or partial structure elucidation.  In some cases, such as a symmetric molecule or when the 13C spectrum shows severe overlap, partial structure elucidation is performed based on the limited carbon atoms labeled by the well-resolved 13C chemical shifts, as well as the constituent heteroatoms.  The remaining carbon atoms, called ignored atoms