Spectrum Research, LLC.

 

 

 

 

 

NMR-SAMS User’s Guide

 

An expert system for computer-assisted structure elucidation

of organic and natural product compounds based on multidimensional spectroscopy

 

 

 

 

 

 

 

 

 

 

 

 


 

NMR-SAMS User’s Guide, Version 2.4

 

This manual describes release 2.4 of the Windows 95/98/2000/NT4.x version of NMR-SAMS™.

 

Copyright Notice

Copyright © 1996 through 2001, Spectrum Research, LLC.  All rights reserved.

No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form by any means without the written permission of Spectrum Research, LLC.  

 

All possible care has been taken in the preparation of this document but Spectrum Research accepts no liability for any errors/omissions that may be found.

 

Spectrum Research, LLC. reserves the right to change the information in this document without prior notice.

 

Trademarks

SpecManTM and NMR-SAMSTM are trademarks of Spectrum Research, LLC.

 

Acknowledgments

NMR-SAMSTM (originally known as CISOC-SES) has been developed by Dr. Shengang Yuan, Dr. Chen Peng and Prof. Chongzhi Zheng at the Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, P.R. China, 1988-1994.  It has been further improved by Dr. Chen Peng in the group of Dr. Geoffrey Bodenhausen at the National High Magnetic Field Laboratory in 1995-1996.  Portions of NMR-SAMSTM are copyright © 1988 through 1995, Shanghai Institute of Organic Chemistry and Florida State University, and are exclusively licensed to Spectrum Research, LLC.  Title and full ownership rights to the converted/modified NMR-SAMSTM will remain solely with Spectrum Research, LLC, and NMR-SAMSTM is asserted to be Spectrum Research’s proprietary information and trade secret.

 

Credits

If the results (figures and/or data) obtained by NMR-SAMSTM are used for publication purposes, please refer to NMR-SAMSTM in the following manner or any other equivalent form:

" NMR-SAMSTM software, developed by Spectrum Research, LLC., was used to compute the results in this publication".

 

 

 


Table of Contents

Table of Contents. i

Abbreviations And Acronyms. i

Introduction. 1

1.1 General 1

1.2 Application Limitations. 3

1.3 System Requirement 4

1.4 Help Facility. 4

1.5 Typographical Conventions. 4

1.6 A Note on Operating Systems. 5

Getting Started with NMR-SAMS. 6

2.1 Installation of the Program.. 6

2.2 Spectrum Research Licensing. 6

2.3 Starting NMR-SAMS. 7

2.4 Brief Introduction to Microsoft Windows. 8

2.5 Description of the Main Menus. 9

2.6 The NMR-SAMS Toolbar 11

Understanding NMR-SAMS. 12

3.1 Overview.. 12

3.2 General Procedure of Structure Elucidation with NMR-SAMS. 12

3.3 What Spectral Data Does NMR-SAMS Use?. 13

3.4 Use of 2D NMR Connectivities: Bond Constraints. 13

3.5 Use of Chemical Shifts And Peak Multiplicities. 15

3.6 Structure Generation. 15

3.7 User Intervention. 16

3.8 Control Parameters. 17

Working Data Set 18

4.1 Overview.. 18

4.2 Open An Existing Working Data Set 18

4.3 Opening A New Working Data Set 20

4.4 Input Molecular Formula. 21

4.5 Save A Working Data Set 23

4.6 Save A Working Data Set as Different Name. 23

4.7 Exiting NMR-SAMS. 23

Input of NMR Spectral Data. 25

5.1 Overview.. 25

5.2 Conversion of SpecMan 1H Peak List 25

5.3 Conversion of SpecMan 13C Peak List 28

5.4 Conversion of SpecMan DQF-COSY Peaks Table. 31

5.5 Conversion of SpecMan HMQC/HETCOR Peaks Table. 36

5.6 Conversion of SpecMan HMBC/COLOC Peaks Table. 37

5.7 Conversion of SpecMan NOESY Peaks Table. 39

5.8 Conversion of SpecMan INADEQUATE Data. 39

5.9 Manual Peak Picking. 40

Spectral Interpretation. 42

6.1 Overview.. 42

6.2 Interpretation of MF, 1H, 13C and HMQC Data as Building   Blocks. 43

6.1.1.     Interpretation of Molecular Formula. 43

6.2.2.     Interpretation of 1D 1H Data. 43

6.2.3.  Interpretation of 1D 13C Data. 44

6.2.4.  Interpretation of HMQC/HETCOR Connectivities. 44

6.2.5.  Generation of Building Blocks. 45

6.3 User-Defined Building Blocks. 47

6.4  Interpretation of 2D Spectral Data as Bond Constraints. 49

6.4.1.  Interpretation of COSY Connectivities. 50

6.4.2.  Interpretation of HMBC/COLOC Connectivities. 51

6.4.3.  Interpretation of NOESY Connectivities. 52

6.4.4.  Interpretation of INADEQUATE Connectivities. 52

6.4.5.  Transformation of Bond Constraints. 53

6.4.6.  Setting up Atom-Atom Connection Matrix (ACMX). 56

2D Structure Generation. 58

7.1 Overview.. 58

7.2 User-Defined Bond Constraints. 59

7.2.1.  Interactive Structure Generation. 62

7.3 User-Defined Atom Environment Constraints. 63

7.4 Structure Generation. 65

Resonance Assignment 70

8.1 Overview.. 70

8.2 Input of the Target Structure. 70

8.2.1.  Building a Target Structure in NMR-SAMS.. 71

8.2.2.  Importing a Target Structure. 72

8.2.3.  Setting up the Assignment Matrix. 73

8.3 User-Defined Resonance Assignment 74

8.4 Resonance Assignment 74

Quick Enumeration/Elucidation. 78

9.1 Overview.. 78

9.2 MF-Based Structure Generation of Virtual Compounds. 78

9.3 Quick Structure Elucidation. 79

Graphical Display of Results. 80

10.1 Overview.. 80

10.2 Display of Structural Building Blocks. 80

10.3 Display of Target Structure. 81

10.4 Display of Generated Structures/Assignments. 81

10.5 Status Window.. 82

10.6 Display Options. 82

10.7 Editing the Display of Generated Structures. 83

Exporting Results. 85

11.1 Overview.. 85

11.2 Exporting NMR Spectral Data. 85

11.3 Exporting Resonance Assignment 86

11.4 Exporting Candidate or Target Structures. 87

NMR Data File. 88

1D Spectral Data. 88

2D Spectral Data. 88

Master Data File. 90

CCSS-13C Chemical Shift Range Correlation Table. 92

Control Parameters. 94

Parameters for Spectral Interpretation. 96

Parameters for Setting up ACMX.. 97

Parameters for Structure Generation. 99

References. 103

Index. 104


Abbreviations And Acronyms

d13C                         13C chemical shift.

d1H                          1H chemical shift.

1D                           One-dimensional.

2D                           Two-dimensional.

ACMX                   Atom-atom Connection MatriX, which summarizes the bond-formation probabilities between the constituent atoms of an unknown.

BB                           Structural Building Blocks for structure generation, e.g.,  CH3-, CH2-, and -OH.

BC                           Bond Constraint derived from 2D NMR spectral data, which defines the number of intervening bonds between the correlated spins.

CCSS                      Carbon-Centered Single-spherical Substructure.

COLOC                  COrrelation via Long-range Coupling, a kind of 2D spectrum that provides 2-to-3-bond 13C-1H connectivities.   

COSY                     COrrelated SpectroscopY, a kind of 2D spectrum that provides 1H-1H through-bond connectivities.

CPU                        Central Processing Unit.

DEPT                      Distortionless Enhancement by Polarization Transfer, a kind of 1D spectra that provides information concerning the number of attached protons on each carbon atom.

EC                           Environment Constraint, limitation on the neighboring types of atoms attached to a central atom specified by the user. 

HETCOR                HETeronuclear Correlation, also called C-H COSY, a kind of 2D spectrum that provides one-bond 13C-1H connectivity information.  

HMBC                    Heteronuclear Multi-Bond Connectivity, a kind of 2D spectrum that provides 2-to-3-bond 13C-1H connectivity information.

HMQC                   Heteronuclear Multiple Quantum Coherence, a kind of spectrum that provides one-bond 13C-1H connectivity information.

INADEQUATE     Incredible Natural Abundance Double Quantum Transfer Experiment, a kind of 2D spectrum that provides one-bond 13C-13C connectivity information.

MDF                       The Master Data File produced while using NMR-SAMS for structure elucidation. This file stores the intermediate and final results produced during the execution of NMR-SAMS.

MF                          Molecular formula or empirical formula of a molecule, which is usually derived from mass spectral data.

NMR                      Nuclear Magnetic Resonance

NOESY                   Nuclear Overhauser enhancement and Exchange SpectroscopY, a kind of 2D spectrum that provides 1H-1H through-space connectivity information.

NSBC                     Number of “Sub-bond constraint(s)”, or pair(s) of relevant atoms, that must satisfy a bond constraint in the generated structure.

PSE                         Partial Structure Elucidation.  Structure elucidation based on information available on a portion of the spectral data, which is usually the well-resolved part 


Chapter 1

Introduction

1.1 General

NMR-SAMS (NMR Spectral Assignment Made Simple) is an expert system for computer-assisted structure elucidation of unknown organic or natural product compounds from multidimensional spectroscopy (e.g., MS, NMR, IR and UV) providing complementary information of chemical compounds.  In particular, NMR-SAMS uses information of chemical compounds from routine 1D and 2D NMR spectroscopy.   Together with SpecMan, it serves as a chemist’s workbench for de novo structure elucidation of small molecules such as organic compounds, natural products, peptides, and other small biomolecules.  NMR-SAMS is also used for automated resonance assignment of known compounds.   

 

The basic strategy of structure elucidation using NMR-SAMS is illustrated in Fig. 1.1. When dealing with an unknown compound, the molecular formula (MF) must first be determined by mass spectroscopy or another approach.   Next, the 1D and 2D NMR chemical shifts, multiplicities, J-couplings and intensities are extracted from the processed 1D and 2D spectra (transformed through conventional FFT or Non-FFT techniques) using SpecMan.  The 1D and 2D spectral data extracted as peak lists using SpecMan are imported into NMR-SAMS, and interpreted as structural building blocks and bond constraints based on one-bond, two-bond and other long-range connectivities.  Finally, the building blocks, NMR-derived bond constraints, and other user-defined bond constraints are used to generate plausible candidate structures with resonance assignments.  If the structure is already known, the user can specify the proposed structure and let NMR-SAMS complete the resonance assignments directly.   

 

Figure 1.1. Data flow diagram of NMR-SAMS representing the different phases of spectral interpretation, structure generation and resonance assignment. Gray boxes represent optional input data.  PSE: means partial structure elucidation based on incomplete spectral data. A bond constraint is represented as n intervening bonds, (B)n, between the correlated atoms.

NMR-SAMS has the following main features:

 

·         Input of peak tables with chemical shifts, multiplicities, J-couplings and intensities, from a variety of 1D and 2D NMR experiments.

·         Automated interpretation, bookkeeping, and crosschecking of spectral data with respect to the molecular formula.

·         Novel representation of 2D NMR correlation information based on the concept of chromatic graph.

·         Structure determination and identification of unknown compounds based on complete utilization of 2D NMR correlation information and complementary spectral information from MS, UV and IR spectral data. 

·         Partial structure elucidation of compounds based on incomplete spectral data.  

·         Graphical tools for interactive building and editing of molecular fragments, and for defining bond constraints and atom environment constraints. Graphical tools to display and browse through candidate structures and sub-structures.  Graphical interaction between structures and bond constraints.

·         Background information-independent structure elucidation, which minimizes the potential human bias introduced into the structure elucidation process.

·         Fast structure generation of complex molecules when sufficient constraints are available. 

·         Fast resonance assignment and structure verification of large complex molecules based on proposed structures.  

·         Automated resonance assignment based on assigned resonances of compounds.

·        Flexible format for report generation of the results of spectral and structural analysis. 

1.2 Application Limitations

The current version of NMR-SAMS can only handle molecules that have less than 128 non-hydrogen atoms. The total number of free bonds (unsatisfied valences) of the structural building blocks before structure generation, which determines the complexity of the problem of structure generation, must not exceed 220 (The total number of free bonds is equal to the sum of valences of heavy atoms, less the number of protons and twice the number of known bonds.).  The maximum number of peaks in a 1D and 2D spectrum is limited to 200 and 1000 respectively.  The maximum number of bond constraints is limited to 1000.

 

Most of the previously proposed CASE (computer assisted structure elucidation) systems either use a chemical shift-substructure correlation database or a more concise chemical shift-substructure correlation model, and rely to a large extent on the knowledge of a human expert.  Such systems have been limited to very simple and small molecules.  NMR-SAMS has demonstrated the impact of using 2D NMR correlation information on improving the efficiency of CASE systems when dealing with real-world complex molecules.  For efficient structure elucidation of unknown compounds, NMR-SAMS requires the molecular formula (which may or may not be known accurately from MS or other methods.  If the molecular formula is unknown, NMR-SAMS uses the number of observed carbon and proton peaks along with any available heteroatoms information to estimate the molecular formula), 1D 1H, 13C, DEPT (or APT), and 2D DQF-COSY, HMQC (or HETCOR), HMBC (or COLOC, FLOCK), and INADEQUATE spectral data.  It is not mandatory to have all of these experimental NMR data sets available, because NMR-SAMS can also solve structure elucidation problems with different possible combinations of experimental data (for details refer to Section 3.3).  Structure elucidation based on 1D 13C chemical shifts is only possible for very simple molecules, and is not practical for complex molecules.  NMR-SAMS cannot elucidate unknown structures based solely on 1D 1H chemical shifts.

 

Although most spectra used by NMR-SAMS, e.g., 1D 1H, 2D DQF-COSY and HMBC, are allowed to have peak degeneracy, the 1D 13C spectrum and HMQC (or HETCOR) must be completely resolved for complete structure elucidation.  If severe overlap prevents resolving all of the 13C peaks, NMR-SAMS will use only the well-resolved spectral data to generate the plausible substructures.  This is called partial structure elucidation (PSE).  Some limitations on PSE are described in Section 7.1.

 

In the current version, NMR-SAMS does not consider molecular symmetry, so partial structure elucidation is performed for a molecule with global symmetry.  For a molecule with local symmetry where the 13C signals corresponding to symmetric carbons can be identified, complete structure elucidation by NMR-SAMS is possible.

 

Most of the steps in NMR-SAMS such as interpretation of 1D and 2D data into bond constraints, and generation of the building block sets, are usually performed very fast.   Structure generation, on the other hand, is more time-consuming because of its combinatorial nature.  The efficiency of structure generation (which is a factor of the computation time, the quality of the structure generated, and the number of structures generated) depends on the size of the molecule and the quality and quantity of the spectral data.  When the unknown molecule is big (e.g. with more than 40 heavy atoms) and the correlation information derived from the spectral data is not sufficient, the structure generation could take very long to finish.  In such cases, the user is advised to input as many as known substructures as possible to accelerate the structure generation process.  In addition, the user can also take advantage of some of NMR-SAMS' other tools, such as resonance assignment for verification of proposed structures, and flexible graphics tools for interactive building of structures to solve this problem.

 

Although the spectral interpretation routines of NMR-SAMS are general-purpose, the structure generator of NMR-SAMS cannot deal with molecules containing ionic atoms, tautomeric or coordinate bonds.  It recognizes only single, double and triple bonds.  Aromatic bonds are represented as alternating single and double bonds.  Sometimes this might cause redundancy in the structure generation of aromatic compounds.

 

In the current version of NMR-SAMS, if the structure is already known, then target structure based resonance assignment is possible, provided the NMR data set is complete.

 

Although NMR-SAMS can recognize all chemical elements, the current substructure/d 13C knowledge base (see Appendix III) contains only the substructures consisting of commonly occurring elements, i.e., C, H, O, and N.   The user can customize this knowledge base.  The user will be informed about the undefined substructures when other elements exist in the molecule, and this could reduce the efficiency of structure generation.  

 

NMR-SAMS can be viewed as an expert assistant helping spectroscopists and chemists to solve structure elucidation problems, and is by no means expected to replace the human expert.  NMR-SAMS is designed for flexible human intervention, and efficiently uses the additional user knowledge and judgment to control and enhance the structure elucidation process.  

1.3 System Requirement

The IRIX version of NMR-SAMS runs on SGI systems running IRIX 6.x or higher operating system with R4000 or higher processors and at least 128 MB of RAM or higher and 8-bit graphics.  R8000 or higher processors and 128 MB or more RAM is recommended. 

 

The Solaris version of NMR-SAMS runs on Sun systems running Solaris 2.x (SunOS 5.x) with SPARC processors and at least 128 MB of RAM and 8-bit graphics.  X/Motif 1.2.3 libraries are required.  These are usually supplied with the SUN Common Desktop Environment (CDE).

 

The Microsoft Windows version of NMR-SAMS runs on Pentium or higher processors (or 100% compatibles) with at least 32 MB of RAM running Windows 95/98/2000, or Windows NT 4.0 or later and a VGA or better monitor.  A Pentium II or higher processor with 64 MB or more RAM is recommended. 

 

NMR-SAMS requires from 2 MB to 55 MB of hard disk space, depending on the sample data that is installed.  The sample data with original spectra requires 40MB of hard disk space.  Swap drive space (i.e. virtual memory) required is proportional to the complexity of the data being analyzed.

1.4 Help Facility

NMR-SAMS provides online help information for many of its dialog boxes.  By clicking the Help button, the relevant help message will be displayed.

1.5 Typographical Conventions

Unless otherwise noted in the text, the User’s Guide of NMR-SAMS uses the typographical conventions described below:

·         A command to select is represented in bold type face by the menu name, the option, and the pull-right option (if any). For example, the command:

Display/Display Options/Chemical Shifts        

means, first click Display menu on the menu bar, then click Display Options in the opened menu.  And then click Chemical Shifts in the pull-right options. 

·         Transcript of a computer file or display is printed in Courier New letters with the keywords shown in bold, and the annotations (if any) in italic Times letters. (Such annotations do not appear in the file or display itself).

ATOM~~ATOM:

For each correlation, listed are the IDs of the correlated atom pair, the range of intervening bonds, and the bond type (0: meaningless or unknown)

(1-23: 1~1 2)
(6-22: 1~1 3)

     .

     .

     .

·         Filenames and parameters are printed in Courier New letter. For example:

Files phasefile and procpar are used for peak picking with SpecMan. 

Parameter GEN_FLAG controls the search criteria of the structure generation.

·         Terms introduced for the first time are presented in boldface type.

·         Words in italic represent variables. For example:

There are n intervening bonds between the correlated atoms.

1.6 A Note on Operating Systems

Spectrum Research has attempted to make its products as similar as possible over the various operating systems.  However, there are some invariable differences that cannot be worked around.  As highest priority, data files have been kept consistent between UNIX and MS Windows machines.

 

It is recommended that the user refer to the online help provided by individual PC vendors for more information on the basics of Operating Systems.  NMR-SAMS follows the interface of the Operating System that it is running on, and therefore, it is important to become acquainted with the Operating System before attempting to learn NMR-SAMS.  See Section 2.4 for information on the basics of the NMR-SAMS Interface.

 


Chapter 2

Getting Started with NMR-SAMS

2.1 Installation of the Program

For instructions on NMR-SAMS installation, please refer to ‘The Release Notes’ or ‘nmrsamsPC.readme’ file supplied with the program.

2.2 Spectrum Research Licensing

NMR-SAMS is copy protected by the Spectrum Research Licensing System.  This licensing system allows NMR-SAMS to run only on the computer for which it was sold.  A license.dat file is included with the installation files and this plain text file will be placed into the NMR-SAMS directory (C:\Spectrum2001\NMR-SAMS). 

 

If a license file is not located with the NMR-SAMS installation files, please contact Spectrum Research.  To create a license file, send the Windows Serial Number (Product ID) to Spectrum Research.  Under Windows 95/98/2000 and Windows NT4.x, right click on the “My Computer” icon on the Windows Desktop.  Choose “Properties” from the menu that pops up, and the Product ID will be listed in the “Registered To:” section (For example: 02658-OEM-2564589-12458). 

 

When the trial licensing time period is nearing expiration, NMR-SAMS will display a dialog box with the remaining number of days listed on it.  Please contact Spectrum Research for a renewal at this time.


2.3 Starting NMR-SAMS

To launch the NMR-SAMS program, click on the nmrsams.exe icon from the File Manager or Windows Explorer (By default, NMR-SAMS is installed into C:\Spectrum2001\NMR-SAMS).  The program starts with a Main Graphics Window that has a menu bar and status bar.

 

By default, a Status Window is also opened, which displays text messages to indicate the current status of the structure elucidation, and also prompts the user with the “what to do next” steps.  The main graphics window is shown below:

 

When NMR-SAMS is started, it reads the following three files from the directory where the user launched NMR-SAMS:

 

nmrsams.ini: defines some of the initial settings of the program, such as window sizes, background colors, atom colors, bond colors, etc.  If this file is not found, default settings will be used.

periodic_tab.def: defines some properties of the chemical elements.  If this file is not found or if it is not properly read, NMR-SAMS will not be able to recognize any element symbols and perform the related functions. 

chemical_shifts.def: defines the knowledge base of  13C chemical shift dispersion ranges for some common carbon-centered single spherical substructures (CCSS) (see Appendix III).  If this file is not found or it is not correctly read, the structure generation will not be possible (see Section 3.5).

2.4 Brief Introduction to Microsoft Windows

If the user is new to Microsoft Windows or Windowing systems in general, please read this section before using NMR-SAMS.  It will help the user become acquainted with the NMR-SAMS interface.

 

First, it is a good idea to become acquainted with the online help system provided by Microsoft Windows.  The online help system is called from within NMR-SAMS when the user clicks on a "Help" button from any dialog box, and it brings up context sensitive help in a window.  There is also a Help Contents facility (also known as an Index).  This consists of a list of the topics in the online-help.  The user can click on one of these items to bring up its corresponding information.  'The Contents' option is available via NMR-SAMS's Help menu and from the Online Help Viewer window by clicking on the “Contents” button.

 

When NMR-SAMS is first started, a window will appear with "NMR-SAMS, version 2.4, (C) Spectrum Research, LLC." on the top.  The area where this text appears is referred to as the "Title Bar."  The user can press the left mouse button while the arrow pointer (which is called the "Cursor") is on the title bar and then move the mouse to move the window.  Release the mouse button to stop moving the window.  That combination of events (pressing a mouse button, moving the mouse, and then releasing) is known as "Dragging".  Position the mouse pointer so that it is over the word "File", located immediately below the title bar.  Now press and then immediately release the left mouse button.  This procedure (pressing a mouse button and then releasing without moving the mouse) is known as "Clicking".  The item that was clicked on was the "Menu Bar".  The menu bar consists of several "Menus" ("File", “Edit”, "Display", "Analysis", and "Help").  When the File menu is clicked on, a "Pulldown" appears.  This pulldown consists of "Menu Items" ("Open...", "New...", etc.).  If the user clicks on one of these menu items, an option will occur.  Menu items are the primary way that the user of NMR-SAMS can convey its wishes to NMR-SAMS. 

 

Some items on menus are not menu items, however.  The line that appears above the "Quit" menu item is known as a "Separator".  Its purpose is solely to make the menu easier to read. Click on the "File" menu and notice that the "Create NMR Data File" menu item has a right pointing triangle after its text.  This type of menu item is known as a "Pullright".  Click the mouse on the " Create NMR Data File " menu item and another group of menu items will appear to the right of it.  The pullright feature is used to group related menu items together, reducing the size of the main pulldowns.  Click on the "Display" menu and the menu item "Status Window", which is known, as a "Toggle" will appear.  Toggles have two states:  "Off" (also known as "Deselected" or "Deactivated"), and "On" (also known as "Selected" or "Activated").  If the status window is on, turn off the "Status Window" toggle by clicking on it and the status window will disappear.  Click on the "Display" menu and turn on the “Status Window” toggle by clicking on it again, and the status window will pop up again.

 

Position the mouse cursor over the frame that surrounds the entire NMR-SAMS window.  Drag the mouse to change the size of the NMR-SAMS window.  All sides of the NMR-SAMS window can be moved to size the window.  The field below the NMR-SAMS Toolbar is known as the "Main Graphics Window".  This is where information about chemical structures is displayed.  At the bottom of the Main Graphics Window is the "Status Bar", and this status bar prints out information about what is going on in NMR-SAMS.  It will notify the user if the user has asked NMR-SAMS to perform a function that it is not prepared to do, in addition to giving the user hints about using NMR-SAMS. 

 

Click on the "Open..." menu item from the "File" menu, and a window will appear with the title of "Open".  This type of window is known as a dialog box.  While a dialog box is displayed, the user must interact with it before continuing with other areas of NMR-SAMS.  Dialog boxes also have a "Help" button that when clicked, will bring up online help about the dialog box.  The dialog box that is currently displayed is referred to as the "File Browse Dialog", and it is used to specify a file.  The user can move to a certain directory by using the “Directory” combo box to find the proper parent directory, and the user can descend the directory structure by double clicking on a directory name from the list (a “Double Click” is two clicks followed in rapid succession).  After the user has changed to the appropriate directory, a list of "Files" with the extension “.mdf” will appear.  Click on one of the filenames to select it and then select the "OK" button at the bottom of the dialog box to accept the input the selected file.  Click the "Cancel" button to close the dialog box without performing an action.

 

When multiple candidate structures are generated, the first structure will be displayed along with a window titled Structure Browser.  This window is known as a "Palette."  Palettes are similar to dialog boxes, however the user is able to interact with them and with the main NMR-SAMS window at the same time.  The "Structure Browser" palette is used to control the display of the candidate structures.  In the "Structure Browser" palette, there is a "Slider", and the user can drag the slider bar to the left or right to raise or lower its value, which determines the sequential number of the structure to be displayed.  Some palettes also have text fields where the user can enter numbers or text.

 

The user should now have enough information to start exploring NMR-SAMS.  Note that NMR-SAMS grays out menu items that are not available during specific stages of the structure elucidation process.  For example, if the user has not prepared the NMR data file, the menu item Analysis/Interpret NMR Data will remain grayed out until the data has been prepared. 

2.5 Description of the Main Menus

The menu bar appears at the top of the main graphics window and contains the names of the five NMR-SAMS menus:  All tasks in NMR-SAMS can be performed by selecting from these five menus.  The five menus are described briefly on the following pages and in greater detail in the other chapters of this book.

 

The File menu:         The File menu lists options related primarily to reading data into and out of NMR-SAMS, as displayed below:

 

The Edit menu:         The Edit menu lists options related to editing of the working data set files and the generated structures, as displayed below:

The Display menu:   The Display menu lists options related to the graphical display of intermediate and final results of NMR-SAMS, as displayed below:

The Analysis menu: The Analysis menu lists the options related to structure elucidation, as displayed below:

The Help menu:        The Help menu lists the options related to the online help of NMR-SAMS, as displayed below:

 

 

2.6 The NMR-SAMS Toolbar

The NMR-SAMS toolbar contains icons (pictures) that represent commonly used menu items.  If the user clicks on one of the icons, the same action occurs as the corresponding menubar item.

 

 

The following menu items have associated toolbar icons:

 

    File/New                                                               

      File/Open

      File/Save                                                               

      Display/Building Blocks & Fixed Bonds

      Display/Target Structure                                   

     Display/Generated Structures or Assignments

      Display/Status Window                                    

    Display/Display Options/Balls

      Display/Display Options/Carbon Symbols    

      Display/Display Options/Numbers

      Display/Display Options/Chemical Shifts      

      Display/Display Options/Protons

      Display/Display Options/Molecular Formula

      Display/Display Options/Connection Table

      Display/Display Options/Refine                      

      Help/Contents


Chapter 3

Understanding NMR-SAMS

3.1 Overview

This chapter introduces the basic procedure of structure elucidation, with a brief description of the concepts and principles of NMR-SAMS, and concludes with a high-level discussion of the typical flow of activity through NMR-SAMS. 

3.2 General Procedure of Structure Elucidation with NMR-SAMS

The process of structure elucidation of an unknown compound through NMR spectroscopy consists of the following steps:  

 

1.        Determination of the molecular formula (MF) by MS.  Determination of some functional groups in the unknown compound through IR and UV spectroscopy.  MF is optional to NMR-SAMS.

2.        Data acquisition of 1D and 2D NMR spectra.  See Section 3.3 for the spectral data used by NMR-SAMS.

3.        Extraction of peak tables with chemical shifts, intensities, J-coupling and multiplicities.  Peak picking of 1D and 2D NMR spectral data is performed with SpecMan using automatic and semi-automatic procedures (see SpecMan’s User Guide).  The peak tables are then converted to NMR-SAMS representation of connectivity information (see Chapter 5).

4.        Set up of the parameters to control the spectral interpretation and structure generation.  In most cases, the default values of these parameters can be used  (see Appendix IV).

5.        Interpretation of molecular formula (if known), along with 1H, 13C, and HMQC spectral data to obtain the structural building blocks.  If the MF is unknown, the user can interactively add heteroatoms into the building block sets (see Chapter 6).

6.        Interpretation of additional 2D NMR spectral data to obtain the bond constraints (see Chapter 6)

7.        Generation of candidate structures that are consistent with the experimental data for unknown compounds (see Chapter 7), or verification of the proposed structure and completion of 1H and 13C resonance assignments (see Chapter 8) for known compounds.  Interactive structure generation and resonance assignment is also possible (see Section 7.2.1).

8.        Exportation of the results of structure generation and resonance assignments (see Chapter 11).

 

Structure elucidation is usually an interactive approach, so this process may need to be repeated several times until the user obtains satisfactory results.  NMR-SAMS assists the user in identifying and correcting the inconsistencies in the input data.  When sufficient input data is not available, NMR-SAMS generates only partial structures with resonance assignments.   NMR-SAMS also warns the user about some common pitfalls that could lead to incomplete or incorrect structure generation, and provides clues for further refinement.

 

3.3 What Spectral Data Does NMR-SAMS Use?

The possible combinations of 1D and 2D spectral data used by NMR-SAMS for structure elucidation are listed in Table. 3.1.  The fifth combination (routine 1D and 2D spectra along with complementary information from other spectral data (MS, UV and IR)), is the recommended choice for structure elucidation of real-world complex molecules.  Other spectral sources such as MS, IR, and UV are not directly interpreted by NMR-SAMS but they can be conveniently used as user-defined bond/environment constraints. 

Table 3.1. Possible combinations of 1D and 2D NMR spectral data used by NMR-SAMS a

 

1D

2D

Comments

1

None

None

Pure isomer enumeration from MF

2

13C (and DEPT b)

None

Very low efficiency except for simple molecules.

3

13C, DEPT b

INADEQUATE

Very high efficiency, if data available.

4

13C, DEPT b, 1H

DQF-COSY c, HMQC d

Low efficiency except for H-rich molecules.

5

13C, DEPT b, 1H

DQF-COSY c, HMQC d, HMBC e (NOESY f)

Most practical way for de novo structure elucidation of complex molecules.

6 g

1H

DQF-COSY c, HMQC d, HMBC e (NOESY f)

Practical when the amount of sample does not allow for carbon-detecting experiments.

 

a TOCSY is not used directly by NMR-SAMS, but can be used by SpecMan to assist the peak picking of DQF-COSY.

b INEPT, or APT can also be used.

c Various types of COSY experiments can be used, as long as they provides geminal and vicinal H-H through-bond connectivity.

d HSQC, HETCOR, or other types of spectra can also be used, as long as they provide one-bond C-H connectivity.

e COLOC, FLOCK, or other types of spectra can also be used, as long as they provide long-range C-H connectivity.

f NOESY or ROESY is optional.

g HMBC and HMQC must be clean enough to allow extraction of 13C chemical shifts and multiplicity information.  13C chemical shifts can be automatically extracted from HMBC using SpecMan.  13C multiplicities must be identified manually from the HMQC spectrum.

3.4 Use of 2D NMR Connectivities: Bond Constraints

NMR-SAMS uses mainly 2D NMR-derived through-bond spin-spin connectivity information for structure elucidation, because it is reliable and provides comprehensive structural information for de novo structure elucidation.

 

In NMR-SAMS, the coordinates of 2D cross peaks are first converted into connectivities between the relevant 1D peaks, and then interpreted as bond constraints on the relevant atoms.  A bond constraint (BC) is a requirement of a certain number (or a range) of intervening chemical bonds between correlated spins.  For an asymmetric molecule, such spin-spin BC’s are directly used as atom-atom bond constraints.  In addition to its efficient utilization of BC’s involving ambiguous bond separation (e.g., 2 or 3 bonds between two HMBC-correlated spins), NMR-SAMS also copes with BC’s concerning ambiguous atoms.  Such ambiguity typically arises from peak degeneracy or low digital resolution.

 

 

In NMR-SAMS, a BC is represented in the following general format:

 

(Atom_y ... - Atom_x ... : minBond ~ maxBond; BondType; minNSBC ~ maxNSBC)Source

where

Atom_y ... is the correlated atom(s) along the Y dimension (13C domain for an HMQC spectrum). It could be more than one atom in the case of ambiguity.

Atom_x ... is the correlated atom(s) along the X dimension (1H domain for an HMQC spectrum).  It could be more than one atom in the case of ambiguity.

minBond and maxBond are the minimum and maximum bond separations between the relevant atoms.

BondType is the type of the intervening bond between the atoms.  Valid choices are: 0, 1, 2, or 3 for unknown, single, double, and triple, respectively.

minNSBC and maxNSBC are the minimum and maximum numbers of relevant atom pair(s) that must satisfy this BC in the generated structure. 

Source encodes the connectivity (or other source) from which the BC was derived.  A connectivity is represented by its spectral type and its ID number. The following codes are used to represent the different spectral types:

“C” for COSY, “Q” for HMQC (or HETCOR), “B” for HMBC (or COLOC), “N” for NOESY, “I” for INADEQUATE.

Note: The ID of a connectivity is different from, though related to, the peak ID(s) in the SpecMan peak tables.  For more details see Fig. 6.4 in Chapter 6.

The following codes are used to represent other kinds of source:

“S” for a pseudo BC added by the program, “U” for a user-defined BC, and “G” for a previously generated bond (when using a generated substructure as the starting point for the next structure generation cycle).

 

For example, an HMBC-derived bond constraint is represented as:

(10 - 17 18: 2 ~ 3; 0; 1 ~ 2)B10

In the above example, the first set of numbers “10 - 17 18: ” denotes the atoms that are correlated.  In this case, since the chemical shifts of H-17 and H-18 are very close, it is difficult to resolve which one of them is really correlated to C-10.  Therefore, both of the protons are retained to represent the possibilities that there could be a correlation between either C-10 and H-17, or C-10 and H-18, or both.  The next set of numbers “2~3” represents that there could be two or three intervening bonds between the correlated C-H pair(s).  The next number “0” represents the bond type of the intervening bonds, and in this case, they are treated as unknown.  The next set of numbers “1~2” represents that either one or both pairs of the atoms involved in the bond constraint must satisfy this bond constraint in the computed structure (i.e., C-10 and H-17, or C-10 and H-18, or both pairs).  Finally, the character string “B10” means that this bond constraint was derived from the HMBC connectivity #10.  From the comment of this connectivity, the ID of the actual cross peak (in the SpecMan peaks table) can be found in the .nmr file. (See Fig. 6.4 in Chapter 6).

 

By default, NMR-SAMS treats unambiguous BC’s (which have exactly two correlated atoms, one-bond separation, and minNSBC = maxNSBC = 1, which means the BC must be satisfied in a generated structure, as fixed bonds.  The rest, which either have ambiguous bond separation, or ambiguous numbers of correlated atoms, or both, are treated as ambiguous BC’s.  The ambiguous BC’s are used as the major constraints for structure generation.  During structure generation, NMR-SAMS computes the number of violations of BC’s for the current substructure/structure.  If the actual number of violations of a substructure/structure is less than the upper limit of allowed number of violations, then the substructure/structure is retained, otherwise it is rejected.   The BC’s are also used by some advanced heuristic methods for acceleration of the structure generation process.  (See Section 7.4)

3.5 Use of Chemical Shifts And Peak Multiplicities

NMR-SAMS uses chemical shifts as the labels of heavy atoms, so that 2D NMR-derived correlation information can be used as bond constraints on specific atoms.  This is also the reason why a generated structure always has unequivocal 1H and 13C resonance assignments.

 

13C chemical shifts are also used to evaluate the intermediate structures/substructures produced during the structure generation process.  A knowledge base consisting of a correlation table of substructure and 13C chemical shift (d) range is used for predicting 13C chemical shift ranges.  Each of the substructures consists of the central carbon atom (which is being considered), its attached bonds, and the first layer of its neighboring atoms (the outwards bonds of these atoms are not considered).  This is referred to as a carbon-centered single-spherical substructure (CCSS).  Currently, this table consists of the 13C chemical shift ranges of around 93 CCSSs composed of C, N, O, and other common elements that have been adapted from literature.  The correlation table is stored as an ASCII file, chemical_shifts.def (see Appendix III), with the code for each CCSS and its expected minimum and maximum 13C chemical shift.  This file can be customized by the user, and is read when NMR-SAMS is started.

 

During structure generation, whenever a carbon atom has a complete CCSS (i.e., its immediate neighbors are known), then its expected chemical shift range is derived from the knowledge base and compared with the observed 13C chemical shift of the central carbon.  If the observed shift satisfies this range, then it is accepted, otherwise the substructure is discarded.  If the CCSS is not defined in the knowledge base table, the test is assumed to have been passed, and the undefined CCSS's are reported after the structure generation has been completed.  As the CCSS's cover only very limited structural features, their chemical shift ranges are very broad.  Thus in NMR-SAMS, 13C chemical shifts act as a much looser constraint on the structure generation than the 2D NMR connectivities.  Hence it is very important to include as much correlation information as possible for efficient structure generation.  Sometimes the correct structure could be overlooked if the molecule has carbons that show odd chemical shifts.  In such cases, it is recommended that the user broaden the predicted chemical shift ranges by specifying an extra tolerance (For details refer to the Appendix IV describing parameter ADD_C13_RNG). 

 

13C peak multiplicities play an important role in determining the number of attached protons of heavy atoms (i.e., the building blocks).  So it is recommended to use DEPT (or INEPT, APT) spectra to obtain complete 13C multiplicity information.

 

In the current version, 1H chemical shifts are not used to evaluate substructures.  1H peak multiplicities are used to limit the neighboring atoms of the concerned atom. (For details refer to the description about H1MULT_FLAG in Appendix IV.)

3.6 Structure Generation 

During structure generation NMR-SAMS searches all possible ways to assemble the structural building blocks into complete structures.  Within some allowance for the violation of constraints, the generated structures are consistent with all of the available spectral data and chemical constraints. 

 

The efficiency of structure generation is a factor of the computation time, the quality of the structure generated, and the number of structures generated.  Because it is a combinatorial problem, structure generation is usually the most time-consuming step.  “Combinatorial explosion” has been the major bottleneck of early attempts of automated structure elucidation.  NMR-SAMS provides novel heuristic search algorithms that reorder the solution space based on bond constraints, and search only the most probable portion of this space for candidate structures.  These methods exponentially reduce the CPU time for structure generation and hence make it practical for complex molecules.  Moreover, the user has full control of the usage of these methods to perform optimized structure generation.  For example, by modifying a few parameters, the user can extend the search space to a more complete search, or simply turn off the heuristic search methods to perform an exhaustive search.  On the other hand, the user can limit the search space for faster structure generation.  (See Section 7.4 and Appendix IV about the parameters GEN_FLAG, SAT_BC_RATE and N_FBX_STEP).

 

For relatively small molecules (e.g. < 30 heavy atoms) with reasonably clean and sufficient spectral data, this process is usually completed in seconds or minutes.  In most cases the correct structure is generated either uniquely or along with a few alternatives.  For more complex problems (bigger molecules and insufficient spectral constraints), structure generation can be completed in a reasonable computation time if adequate user-defined constraints are included.   

 

The candidate structures generated by NMR-SAMS include complete structures and optionally, substructures.  A complete structure is defined as one having no unsatisfied free bonds.  In the case of partial structure elucidation (see Section 7.1 for details), the chemically incomplete structure obtained is still referred to as a complete structure, because all of the free bonds are satisfied either by real bonds or dummy bonds.  During structure generation, the program enables the user to save the largest intermediate substructures.  The substructures are useful when the generation of complete structures is not possible due to errors in spectral data or other reasons, and they provide clues and hints for improving the input spectral data and completing the structure elucidation successfully.

3.7 User Intervention 

NMR-SAMS was developed to streamline and automate the structure elucidation process with less user-intervention.  However, when the molecular size of the unknown is big (e.g., number of non-hydrogen atoms is greater than 40), or insufficient connectivity information is available, user-intervention is absolutely necessary to improve the efficiency of structure generation.  Currently the user can interact with the structure elucidation procedure in the following ways:

 

1.        Modification of the control parameters for NMR interpretation and structure generation.  For example, the user can decide whether or not to use the “negative information” of DQF-COSY based on the spectral quality, and the user can also limit ring sizes to either 5 or 6-membered rings in the generated structure and discard structures containing other ring sizes.

2.        Modify the intermediate results in the MDF by using Edit/Master Data File.

3.        Supply structural building blocks by using Analysis/Edit Building Blocks if the MF is unknown.

4.        Supply known structural information as user-defined bond constraints. This is very important especially for heteroatoms that are either not observed or have sparse connectivity information in 2D NMR experiments.  Also, different spectral data, such as IR and UV, normally provide positive evidence of some known functional groups.  Using Analysis/User-defined Bond Constraints, the user can add as many known bonds as possible between the constituent atoms (see Section 7.2).  Using this feature, the user can also manually assemble the building blocks as a complete structure, or use a selected substructure (which was previously generated ) as the starting point for the next structure generation.

5.        Supply known structural information as atom environment constraints (EC).  An EC defines the number of occurrence of a certain type of atom(s) as the immediate neighbor(s) of an atom under consideration (See Section 7.3).

6.        Propose a possible structure for the unknown and perform resonance assignment.  This way the user can verify user-proposed structures and complete the structure elucidation.

7.        Modify the results of resonance assignment of a target structure using Analysis/User-Defined Assignment.

3.8 Control Parameters

The parameter file (.par file) stores the parameters for controlling spectral interpretation, for setting up ACMX, and for structure generation.  All of the parameters can be modified by selecting Edit/Parameters/NMR Interpretation, Edit/Parameters/Set up ACMX or Edit/Parameter/2D Structure Generation.  Default values are assigned to the parameters according to the nmrsams.ini file when a new working data set is opened.  The default values can be customized by editing the nmrsams.ini and nmrsamspersonal.ini files.  In most cases, the default parameters should be a good starting point for structure elucidation.  In the following chapters, the name of the parameter, e.g., GEN_FLAG, is used to refer to a parameter, and the corresponding titles in the dialog boxes and details about the usage of the parameters are described in Appendix IV.


Chapter 4

Working Data Set

4.1 Overview

This chapter describes the operations related to the data files used by NMR-SAMS.  During each session of structure elucidation, NMR-SAMS works with a working data set, which consists of five text files with the same root name but different extensions.  For example, if the root name is Q-2-test, then the working data set consists of the following files:

 

·         A master data file (MDF), Q-2-test.mdf, where all of the intermediate and final results are stored.  The user can view and edit this file by using Edit/Master Data File (See Appendix II).

·         A parameter file, Q-2-test.par, where the control parameters used for the data interpretation and structure generation are stored. The user can access the parameters by using the commands in the pull-right menu of Edit/Parameters (see Appendix IV).

·         An NMR data file, Q-2-test.nmr, where the NMR data converted from the SpecMan peaks table are stored.  The user can view and edit this file by using Edit/NMR Data File (see Appendix I).

·         A log file, Q-2-test.log, where most of the information, warning, and error messages produced during the analysis are stored.  The user can view the log file by using Edit/Log File.

·         A structure file, Q-2-test.str, where the atom-atom connection table of the generated structures and their resonance assignments are stored.  The user can display the structures by using Display/Generated Structures (see Chapter 10).

·         A lock file, Q-2-test.lock, which is used to prevent two users opening the same data set simultaneously.

4.2 Open An Existing Working Data Set

Command: File/Open.

Description:  This procedure is used to open an existing working data set.  An existing working data set stores the data and results of the last session of structure elucidation with NMR-SAMS.  Opening an existing working data set allows the user to continue from where the dataset had last been saved.  After selecting File/Open, a file browser is displayed, listing the master data files in the current directory.  If necessary, the user can switch to the desired directory, and then click the desired master data file name.  The selected file name appears in the Open MDF field.  Next click OK, and the working data set is then opened for use.

 

 

After a working data file has been opened, the following message will appear:

 

 

The message prompts the user to confirm removal of old log messages from the previous session.   To remove the old log messages, select ‘Yes’ or to retain the old log messages, select ‘No.’ 

 

The status window displays the current state of structure elucidation.  It lists the NMR data files that are being used.  It also lists the steps that have been completed, and provides tips to the user as to what steps need to be done next.  The structural results, such as building blocks or candidate structures, are displayed in the main graphics window (see Chapter 10).

 

Note:  If another working data set is opened before the current modified working data set has been saved, NMR-SAMS will prompt the user to save the changes.

 

If the user wants to discard the changes that have been made to the current working data set without exiting the program, re-open the dataset and click ‘Yes’ to the following message:

 

 

Then it is possible to start from the point at which the working data set was last saved.  Note that if a data set that is being locked by another user is selected, the following warning message will appear:

 

 

Click 'Yes' to open the data file anyway, or click 'No' to cancel.  Note that if 'Yes' is selected, problems may arise.  

4.3 Opening A New Working Data Set

Command: File/New.

Description: This procedure is used to create a new working data set. When dealing with a new structure problem, the user must open a new working data set.  The user can open a totally new working data set, or open one starting from an existing NMR data file that has already been prepared.

 

To open a totally new working data set, choose File/New. In the displayed file browser, make sure to select the file type as 'Completely New Dataset (*.mdf).'  Switch to the desired directory if necessary, and type a root name for the new working data set.  The extension *.mdf will be automatically added.

 

 

After clicking 'Open' NMR-SAMS creates the *.mdf, *.par, *.nmr, *.log and *.str files.  All files, except for the parameter file (*.par) will be empty. 

 

Next, NMR-SAMS prompts the user to input the molecular formula (MF) of the sample as shown below:

 

 

Input the molecular formula into the dialog box (see Section 4.4 for more information about inputting the molecular formula).

 

To open a new working data set starting with an existing NMR file, select the file type as 'Existing NMR File (*.nmr)' in the file browser.  Switch to the desired directory if necessary, and click the desired .nmr file.  Next, click 'OK' and a new working set is created with the selected .nmr file. 

 

Note: If the user selects the filename of an existing data set, NMR-SAMS will warn the user about existing files with the same root name, as shown below:

 

 

 

Click 'Yes' and the program will overwrite the existing files (except the .nmr file if starting from an existing NMR data file).

 

If the user wants to use the existing .nmr file, but doesn't want to overwrite the existing files, click 'No' to cancel this dialog box.  Then, make a copy of the .nmr file with a new root name and reopen the newly named .nmr file. 

4.4 Input Molecular Formula

Command: File/Input Molecular Formula.

Description:  This procedure is used to define the molecular formula of the sample.  Normally this command is used when the user wants to change the MF, since NMR-SAMS always prompts the user to enter the MF when a new working data set is first opened (see Section 4.3), as shown below:

 

 

Note that the element symbol must be typed with the first letter in upper case and the second one, if any, in lower case.  The user can specify the valence of an atom in parenthesis following the element symbol (i.e., C10H12N(V)N2S(VI)O8).  If the valence is not specified, the most common chemical valence is adopted for any elements with multiple valences (i.e., a valence of 3 and 2 would have been adopted for N and S).  The user can also change the valences later by selecting Analysis/User-Defined Building Blocks. 

 

If the exact MF is unknown, enter the closest possible formula or type 'UNKNOWN'.  In any case, the user can modify the elemental composition of the molecule by using Analysis/User-defined Building Blocks later (see Section 6.3).

 

Once a molecular formula has been entered, it is interpreted and a dialog box appears displaying the standardized MF, the molecular weight, and the double bond equivalence (DBE), as shown below:

 

 

Two records are written into the MDF. The first record starts with the keyword “MF:” and contains the standardized MF:

MF: C30H48O3

The second record starts with the keyword “ATOMS:”.  Following this are the molecular weight and the degree of unsaturation (or double bond equivalence) in the same line.  The second line is a brief description of the entries in each of the remaining lines.  Each line consists of the ID, the atomic number, the chemical valence, the minimum and maximum attached protons, the minimum and maximum of attached double bonds, and the minimum and maximum attached triple bonds of a constituent heavy atom, respectively.  The constituent heavy atoms are listed with carbon first, and the remaining elements in the alphabetic order of their element symbol.

 

ATOMS:  (MW = 456.7074, DBE = 7.0)                    

#Atom; Element; Valence; Min. & max. attached H; Min. & max. double bonds; Min. & max. triple bonds

# 1.  C 4   0 3   0 2  0 1

# 2.  C 4   0 3   0 2  0 1

# 3.  C 4   0 3   0 2  0 1

      .

      .

      .

#30.   C 4   0 3   0 2  0 1

#31.   O 2   0 1   0 1  0 0

#32.   O 2   0 1   0 1  0 0

#33.   O 2   0 1   0 1  0 0

 

Note: When an atom has multiple valences, the most common valence will be adopted, by default.  For example, the valence 3 is always adopted for N.  However, the user can specify an uncommon valence while inputting the MF.  If there is a -NO2 group in the molecule, input the MF containing a “N(V)”  (e.g.,  C6H5N(V)O2).  Modifying the valence manually in the .mdf file is not recommended, because whenever Analysis/Building Blocks is selected, the MF will be re-interpreted and the previous changes will be overwritten.   

4.5 Save A Working Data Set

Command: File/Save.

Description:  This command allows NMR-SAMS to update the working data set with the current state of structure elucidation.  The user will be prompted to save changes before exiting the program or opening another working data set.

4.6 Save A Working Data Set as Different Name

Command: File/Save As.

Description:  This command allows NMR-SAMS to save the current state of structure elucidation in a working data set with a different root name.  After selecting File/Save As, the following file browser is displayed.  Switch to the desired directory (if necessary), type the new root name, and then click OK.

 

4.7 Exiting NMR-SAMS

Command: File/Exit.

Description:  This command allows the user to exit NMR-SAMS.  If changes have been made to any of the three data files (*.nmr, *.mdf, or *.par), and those changes have not been saved, NMR-SAMS will prompt the user to save them before exiting the program:

 

 

If 'Yes' is clicked, the changes will be updated before exiting the program.  However, if 'No' is clicked, the changes will be ignored before exiting the program.  The command will be ignored if 'Cancel' is selected.  


Chapter 5

Input of NMR Spectral Data

5.1 Overview

It is important to generate a clean and reliable set of peak lists from different NMR experiments before using them in NMR-SAMS.  SpecMan provides several advanced and intelligent peak-picking tools to perform fast and reliable peak picking.  For details regarding peak picking, refer to the SpecMan User's Guide.  Since SpecMan can independently perform peak picking and peaks table conversion, the user can either perform both steps in SpecMan, or perform peak picking in SpecMan and then peaks table conversion in NMR-SAMS.  Either way, the ability to perform consistency checking during the conversion process will help the user to find potential errors in the peak picking results. 

 

This chapter describes how to prepare 1D and 2D NMR spectral data as input for NMR-SAMS. (for details about the NMR Data File format see Appendix I).  It is assumed that the peak picking has already been performed in SpecMan.  The peak tables from SpecMan are then converted into the NMR-SAMS format by selecting from the following pull-right options of 'Create NMR Data File' from the File menu as shown below:

5.2 Conversion of SpecMan 1H Peak List

Command: File/Create NMR Data File/H1.

Description: In this procedure, SpecMan 1H peaks table is converted into NMR-SAMS format.  First the following dialog box is displayed which prompts the user to enter the filename of the 1H peaks table from SpecMan. 

 

Click 'Browse' to locate the peaks table file, and then click OK.  An information dialog box displays the number of 1H peaks that have been converted:

 

 

In the current version of SpecMan, all 1H peak multiplicities are marked as unknown (u), by default.  Therefore, NMR-SAMS will prompt the user to supply the 1H multiplicity for the peaks (referring to their splitting patterns). As shown in Fig. 5.1, if the multiplicities of all or some of the 1H peaks are known, select Edit/NMR Data File to open the NMR data file and replace the unknown multiplicity (represented as “u”) by one of the following symbols recognizable to NMR-SAMS:

 

s: singlet, d: doublet, t: triplet, q: quartet, m: other multiplet.  If the multiplet is unknown, leave it as unknown (u). 

 

NMR-SAMS uses 1H multiplicity information to eliminate inappropriate bonds while setting up ACMX. For additional details, refer to the usage of parameter H1_MULT_FLAG (in Appendix IV).

 

 

 

 

 

 

Figure. 5.1. Running NMR-SAMS and SpecMan side-by-side provides a convenient way to verify and edit the 1D peaks converted from SpecMan peaks table. Left (NMR-SAMS): select Edit/NMR Data File to open the .nmr file.  Right (SpecMan): Open the 1D spectrum and load the 1D peaks table. From the comment field of a converted peak, the ID (#32) of the original peak is found. By clicking the corresponding entry in the peaks table, the 1D peak (#32, shown in cyan) is highlighted in the spectrum so that the user can see and recognize the multiplicity of this peak before modifying the .nmr file.

Possible Errors: Generally NMR-SAMS crosschecks the converted 1H peak list against the MF (if known) and alerts the user of any potential conflicts.  The following situations will be reported when there is a conflict:

·         If the multiplicity information is unknown for more than three fourths of the peaks, a warning message prompts the user to supply this information if possible.

·         If the number of 1H peaks exceeds the constituent protons, an error message prompts the user to correct either the peak picking result or the MF.

 

Results:  After conversion, the .nmr file is updated with information regarding proton peaks starting with the keyword “H1:”.  Following is a transcript of the converted 1H peaks:

H1: C:\Spectrum2001\Data\NMR-SAMS\Q-2-test/h1.pks

 #1. 4.930 s   ;1

 #2. 4.755 s   ;2

 #3. 3.509 u   ;3

       .

       .

       .

 #32. 0.818 s   ;32

 #33. 0.811 u   ;33

The first line beginning with the keyword “H1:” indicates the start of 1H peak list. Following the keyword and a blank space, comments may be added up to 80 characters in length. The entries in the rest of the lines represent the following attributes of each 1H peak:

·         Peak ID, a serial number that uniquely identifies this peak.

·         Chemical shift of the peak in ppm values.

·         Multiplicity, designated as s (singlet), d (doublet), t (triplet), q (quartet), m (other multiplet) or u (unknown).  By default it is assigned as unknown. 

·         Comments, which are optional. The number in the comment field corresponds to the ID of the 1H peak in the SpecMan peaks table.

One or more spaces are used as a delimiter for all items except comments that are separated by a semicolon (;).   Items marked as optional can be omitted unless an item following them is included.  In such a case, the user must include default values for ignored items even if they don’t get used.  Comments can always be included as long as they follow a semicolon (;).  The peak list intensities and comments of the 1H peak list are not currently used by NMR-SAMS.

Note: Whenever the user repeats a 1H peaks table conversion or modifies a converted peak list (using Edit/NMR Data File), the dependent 2D spectral data must also be reconverted.  For example, if a 1H peak is added to the converted 1H peak list, the user must reconvert the COSY, HMQC, HMBC, and NOESY data again (if they had been converted already). Otherwise the added 1H peak will not be reflected in the 2D data.

5.3 Conversion of SpecMan 13C Peak List

Command: File/Create NMR Data File/C13 and DEPT.

Description: In this procedure the SpecMan 13C and DEPT/APT peak tables are converted into a peak list of 13C chemical shifts and multiplicities.  NMR-SAMS requires 13C multiplicity information for reliable structure elucidation, and in order to get the complete 13C multiplicity information, the user needs 13C, DEPT-90/APT-90 and DEPT-135/APT-135 experimental data.  However, NMR-SAMS provides a flexible way to derive the 13C multiplicity information from any combination of available experiments as described below:  

 

1.        13C Only. In the dialog box that appears, select ‘None’ for Peak Multiplicity Experiments and then click ‘Browse’ to find and select the SpecMan-created 13C Peaks Table, as shown below:

 

After clicking ‘OK’ NMR-SAMS updates the .nmr file with a list of 13C chemical shifts having unknown multiplicities as shown in the Results section below.  If the multiplicities of some peaks are known, the user can manually edit the .nmr file to supply this information.

2.        13C and DEPT.  In the dialog box that appears, click ‘Browse’ to enter the SpecMan-created 13C Peaks Table.  Then select ‘DEPT’ for Peak Multiplicity Experiments, and enter the peaks table filenames for DEPT-45 (optional), DEPT-90, and DEPT-135 experiments.  As mentioned before, all of the DEPT experiments are optional, so turn off the corresponding toggle if certain DEPT data has not been obtained.  Note that ignoring some DEPT experiments (except for DEPT-45) could leave some peaks with unknown multiplicities.

 

 

Also enter a matching tolerance (in ppm) to match the 13C and DEPT peaks.  Upon clicking ‘OK’, NMR-SAMS will update the .nmr file with a list of 13C chemical shifts and derived multiplicities as shown in the Results section below. 

 

3.        13C and APT.  In the dialog box that appears, click ‘Browse’ to enter the SpecMan-created 13C Peaks Table.  Select ‘APT’ for Peaks Multiplicity Experiments and then enter the peaks table filenames for APT-45, APT-90, and APT-135 experiments. As mentioned before, all of the APT experiments are optional, so turn off the corresponding toggle if certain APT data has not been obtained.  Note that ignoring some APT experiments (except for APT-45) could leave some peaks with unknown multiplicities.

 

 

Also enter a matching tolerance to match the 13C and APT peaks.  Upon clicking ‘OK’, NMR-SAMS will update the .nmr file with a list of 13C chemical shifts and derived multiplicities as shown in the Results section below. 

 

Possible Errors: During the conversion NMR-SAMS crosschecks the 13C peak list with the MF, and alerts the user of potential inconsistencies.  In such cases, the following general messages will be reported:

 

·         If there are more 13C peaks than the constituent carbon atoms, an error message will prompt the user to remove peak artifacts or correct the MF.

·         If there are fewer 13C peaks than the constituent carbon atoms, a warning message will prompt the user to resolve 13C peak overlap.  Define the overlapping peaks as individual peaks with slightly different chemical shifts by choosing Edit/NMR Data File and editing the NMR data file (it is usually possible to resolve such ambiguities by looking at the peak intensity and the HMQC spectrum, or by acquiring the spectrum at different conditions).  If the user is unable to resolve overlapping peaks (for example, in the case of a symmetric molecule, or due to severe overlap in a spectrum), then partial structure elucidation will be performed (see Section 7.1). 

·         If the multiplicity of one or more 13C peaks is unknown, a warning message will prompt the user to supply this information, if possible.  Lack of this information may result in multiple building block sets (see Section 6.2).

·         The number of carbon-attached protons (n_CH ) is calculated based on the 13C multiplicities.  If n_CH is greater than the number of constituent protons, an error message will prompt the user to correct either the multiplicity information or the MF.

·         When the number of 13C peaks is equal to that of the carbon atoms and all 13C multiplicities are known, the maximum number of heteroatom-attached protons (max_XH ) is calculated based on the valence of the constituent heteroatoms.  If (n_CH + max_XH) is smaller than the number of constituent protons, an error message will prompt the user to correct either the multiplicity information or the MF.

 

Results: After conversion, the .nmr file is updated with information regarding the 13C peaks starting with the keyword “C13:” in the .nmr file.  The following is a transcript of a converted 13C peak list (note that if DEPT or APT data is not used, the multiplicity will be unknown “u” for all peaks):

 

C13: C:\Spectrum2001\Data\NMR-SAMS\Q-2-test\c13.pks

 #1. 178.822 s ;1

 #2. 151.323 s ;2

 #3. 109.931 t ;3

       .

       .

       .

 #28. 16.340 q ;28

 #29. 14.929 q ;29

The first line beginning with the keyword “C13:” indicates the start of the 13C  peak list.  Following the keyword and a blank space, comments may be added up to 80 characters in length. The entries in each of the rest of the lines represent the following attributes of the 13C peak:

·         Peak ID, a serial number that uniquely identifies this peak.

·         Chemical shift of the peak in ppm values.

·         Multiplicity, designated as s (singlet, C), d (doublet, CH), t (triplet, CH2), q (quartet, CH3), or u (unknown).

·         Comments, which are optional. The number in the comment field corresponds to the ID of the 13C peak in the SpecMan peaks table.

One or more spaces are used as a delimiter for all items except comments that are separated by a semicolon (;).   Items marked as optional can be omitted unless an item following them is included.  In such a case, the user must include default values for ignored items even if they don’t get used.  Comments can always be included as long as they follow a semicolon (;).  The peak list intensities and comments of the 13C peak list are not currently used by NMR-SAMS.

Note: Whenever the user repeats a 13C peaks table conversion or modifies a converted peak list (using Edit/NMR Data File), the dependent 2D spectral data must also be reconverted.  For example, if a 13C peak is added to the converted 13C peak list, the user must reconvert the HMQC, HMBC, and INADEQUATE data again (if they had been converted already). Otherwise the added 13C peak will not be reflected in the 2D data.

As shown in Fig. 5.1, NMR-SAMS and SpecMan can be used side-by-side to verify the peak picking results of peaks mentioned in warning or error dialog boxes.

5.4 Conversion of SpecMan DQF-COSY Peaks Table

Command: File/Create NMR Data File/COSY.

Description:  In this procedure NMR-SAMS converts the DQF-COSY cross peak coordinates into connectivities between 1D 1H peaks.  As illustrated in Fig. 5.2, the coordinates of the peak center (shown as a cross) are matched to the 1D chemical shifts (shown as dotted lines).  The 1D peaks that match the peak center within the tolerances (±D2 and ±D1 in F2 and F1 dimensions, respectively) are taken as the correlated 1D peaks.  If more than one 1D peak (such as 1H peaks a and b in Fig. 5.2) matches the cross peak center in a certain dimension, then all are treated as possible correlated 1D peaks in that dimension. Such connectivity is called an ambiguous connectivity and NMR-SAMS will internally consider all possible correlations for an ambiguous connectivity (for more details about ambiguous connectivity, see the example in Section 3.4).

Figure. 5.2. Illustrates the conversion of COSY cross peak coordinates into a correlation between the 1D 1H peaks.  The cross (+) denotes the cross peak center.  The dotted lines denote the chemical shifts of the three 1D 1H peaks, a, b, and c, respectively.  D1 and D2 are the matching tolerances along F1 and F2, respectively.  All three peaks, which match the cross peak center within the tolerances, are taken as correlated 1D peaks.

Upon selecting File/Create NMR Data File/COSY, NMR-SAMS opens a dialog box that prompts the user to enter the filename of the COSY peaks table.  The user is also prompted to input matching tolerances along the X (i.e. F2) and Y (i.e. F1) dimensions.

 

 

The default value for the matching tolerance is 0.005 ppm for both dimensions.  It is important to select an appropriate tolerance since too large of a tolerance value could result in undesired ambiguity, and too small of a tolerance value could ignore some real peaks.  To choose a suitable tolerance, the four following factors must be considered:

 

·         Accuracy of the peak picking.  The grid-intelligence-based peak picking of SpecMan provides a very convenient way to verify the accuracy of peak picking by comparing the expected locations of the cross peaks with the picked peaks (see SpecMan's User’s Guide). If a peak list was carefully verified with this method, it is acceptable to start with a small tolerance.

·         Alignment between 1D 1H and the COSY spectra.  SpecMan provides convenient tools to correct frequency offset between the 1D and 2D spectra.  Sometimes different experimental conditions introduce small chemical shift differences between 1D and 2D resonances.  To further correct the differences due to sample conditions, the user can utilize the grid-intelligence-based peak picking method of SpecMan.  If these corrections have been applied, it is acceptable to start with a small tolerance.

 

Possible Errors: