Rgroup Query Tutorial
Introduction
JChemPaint allows you to draw, save and retrieve "Rgroup queries".
Rgroup queries are defined by the Symyx RGfile format, more details for which can be found
in the "CTfile Formats" manual downloadable from www.symyx.com.
In essence, an Rgroup query allows you to define multiple molecules in one data structure
by combining different substituent attachments with one root structure (or scaffold).
A JChemPaint screenshot may help to clarify this:
This (simple) example contains a root structure with one Rgroup labeled "R1" in pink.
For R1, three substituents are defined and drawn under the root structure. Three possible molecules
can thus be derived by substituting R1 with one of these.
The substituents are to be attached to the root by replacing R1 with the substituent atom marked with an asterisk.
It will connect to the root using the bond that is also marked with an asterisk.
Using the "R-groups" menu, you can let JChemPaint generate the possible configurations for any Rgroup query
and save these to an SDF file.
For the example above, this would result in the following configurations (BioClipse screenshot):
Drawing R-groups
To draw R-groups, the following JChemPaint menus are relevant
- "R-groups" (from the main menu bar or by right clicking the canvas)
- "R-group attachment" in the bond popup menu (by right clicking a bond)
- "R-group attachment" in the atom popup menu (by right clicking a bond)
This section will explain how to create Rgroups using the aforementioned menus.
Define the root
The suggested way to construct Rgroup queries is to first draw all the structures you need, and then assign Rgroup aspects
to these. Going back the earlier example, you would first draw the four structures that make up the query. The R1 atom is a so
called "pseudo atom". To draw it, you can draw it as a normal (say Carbon) atom first, and then right click it and from the
atom menu pick Pseudo Atoms->R1.
Next, use a selection tool to select the entire structure that is intended to become the root structure. With the selection made,
right click the canvas and pick R-groups->Define as Root Structure. See the picture below for an illustration.
Define the substituents
Once a root structure has been chosen, the remaining structures in the drawing are subsequently flagged as "Not in R-Group".
To continue, select each structure that is to become a substituent, and choose R-groups->Define as Substituent. JChemPaint
will then prompt you to "Enter an R-group number". In our example, we'll enter 1. In general, you can pick any number
that corresponds to an R1...R32 atom in your root structure. In this way, you link the substituent to the specific Rgroup
of your choice.
JChemPaint will randomly pick a "connection point" atom on a newly declared substituent and flag this atom with an
asterisk. Potentially, if your Rgroup atom has multiple bonds connecting it to the root, it will pick a second
attachment point and flag this with a double quote. The asterisk and quote on the substituent correspond to the
similarly flagged bonds on the root. The RGfile format allows for two attachment point/bonds to be identified per Rgroup, not more.
You are allowed to change the attachment atom(s) in the substituents and the attachment bond(s) in the root. For example,
to make another atom in a substituent the attachment point, select it and right click it. From the atom popup menu, select
R-Group attachment->Set as first attachment point (or Set as second attachment point).
Similarly, you can pick a bond in the root structure that connects to an R-atom and right click it to define its attachment
using the bond popup menu.
Advanced Rgroup logic
With the Rgroup query in place, the RGfile format allows you to define some more advanced logic on top of it.
With JChemPaint this can be done using menu option R-groups->Advanced R-group logic.
For each group you can indicate:
- Occurrence: the occurrence value determines the number of times that a member of that an Rgroup must appear in the retrieved
structures. It can be relevant for more elaborate queries, for instance when you have multiple pseudo atoms
labeled R1 in your root structure and want to specify the substitution occurrence.
Allowed values are:
- n: exactly n
- n - m: n through m
- < n: fewer than n
The default is ">0". Any non-contradictory combination of the preceding values is also allowed; for example: "1,3-7,9,>11".
- RestH: with RestH set to true, only hydrogen atoms are permitted at the unsatisfied Rgroup sites of retrieved structures. If RestH is set to false (the default),
unsatisfied Rgroup sites can be filled by hydrogen atoms or other molecule fragments.
This becomes relevant when you allow the occurrence for a certain group to be lower than the total amount of R-atoms you have drawn for that group.
- If..Then: you can define dependencies between Rgroups. If a root structure contains for example R2 and R10, you can make the condition
IF R2 THEN R10 (or the other way around). So if a substituent for R2 occurs in a retrieved structure, then it must
also have a substituent for R11.
Saving Rgroups
When you save the drawing, choose the "MDL MOL file" format as the file type. There is no exclusive file extension reserved for RGfiles.
JChemPaint will prompt you if you want to retain the RGroup information. Choose Yes; if you do not, the drawing will be saved as a regular
MOL file instead of an RGroup query (extended) MOL file.
Generating configurations
To confirm the correctness of you RGroup query specification, you can let JChemPaint/CDK generate all its possible configurations into an SDfile.
This can be done with menu option R-groups->Generate possible configurations. You can use a tool like BioClipse to browse the content of the
SDfile.