cheminformatics

CHEMINFORMATICS:
LIBRARY & COMPOUND SELECTIONS
V. Feher & Y. Su
NCBR CADD Workshop Aug. 2, 2012
Top 10 Best Practices:
Target-Focused Screening Libraries
1. 
Get familiar with your binding site
2. 
Scour the literature for known substrates, transition state (analogs), small molecule
binders, natural products
3. 
Know your compound sources - NCI, Commercial, Natural Product, PubChem
4. 
Pre-filter your compound sources – REOS
5. 
Filter Known Properties – Flag PK Properties
6. 
Use multiple VS Methods - 2D & 3D similarity searches, docking and pharmacophore
7. 
Keep chemists in mind – diverse, not overly adorned scaffolds and SAR
8. 
Characterize hits with MS (NMR) – did you assay what you thought you bought
9. 
Reorder, retest and secondary assay
10. 
Analog searching for follow-up
2
What is Cheminformatics?
Cheminformatics is the process of amassing information about
small molecules and using this information to make “better
decisions faster in the area of drug lead identification and
optimization.” 1,2
Experimentally derived properties
solubility
microsome stability
cell permeability
toxicity
Chemistry knowledge
tautomers
ionizable groups
chemical stability
Physical & Calculated properties
MW
# rotatable bonds
octanol/water partition coefficient
1 Brown, F.K. (1998) Chapter 35. Chemoinformatics: What is it and How does it Impact Drug Discovery". Annual Reports in
Med. Chem.. Annual Reports in Medicinal Chemistry 33: 375.
2 Brown, Frank (2005). "Editorial Opinion: Chemoinformatics – a ten year update".
Current Opinion in Drug Discovery & Development 8 (3): 296–302.
Figure = NBCR Pipeline; http://www2.nbcr.net/wordpress2/?page_id=1175
3
How does Cheminformatics fit in with the CADD
Pipeline?
Where ever we are dealing with small molecules…..
Ligand Libraries
Filtering: REOS
Clustering, Analysis, Selecting
4
Top 10 Best Practices:
Target-Focused Screening Libraries
1. 
Get familiar with your binding site
2. 
Scour the literature for known substrates, small molecule binders, natural products
3. 
Know your compound sources - NCI, Commercial, Natural Product, PubChem
4. 
Pre-filter your compound sources – REOS
5. 
Filter Known Properties – Flag PK Properties
6. 
Use multiple VS Methods - 2D & 3D similarity searches, docking and pharmacophore
7. 
Keep chemists in mind – diverse, not overly adorned scaffolds and SAR
8. 
Characterize hits with MS – did you assay what you thought you bought
9. 
Reorder, retest and secondary assay
10. 
Analog searching for follow-up
5
FILTERING
6
Why is filtering important?
1. Why dock & analyze compounds you’ll never test?
2. PAINS = “Pan-Assay Interference Compounds”
Article
Problematic scaffolds – has cost their Institute time and $$
Journal of Medicinal Chemistry, 2010, Vol. 53, No. 7
2723
Baell & Holloway “New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries
and for their exclusion in bioassays. J. Med. Chem. (2010) 53, 2719-2740
Figure 2. Problematic cul de sac compounds that have incurred wasted resources through being followed up to varying degrees at our Institute.
We have found chromones such as 5 to be highly susceptible to nucleophilic attack at the 2-position, while β-amino sulfones (and ketones) such
as 2 readily form reactive retro Michael alkenes. Compounds 6-9 are also susceptible to attack by biologically relevant nucelophiles. The other
compounds are problematic for reasons that are either discussed in the text or remain unknown.
screening librarygroups
primary hit setthat
registered
a
See Table 11 in this reference. – compounds,
It has this
functional
they
have trouble with & see in the literature.
substantial number of compounds (51%) as being recognized
tend to contain a greater proportion of problematic compounds. Hence, the percentages here are 13% and 9.6% for
vendors A and B, respectively, compared with 4% and 6% for
vendors C and D. The latter vendors describe their approach
to library design as being more tailor-made than combinatorial, suggesting a link with problematic compounds and facile
chemistry.
by our problematic compound filters.
We also passed our filters back through our original
primary hit sets from the six HTS campaigns under investigation here (SI Table S2), and as shown in Table 7, as would be
hoped, a large percentage of primary hits are removed by our
7
The Compound Library to Docking Pipeline
1. Rapid Elimination of Swill (REOS)1,2
L
I
B
R
A
R
I
E
S
P
R
O
C
E
S
S
I
N
G
F
I
L
T
E
R
F
I
L
T
E
R
F
I
L
T
E
R
P
R
O
P
E
R
T
Y
C
A
L
C
S
F
I
L
T
E
R
Filtering Tools:
Open Eye FILTER
Accelrys Pipeline Pilot
CCG MOE “wash” & db tools
Schrodinger Canvas/Qikprop
C
P A
K L
C
S
F
I
L
T
E
R
2D
YOUR
LIBRARY
More Docking
Preparation
1 Walters, WP, Murcko, MA “prediction of “drug-likeness”. Adv. Drug Delivery Rev. (2002) v. 54, 255 – 271
2 Walters, WP, Stahl, MT, Murcko, MA “Virtual screening – an overview” Drug Discovery Today (1998) 3, 1608- 178
The Compound Library to Docking Pipeline
2. 2D Library to 3D Docking Library
2D
2D to 3D valence check tautomers
conversion
ionization stereoisomers
docking
format
3D
Docking Prep Tools:
Open Eye OMEGA
Accelrys Pipeline Pilot
CCG MOE db tools
Schrodinger LigPrep
MGL Tools
9
Basic “Washing” •  Removing Salts & Unwanted Elements
•  Filter out cationic atoms: Ca2+, Na+, etc
•  Filter out metals: Sc,Ti,V,Cr,Mn,Fe,Co,Ni,Cu,Zn,Y,Zr,Nb,Mo,Tc,Ru,Rh,Pd,Ag,Cd
•  Often the salt “filter” = keeping the largest molecule in the sdf entry
•  ALLOWED_ELEMENTS H, C, N, O, F, P, S, Cl, Br, I
10
Basic “Washing” •  Proper Atom Types
•  Filter adds hydrogens and checks if O, N, C valences make sense –
sometimes sdf have corrupt entries
•  Checks formal charge
•  If it doesn’t make sense – it “fails” the compound
•  Ionization
•  Filter uses a rule based method to add Hs & charge to particular groups for property
calculations that occur later; it assumes pH = 7.4
11
Filter out: Reactives
Covalent Inhibitors
Reactive to protein functional groups
Rishton, G.M. “Nonleadlikeness and leadlikeness in biochemical screening” Drug Discovery Today (2003) 8, 86-96
Open Eye Filter Manual – has many examples
Filter out: Synthesis Intermediates, Chelators, other
unwanteds
Blocking groups
Phosphates
Chelators – bind metals
Rishton, G.M. “Nonleadlikeness and leadlikeness in biochemical screening” Drug Discovery Today (2003) 8, 86-96
Open Eye Filter Manual – has many examples
13
Filter out: Dyes
These may give false positives if you are using a photometric assay. Also
these are of little pharmacological interest. Usually highly conjugated & flat
aryl compounds. –NO2 and –SO3 groups add solubility to these flat
conjugated systems.
14
Filter out: Aggregators & Promiscous Binders
ES
this value. If these particles were hollow, similar to a
Promiscous
binders
= compounds
we would
expect roughly
a 10-fold
difference between
give “positives”
many
ured that
and calculated
volumes. in
Even
withtarget
errors from
d sticking
to the plate and the small percentage of the
assays.
hat fell below the accurate counting range of the flow
, both of which reduce the apparent density of the
One mechanism for many of these is
these results suggest that the aggregates are largely
aggregation in solution. They are
typically
greasy
& flat.
physical
properties
of these
promiscuous aggregates
Coan and Shoichet
Drugs – promiscuous at high concentrations
Figure 6. Model of aggregate structure and enzyme binding. Some organic
molecules can form densely packed particles (108 small molecules per
aggregate for larger particles) in aqueous media. Once formed, these larger
particles sequester and then inhibit enzyme with a stoichiometry of
approximately 104 enzyme molecules per aggregate. The surface of the
aggregate is sufficient to accommodate all bound enzyme.
e into focus. It is easy to imagine that these particles
ermediate
formmechanism
of precipitate, but
that does not seem
Another
is general
case.protein
Althoughunfolding.
aggregates can transition to precipitant
dressed for aggregate mechanism, including why enzyme
ntration is increased, the latter does not sequester
becomes inhibited when bound to an aggregate.
24
Consistent with these observations, the particles here
These caveats, while important, do not diminish our confiis worth learning
whatand
these
be inIt equilibrium
with monomer
are a look
reversible
dence in the main conclusions of this study, which suggest the
like. the concentration of a suspension of
n lowering
following model (Figure 6). At micromolar concentrations,
s below its CAC, the particles rapidly redissolve (tens
organic molecules can reversibly associate into colloid-like
ds). As anyone who has tried to dissolve organic
particles in aqueous media. For larger particles, about 108 smallnto aqueous solution can attest, this is rarely true for
molecule monomers associate per particle. These particles are
ed material, which is why most organic molecules are
packed and, again for larger particles, sequester about
http://shoichetlab.compbio.ucsf.edu/aggregators.php
McGovern, S.I. et al. J.Med. Chem. (2002) 45, 1712 – 1722densely
4
to aqueous
buffer from DMSO stocks. Thus, although
molecules
each. (2008)
Whereas
we9606
cannot
rule out
Roche, O. et al. J. Med. Chem. (2002) 45, 137-142; Coan, 10
K. E.enzyme
D. & Shoichet
B.K. JACS
130,
– 9612.
15 the
gates are only transiently stable, the individual particles
possibility that enzyme is absorbed inside the aggregate,
More examples of promiscuous binders
McGovern, S.I. et al. J.Med. Chem. (2002) 45, 1712 – 1722
Roche, O. et al. J. Med. Chem. (2002) 45, 137-142; Coan, K. E. D. & Shoichet B.K. JACS (2008) 130, 9606 – 9612. 16
Filter out: Unwanted Functional Groups
These are removing cases where there are too many of a type of functional group
You can CUSTOMIZE the rules depending on the goals for your library (OE rules)
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
2
4
4
2
3
5
1
0
1
2
2
2
2
1
1
1
2
alkyne
aniline
aryl_halide
carbamate
ester
ether
hydrazone
nonacylhydrazone
hydroxylamine
nitrile
sulfide
sulfone
sulfoxide
thiourea
thioamide
thiol
urea
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
6
4
4
4
2
4
4
4
2
6
0
2
4
1
1
1
alcohol
alkene
amide
amino_acid
amine
primary_amine
secondary_amine
tertiary_amine
carboxylic_acid
halide
iodine
ketone
phenol
imine
methyl_ketone
alkylaniline
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
RULE
1
0
0
3
3
1
1
0
1
0
1
4
1
oxime
isothiocyanate
isocyanate
lactone
lactam
thioester
carbonate
carbamic_acid
thiocarbamate
triazine
malonic
sulfonamide
sulfonylurea
17
OE’s FILTER: Physical Property Calculations & Default
Cutoff’s
• 
• 
• 
• 
• 
• 
• 
• 
Molecular Weight
(130 ≤ MW ≤ 781)
Heavy Atom Count
(9 ≤ HVY ≤ 55)
Carbon Count, Hetero Count, (3 ≤ #C ≤ 41) , (1 ≤ HETERO ≤ 14)
Hetero/Carbon Ratio
(0.4 ≤ HET/C ≤ 4.0)
Chiral Centers Count
(0 ≤ Chiral ≤ 21)
H-bond Acceptors :
(0 ≤ HBA ≤ 13)
H-bond Donors : (0 ≤ HBD ≤ 9)
• 
counts the # of Hs on N, O, S
•  Halide Fraction: (0 ≤ Halide Fraction ≤ 0.66)
• 
Note: some cutoffs change
when more rules are added,
eg: Lipinski, Verber, etc.
MW of halide/MW of cmpd
What’s most important?
•  Formal Count (0 ≤ # Formal Charge ≤ 4)
• 
# atoms with formal charge
I typically focus on:
•  total formal charge
cLogP
Unbranched chain: # unbranched connected non-ring atoms MW
Connected, non-ring (0 ≤ keep ≤ 19)
# rotatable bonds
Ring systems:
(0 ≤ keep ≤ 5)
# halides: not too many –Br, -I
•  # of contiguous rings
Ring size
(0 ≤ keep ≤ 20)
# rings
•  Formal Sum: (-2 ≤ Sum Formal Charge ≤ 2)
• 
• 
• 
• 
•  Rotor Count :
• 
•  Rigid Count:
• 
(0 ≤ keep ≤ 16)
# of rotatable bonds
# of non-rotatable bonds
(0 ≤ keep ≤ 55)
Cheminformatics Rules-of-Thumb for Hit Selection &
Lead Optimization
4832 Journal of Medicinal Chemistry, 2010, Vol. 53, No. 13
Muchmore et al.
Table 2. ref. below
Table 2. Cheminformatic Rules-Of-Thumb for Hit Selection and Lead Optimization
parameter
oral bioavailability
(“rule of 5”)
oral bioavailability
oral bioavailability
(“Golden Triangle”)
toxicity
toxicity
membrane
permeability
membrane
permeability
blood-brain
barrier penetration
solubility
general
“developability”
rules-of-thumb
MW e 500 Da
ClogP e 5
H-bond donors e 5
#(N þ O) e 10
Nrot e10
PSA e 140 Å2
MW e 500
variable LogD
(LogD range: 0 - 5)
ClogP e 3
PSA g 75 Å2
LLE g 5
PSA e 120 Å2
MW e 500
variable LogD
(LogD range: 0.5 - 5)
PSA e 70 Å2
Fsp3 g 0.4
number of aromatic
rings e 3
comment
programs
violation of these limits
decreases oral bioavailability
key references
85,86
Lipinski (1997)1
Biobyte ClogP
or
ACD LogP v4.012
Wenlock (2003)12
violation of these limits
decreases oral bioavailability
violation of these limits
decreases oral bioavailability
tPSA62
(nitrogen and oxygen only)
experimental LogD
Veber (2002)13
violation of these limits
increases the risk of toxicity
low ligand-lipophilicity
efficiency can lead to
increased promiscuity
violation of this limit
decreases membrane permeability
violation of these limits
decreases membrane permeability
Biobyte ClogP v4.385
tPSA62 (nitrogen and oxygen only)
Biobyte ClogP85
Hughes (2008)2
Quanta 3D
(nitrogen and oxygen only)
ACD PhysChem Batch87
or AZlogD88
Kelder (1999)61
violation of this limit
decreases brain penetration
increased fraction of sp3
hybridized carbons (Fsp3)
increases solubility
increase in aromatic ring
count decreases solubility
and increases protein binding
Quanta 3D
(nitrogen and oxygen only)
Pipeline Pilot 7.5
Kelder (1999)61
none listed
Ritchie (2009)52
Johnson (2009)35
Leeson (2007)19
Leach (2006)23
Bhal (2007)34
Waring (2009)36
Lovering (2009)51
Commercial
Tools:
Schrodinger’s
& Canvas,
Pipeline
CCG
in others.
This has
led to a nonsystematicQikProp
and perhaps even
we havePilot,
found that
high MOE
ConsistentAccelrys’
with external reports,
haphazard
application
of
this
rule
in
many
Discovery
settings
molecular
weight
compounds
are
more
likely
to
be
false
database tools,
ACD
(even within the same group of medicinal chemists), making
positives than low molecular weight compounds, and there20
it difficult to assess its impact on productivity. Support for
fore, engaging in hit-to-lead activities on compounds more
pounds from four major pharmaceutical companies were
compared.19 It was concluded that a large fraction of com-
interesting actives in this 50% that can be appropriately
“down-sized” during lead optimization. This is most cer-
Muchmore, SWthis
et al.
Medicinal
Chemists”
J.beMed.
Chem.
(2010)
53,resources.
4830 –However,
view“Cheminformatic
comes from a recent Tools
analysisfor
from
AstraZeneca,
likely to
artifacts
misdirects
precious
where thewithin
physicochemical
properties of patented coma common contrarian response is that perhaps there are
4841. (see references
for “key references”).
19
Calculated Properties: cLogP and cLogD
These are partition coefficients for small molecules between octanol and water.
There is a correlation between these values and a molecule’s solubility and
ability to cross membranes
Note: cLogP ignores charges on molecules –
It is invalid for compounds with charge
This is pH dependent and is usually
reported for pH = 7.4
Different cLogP calculations can vary up to 1 log unit. It’s good to remember that when
applying Lipinski guidelines
20
TPSA = topological polar surface area
Some TPSA programs only count O & N
atoms
Of all the calculated properties – this is
one of the best correlations
Ertl, P et al. “Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its
application to the prediction of drug transport properties” JMC (2000) 43, 3714-17
21
Practical: Using OpenEye’s FILTER
Input = sdf, smiles, smarts
http://eyesopen.com/
In 2D format
22
Practical: Using OpenEye’s FILTER
/Applications/OpenEye/bin/filter –in file.sdf –filter lead –prefix clean –fail failed –out clean_out
23
Practical: OpenEye’s FILTER output
•  clean_out = compounds for VS
failed = compounds that failed
•  clean.info file
•  clean.log
•  Top of the logfile lists the parameter settings used
•  End of the log file has each compound listed + pass/fail, failure reason
24
Practical: Using OpenEye’s FILTER
•  Default filters
•  lead
•  drug
•  blockbuster
The filters are text files –
they can be found in the
directory
/OpenEye/data/
25
Filtering: Customizing FILTER
you can edit your own using the “lead” filter as a template
or
add SMARTS strings to remove specific compounds in a separate text file &
using “- newrule” in your command line
/Applications/OpenEye/bin/filter –in file.sdf –filter lead –newrule file.txt –out clean_out
SMARTS theory:
http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
For easier conversion of smiles strings to SMARTS:
http://www.chemaxon.com/marvin/help/formats/smiles-doc.html#SMARTS
or
Open Eyes’ OMEGA
26
Top 10 Best Practices:
Target-Focused Screening Libraries
1. 
Get familiar with your binding site
2. 
Scour the literature for known substrates, small molecule binders, natural products
3. 
Know your compound sources - NCI, Commercial, Natural Product, PubChem
4. 
Pre-filter your compound sources – REOS
5. 
Filter Known Properties – Flag PK Properties
6. 
Use multiple VS Methods - 2D & 3D similarity searches, docking and pharmacophore
7. 
Keep chemists in mind – diverse, not overly adorned scaffolds and SAR
8. 
Characterize hits with MS – did you assay what you thought you bought
9. 
Reorder, retest and secondary assay
10. 
Analog searching for follow-up
27
Flagging vs. Removing
•  Each user may have different screening goals
•  Flagging properties, dyes and/or potential aggregators may be
useful to simply raise awareness of a potential compound hazard
•  eg. PAINS – includes azo dyes because they are looking for easy
chemistry and cancer drugs with high toxicity tolerance but they remove
many aggregators
•  If you are looking for compounds likely to pass the BBB, you may want
to flag for low TPSA values to prioritize which hits to follow up on later
•  Over-filtering may lead to low hit rate, but lack of filtering
can lead to wasted resources & time
28
Example: Library for Neglected Diseases, Dundee
Scotland)
Asinex
Biofocus
Bionet
Chembridge
Chemdiv
IBS
Maybridge
Peakdale
Sigma-Aldrich
Specs
Tripos
2.26 Million
cmpds
A
V
A
I
L
A
B
L
E
D
U
P
L
I
C
A
T
E
S
1.75 M
U
N
W
A
N
T
E
D
G
R
O
U
P
S
Properites
10-27 heavy atoms
<4 HBD
<7 HBA
0<HBA+HBD<10
0-4 cLogP/cLogD
C
O
M
P
L
E
X
I
T
Y
0.2 M
D
I
V
E
R
S
I
T
Y
V
I
S
U
A
L
57,438
cmpds
0.09 M
0.9 M
Brenk, R. et al. Lesson learnt from assembling screening libraries for drug discovery for neglected disease.
Chem. Med. Chem. (2008) 3, 435-444.
29
The Compound Library to Docking Pipeline
2. 2D Library to 3D Docking Library
2D
2D to 3D valence check tautomers
conversion
ionization stereoisomers
docking
format
3D
Docking Prep Tools:
Open Eye OMEGA
Accelrys Pipeline Pilot
CCG MOE db tools
Schrodinger LigPrep
MGL Tools
30
Tautomerization
•  Why is it important to VS?
•  Location of HBA and HBD
•  Structural conformation
•  Properties
Keto-enol tautomers
Oellien, F., Cramer, J., Beyer, C., Ihlenfeldt, W.H., Selzer, P.M.;
The Impact of Tautomer Forms on Pharmacophore-Based Virtual Screening; J. Chem. Inf. Model. 46 (2006) 2342-2354.
31
How does Cheminformatics fit in with the CADD
Pipeline?
Where ever we are dealing with small molecules…..
Ligand Libraries
Filtering: REOS
Clustering, Analysis, Selecting
32
Selecting Compounds: What to do about data overload!
•  There are many ways to select your best docked compounds
•  Docking score – top 100(?)
•  Consensus score, other docking scores
•  Rescoring – MM-PBSA, MM-GBSA, TI, NN-Score
•  You need to look at your selections – with large compound databases –
perhaps multiple poses saved & RCS – this is more that can be reasonably
reviewed visually
33
Compound Clustering
•  Take the top scoring compounds (100 – 5000) and their docking scores –
cluster them by chemical types (scaffolds, chemotypes)
•  You can then look for scaffolds that give the best scores and view a few of
them
•  Do they make reasonable interactions within the pocket?
•  Are the conformations of the compounds reasonable?
•  Are there particular functional groups on the chemotype skewing the
score?
34
Scaffold Hunter
scaffold bin example
http://scaffoldhunter.sourceforge.net/index.html
Schuffenhauer, A. et al. “The Scaffold Tree – Visualization of the Scaffold Universe by Hierarchical
Scaffold Classification” JCIM (2007) 47, 47-58
35
Tripod: NIH Cheminformatics Scaffold – Activity
Diagram
Another “freeware” option for
scaffold binning
http://tripod.nih.gov/
36
Top 10 Best Practices:
Target-Focused Screening Libraries
1. 
Get familiar with your binding site
2. 
Scour the literature for known substrates, small molecule binders, natural products
3. 
Know your compound sources - NCI, Commercial, Natural Product, PubChem
4. 
Pre-filter your compound sources – REOS
5. 
Filter Known Properties – Flag PK Properties
6. 
Use multiple VS Methods - 2D & 3D similarity searches, docking and pharmacophore
7. 
Keep chemists in mind – diverse, not overly adorned scaffolds and SAR
8. 
Characterize hits with MS – did you assay what you thought you bought
9. 
Reorder, retest and secondary assay
10. 
Analog searching for follow-up
37
Scaffold Diagrams can assist with selecting sets for
SAR
Example
If a scaffold makes sense in the binding
pocket
- purchase the best scorer +
- a variety of functional groups
This gives you and the chemist and idea
of what changes might work.
http://scaffoldhunter.sourceforge.net/index.html
Schuffenhauer, A. et al. “The Scaffold Tree – Visualization of the Scaffold Universe by Hierarchical
Scaffold Classification” JCIM (2007) 47, 47-58
38
Scaffold Hopping
An approach to discover novel compounds with different central core
structures from the known leads/hits.
Taken from
Drug Discovery Today
Classification of scaffoldhopping approaches
(2012) 17, 310–324
39
LIBRARY SOURCES
40
How does Cheminformatics fit in with the CADD
Pipeline?
Where ever we are dealing with small molecules…..
Ligand Libraries
Filtering: REOS
Clustering, Analysis, Selecting
41
Compound Library Sources
Compiled Libraries
Commercial Libraries
Asinex
ZINC
~17.8M
Pharmex
Specs
ChemBridge
Enamine
IBS
Maybridge
Publically Available Libs
eMolecules
~8.3M
NCI
Molsoft’s
Molcart
11.5M
ChemDiv
Chemical Library
ACD
PubChem
Natural Product Libraries
NCI
Sequoia
Timtec
42
Compiled Database of Commercial Compounds: ZINC
Great resource for docking collections – but you may want to select just a few vendors
http://zinc.docking.org/browse/subsets/
43
Library Sources – Commercial Library Example, Asinex
Old collections are from Russian
universities – collected in the late
1990s
They typically synthesize their own
libraries
They use sophisticated chemistries
Some libraries will be highly
specialized for current popular drug
targets in pharma & they will be
expensive!
General collections are a good source
- Individual compounds can be
ordered on-line, 48 hr delivery
- Larger orders, 4-6 weeks.
It’s cheaper to order from them
directly than through eMolecules.
http://www.asinex.com/
44
Library Source - NCI
•  Benefits - Great resource for free compounds!
•  Caveats – The old adage: “You get what you pay for”
•  NCI = National Cancer Institute - Most cancer compounds are toxic
•  This database has lots of dyes, steroid-like & aggregators
•  Typically they are not good starting scaffolds for chemists
•  A few compounds are being used by many academic labs
Properties
NCI
140K
cmpds
>250mgs
& mol file
80K
cmpds
5 pharm4 features –
dissimilar to others
≤ 5 rot. Bonds
Planar
≤ 1 chiral center
No leaving groups
No organometallics
No polycyclic aromatic
hydrocarbons
3046 cmpds
http://dtp.nci.nih.gov/branches/dscb/div2_explanation.html
Purity >90%
MS/LC
Diversity
Set III
1597 cmpds
OE “lead”
< 500 cmpds
45
OH
OH
O
O
HO
Ac NH
O
O
O
O
Library Sources: Natural Product Library
OH
O
COOH
O
OH O
HO
O
O
OH O
(19) Ginkgetin (biflavone)
Natural Product Sources:
- Collaborations
- ZINC
O
O
HO
HO
OH
(20) Hinokiflavone-sialic acid
La
HO
HO
OH
HO
O
HO
HO
O
OH
O
HO
OH O
OH O
(21) Apigenin
OH
(22) Vitexin
flavanoid
O
OH O
(23) Chrysin
OH O
Sialic acid
Natural product properties differ
from
O
OH
Journal of Medicinal Chemistry analog
O
O
O
HO
synthetic libraries:
HO
O
OH
OH
•  More O atoms, Less N atoms
OH O
OH O
(24) Rhamnocitrin (3,4’,5(25) Hinokiflavone
•  Less rotatable bonds
trihydroxy-7-methoxyflavone)
synthesized:
•  Less aromatic rings
•  More fused rings
In this work we have used the FIT function in conjunction with
cases of random da removal fo
the rule of thumb that at least 5 data points should be present for
(5 molecules).
Note: Many NPs break common
each fitting parameter [17] to set the optimal number of molecular
•  More chiral centers
descriptors (d ) in the linear regression equation.
3. Results and discussion
Lipinski guidelines and so do most
antiAs a theoretical validation of all the models we choose the wellknown Leave-One-Out
(loo)chemical
and the structures
Leave-More-Out
Cross-ValidaBy means
the ERM
we search
Figure 1. The
of influenza
NA inhibitors used
in thisof
study.
1, Neu5Ac2en
(N
bacterials.
tion procedures
(l-n%-o) [18],
where n%octanoate
accounts(CS-8958);
for the number
of
descriptors and obtained optimal m
3, laninamivir;
4, laninamivir
and 5, oseltamivir.
Fig. 1. (continued).
op
molecules removed
from the training set. We generated 1,000,000
doi:10.1371/journal.ppat.1002249.g001
eters linking the molecular structur
adopts a Glu276-Arg224 salt bridge in its laninamivir octanoate
complex, forming a hydrophobic pocket that is also necessary to
accommodate oseltamivir. The observation of different Glu276
rotation in p09N1 and p57N2 offers insight into the group specific
differences of oseltamivir binding and resistance.
Val149 is [13]. N5 on the ot
150 salt bridge and displays a
NAs with Val149 and no 14
[15]. Therefore, NAs with th
46
covered in our comparative a
Reputable Sources
•  Asinex
•  Chembridge
•  ChemDiv
•  Specs
•  LifeChemicals
•  IBS
•  Maybridge
•  NCI*
47