Where do all those small molecules come from? What the data reveal

Where do all those small molecules
come from? What the data reveal
Paul Peters
Director, EMEA Sales
CAS – SIIL, Netherlands
ICIC, Vienna, Austria
October 27th, 2010
October 28, 2010
CAS is a division of the American Chemical Society
ACS Vision
Improving people’s
lives through the
transforming power of
chemistry
CAS is a division of the American Chemical Society.
2
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
CAS supports the mission of the ACS
ACS Mission
To advance the broader chemistry
enterprise and its practitioners for the
benefit of Earth and its people.
CAS Commitment
To support the ACS mission by providing
secure access to high-quality,
comprehensive chemical information.
CAS is a division of the American Chemical Society.
3
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
Three major services provide access to CAS database
information
CAS is a division of the American Chemical Society.
4
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
CAS mission drives development of the CAS databases
CAS is a division of the American Chemical Society.
5
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
Growth in published chemistry has stayed strong in the
last decade
Published articles and patents in chemical science by CAS 2003-2010
1400000
1200000
1000000
800000
600000
400000
200000
0
2003
2004
2005
2006
2007
2008
2009
2010
Projection
CAS is a division of the American Chemical Society.
6
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
Worldwide patenting of new chemical research
continues to accelerate…
CAS Patents Covered each Year
1,200,000
1,000,000
800,000
600,000
400,000
200,000
0
1972 1976 1980 1984 1988 1992 1996 2000 2004 2008
CAS is a division of the American Chemical Society.
7
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
CAS analyzes global chemical information, including
publications from Asia
Each year, CAS covers
• 10,000 serial journal titles and 61 patent authorities worldwide
• 2,100 Asian serial journal titles
• All major Asian patent authorities, including offices in
– People’s Republic of China
– South Korea
– Japan
– India
Chinese, Japanese,
and Korean language
publications account
for 33% of new CAplus
database records
CAS is a division of the American Chemical Society.
8
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
For complex chemistry, CAS chemists classify substance
information and verify graphical processes and structures
1. Review reaction and structure
CAS is a division of the American Chemical Society.
2. Create registration record
9
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
Accurate and timely, CAS substance analysis for CAS
REGISTRY ensures scientists can find information
1. Scientist asks structure question
3. Finds the PCT patent application
2. Finds 433 compounds and 3 publications
CAS is a division of the American Chemical Society.
10
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
CAS chemists interpret when compounds are described
in terms other than singular structures or names
CAS is a division of the American Chemical Society.
11
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
From published patent to completed indexing in
SciFinder®: 15 days
A typical chemistry patent:
• PCT application
• “A new antibacterial”
• 250 pages
• 24 claims
CAS is a division of the American Chemical Society.
12
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
From published patent to completed indexing in
SciFinder: 15 days
Dr. Mark Willi from CAS completes substance
indexing 15 days after WIPO publishes the patent
CAS chemist analysis of
single PCT application
• 917 indexed compounds
from Examples and Claims
• 576 new compounds added
to CAS REGISTRY
• 613 single-step reactions
• 5,394 reactions with multiple
steps
• 1,029 reaction participants
• MARPAT® Markush
structure with 2,119
substituent definitions
CAS is a division of the American Chemical Society.
13
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
Challenges in substance and reaction indexing show the
value of intellectual substance identification
OMe
2 R=H
3 R=Me
10 R=Me
Scheme 1. Reagents and conditions: (a) HCO2Et, NaH; (b) H2NCH2CN, NaOAc, 80%; (c) ethyl
chloroformate, DBN, 0 °C, then DBN, 0 °C to rt; (d) Na2C03> MeOH, 71%; (e) HCO2H, reflux, 75%; (0
POC13, 100 °C; (g) (MeO)2SO2, K2CO3, 75%; (h) hydrazine hydrate, EtOH, 80 °C, 82%; (i) pyridine-4carboxaldehyde, pyrrolidine (cat.), EtOH, 80 °C, 80%.
Since products of step b, d, e, and h were characterized by their
yield, CAS policy demands these are to be registered and
indexed for CASREACT
CAS is a division of the American Chemical Society.
14
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
CAS specialists in many fields of chemistry interpret
author terminology to register compounds
Author identified this compound
only as D4GlcUA-GlcNAc(GlcUA-GlcNAc)5-PA
CAS is a division of the American Chemical Society.
15
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
Patents regularly describe substances in ambiguous
ways: In WO 2007089907 this “desired product” is fully
characterized
CAS is a division of the American Chemical Society.
16
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
(Relatively) new substance classes can be registered
Metal-organic frameworks
show great potential for
capture of H2 or CO2 or in
other gas separation
processes
CAS is a division of the American Chemical Society.
17
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
By the end of 2010, CAS REGISTRY will contain more
than 56 million compounds
Registered Compounds (millions)
Total Small Molecules
60
50
40
CAS
REGISTRY
CAS
Registry
30
20
10
0
65
CAS is a division of the American Chemical Society.
69
73
77
81
18
85
89
93
97
01
05
09
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
CAS REGISTRY substances are enriched with spectra,
numeric properties, tags, and published sources
Spectra
• More than 42M calculated NMR
spectra (H1, C13, hetero), with 26M
added in 2009
• More than 700,000 experimental
spectra (MS, NMR, IR, Raman),
with another 300,000 newly
acquired MS, IR, and NMR to
be added in 2010
Numeric
• More than 4.5M experimental property values (m.p., b.p., optical rotary
power, etc.)
• 7.5M data tags linked to indexed documents
• 2.68B calculated bio-“evaluation” metrics (bio-concentration, Log P,
Lipinski, etc.)
CAS is a division of the American Chemical Society.
19
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
CAS continues to register new small molecules in
significant numbers
CAS REGISTRY Growth, 2003-2010
60
56.0
51.3
Millions
50
41.5
40
30
25.0
27.0
22.3
2003
2004
2005
30.0
33.3
20
10
0
2006
2007
2008
2009
2010
Projection
CAS is a division of the American Chemical Society.
20
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
Patenting has increasingly "monetized" new chemical
information
Percentage of total
Percentage of New Compounds added to CAS REGISTRY from Patents
%
*CA Database annual average is 23% patents
CAS is a division of the American Chemical Society.
21
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
CAS has indexed more than 10.1M exemplified prophetic
substances from patent documents since 1993
CAS is a division of the American Chemical Society.
22
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
CHEMCATS continues to grow and remains a source of
new small molecules
Number of Catalog Products and Unique
Substances in the CAS CHEMCATS Database
CAS is a division of the American Chemical Society.
23
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
Chemical substances from web-based sources provide a
moderate addition to the small molecules in REGISTRY
•
•
1.6M substances have been captured from Internet substance collections
(They must be from a reputable source and contain characterizing data)
The following chart illustrates some of the larger collections:
400,000
350,000
300,000
250,000
200,000
150,000
100,000
50,000
sS
D
N IS
TM
as
Am
NC
I3
pe
c
ter
b in
t
Br
oa
dI
ns
DB
Ch
em
r
id e
Sp
Ch
em
Z IN
C
0
CAS is a division of the American Chemical Society.
24
Copyright 2010 American Chemical Society. All rights reserved.
October 28, 2010
Conclusions:
What the data reveal about small molecules
•
•
•
•
•
Information professionals and scientists from around the world rely on
CAS to provide the most authoritative, current, and complete collection of
substances
Substances are mainly from the journals and patents covered by CAS
databases and indexed by CAS scientists
Intellectual processing of substance identification provides superior
content
Unique substances are found in chemical catalogs and chemical libraries
as long as there is proof they exist
Internet sources provide some complementary pointers to disclosed
substance information
CAS is a division of the American Chemical Society.
25
Copyright 2010 American Chemical Society. All rights reserved.