Presentation as pdf

Maximizing Search Results in
Bio-pharmaceutical Space
Srinivasan Parthiban
ICIC 2011 23-26 October 2011, World Trade Center, Barcelona, Spain
Parthys Reverse Informatics
IIT Madras Reserch Park – 8C
Kanagam Road, Taramani
Chennai 600 113, Tamil Nadu, India
T: +91-98409-75643, F: +91-44-4261-7070
[email protected]
26 October 2011
The Two Pillars in Drug Discovery
Bioassays
Biological Target
Chemical Compound
The Changing Face of Life Science Industries
Yesterday
Big Life
Science
Company
Today
Why Industry Cares?
The volume of data
The heterogeneity of data
Growth of PubMed
1000000
900000
800000
700000
600000
500000
400000
300000
200000
100000
0
Articles added per year
Growth of Patents
600000
US Patent Applications
500000
400000
300000
US Granted Patents
PCT Applications
200000
100000
0
1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010
Many Biological Entities have Multiple Synonyms
EC 2.7.11.26
Understanding the Territory
Gene Expression
Structural Biology
Protein Interaction Networks
Structure-Activity Relationships
Pharmacology
Toxicological Properties
Clinical Response
Genetic Variation
Drug Targets Literature Landscape
Receptors
Ion channels
Enzymes
Nuclear
receptors
Has anyone in another
company developed
chemical compounds
DNA
for a protein similar to
my target?
Predominate Classes of Targets in Drug Discovery
KINASE
Single Word
NO textual variants
Synonyms: Function
Phosphotransferase
GPCR
Abbreviation
Several Textual Variants
 G protein coupled receptor
 G-protein-coupled receptor
Synonyms: Structure and Function
Seven-transmembrane receptor (7TM)
Heptahelical membrane G-protein receptor
Other Major Classes of Targets
Protease
Single Word
NO textual variants
Nuclear Receptor
Ion Channel
Two words
Some textual variants
Synonyms: Function
Peptidase/Proteinase
Intracellular membrane receptor
Ionotropic receptor
Challenges in Patent Retrieval
US7939554
US7939263
Ambiguous Titles
US7923041
Multiple Synonyms
US7915410
US7897607
Textual Variants
US7855279
US7906281
Way Forward
Identify Synonyms
From various Sources
Standardize & Integrate
Search & Analyze
the Data
Our solution – Build a unique synonym dictionary of bio-targets
and integrate it with a hierarchical classification to create Drug Target
Ontology for patent retrieval
Number of Publications
Few Genes are well Studied
TP53
TNFα
APOE
MTHFR
IL-6
EGFR
VEGFA
HLA-DRB1
TGFβ1
ACE1
25,403
Protein Coding Genes
60%
41%
23%
Protein Coding Genes
Lost and Found!
US7683055
US7723339
US7795212
US7834152
US7727235
In claims
ACE1
MTHFR
TGF-beta
IL-6
1
EGFR
Assignee: Zensun (Shanghai) Science
& Technology Limited (Shanghai, CN)
Assignee: Oklahoma Medical Research Foundation
US7671078
Assignee: Ethicon, Inc
(Basel,
CH)
Assignee: Beth Israel Assignee:
DeaconessNovartis
MedicalAG
Center,
Inc.
(Boston, MA, US)
US7741290
US7659293
Assignee: The Board of Trustee of The Leland
Stanford Junior University (Stanford, CA, US)
Comparison of Drug Target Articles and Patents
Standard name Vs Synonym dictionary
1000
910
870
900
Article hits
using standard
name
No of Hits
800
700
Article hits
using synonym
dictionary
600
474
500
Patent hits
using standard
name
362
400
300
234
223
145
200
128
107
76
100
2
22
23
34
21
3
69
40
5
2
0
ADORA1
GSK3B
iGluR1
Drug Target Symbol
NR0B1
KCNA1
Patent hits
using synonym
dictionary
Screenshots of Drug Target Ontology
Empowering the Research Community with Technology Alerts
End Use
Benefits of Technology Alert
Scientists & Researchers
Comprehensive list of patents in their area of interest
Patent Analysts
Exhaustive and accurate patent search results
CTO/HR
Head hunting the most active inventor in the field
CFO
Reduction in investment on redundant research
Competitive Intelligence
Research approach of competitors
Most active competitor in the field of interest
Better Results, Enable Success
Kinase-Thematic Database
Repurposing Contextualized
Chemical Dimension
Therapeutic Dimension (Diseases)
On-target
Off-target
Chemical Plane
Original Use
Mechanistic Dimension (Targets)
Summary
No two patent searches are ever the same; many biological
entities have multiple synonyms
The information retrieval challenges in patents are discussed
in terms of ambiguous titles, multiple synonyms and textual
variants
Our solution - Building a search platform to retrieve patents
with unique synonym dictionary of bio-targets
While recall improved with synonym dictionary, precision
improved with relevant IPC code Filters
Application of such comprehensive/complete retrieval of
patents would enable us to provide Technology Alert Services
to empower the industry with better analytical capabilities
for greater business impact
Thank you!