surjan - Buffalo Ontology Site

“Penuria nominum” – shortage of words
Knowledge beyond the capacity of language?
by György Surján
ESKI
Hungary
Commentary to Judith Blake
Beyond Data Integration: Data Management for
Knowledge Discovery
Ontology and Biomedical Informatics
Rome 29 April – 2 May 2005
1
Overview:
(A commentary of an outsider)
1.
2.
3.
4.
Modern science is analytical
Problem of identity
Capacity of language is limited
Genomics and proteomics deals with extremely
large databases.
5. Ontologies are bound to reality by language tags
2
1. Modern science is analytical
3
Mouse and human genes
agree in 90%
Mouse and human body is
built from rather similar
building blocks.
The difference is not in the
building elements, but in
the different way of
integration of the elements.
The difference is not only
phenotypic, but functional:
Humans are able to create
such a sculpture
demonstrating the beauty
of human body
4
By changing of 10 % of its genes would a mouse be able to
create sculptures like Michelangello?
Q1. Is analytical approach sufficient to explain
differences of living organisms?
5
2. The problem of identity
Importance of identity in ontology:
Entities having different identity criteria can not belong
to the same class.
6
We have the strong feeling of our self-identity all over of our
whole life, despite of all changes that happen to us
=
7
Identity is independent from similarity and recognition
=
8
Identity of genes or proteins
Entities may gain or loose parts without loosing their identity
Genes loosing some nucleotides are still identical?
Q2. What are the identity criteria for genes and
proteins?
9
Elementary particles have
no identity
Humans and developed animals
obviously have
Q3. At which level of organisation identity emerges?
(Do biological macromolecules have identity?)
10
3. Capacity of language is limited
Shepard in the 19th century ~3-400 words
Anatomy (intermediate language certificate) ~4000 terms
SNOMED 3.1, Encyclopaedia Britannica
WordNet
UMLS Metathesaurus
~120 000 terms
~150 000 strings
> 1 500 000 terms
 Our language capacity is huge, but nevertheless finite
11
Limiting factors:
1. Capacity of human brain
2. Number of terms shared by a community
12
Example of numbers
Different names for the first 13 numbers
(zero- twelve) in English, then we use
combinations
hundred
thousand
million
billion
….
?
102
103
106
109
We have linguistic solution to
express extremely large numbers in
price of precision loss
1080
94869313860999624578839454223454292345623754278394542323452456598564789345634987
9.486 1080
13
Up to now, mankind has not met any situation which could exhaust the
capacity of human language, not because the number of things to be
expressed were less than this capacity, but we always could find some
acceptable compromise.
We do not know where are the limitations of our language capacity, but the
feeling of this limitation was well known centuries ago
(penuria nominum):
In the 17th century Harsdörfer proposed a machine with 5 wheels
containing 256 syllables, prefixes and suffixes, beeing able to generate
about 97 million (mostly nonsense) German words in order to find the
real name of God and also to being able to use different names for all
particualrs in the world instead of referring them by names of their classes
(U. Eco: Between La Mancha and Babel)
14
Size of genomics databases:
GO ~18 000 terms
Human genome ~ 30 000 ? genes
GenBank ~42 000 000 sequences
15
Are we able to use 42 million names?
Q4. Is it possible to describe molecular biology using
human language?
Is there any other representation tool to be used for
that purpose?
16
5. Ontologies are bound to reality by language tags
formal languages are used to describe structures
17
Language
What is the meaning ?
ID
language tag
ID language tag
ID
language tag
ID language tag
ID language tag
ID language tag
ID
language tag
ID
ID
language tag
language tag
ID language tag
Reality
18
If ontologies are bound to reality by language, than it is hard to create (use)
ontology where the problem field exceeds the capacity of language.
Q5. If language fails in genomics and proteomics, is
there a need and possibility for alternative methods
of ontology engineering, that does not requires
language?
19
Summary of questions
Q1. Is analytical approach sufficient to explain
differences of living organisms?
Q2. What are the identity criteria for genes?
Q3. At which level of organisation identity emerges?
(Do biological macromolecules have identity?)
Q4. Is it possible to describe molecular biology using
human language? Is there any other representation
tool to be used for that purpose?
Q5. If language fails in genomics and proteomics, is
there a need and possibility for alternative methods
of ontology engineering, that does not requires
language?
20