Gene/protein name recognition using Support Vector

 !"#
"
"
!$%"% % &'
()"))*'
("))*'
"+ , +
-"-"-
"
"
! "%-
&'
())*'
"
! " #"# $ Æ "
" % ! #&'() )*+, ( - ./01 - 2034 - 2.-5 "#
!"#$ %
! & & &
' Æ$ ! $ !$ #
& "# '
& ( ) * +
& ! $ & ( ) ( )
& & ,& '& Æ * -./ , &
Feature extraction
training
data
word, pos,
orthographic, prefix,
suffix , dictionary,
preceding class
SVM learning
gene/protein name
Tagging on
gene/protein names
dictionary of
SWISS-PROT and TrEMBL
test data
word, pos,
orthographic, prefix,
suffix , dictionary
Feature extraction
SVM classification
evaluating preceding class
0 "# & +
1 -/ ,' 2& 2 2 &
' () ()& () () Æ 2& 2 2 & Æ () ()& () ()
2,1 -/ 3#*4-/ 56&57 7&66 2
,1 86&756 &.. 3#*4 &
!$
2,1 ! %
9,
:
!%:,$$ !$ 2,1 3#*4 ! %,:$ & ; !<$; -7/& 2& 2 2 & 2 ()& ()& (
)& ()& () () 2 %,:& 2 ; !<$; 2 %,:& 2 ;
!=$; *2 & ( )& ( )& (
) 2 & ) )& ( )& 2& 2 & 2 9
3 %,: ¾ ÖÙÒ 1 %,: 2 & (=2+ *) (= + *)& (=+ *)& (=+*) ¿ ÖÙÒ 3 %,: ½ ÖÙÒ
2 ,
* & 2 2 & > ! $&
2 2 ,
0 "!
%&
!
(
+!
/
(
)!
/!
/
3
2&
-"
example
sentence Nerve
1 gram
2 gram
3 gram
{
{
Y
$
)
,%"
)&
&2
&
4
7
!
"
!
"!
*
-*
!
*
!
"
!
-
#
'
.
0
1
56
growth
factor
(
NGF
Y
Y
N
Y
Y
) ……
N
Y
N
N
N
Y
N
N
N
0 3 ( )
2& 2 2 2 < !=$ ! $ 4<
4
4
@
6
6
6<
8-,
4&
2
*-"
==
)
*
*
)
A2*
,2
-,+3/
/
%&
"!
/
/
/
*,9):
>
4
2>>
";):
>
2>>
?
?
)!
?
?
+(%
2
)
-
0 ?
( ! )
"# @
& <@ -/ "#2
+ <@ "# +
! $
& ! $
& & 2 A
2& 2& A A !
>$ "" & +
" & "# (2 )
0 B C
Æ
B A
!1 " 1 < B B 1 < B B 2 41 4
B "8)""4*,-+ +9$2/
0 <@
*
&
B &
B 4
A
B
414 1@16 16
4
. ! $ ! $ (
2) 2,1 B
C & !
$
! .$ ! $ 7 C
! !$ +
& !$ 2 A !$ 2 2 $ ? & 4 &4&
B
.0 1 @ @ ,& , = 2 & 2 2
"8)""4*,-+
"8)""4*,-+ < "8)""4*,-+ +9$2/
B
*
@DE
@D<@
@D
@D ,
@FE G
@FE<<
@FE@D
@F@F
2 B4
@FD@H
@FD
@FFH
@F G
+*
EE EE
EE@F
E@H
*
H<H
H
H
HF<
<F
F
E
FE@
-/ + D @ = >> *2# 3
3
" #
# $%&'( ) ! &
# *& 7826.
-/ *+
*& * ?& ? 1& * #@& 3 ?& % 3& #
#E& #
D& ;:
@& ,
& , # >> 2,1 +
2
3#*4 >> ! $ +
& " & = 0 6728>
-/ D # < >> @
+
" #
$*
%
$
%* ! &$$%&& 5255
-./ * 3 55. ? * , # ! % $,
! !!& ???,0 8288
-7/ < D& D & D
? # < >> ,
= *
?
# $%&'( ) ! &
# *&6728