Active Estimation of F-Measures (Online Appendix)
Christoph Sawade, Niels Landwehr, and Tobias Scheffer
University of Potsdam
Department of Computer Science
August-Bebel-Strasse 89, 14482 Potsdam, Germany
{sawade, landwehr, scheffer}@cs.uni-potsdam.de
A. Proof of Lemma 1
Pn
Let (x1 , y1 ), ..., (xn , yn ) be drawn according to q(x)p(y|x). Let Ĝ0n,q = i=1 vi `i wi and Wn =
Pn
p(xi )
i=1 vi wi with vi = q(xi ) , wi = w(xi , yi , fθ ) and `i = `(fθ (xi ), yi ). Furthermore, we define
i
h
E[ 1 Ĝ0 ]
G
µG = E n1 Ĝ0n,q and µW = E n1 Wn . Note that G = E 1n Wn = µµW
.
[ n n]
Starting with the definition of the bias (Equation 1), we derive
h
i h
i
Bias Ĝn,q = E Ĝn,q − G
j
X
∞
1
W
−
µ
1
n
W
j
0
n
Ĝn,q
(−1) − G
= E
j+1
n
µW
j=0
j
X
∞
1
1 0
j
n Wn − µW
Ĝ
= E
(−1)
n n,q j=2
µj+1
W
i h
i
h
Cov n1 Ĝ0n,q , n1 Wn − µW + E n1 Ĝ0n,q E n1 Wn − µW −
µ2W
i
h
j
X
0
∞
1
,
W
Ĝ
Cov
n
n,q
1 0
j
n Wn − µW
= E
−
Ĝn,q
(−1)
2
j+1
2
n
n µW
µW
j=2
j
X
∞
1
1 0
1
j
n Wn − µW
−
= E
Ĝn,q
(−1)
Cov
[v
`
w
,
v
w
]
1 1 1 1 1 2
j+1
n
nµW
µW
j=2
v "
#
#
"
u 2
2j
∞
X
1 0
1
1
1 u
t
E
Ĝn,q
E
Wn − µW
+ · const
≤
j+1
n
n
n
j=2 µW
r
∞
i 1
X
1
1 h
2j
≤
const · j E (v1 w1 − µW )
+ · const
j+1
n
n
j=2 µW
r h
i
2j
E
(v
w
−
µ
)
∞
1
1
W
X
1
=
const ·
+ · const.
√ j
n
µW (µW n)
j=2
In Equation 2, we have expressed the quotient Ĝn,q =
E[ n1 Ĝ0n,q ]
around the expectations
j = 0. In Equation 4, we exploit
0
1
n Ĝn,q
1
n Wn
and E[ n1 Wn ]. In Equation
that Cov [vi `i wi , vj `j ] = 0
1
(1)
(2)
(3)
(4)
(5)
(6)
(7)
by a multivariate Taylor expansion
3, G cancels out with the summand for
for i 6= j. Equation 5 follows from the
Cauchy-Schwarz inequality for expectations. To derive Equation 6, we consider the 2j-th derivative
of the moment-generating function of n1 Wn − µW . A straightforward calculation shows that
"
2j #
i
1
1 h
2j
E
Wn − µW
≤ const · j E (v1 w1 − µW )
n
n
2 where const denotes terms independent of n. The term E n1 Ĝ0n,q
from Equation 5 is also
subsumed into const as it is bounded independently of n. Finally, we note that for sufficiently large
n, the series on the right-hand side of Equation 7 converges and is of order n1 . Thus, there exists a
constant C such that for all n ∈ N it holds that
h
i C
Bias Ĝn,q ≤ .
n
B. Comprehensive Empirical Results in the Text Classification Domain
Table 1 lists the true one-vs-rest precision, F0.5 -measure, recall, and accuracy of the actively trained
model for the ten classes in the Reuters-21578 text classification domain.
Figure 1 shows the estimation error of activeF , passive, and activeerr over number of labeled data,
for precision, F0.5 and recall estimates and all ten classes in the text classification domain.
Table 1: Class ratios and model quality for active learned model for ten most frequent occurring
topics in Reuters corpus
topic
earn
acq
crude
trade
money-fx
interest
ship
sugar
coffee
gold
class ratio
Acc
Prec
Rec
F0.5
0.510
0.978
0.976
0.980
0.978
0.280
0.974
0.942
0.966
0.954
0.044
0.994
0.972
0.879
0.923
0.041
0.995
0.937
0.940
0.938
0.034
0.991
0.872
0.861
0.867
0.027
0.992
0.851
0.843
0.847
0.020
0.994
0.886
0.820
0.850
0.016
0.998
0.972
0.921
0.946
0.015
0.999
1.000
0.900
0.947
0.012
0.998
0.954
0.911
0.932
2
F0.5−measure (class fraction: 51.0%)
F0.5−measure (class fraction: 28.2%)
F0.5−measure (class fraction: 4.4%)
passive
activeF
activeerr
0.015
0.01
0.005
200
400
600
labeling costs n
0.025
0.02
0.015
0.01
800
0
F0.5−measure (class fraction: 4.1%)
800
0.04
0.02
0
200
400
600
labeling costs n
800
F0.5−measure (class fraction: 2.7%)
0.05
0.04
0.03
0.02
0.01
200
400
600
labeling costs n
800
passive
activeF
0.1
activeerr
0.08
0.06
0.04
0
F0.5−measure (class fraction: 1.9%)
200
400
600
labeling costs n
estimation error (absolute)
activeerr
estimation error (absolute)
estimation error (absolute)
passive
activeF
0.06
800
passive
activeF
0.1
activeerr
0.08
0.06
0.04
0.02
0
200
400
600
labeling costs n
800
F0.5−measure (class fraction: 1.5%)
F0.5−measure (class fraction: 1.6%)
0.08
0.09
passive
activeF
0.08
activeerr
0.07
0.06
0.05
0.04
0.03
200
400
600
labeling costs n
estimation error (absolute)
0.1
estimation error (absolute)
400
600
labeling costs n
activeerr
0.06
0.12
0.07
0
200
passive
activeF
0.08
F0.5−measure (class fraction: 3.4%)
0.08
0
activeerr
800
0.07
passive
activeF
0.06
activeerr
0.05
0.04
0.03
0.02
0
200
400
600
labeling costs n
800
estimation error (absolute)
0
passive
activeF
0.03
estimation error (absolute)
0.1
estimation error (absolute)
estimation error (absolute)
0.02
passive
activeF
0.06
activeerr
0.05
0.04
0.03
0.02
0
200
400
600
labeling costs n
800
F0.5−measure (class fraction: 1.2%)
estimation error (absolute)
0.1
passive
activeF
0.08
activeerr
0.06
0.04
0.02
0
200
400
600
labeling costs n
800
Figure 1: Text Classification: Estimation error over number of labeled data, for recall, F0.5 and
precision estimates. Classes (from left to right and top to bottom): “earn”, “acq”, “crude”, “trade”,
“money-fx”, “interest”, “ship”, “sugar”, “coffee” and “gold”. Error bars indicate the standard error.
3
C. Detailed Empirical Results in the Digit Recognition Domain
We consider a digit recognition domain in which training and testing distributions diverge because
the data originate from different sources. To realize this scenario, a digit recognition model is trained
on the USPS data set and evaluated on the MNIST data set. We use a version of MNIST prepared by
Sam Roweis. MNIST images were rescaled from 28 × 28 to 16 × 16 pixels to match the resolution
of USPS and the bounding box was recomputed. Images are represented by their 256 numeric pixel
values. There are 10 classes, 11, 000 training and 70, 000 test instances. We train a single multi-class
model using an RBF kernel. We evaluate one-versus-rest F -measures for each class, resulting in ten
different evaluation tasks.
Table 2 lists the true one-vs-rest precision, F0.5 -measure, recall, and accuracy of the trained model
on the ten different estimation problems corresponding to digits 0 to 9.
Figures 2 and 3 show the estimation error of activeF , passive, and activeerr over number of labeled
data, for precision, F0.5 and recall estimates and the ten different one-vs-rest estimation problems
corresponding to digits 0 to 9 in the digit recognition domain.
Table 2: Class ratios and model quality of digit recognition model on MNIST
digit
0
1
2
3
4
5
6
7
8
9
class ratio
Acc
Prec
Rec
F0.5
0.097
0.995
0.966
0.978
0.972
0.113
0.978
0.955
0.844
0.896
0.099
0.986
0.937
0.918
0.927
0.102
0.980
0.885
0.920
0.902
0.098
0.987
0.930
0.937
0.933
0.092
0.980
0.932
0.840
0.884
0.098
0.991
0.951
0.962
0.956
0.104
0.978
0.972
0.810
0.883
0.096
0.962
0.744
0.926
0.825
0.099
0.977
0.849
0.940
0.891
4
F0.5−measure (digit: 0)
activeerr
0.025
0.02
0.015
0.01
400
600
labeling costs n
activeerr
0.02
0.015
0.01
0.005
0
800
activeerr
0.04
0.02
200
400
600
labeling costs n
0.07
estimation error (absolute)
estimation error (absolute)
passive
activeF
0.06
0
800
0.04
0.03
0.02
0.06
activeerr
0.05
0.04
0.03
0.02
400
600
labeling costs n
0.06
estimation error (absolute)
estimation error (absolute)
0.07
200
800
0.03
0.02
0.01
0.06
activeerr
0.05
0.04
0.03
0.02
400
600
labeling costs n
0.07
estimation error (absolute)
estimation error (absolute)
0.07
200
800
0.04
0.03
0.02
0.05
0.04
0.03
0.02
200
400
600
labeling costs n
800
0.06
estimation error (absolute)
estimation error (absolute)
activeerr
0
400
600
labeling costs n
800
activeerr
0.03
0.02
0.01
200
400
600
labeling costs n
0.04
0.03
0.02
0.01
200
400
600
labeling costs n
800
0.07
passive
activeF
0.06
activeerr
0.05
0.04
0.03
0.02
0.01
200
400
600
labeling costs n
800
passive
activeF
0.08
activeerr
0.06
0.04
0.02
0
200
400
600
labeling costs n
800
Precision (digit: 4)
0.04
0
activeerr
0.08
passive
activeF
0.05
0.05
Precision (digit: 3)
activeerr
200
passive
activeF
0.1
0.05
0
800
0.06
0
F0.5−measure (digit: 4)
passive
activeF
0.06
800
passive
activeF
0.06
Recall (digit: 4)
0.07
400
600
labeling costs n
400
600
labeling costs n
Precision (digit: 2)
activeerr
200
200
0.08
0.04
0
0.01
0
F0.5−measure (digit: 3)
passive
activeF
0
800
passive
activeF
0.05
Recall (digit: 3)
0.08
400
600
labeling costs n
0.02
Precision (digit: 1)
activeerr
200
activeerr
0.03
0.07
0.05
0
passive
activeF
0.04
0
F0.5−measure (digit: 2)
passive
activeF
0
800
passive
activeF
0.06
Recall (digit: 2)
0.08
400
600
labeling costs n
F0.5−measure (digit: 1)
Recall (digit: 1)
0.1
0.08
200
estimation error (absolute)
200
0.025
estimation error (absolute)
0
passive
activeF
estimation error (absolute)
0.03
0.03
800
estimation error (absolute)
0.035
Precision (digit: 0)
0.05
0.035
estimation error (absolute)
estimation error (absolute)
passive
activeF
estimation error (absolute)
Recall (digit: 0)
0.04
0.07
passive
activeF
0.06
activeerr
0.05
0.04
0.03
0.02
0.01
0
200
400
600
labeling costs n
800
Figure 2: Digit Recognition: Estimation error over number of labeled data for recall, F0.5 and
precision estimates for digits 0 to 4 (from top to bottom). Error bars indicate the standard error.
5
F0.5−measure (digit: 5)
activeerr
0.08
0.06
0.04
0.02
800
activeerr
0.05
0.04
0.03
0.02
0
activeerr
0.04
0.03
0.02
0.01
0
200
400
600
labeling costs n
0.05
estimation error (absolute)
estimation error (absolute)
passive
activeF
0.05
800
0.02
0.01
0.06
0.04
0.02
800
activeerr
0.04
0.03
0.02
0.06
0.04
0.02
800
0.04
0.03
0.02
200
400
600
labeling costs n
0.01
200
800
activeerr
0.02
200
400
600
labeling costs n
0.06
passive
activeF
0.05
activeerr
0.04
0.03
0.02
0.01
200
400
600
labeling costs n
800
0.02
0.01
200
400
600
labeling costs n
800
passive
activeF
0.15
activeerr
0.1
0.05
0
200
400
600
labeling costs n
800
Precision (digit: 9)
activeerr
0.06
0.04
0.02
400
600
labeling costs n
activeerr
0.03
0.1
passive
activeF
200
800
passive
activeF
0.04
0
0.1
0.08
800
Precision (digit: 8)
0.04
0
400
600
labeling costs n
0.2
passive
activeF
0.06
0
estimation error (absolute)
estimation error (absolute)
activeerr
0.05
0
800
F0.5−measure (digit: 9)
passive
activeF
0.06
400
600
labeling costs n
0.08
Recall (digit: 9)
0.07
200
0.1
estimation error (absolute)
estimation error (absolute)
activeerr
400
600
labeling costs n
0.02
0
F0.5−measure (digit: 8)
passive
activeF
200
0.03
Precision (digit: 7)
0.05
Recall (digit: 8)
0
0.04
0
800
passive
activeF
0.06
0
0.1
0.08
activeerr
0.05
0.07
estimation error (absolute)
estimation error (absolute)
activeerr
0.08
400
600
labeling costs n
0.05
F0.5−measure (digit: 7)
passive
activeF
200
400
600
labeling costs n
passive
activeF
Precision (digit: 6)
activeerr
200
0.06
0.07
0.03
Recall (digit: 7)
0
800
passive
activeF
0.04
0
0.12
0.1
400
600
labeling costs n
F0.5−measure (digit: 6)
Recall (digit: 6)
0.06
200
estimation error (absolute)
400
600
labeling costs n
0.06
estimation error (absolute)
200
0.07
estimation error (absolute)
0
Precision (digit: 5)
0.07
passive
activeF
800
estimation error (absolute)
0.1
0.08
estimation error (absolute)
estimation error (absolute)
passive
activeF
estimation error (absolute)
Recall (digit: 5)
0.12
passive
activeF
0.08
activeerr
0.06
0.04
0.02
0
200
400
600
labeling costs n
800
Figure 3: Digit Recognition: Estimation error over number of labeled data for recall, F0.5 and
precision estimates for digits 5 to 9 (from top to bottom). Error bars indicate the standard error.
6
© Copyright 2026 Paperzz