Sta$s$cal Hypothesis Tes$ng in Posi$ve Unlabelled Data Konstan$nos Sechidis∗, Borja Calvo∗∗, Gavin Brown∗ ∗School of Computer Science University of Manchester, UK ∗∗Department of Computer Science and Ar<ficial Intelligence University of Basque Country, Spain Scope of this work We use hypothesis tests a lot. G-‐test Chi-‐squared test strongly related Mutual Informa<on too… e.g. Bayesian network structure learning -‐ should there be an arc between node X and Y? Feature Selec<on -‐ should we select feature X? Decision trees -‐ should we split on feature X? Contribu$ons of this work Explores the dynamics of hypothesis tes$ng in “posi$ve-‐unlabelled” data. 1. Can we perform a valid hypothesis test in this situa$on? 2. Sample size determina$on: how many examples do I need? 3. “Supervision determina$on”: how many labelled examples do I need? “Posi$ve-‐Unlabelled” data Fully labelled Semi supervised Posi$ve Unlabelled X Y X Y X Y 2 1 2 1 2 1 3 1 3 1 3 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1 1 1 1 1 3 0 3 0 3 0 1 0 1 0 1 0 3 0 3 0 3 0 3 0 3 0 3 0 2 0 2 0 2 0 PU data applica$ons Many many applica2ons… Func$onal Genomics “Gene 23 is associated with the disease.” “The other ones… we don’t know.” Text Mining Background: Hypothesis Tests Step 1. Calculate G-‐sta$s$c… Step 2. Compare to cri$cal value, obtained from lookup table. X Y 2 1 3 1 1 1 2 1 1 1 3 0 1 0 3 0 3 0 2 0 if G > cri$cal value… else REJECT NULL HYPOTHESIS i.e. assume X and Y are related ACCEPT NULL HYPOTHESIS i.e. assume X and Y are INDEPENDENT References Hypothesis Tes$ng in PU data? 1. Agresti, A.: Categorical Data Analysis. Wiley Series in Probability and Statistics, Wiley-Interscience, 3rd edn. (2013) 2. Bacciu, D., Etchells, T., Lisboa, P., Whittaker, J.: Efficient identification of independence networks using mutual information. Computational Statistics 28(2), 621–646 (2013) 3. Blanchard, G., Lee, G., Scott, C.: Semi-Supervised Novelty Detection. Jour. of Mach. Learn. Res. 11 (Mar 2010) 4. Brown, G., Pocock, A., Zhao, M., Lujan, M.: Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. Jour. of Mach. Learn. Res. 13, 27–66 (2012) S 5. Calvo, B., Larrañaga, P., Lozano, J.: Learning Bayesian classifiers from positive and unlabeled examples. Patt. Rec. Letters 28, 2375–2384 (2007) Assume a ll n ega2ve? 1 6. Cohen, J.: Statistical Power Analysis for the Behavioral Sciences (2nd Edition). 1 S = “surrogate” Routledge Academic (1988) 1 7. Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Machine 201–221 (1994) of G(X;Y) ….. Then uLearning se test f15(2), or G(X;S) instead 0 8. Denis, F.: PAC learning from positive statistical queries. In: Proceedings of the 0 9th International Conference on Algorithmic Learning Theory (ALT). pp. 112– 0 126. Springer-Verlag, London, UK, UK (1998) 0 9. Denis, F., Laurent, A., Gilleron, R., Tommasi, M.: Text classification and cotraining from positive and unlabeled examples. In: International Conf. on Machine 0 Learning, Workshop: The Continuum from Labeled to Unlabeled Data (2003) 0 10. Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: 0 SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (2008) 11. Ellis, P.: The Essential Guide to E↵ect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results. Camb. Univ. Press (2010) 12. Gretton, A., Györfi, L.: Consistent nonparametric tests of independence. The Journal of Machine Learning Research 99, 1391–1423 (2010) 13. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.: Feature Extraction: Foundations and Applications. Springer-Verlag New York, Inc., Secaucus, NJ, USA (2006) Step 1. Calculate G-‐sta$s$c… X Y 2 1 3 1 1 1 2 1 1 1 3 0 1 0 3 0 3 0 2 0 Theorem 1 implies… Tes$ng with the surrogate G(X;S) will have an iden<cal FALSE POSITIVE rate to the original (unobservable) test. However…. Tes$ng for G(X;S) will have a higher FALSE NEGATIVE rate. X Y S 2 1 1 3 1 1 1 1 1 2 1 0 1 1 0 3 0 0 1 0 0 3 0 0 3 0 0 2 0 0 That is… If X;Y are truly related, …… we may miss that dependency by using the surrogate test G(X;S). Technical explana$on for this…. Contribu$ons of this work 1. Can we perform a valid hypothesis test in this situa$on? 2. Sample size determina<on: how many examples do I need? 3. “Supervision determina$on”: how many labelled examples do I need? Sample Size Determina$on G(X;Y) G(X;S) I want my sta$s$cal test to detect an “effect” as small as I(X;Y) = 0.053 and, have …..a FALSE POSITIVE rate of 0.01 …..a FALSE NEGATIVE rate of 0.01 X Y S 2 1 1 3 1 1 1 1 1 2 1 0 1 1 0 3 0 0 1 0 0 3 0 0 3 0 0 2 0 0 N = 227 How many examples (N) do I need? Standard sample size determina$on does not apply…. So, we derive a correc$on factor, kappa.... The non-‐centrality parameter of the test: G(X;S) test with sample size N/k …. ….obtains iden$cal FPR and FNR to the (unobservable) test G(X;Y) Probability of p(s=1) i.e. probability of being labelled. Prior probability of p(y=1) PU sample size determina$on… assuming we know p(y=1) N=1077 N=227 PU sample size determina$on… uncertainty over p(y=1) Place Beta distribu2on over p(y=1), use Monte Carlo simula2on to determine posterior over sample size N. Analy2cal solu2on seems to be intractable. Belief over p(y=1) Minimum N for specified test when p(s=1) = 0.05 Contribu$ons of this work 1. Can we perform a valid hypothesis test in this situa$on? 2. Sample size determina$on: how many examples do I need? 3. “Supervision determina<on”: how many labelled examples do I need? PU “supervision determina$on” Use similar methodology to before, except we fix N, and solve for p(s+) From the total N=1000, …you need only 57 labels. …the rest can be unlabelled. If uncertain p(y=1) …. Example for Prac$$oner Supervision determina$on Design an experiment to observe … … a medium effect of I(X;Y) = 0.045 nats … with False Posi$ve Rate = 0.01 … with False Nega$ve Rate = 0.05 …. having N = 3000 examples Posi$ve Unlabelled knowing prior to be p(y=1) = 0.20 3000 examples • 49 posi<ves • 2951 unlabelled Conclusions 1. We perform a valid hypothesis test in PU data by unlabelled as nega$ves. 2. Sample size determina$on: How many examples we need to collect. 3. “Supervision determina$on”: How many labelled examples we need to collect. Thank you for your aien$on! www.cs.man.ac.uk/~gbrown/posunlabelled/ Example for Prac$$oner Sample size determina$on Design an experiment to observe … … a medium effect of I(X;Y) = 0.045 nats … with False Posi$ve Rate = 0.01 … with False Nega$ve Rate = 0.20 Tradi$onal, fully supervised 130 labelled examples Posi$ve Unlabelled, with labelling 5% … and knowing prior to be p(y=1) = 0.20 617 examples • 31 posi<ves • 586 unlabelled
© Copyright 2026 Paperzz