Exploiting Relational Structure to Understand Publication Patterns in High-Energy Physics Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, David Jensen Knowledge Discovery Laboratory University of Massachusetts Amherst Knowledge Discovery Process Data cleaning Data extraction Data analysis Citation analysis Identifying research communities Data dependencies Understanding author influence Predicting journal publication Implemented using KDL’s PROXIMITY software Data cleaning and extraction • • Extracted abstracts Consolidated authors – Same name assumed – 13,185 authors to 9,200 – Co-authored with similar names – Authors of referenced papers with similar names – Authors with similar email domains and the same username Relational schema Data dependencies • Examples of high correlations: – Number of downloads in first 60 days and number of citations – Is paper published and number of citations (binned) • Examples of high autocorrelation: – – – – Journal name (through author) Topic cluster of paper (through author) Author’s total co-authors (through paper) Number of downloads in first 60 days (through journal) + – + – + Low autocorrelation – – + + + High autocorrelation Influential Authors Author Non-self Number Number citations in top 50 of papers Awards Edward Witten 13806 14 59 Fields, Macarthur, Dirac Juan M. Maldadena 7334 6 39 Macarthur, Packard, Sackler Cumrun Vafa 6578 3 55 Packard Nathan Seiberg 6528 3 45 Macarthur Andrew Strominger 5371 3 44 Michael R. Douglas 5089 5 24 Igor R. Klebanov 5063 4 51 Sackler 20% of physicists receive 80% of the citations Influential authors are more connected Will a paper be accepted by Physics Letters B? • Papers from 1995-2000 • 68% accuracy, 0.75 AUC Identifying Research Communities • Spectral clustering on citation graph and abstracts • Papers from 1995 to 2000 Example topic clusters Cluster 2: Black hole approach to string theory: Sumit R.Das (251), Physical Review D Absorption of Fixed scalars and the D-brane Approach to Black Holes Universal Low-Energy Dynamics for Rotating Black Holes Interactions involving D-branes Black Hole Greybody Factors and D-Brane Spectroscopy Cluster 10: Tachyon Condensation: Juan M. Maldacena (1924), Journal of High Energy Physics Field theory models for tachyon and gauge field string dynamics Super-Poincare Invariant Superstring Field Theory Level Four Approximation to the Tachyon Potential in Superstring Field Theory SO(32) Spinors of Type I and Other Solitons on Brane-Antibrane Pair KDD Cup 2003 Paper: kdl.cs.umass.edu/papers/kddcup2003.html Proximity: kdl.cs.umass.edu/proximity/ Email: [email protected]
© Copyright 2026 Paperzz