Task 4 Winners

Exploiting Relational Structure
to Understand Publication Patterns in
High-Energy Physics
Amy McGovern, Lisa Friedland, Michael Hay,
Brian Gallagher, Andrew Fast,
Jennifer Neville, David Jensen
Knowledge Discovery Laboratory
University of Massachusetts Amherst
Knowledge Discovery Process
Data cleaning
Data extraction
Data analysis
Citation
analysis
Identifying
research communities
Data
dependencies
Understanding
author influence
Predicting
journal
publication
Implemented using KDL’s PROXIMITY software
Data cleaning and extraction
•
•
Extracted abstracts
Consolidated authors
– Same name assumed
– 13,185 authors to 9,200
– Co-authored with similar
names
– Authors of referenced
papers with similar
names
– Authors with similar
email domains and the
same username
Relational schema
Data dependencies
• Examples of high correlations:
– Number of downloads in first 60 days and number of citations
– Is paper published and number of citations (binned)
• Examples of high autocorrelation:
–
–
–
–
Journal name (through author)
Topic cluster of paper (through author)
Author’s total co-authors (through paper)
Number of downloads in first 60 days (through journal)
+ – + – +
Low autocorrelation
– – + + +
High autocorrelation
Influential Authors
Author
Non-self Number Number
citations in top 50 of papers
Awards
Edward Witten
13806
14
59
Fields, Macarthur,
Dirac
Juan M. Maldadena
7334
6
39
Macarthur,
Packard, Sackler
Cumrun Vafa
6578
3
55
Packard
Nathan Seiberg
6528
3
45
Macarthur
Andrew Strominger
5371
3
44
Michael R. Douglas
5089
5
24
Igor R. Klebanov
5063
4
51
Sackler
20% of physicists receive 80%
of the citations
Influential authors are more
connected
Will a paper be accepted by
Physics Letters B?
• Papers from 1995-2000
• 68% accuracy, 0.75 AUC
Identifying Research Communities
• Spectral clustering on citation graph and abstracts
• Papers from 1995 to 2000
Example topic clusters
Cluster 2: Black hole approach to string theory:
Sumit R.Das (251), Physical Review D
Absorption of Fixed scalars and the D-brane Approach to Black Holes
Universal Low-Energy Dynamics for Rotating Black Holes
Interactions involving D-branes
Black Hole Greybody Factors and D-Brane Spectroscopy
Cluster 10: Tachyon Condensation:
Juan M. Maldacena (1924), Journal of High Energy
Physics
Field theory models for tachyon and gauge field string dynamics
Super-Poincare Invariant Superstring Field Theory
Level Four Approximation to the Tachyon Potential in Superstring Field Theory
SO(32) Spinors of Type I and Other Solitons on Brane-Antibrane Pair
KDD Cup 2003 Paper:
kdl.cs.umass.edu/papers/kddcup2003.html
Proximity:
kdl.cs.umass.edu/proximity/
Email:
[email protected]