LEARNING VALID ADVERBADJECTIVE PAIRS CAROLINE SUEN CS224U WINTER 2013 THE CHALLENGE We can say: • “The glass is half full.” • or “Wow, Bob is really tall.” But can we say: • “Wow, Bob is half tall”. • or “The glass is really full.” ? Goal: develop a model that can learn whether an adverb and an adjective can be used together and make grammatical sense. PRIOR WORK Syrett and Lidz (2010) • Use linguistics to develop patterns Sentiment analysis • Benemara et. al (2007), Liu et. al (2009) Adjective-noun pairs • Hatzivassiloglou et. al (1993) EXTRACTING DATA half completely extremely nearly full 5 3 3 1 tall 0 0 4 0 smart 0 1 4 0 daylong 0 0 0 1 • New York Times dataset, ~18000 articles • Stanford POS tagger to find valid adverb-adjective pairs • 1019 adverbs, 4876 adjectives, 19337 pairs BUILDING A GRAPH half full completely tall extremely smart nearly daylong Relatively sparse bipartite graph PARTITIONING half full completely tall extremely smart nearly daylong BUILDING A GRAPH: TECHNICAL DETAILS • Used Stanford Network Analysis Platform • Experimented: • • Find dense bipartite subgraphs using the frequent itemset algorithm Build adverb graphs and adjective graphs and run community detection algorithms on these graphs • Based on common neighbors half full completely tall extremely smart nearly daylong Adjective graph full tall daylong half completely smart extremely nearly Adverb graph CLIQUE PERCOLATION From Wikipedia CLASSIFY: DOES AN EDGE BELONG? Use the communities that adverbs u and adjective v are in. If, by combining these communities, the edge density is sufficiently high, we claim that u and v can be paired up. Harder case: • An adverb is in communities C1 and C2. How likely is it to be connected to an adjective in communities D1, D2, and D3? • Thankfully, this is rare! • Larger and more densely connected communities are given higher weight EVALUATION: RECALL • Find “test data” (1100 edges) – remaining edges is “training data” • Find communities based on training data • Observe fraction of test data edges recovered EVALUATION: RECALL Not enough connections: 260 (21.7%) Not discovered by community detection algorithm: 129 (11.7%) Correctly discovered by community detection algorithm: 711 (64.6%) CHALLENGES + NEXT STEPS • Not enough pairings • • (recall for test data with enough connections: 84.6%) Clique percolation is slow • • priority was building evaluation framework first • next steps: experimenting with clustering Adjective edge connections are much more important than adverb connections • Current framework does not test precision • • MTurk for crowd-sourced, hand-labeled data Potential next step: • Check Syrett and Lidz’ linguistic results THE END THANKS FOR LISTENING! J
© Copyright 2026 Paperzz