Automatic extraction of synonyms in a dictionary Vincent D. Blondel Pierre P. Senellart April 13th 2002 The dictionary graph Computation (n.) The act or process of computing; calculation; reckoning. Computation (n.) The result of computation; the amount computed. Computed (imp. & p. p.) of Compute Computing (p. pr. & vb. n.) of Compute Compute (v.t.) To determine calculation; to reckon; to count. Compute (n.) Computation. Computer (n.) One who computes. 1 Computed Computation Compute Computer Computing Rest of the graph 2 Looking for near-synonyms Definition. The neighborhood graph of a node i in a directed graph G is the subgraph consisting of i, all parents of i and all children of i. i is some word we want a synonym of. A will be the adjacency matrix of the neighborhood graph of i in the dictionary graph. n is the order of A. 3 Subgraph of the neighborhood graph of likely invidious likely adapted giving truthy probable verisimilar belief probably 4 Hubs and Authorities on the Web The Web as a graph: • vertices=web pages • edges=links Hub −→ Authority A mutually reinforcing relationship: good hubs are pages that point to good authorities and good authorities are pages pointed by good hubs. 5 Kleinberg’s algorithm x1i : iteratively computed hub weights x2i : iteratively computed authority weights 1 x10 = x20 = . . . 1 1 x x2 0 A AT 0 t = 0, 1, . . . = t+1 1 x x2 t The principal eigenvectors of AT A and AAT give respectively the authority weights and hub weights of the vertices of the graph. 6 An extension of Kleinberg’s algorithm Let M (m, m) and N (n, n) be the transition matrices of two oriented graphs. Let C = M ⊗ N + M T ⊗ N T where ⊗ is the Kronecker tensorial product. We assume that the greatest eigenvalue of C is strictly greater than the absolute value of all other eigenvalues. Then, the normalized principal eigenvector X of C gives the ”similarity” between a vertex of M and a vertex of N : Xi×n+j characterizes the similarity between vertex i of M and vertex j of N . 0 1 In particular, if M = , the result is that of Kleinberg’s algorithm. 0 0 7 Application to the search for synonyms 1 −→ 2 −→ 3 We are looking for vertices “like 2” in the neighborhood graph of i. 0 1 0 Let C = M ⊗ A + M T ⊗ AT where M = 0 0 1 . 0 0 0 The principal eigenvector of C gives the similarity between a node in G and a node in the graph 1 −→ 2 −→ 3. We just select the subvector corresponding to the vertex 2 in order to have synonymy weights. 8 The vectors method For each 1 ≤ j ≤ n, j 6= i, compute: k(Ai,· − Aj,·)k + k(A·,i − A·,j )T k (where k k is some vector norm, Ai,· and A·,i are respectively the ith line and the ith column of A). For instance, if we choose the Euclidean norm, we compute: n X k=1 (Ai,k − Aj,k )2 ! 21 + n X (Ak,i − Ak,j )2 ! 12 k=1 The lower this value is, the better j is a synonym of i. 9 ArcRank PageRank (Google): stationary distribution of weights over vertices corresponding to the principal eigenvector of the adjacency matrix. ArcRank: rs,t ps/|as| = pt |as| is the outdegree of s. pt is the pagerank of t. The best synonyms of i are the other extremity of the best-ranked arcs arriving to or leaving from i. 10 Extraction of the graph • Multiwords (e.g. All Saints’, Surinam toad) • Prefixes and suffixes (e.g un-, -ous) • Different meanings of a word • Derived forms (e.g. daisies, sought) • Accentuated characters (e.g. proven/al, cr/che) • Misspelled words 112, 169 vertices - 1, 398, 424 arcs. 11 Lexical units 13, 396 lexical units not defined in the dictionary: • Numbers (e.g. 14159265, 14th) • Mathematical and chemical symbols (e.g. x3, fe3o4) • Proper nouns (e.g. California, Aaron) • Misspelled words (e.g. aligator, abudance) • Undefined words (e.g. snakelike, unwound) • Abbreviations (e.g. adj, etc) 12 Too frequent words of a the or to in as and an by one with which 68187 47500 43760 41496 31957 23999 22529 16781 14027 12468 12216 10944 10446 13 Parts of speech 305 different categories transformed into combinations of: • noun • adjective • adverb • verb • other (articles, conjunctions, interjections. . . ) 14 Disappear 1 2 3 4 5 6 7 8 9 10 Mark Std dev. Vectors vanish wear die sail faint light port absorb appear cease 3.6 1.8 Kleinberg vanish pass die wear faint fade sail light dissipate cease 6.3 1.7 ArcRank epidemic disappearing port dissipate cease eat gradually instrumental darkness efface 1.2 1.2 Wordnet vanish go away end finish terminate cease 7.5 1.4 Microsoft Word vanish cease to exist fade away die out go evaporate wane expire withdraw pass away 8.6 1.3 Table 1: Near-synonyms for disappear 15 Parallelogram 1 2 3 4 5 6 7 8 9 10 Mark Std dev. Vectors square parallel rhomb prism figure equal quadrilateral opposite altitude parallelopiped 4.6 2.7 Kleinberg square rhomb parallel figure prism equal opposite angles quadrilateral rectangle 4.8 2.5 ArcRank quadrilateral gnomon right-lined rectangle consequently parallelopiped parallel cylinder popular prism 3.3 2.2 Wordnet quadrilateral quadrangle tetragon Microsoft Word diamond lozenge rhomb 6.3 2.5 5.3 2.6 Table 2: Near-synonyms for parallelogram 16 Sugar 1 2 2 4 5 6 7 8 9 10 Mark Std dev. Vectors juice starch cane milk molasses sucrose wax root crystalline confection 3.9 2.0 Kleinberg cane starch sucrose milk sweet dextrose molasses juice glucose lactose 6.3 2.4 ArcRank granulation shrub sucrose preserve honeyed property sorghum grocer acetate saccharine 4.3 2.3 Wordnet sweetening sweetener carbohydrate saccharide organic compound saccarify sweeten dulcify edulcorate dulcorate 6.2 2.9 Microsoft Word darling baby honey dear love dearest beloved precious pet babe 4.7 2.7 Table 3: Near-synonyms for sugar 17 Science 1 2 3 4 5 6 7 8 9 10 Mark Std dev. Vectors art branch nature law knowledge principle life natural electricity biology 3.6 2.0 Kleinberg art branch law study practice natural knowledge learning theory principle 4.4 2.5 ArcRank formulate arithmetic systematize scientific knowledge geometry philosophical learning expertness mathematics 3.2 2.9 Wordnet knowledge domain knowledge base discipline subject subject area subject field field field of study ability power 7.1 2.6 Microsoft Word discipline knowledge skill art 6.5 2.4 Table 4: Near-synonyms for science 18 Perspectives • Extension of the subgraph • Other dictionaries, other languages • Other applications of the extension of Kleinberg’s algorithm 19
© Copyright 2026 Paperzz