Neural networks Natural language processing - convolutional network WORD TAGGING Topics: word tagging • In many NLP applications, it is useful to augment text data with syntactic and semantic information ‣ we would like to add syntactic/semantic labels to each word • This problem can be tackled using a conditional random field with neural network unary potentials ‣ we will describe the model developed by Ronan Collobert and Jason Weston in: A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Collobert and Weston, 2008 (see Natural Language Processing (Almost) from Scratch for the journal version) 2 SENTENCE NEURAL NETWORK C OLLOBERT, W ESTON , B OTTOU , K ARLEN , K AVUKCUOGLU AND K UKSA ‣ could use a CRF with neural network unary potentials, based on a window (context) of words - ‣ to model each label sequence not appropriate for semantic role labeling, because relevant context might be very far away The cat sat w11 w21 . . . on the mat 1 wN w1K w2K . . . K wN Lookup Table LTW 1 .. . LTW K d Convolution M1 × · Collobert and Weston suggest a convolutional network over the whole sentence - prediction at a given position can exploit information from any word in the sentence Padding • How Text Feature 1 .. . Feature K Padding Topics: sentence convolutional network Input Sentence n1 hu Max Over Time max(·) n1 hu Linear M2 × · n2 hu HardTanh Linear M3 × · n3 hu = #tags Figure 2: Sentence approach network. 3 SENTENCE NEURAL NETWORK Topics: sentence convolutional network • Each word can be represented by more C ,W ,B than one feature ‣ ‣ ESTON OTTOU , K ARLEN , K AVUKCUOGLU AND K UKSA feature of the word itself Input Sentence substring features suffix: ‘‘ eating ’’ ‘‘ eat ’’ ‘‘ ing ’’ gazetteer features - whether the word belong to a list of persons, etc. The cat sat w11 w21 . . . on the mat 1 wN w1K w2K . . . K wN Padding - prefix: ‘‘ eating ’’ Text Feature 1 .. . Feature K Padding - ‣ OLLOBERT Lookup Table LTW 1 known.. locations, . LTW K features are treated like word Convolution features, with their own lookup tables d • These M1 × · 4 SENTENCE NEURAL NETWORK Topics: sentence convolutional network • Feature must encode for which word C ,W ,B we are making a prediction ‣ • For OTTOU , K ARLEN , K AVUKCUOGLU AND K UKSA Input Sentence Text Feature 1 .. . Feature K The cat sat w11 w21 . . . on the mat 1 wN w1K w2K . . . K wN Lookup Table LTW 1 .. . LTW K SRL, must know the roles for which verb we are predicting ‣ Padding this feature also has its lookup table ESTON Padding ‣ done by adding the relative position i-posw, where posw is the position of the current word OLLOBERT d also add the relative position of that Convolution verb i-posv M1 × · 5 SENTENCE NEURAL NETWORK Input Sentence The cat sat w11 w21 . . . on the mat 1 wN Topics: sentence convolutional network • Lookup ‣ table: for each word concatenate the representations of its features • Convolution: ‣ ‣ LTW 1 .. . LTW K d Convolution M1 × · n1 hu this is a convolution in 1D pooling: obtain a fixed hidden layer with a max across positions K wN Lookup Table at every position, compute linear activations from a window of representations • Max ‣ w1K w2K . . . Padding Padding Text Feature 1 .. . Feature K Max Over Time max(·) n1 hu Linear 6 SENTENCE NEURAL NETWORK n1 hu Topics: sentence convolutional network Max Over Time • Regular ‣ ‣ neural network: the pooled representation serves as the input of a regular neural network they proposed using a ‘‘hard’’ version of the tanh activation function max(·) n1 hu Linear M2 × · n2 hu HardTanh Linear M3 × · n3 hu = #tags Figure 2: Sentence approach network. • The outputs are used as the unary potential of a chain CRF over the labels ‣ ‣ no connections a separate In the following, we will describe each layer we use in our networks shown in Figure 1 and Figure 2. We adopt few a matrix task A we (one denoteCRF [A]i, jper the task) coefficient at row i and column j between the notations. CRFs of Given the different in the matrix. We also denote ⟨A⟩di win the vector obtained by concatenating the dwin column vectors neuralaround network used for each task A ∈ Rd1 ×d2 : the ithiscolumn vector of matrix 7
© Copyright 2026 Paperzz