Motivation Machine Learning versus Knowledge Based Classification of Legal Texts Automatic support building formal models: Increase quality models and efficiency process Increase inter-coder reliability Structured text with explicit and typed refs NL text Emile de Maat Kai Krabben Radboud Winkels Semantic Network E-POWER ESTRELLA 2 Automatic recognition and classification seems doable Types not specific for Dutch law (cf. Tiscornia e.a. for Italian law) 5 1/12/2011 Example: Norms (1) In definitions, descriptions are given for terms used by the law Steam Act, article 1 In the stipulations made in or based upon this law, it is understood by: steam kettles: devices, in which water is heated by the inflow of warmth which is not derived from another device on which this law applies. 1/12/2011 4 Example: Definitions Model fragment suggestions 1. Definitions 2. Deeming Provision 3. Norm – Right/Permission 4. Norm – Obligation/Duty 5. Application Provision 6. Value Assignment 7. Change 8. Enactment Date 9. Citation Title 10.Penalisation Dutch Law: 1/12/2011 Recognizing and classifying Categories Provisions usually match one sentence Several types of sentences can be easily distinguished Max. 5 language constructs per type 3 Integrated model of meaning (ICAIL 2005; JURIX 2006) JURIX 2007 ICAIL 2008 From conclusions of JURIX 2007: Model of individual provisions Uses understood by Standardised by the Guidelines for Legal Drafting 1/12/2011 6 Normative sentences form the core of each regulation, stating obligations and rights Rights can be denoted by a wide range of verbs: can, may, is allowed to, has a right to, … Similarly, obligations can be denoted by the use of certain verbs: is prohibited, is charged with Many variations 1/12/2011 1 Example: Norms (2) Previous Experiment: Pattern-based Classifier (1) However, obligations are often represented as a “statement of fact” JAVA Classifier based on patterns Based on 81 patterns, mostly consisting of one to three words. Longer patterns take precedence over shorter patterns Obligation as a default category: if it isn’t something else, it’s a statement of fact Funeral Law, article 46, section 1 No bodies are interred on a closed cemetery. May be about any subject No common signal words or patterns Preferred by the Guidelines for Legal Drafting 7 1/12/2011 Previous Experiment: Pattern-based Classifier (2) 8 1/12/2011 New Approach Tested on 592 sentences 91% of all sentences was identified correctly Main problems: Would a machine-learning approach work better? Cf. Gonçalves & Quaresma 2005; Francesconi & Passerini 2007; Opsomer e.a 2009 Missing patterns Patterns appearing in auxiliary sentences Support Vector Machines Using the test set used for the pattern based approach (584 sentences) Smaller set, as small categories are left out Leaving-one-out No fair comparison possible 9 1/12/2011 Data representation 10 12/01/2011 Data representation – Finding the best settings Settings Sentences are presented as a bag of words Weight: Binary, Term frequency or inverse document frequency Stop list Stemming Grouping of numbers Conversion to lowercase baseline LOO Accuracy (%) 93.32 baseline + TF 92.29 baseline + TFIDF 93.32 baseline + stop list 94.01 baseline + grouping 92.81 baseline + stemming 92.47 baseline + min. term frequency 2 93.15 baseline + min. term frequency 3 92.47 baseline + lowercase 93.15 baseline + stop list + min. term frequency 2 94.69 11 12/01/2011 12 12/01/2011 2 Results – Leaving-one-out Results – Leaving-one-out In corpus Definition Permission Obligation Delegation Publication provision Application provision Enactment date Citation title Change – Scope Change – Insertion Change – Replacement Change – Repeal Change – Renumbering Total 14 59 181 19 6 41 18 4 55 44 111 23 9 584 13 Missed False Recall 2% 10% 31% 3% 1% 7% 3% 1% 9% 7% 19% 4% 2% 6 5 9 2 1 4 1 1 0 0 0 0 2 31 Precision 4 57.14% 7 91.53% 17 95.01% 0 89.47% 0 83.33% 2 90.24% 1 94.44% 0 75.00% 0 100% 0 100% 0 100% 0 100% 0 77.78% 31 94.69% 66.67% 88.52% 91.01% 100% 100% 94.87% 94.44% 100% 100% 100% 100% 100% 100% 94.69% 12/01/2011 Definition Permission 8 Obligation Delegation Publication provision 3 Application provision 1 Enactment date Citation title Change – Scope Change – Insertion 17 Missed False Recall 2% 10% 31% 3% 1% 7% 3% 1% 9% 7% 19% 4% 2% 6 5 9 2 1 4 1 1 0 0 0 0 2 31 Precision 4 57.14% 7 91.53% 17 95.01% 0 89.47% 0 83.33% 2 90.24% 1 94.44% 0 75.00% 0 100% 0 100% 0 100% 0 100% 0 77.78% 31 94.69% 66.67% 88.52% 91.01% 100% 100% 94.87% 94.44% 100% 100% 100% 100% 100% 100% 94.69% 12/01/2011 1 172 1 17 1 3 1 1 5 37 8 Obligation Delegation Publication provision 3 Application provision 1 Enactment date Citation title Change – Scope Change – Insertion 17 1 1 Definition Permission 3 55 44 23 4 1 4 Change – Repeal Change – Renumbering 1 172 1 17 1 1 1 5 37 3 17 1 1 3 55 44 111 23 Change – Repeal Change – Renumbering 7 Change – Insertion Change – Replacement Change – Scope Citation title Publication provision Application provision Enactment date Delegation Obligation 6 54 Change – Replacement 111 2 Permission Definition Change – Repeal Change – Renumbering Change – Insertion Change – Replacement Change – Scope Citation title Publication provision Application provision Enactment date Delegation Obligation Permission 4 1 4 Results 14 6 54 Change – Replacement Change – Repeal Change – Renumbering 14 59 181 19 6 41 18 4 55 44 111 23 9 584 Confusion matrix Definition Confusion matrix In corpus Definition Permission Obligation Delegation Publication provision Application provision Enactment date Citation title Change – Scope Change – Insertion Change – Replacement Change – Repeal Change – Renumbering Total 7 2 Results – Leaving an entire law out (1) Size Accuracy of 94.69% (Pattern-based approach 91%) Problems with definitions and obligations Does this result generalise to new texts? 12/01/2011 18 Royal Decree Stb.1945, F 214 Bill 20 585 nr. 2 Bill 22 139 nr. 2 Bill 27 570 nr. 4 Bill 27 611 nr. 2 Bill 30 411 nr. 2 Bill 30 435 nr. 2 Bill 30 583 nr. A Bill 31 531 nr. 2 Bill 31 537 nr. 2 Bill 31 540 nr. 2 Bill 31 541 nr. 2 Bill 31 713 nr. 2 Bill 31 722 nr. 2 Bill 31 726 nr. 2 Bill 31 832 nr. 2 Bill 31 833 nr. 2 Bill 31 835 nr. 2 28 31 22 21 11 141 40 26 3 23 7 8 7 32 78 7 4 99 Original Train/Test 6 4 1 2 0 7 3 0 1 0 1 0 0 1 3 1 1 0 8 7 2 5 0 28 3 1 1 0 1 0 0 9 6 1 1 7 One Law 11 13 10 7 5 22 10 5 3 4 5 3 7 7 7 2 4 7 12/01/2011 3 Results – Leaving an entire law out (1) Size 19 Royal Decree Stb.1945, F 214 Bill 20 585 nr. 2 Bill 22 139 nr. 2 Bill 27 570 nr. 4 Bill 27 611 nr. 2 Bill 30 411 nr. 2 Bill 30 435 nr. 2 Bill 30 583 nr. A Bill 31 531 nr. 2 Bill 31 537 nr. 2 Bill 31 540 nr. 2 Bill 31 541 nr. 2 Bill 31 713 nr. 2 Bill 31 722 nr. 2 Bill 31 726 nr. 2 Bill 31 832 nr. 2 Bill 31 833 nr. 2 Bill 31 835 nr. 2 Results – Leaving an entire law out (2) Original Train/Test 28 31 22 21 11 141 40 26 3 23 7 8 7 32 78 7 4 99 6 4 1 2 0 7 3 0 1 0 1 0 0 1 3 1 1 0 One Law 8 7 2 5 0 28 3 1 1 0 1 0 0 9 6 1 1 7 11 13 10 7 5 22 10 5 3 4 5 3 7 7 7 2 4 7 Results – Two new laws lists 21 ML approach Not an issue for the ML approach: Slight variations KB approach 71 Nr. misclassified 3 18 3 83.33% 1 94.44% 205 23 88.78% 9 95.61% 9 0 100% 0 100% Test set Bill 32 398 nr. 2 sentences Accuracy 95.77% Nr. Accuracy misclassified 4 94.37% 12/01/2011 Issues (2) Common issues: Keywords/patterns appearing in subordinate sentences Missing patterns 22 12/01/2011 Conclusions ML issues: Keywords linked to different classes (may - may not) Keywords outside of standard phrase Wrong keywords Statement of fact Skewness 23 12/01/2011 Issues (1) # lists Some laws do seem to use patterns that are unique (within this set), and this does cause problems 20 12/01/2011 Bill 32 393 nr. 2 sentences Accuracy of 86.39% Smaller training set, so lower accuracy is expected 12/01/2011 Both methods are viable Both would benefit from a larger training set Both would benefit from separating auxiliary sentences 24 Machine-learning method is “black box” Pattern-based still preferred for modelling 12/01/2011 4
© Copyright 2026 Paperzz