Artificial Intelligence
Computational modelling of Grammar
Acquisition
Rishabh Nigam
Shubhdeep Kochhar
The Problem
●
Computational framework for Grammar Acuisition
●
Unsupervised Learning from a real corpus
●
Why the problem
●
Algorithm is capable of learning complex syntax, generating
grammatical novel sentences, and proving useful in other
fields that call for structure discovery from raw data, such as
bioinformatics.
The Algorithm
●
ADIOS – Automatic Distillation Of Structure
●
What it does :
The Mex Criterion : It uses the M[i,j]=PR[i,j] or PL[i,j] , this 2d matrix is then searched for steep decrease in PR[i,j] and PL[i,j]
indicating a possibility of Equivalence classes in between them
Codes Used
●
MEX criterion
●
Training scripts
●
Generating Scripts
--> Edelman and Zach Solan made these codes available .
Work done so far
●
●
●
Converted the CHILDES database, HINDI
database(WORDNET) into format readable by the ADIOS
algorithm.
Ran the algorithm on the CHILDES database and HINDI
database.
Had a brief correspondence with Shushobhan Nayak and we
ran the algorithm on his database of small commentary.
Running on the CHILDES database
●
E6478
{we,you,youse}
●
P6479
(I,think) 0.0068258047
1
2
201
●
P6480
(E6481,you,are)
1
3
18
●
E6481
{there,here}
●
P6482
(who,E6466)
1
2
●
P6483
(P6439,P6434,Emily)0
●
P6484
(he,is,E6485)
●
E6485
{.,here}
●
P6486
(are,we,P6402)
●
P6487
(wait,to,E6488,E6489)
●
E6488
{we,you}
●
E6489
{hear,see}
0
0.0039371848
0.33333334
36
5.4000001 4
0.0058915019
1
3
28
0.0043362379
1
4
10
0.5
4
6.1452389e-05
For eg E6481 you are --> There you are and here you are --< sentences in the corpus used
15
Running it on Hindi Database
●
ID
seq
p-value
gen
len
●
P3487
(भी,प्रचलित) 0.0042799711
1
2
5
●
P3488
(के ,E3489,भागों)
1
3
11
●
E3489
{लिलभन्न,मुिायम}
●
P3490
(E3491,की,भाषा,में)
1.9848347e-05
1
4
6
●
E3491
{लिज्ञान,बोि-चाि}
●
P3492
(में,E3493,िेप,करने,से)
0.0037000179
1
5
4
●
E3493
{लमिा,घोिकर,पीसकर}
●
P3494
(लिष,नष्ट,होता)
0.001850009
1
3
4
●
P3495
(समान,भाग) 0.0059099197
2
26
0
occ
1
Running on the Commentary
●
P447 (the,E448,square)
0
3
65
●
E448 {large,big}
●
P449 (big,square)
7.212162e-05
1
2
●
P450 (the,little,E451)
0
1
3
49
●
E451 {circle,square}
●
P452 (the,big,box)
0
1
3
34
●
E455 {opens,closes,enters}
●
P456 (the,E457)0.0055941939
1
2
91
●
E457 {bottom,corner,door,entrance}
●
P458 (P449,E459,the)
1
4
4
●
E459 {leaves,closes,enters}
●
E461 {--,inside,and,leaves,left,opens,closes,enters}
0
1
38
Precision And Recall
Precision - the proportion of Clearner sentences
accepted by the Teacher
Recall - the proportion of Ctarget sentences accepted
by the Learner
Values found around 0.6 precision and 0.5 recall
References
[1] Heider. Waterfall ,Ben Sandbank,Luca Onnis and Shimon
Edelman , An empirical generativeframework for
computational modeling of language acquisition* :
Cambridge University Press 2010
[2] Zach Solan PHD thesis under Professor David Horn
,Professor Shimon Edelman, and Professor Eytan Ruppin ,
AVIV university
Thank You
© Copyright 2026 Paperzz