Multiplex lexical networks reveal patterns in early word acquisition in

Multiplex lexical networks reveal patterns in early word
acquisition in children
arXiv:1609.03207v1 [physics.soc-ph] 11 Sep 2016
Massimo Stella1,+ , Nicole M. Beckage2,3 , and Markus Brede1
1
2
Institute for Complex System Simulations, University of Southampton, UK
Department of Computer Science, Institute of Cognitive Science, University of Colorado Boulder, USA
3
Department of Electrical Engineering and Computer Science, University of Kansas, USA
+
Corresponding author: [email protected]
September 13, 2016
Abstract
Network models of language have provided a way of linking cognitive processes to the
structure and connectivity of language. However, one shortcoming of current approaches is
focusing on only one type of linguistic relationship at a time, missing the complex multirelational nature of language. In this work, we overcome this limitation by modelling the
mental lexicon of English-speaking toddlers as a multiplex lexical network, i.e. a multi-layered
network where N=529 words/nodes are connected according to four types of relationships: (i)
free associations, (ii) feature sharing, (iii) co-occurrence, and (iv) phonological similarity. We
provide analysis of the topology of the resulting multiplex and then proceed to evaluate single
layers as well as the full multiplex structure on their ability to predict empirically observed
age of acquisition data of English speaking toddlers.
We find that the emerging multiplex network topology is an important proxy of the
cognitive processes of acquisition, capable of capturing emergent lexicon structure. In fact,
we show that the multiplex topology is fundamentally more powerful than individual layers in
predicting the ordering with which words are acquired. Furthermore, multiplex analysis allows
for a quantification of distinct phases of lexical acquisition in early learners: while initially
all the multiplex layers contribute to word learning, after about month 23 free associations
take the lead in driving word acquisition.
Introduction
Language is a complex system: a non-trivial, multi-level mapping of meanings onto words
[1, 2]. In order to communicate, humans must learn how to use linguistic structures for expressing meaning and thoughts as words. The cognitive processes driving language learning
likely organise the components of language into a so-called mental lexicon (ML) [2], which can
be described by a web-like structure of interacting lexical items (e.g. word representations).
These linguistic interactions are multi-faceted or multiplex in nature, as they must account
for several types of relationships. Empirical studies in psycholinguistics have suggested that
rather than providing exact word definitions (as in common dictionaries), the ML contains
1
word meanings in terms of the above multi-relational word patterns [3–7]. How this internal,
multiplex organisation of language relates to and influences language learning is still poorly
understood. In this paper we quantify the relation between the structure of language and
cognitive processing through multiplex networks [8,9]. We achieve this by combining information about word phonology, semantics, and syntactic components in the layers of a multiplex
network. Going beyond the network’s topological description, we then exploit the multiplex
structure to predict lexical acquisition of young children.
These representational patterns heavily influence the way people learn and access information from the lexicon. For example, individuals respond differently when asked to ’name
a fruit that is red’ or to ’name a red fruit’ [5], likely because hearing fruit first activates representations of fruit as compared to first activating the colour red. Further, in phonological
confusability tasks, words with more phonological neighbours are more likely to be incorrectly
identified in noisy environments, suggesting that phonological features influence recall and
recognition [10]. Understanding the structure and interactions of the ML provides insights
into human cognition and language.
A framework for investigating the multi-relational, dynamic nature of the ML requires two
key aspects: (i) accounting for different types of linguistic relationships and (ii) considering
cognitive processes that shape and influence the structure of the ML. One approach that
is capable of addressing both of these requirements is to use models that are based on a
multiplex, or multi-layer, network representation [8,9,11]. Hence, we project the complexity of
an individual’s mental lexicon (ML) to a multiplex network structure, which we call Multiplex
Lexical Network (MLN). The MLN is a multi-layer network, where nodes represent words.
The layers of the MLN capture relationships previously shown to influence children’s lexical
learning in single-layer network studies, namely 1) free-associations [12,13], 2) shared features
[14,15], 3) co-occurrence in child directed speech [16,17] and 4) phonological similarity [10,18].
Nodes can additionally be characterised by external (word-specific) features such as frequency
or length.
Previous literature on modelling language use and learning through network science has
largely focused on single-layer representations of networks, for a review see [6, 7, 19]. These
works strongly suggests that cognitive constraints and mechanisms affecting the use of language can be uncovered and explored through network science.
For example, semantic relationships of all types have been shown to have small-world
structure, i.e. they are described by highly clustered networks that have short paths between
nodes [13,20–22]. In fact, it has been suggested that this small-world structure may be cognitively beneficial to language learning and use [6, 19, 23]. Beyond topological classifications of
language networks, experimental results shown strong correlations between network structure
and human performance in language related tasks. Measurements of retrieval times [4, 5, 24],
age of acquisition [13,15,17] and even semantic degradation due to ageing [23] can be studied
through analyses of single layer networks. Phonological similarities among words, when represented by a network, are characterised by a bound on the maximum sizes of phonological
neighbourhoods and a tendency to avoid local clustering [10, 25, 26]. Experimental work suggests that this reduces confusability and aids in quick retrieval of words from memory [10,27].
While single layer networks reveal aspects of the structure of language and languagerelated cognitive processes, there is still no unified understanding of language and cognition
that accounts for both phonological and semantic aspects of language. Multi-layered linguistic networks begin to fill this gap. Liu et al. [28] analysed Chinese as a multi-layered
network, composed of three syntactic networks of words and one network of co-occurring
phonemes. They found that almost half of the syntactic dependency relations in modern
Chinese sentences are between co-occurring words. A similar analysis was performed for
2
English and Croatian [29]. While these studies consider topological features of a multi-layer
linguistic networks, more research exploring the role of a multiplex representation of language
is necessary in order to highlight the interplay between structural multi-layer features and
the cognitive processes shaping them.
The main advantage of the multiplex approach is that it allows for a more detailed representation of the complexity of many real-world systems [30]. This added complexity provides
additional insights into a system’s structure and dynamics. To name some examples, in the
last few years multiplex modelling has provided novel insights about social balance in online platforms [31, 32], the emergence and stability of multiculturalism [33], congestion in
transportation networks [30], and ecosystem characterisation in ecology [34, 35], cf. [8, 9] for
a review on multi-layer and multiplex networks.
Here, we use multiplex lexical networks to understand language acquisition in children by
assessing the impact of several external and internal structural network features. We evaluate
performance using empirical parent reports of a child’s production, aggregated to capture the
order of early learned words for children between 16 and 30 months [36]. Our main aim is
to assess the use and possible necessity of a multiplex network representation in capturing
emerging features of children’s MLs. We demonstrate that the multiplex model is more accurate than any single layer representation in predicting this normative acquisition trajectory
of young children. We find that the multiplex network can account for developmental trends,
offering predictions and interpretability that analysis based on word features, or single layer
network cannot.
Results
Multiplex network construction and analysis
We first construct a multiplex lexical network composed of four layers capturing i) free associations [12], (ii) feature sharing [14], (iii) co-occurrence [16], and (iv) phonological relationships [10] (cf. Methods). Panels (a) and (b) of Fig. 1 provide a visual impression of the
connections and clusters in the four multiplex layers, either by separating out the layers (a),
or by treating the multiplex network as an edge-coloured graph (b). For easier visualisation
words have been grouped into topic areas and clustered into two communities and marked by
red (e.g. clothing, body, etc.) and yellow (games, toys, food, etc.) and obtained via Louvain’s
method [37]. The data representation in Fig. 1(a) also shows how words are connected within
word topics. In agreement with other authors [3, 4], we find that semantically related words
tend to be clustered in the association, feature, and co-occurrence layers.
Single layer analysis, with findings and comparisons to appropriate null models, are summarized in Tab. 1. The results in the table show that apart from the association layer, all
layers are disconnected, and have fairly small giant components. In contrast, the multiplex
network is fully connected [38], so that for every pair of nodes there is a multiplex path joining
them. This indicates the importance of inter-layer jumps for navigating the ML.
In any network representation detecting how the organisation of connections differs from
a random one can provide insightful information on the represented system. Here we briefly
report on the topological features of words in the MLN (e.g. being a hub, being in a densely
connected core) for relating them to word acquisition through ordering experiments in the
next Section. In terms of word clustering, previous studies highlighted small-worldness as
being a distinctive trait of language networks [10, 20, 21]. Analysis of clustering and shortest
path lengths (cf. Tab. 1) confirm this trend: each MLN layer is a small-world [39]. Further,
3
bathroom
Asso. – Words in Furniture
potty
Feat. – Words in Clothing coat
window
porch
dress
shower
bench bed crib
chair
sofa
table
couch
drawer door
bedroom bathtub sink
closet
refrigerator
room kitchen
boots
belt necklace
pants
mittens
oven
stove
People
Clothing
Body
pajamas
shirt
underpants
Descriptive Places
Quantifiers
Pronouns
Vehicle
Outside
diaper hat sneaker
zipper
Food
tissue
Sound
(a)
Toys
Household
Prepositions
scarf
garage
stairs basement
(b)
jeans
sweater
beads button shorts
dryer
jacket
gloves
Action
Animal
Time
Questions
Furniture
Games
(c)
K.Tau
-
1.0
0.8
0.6
0.4
Asso.
Co-occ.
Feat.
0.2
-
Phon.
0
-0.1
eat
!
carry touch cover
pour
read
swim
)
(d)
he
hurry
paint
stop wash
feed
fall talk drive
pick be write
blow
push
sit
hide
tear
throw
lick
hate shake drop
bump
spill
Mult.
0.49
her
0.40
mine
share
walk hold
sleep
fix do sing
knock help
get wait
kick
(
him
think cut
have
look
find
stay
play
say like
hear
you
wish
take bite
put
me
-
Co-occ.
ride
Words in Actions
0.30
she
Phon.
Words in Pronouns
0.20
-
0.10
0
kiss
tickle chase
Figure 1: (a) Visualisation of the multiplex network of the mental lexicon of young children (grey
box), highlighting some particular subgraphs in each layer (insets). (b) Visualisation of the MLN
as an edge coloured network. For simplicity, words are clustered into Communicative Development
Inventory categories (e.g. clothing, toys, etc.). (c) Visualisation of the degree correlations across
layers quantified by the Kendall Tau. (d) Visualisation of edge overlap among different layers,
compared against configuration models. The aggregate layer is reported for quantifying how much
each layer contributes to it.
apart from the phonological layer which is expected to be assortatively mixed [10, 25], all
other layers are characterised by disassortative mixing by degree. Combining findings of a
high disconnectedness ratio θ [40], strong clustering and typically high assortativity values
indicates the presence of a marked core-periphery structure in the MLN layers (as further
supported by a k-core analysis, see Supplementary Information). Words in the core display
different topological properties that might result in different word learning patterns compared
to words in the periphery, as we test in the next Section. Further analysis in Fig. 2(b) also
reveals that layers differ in their degree distributions which are right-skewed and exponentiallike for the association and phonological layers and significantly more heavy-tailed for the
co-occurrence layer. The degree distribution for the multi- or overlapping degree of the
MLN [11, 41] underlines the presence of hubs across the multiplex structure, i.e. words which
are significantly more connected than average and that are typically important for navigation
through concepts [20, 21, 38]. This is why we will test whether hubs are words learned at
4
earlier stages in the next Section.
One might wonder if a representation of the ML in terms of four separate layers is an
appropriate representation. To address this, we carried out a structural reducibility analysis [42] which tests if multiplex layers can be aggregated without losing information (see
Supplementary Information for details). Results reported in Fig. 2(b) show that aggregation
cannot be performed without information loss, emphasising that our multiplex representation
is irreducible [42]. The analysis also quantifies similarity between individual layers of the multiplex. We find the strongest similarity between the co-occurrence and phonological layers.
This result is not surprising as co-occurrence in child directed speech has strong phonological influences, particularly for young children [16]. The second strongest similarity is found
between the layers for free association and feature sharing norms, which are both related to
semantic features [12, 14, 43]. These results suggest that the network layers are irreducible
and capture relational structures of language that have been shown previously to influence
word learning.
The multiplex structure allows quantification of correlations across different aspects of the
ML. We are interested in understanding how the degree structure in one layer is similar to
the degree structure in another layer. Fig. 1(c) visualizes these results for the MLN. We find
that words in the feature and co-occurrence layers tend to have negative degree correlations
(Kendall Tau κ ≈ −0.13, p < 0.0001), so that hubs in one layer tend to have lower degrees in
the other layer. In contrast, the co-occurrence and phonological layers display positive degree
correlations (Kendall Tau κ ≈ 0.64, p < 0.0001): in the children’s lexicon words having many
co-occurrences display larger phonological neighbourhoods too, further supporting the idea
that phonological similarities and co-occurrences in children influence each other [16, 18].
Another question of interest is to what extend the presence of relationships in one layer
implies relationships in other layers. This correlation can be quantified by edge overlap (or
multiplexity [44]). Results are reported in Fig. 1(d). As one would expect, we find some
overlap between connections in the feature and association layers, as both capture semantic
aspects of the ML. Importantly, however, links tend not to overlap across layers. This suggests
that different layers of the multiplex tend to capture different language and relational aspects
of the ML.
Multiplex Orderings: Results and Discussion
Next, we explore the process of word learning through the MLN assuming a preferential
acquisition scenario [13, 15]. We generate a word acquisition ordering by ranking words
according to features derived from the vocabulary network structure. We compare these
orderings to those based on word specific information such as word frequency and word
length. Orderings are evaluated by measuring the relative gain in the number of words
correctly predicted. More specifically, we define word gain as the size of the overlap between
the set of predicted words and the actual words reported as learned by age t. To obtain
relative word gains we calculate the difference to random guessing and normalise by the total
number of words that have been learned at age t (see Methods). Hence, positive relative
word gain indicates results superior to, and negative values indicate performance inferior to,
random guessing.
Results in Fig. 4, supported by results in the next Section, suggest that the normative
acquisition trajectory is well characterized by three distinct learning phases. In a first stage,
comprising months 19 and 20 which we call the very early learning stage (VELS), each
ordering essentially tries to predict the first 40 learned words. This phase is followed by
an early learning stage (ELS) covering an age range between 20 and 23 months. Last we
5
Network
L
θ
CC(def.) a
GCSize D
hdi
Associations (Asso.)
Asso. Config. Model
Feature Norms (Feat.)
Feat. Config. Model
Co-occurrences (Co-occ.)
Co-occ. Config. Model
Phonological Similarities
(Phon.)
Phon. Config. Model
Multiplex Aggregate
Mult. Agg. Combined
CMs
2441
2441
2389
2389
2147
2147
349
0.02
0.02
0.76
0.76
0.497
0.497
0.68
0.2
0.03
0.63
0.38(2)
0.69
0.75
0.37
-0.1
-0.007(6)
-0.01
-0.07(6)
-0.44
-0.4
0.48
527
527
128
128
330
330
175
3.2
3.00(1)
1.8
1.72(1)
2.2
2.19(1)
7.7
349
6998
6998
0.68
0.004
0.004
0.02
0.33
0.18(5)
0.03
-0.07
-0.1
241(5) 12.5(3)5.31(7)
529
5
2.4
529
4.2(1) 2.25(1)
6
6.1(1)
3
3.1(1)
4
4.2(1)
20
Table 1: Metrics for the MLN with N = 529 nodes listing edge count L, average degree hki,
disconnectedness ratio θ, mean deformed clustering coefficient CC, assortativity coefficient a, size
of the largest connected component GCSize, diameter D and mean shortest path length hdi. For
reference, also expectations from configuration models as reference models are reported with errors
given by standard deviations (reported in parentheses for the last digits).
discriminate a late learning stage (LLS) comprising ages between 23 and 28 months. Table 3
reports average relative word gains over the entire study period and for each of the mentioned
stages of learning.
A first observation from Fig. 4 and Tab. 2 is that the use of frequency data to predict
word learning depends strongly on what corpus is used to compile frequency data. We find
that frequency data computed based on child directed speech (which was also used to compute
the co-occurrence networks [16]) allows for markedly better ordering than random baseline
predictions. In contrast, word frequency data gathered for adult speakers [45] give results
close to random chance and are thus not suitable for predicting word learning in young
children.
While previous work has found frequency to be a good predictor for word learning [46],
results in Fig. 4 suggest that some topological network information is more predictive of
acquisition ordering. For instance, degree-based orderings in the association layer yield the
highest word gains against random word guessing (see Supplementary Fig. S4). Notice
that a simple addition of layers does not improve performances of the degree-based orderings:
orderings based on the multi-degree network [11] perform worse than using just the association
layer. We conjecture this is due to the lack of strong positive degree correlations across the
layers (see Fig. 1(c)), i.e. nodes that are hub nodes in one layer often have low degrees in
other layers so that uniformly combining degrees washes out much of the MLN structure
that may be useful to young language learners. The high prediction power of word degree in
the association layer (see Fig. 1(c)) is interesting in the context of understanding acquisition
processes of young children. Our finding is in agreement with previous studies indicating how
association norms are generally valid predictive models for language learning [13, 15, 47]. We
conjecture associations perform particularly well in our case because they do contain both
strong relationships with semantic memory [12, 24, 43] and early learned words [15].
Previous work has pointed out that it is very difficult to outperform random guessing in
the ELS and particularly in the VELS because children may be trying out different ways of
learning [47]. We find that better than random prediction is possible, at least for normative
6
The Multiplex Lexicon Network
Asso.!
!
!
!
Feat.!
!
���������� ������������
���
▼ ◆
○
▼ ○
■ ○
■
▼
●
▼ ○
■
■ ○
○
▼ ◆
○
■ ○
●
●
▼
●
■ ■ ■ ■ ■
▼ ○
●
●
▼
◆
▼
▼ ○
▲▲▲▲▲▲◆ ◆ ◆ ● ● ● ○
▼ ○
▼ ○
▼
◆ ◆ ◆
▲▲▲
●
◆
▲▲▲
◆
● ◆ ◆
▲
●
▲▲
●
▲
��-�
●
��-� ■
��-�
◆
▲
▼
○
Florida Asso.
McRae Feature Norms
▲▲
▲
Co-occurrence
!
■ ■
■
○
▼ ○
▼ ○
▼
○
■
▼
◆
◆
◆
○
◆ ▼
■ ○
◆
●
◆
▼ ◆
○
●
▼ ◆
○
■
▼ ◆ ◆
▼
●
▼ ○
○
Phonological Sim.
!
Co-occ.!
!
!
Phon.!
!
1.19 . 10-2"
●
1.02 . 10-2"
Aggregated
Relative Entropy
of different
Aggregates "
Multidegree
���
���
���
0.72 . 10-2"
0"
���� ������
Figure 3: Relative entropy for different aggreFigure 2: Cumulative degree distributions gation stages of the MLN. The maximum relaP (K ≥ k) for the individual layers and for the tive entropy determines which layer configuraaggregate and multi-degree.
tion has the least redundant information. The
maximum relative entropy is highlighted in red
and it indicates irreducibility of the MLN, cf.
[42].
acquisition, even in very early learning during the VELS, in which orderings based on the
association layer, and multiplex measures give better than random results (cf. Tab. 2).
The ELS phase is characterised by the dominating performance of a global multiplex
measure, multiplex closeness. In fact, all orders based on individual layer features fail to
replicate the peak in word gain observed for multiplex closeness (cf. Fig. 4). Multiplex
measures such as the versatile PageRank [48] or multidegree also give considerably better
performance than random guessing. Good performance can also be achieved using an ordering
based on word length, possibly because of a least effort effect [1] that suggests that short
words are easier to memorise and learn. This hypothesis is also supported by the observation
that word length loses its predictive power at later learning stages. From these collective
results, we can conclude that early learning is strongly influenced by the multiplex structure,
specifically the closeness of words via all layers that form the MLN structure.
Results for the LLS show another large change of trends. Here we find that the association
degree gives best predictability. Global and local multiplex measures, as well as word specific
features such as word frequency, perform slightly worse that association degree. This suggests
that relational learning is very important, but this type of relational learning is best captured
by the single association layer. This is compatible with the observed period of semantic
learning in children [18]. This may also be due to the design of vocabulary norms collection.
During later stages of learning fewer and fewer words are monitored. This results in difficulty
teasing apart performance differences. During the LLS relative word gains for each ordering
approach word gains predicted by the baseline of random guessing, making performance in
the LLS hard to evaluate.
Averaging over the entire time frame it becomes clear that closeness measures on the
multiplex allow for best predictions. This provides evidence for the hypothesis that predictors
based on the multiplex can outperform single layer network measures. So far, we have tested
7
����
�������� ���� ����
����
����
����
����
����
��
��
�������� ����� (��� �� ������)
��
��
��
◆◆
●
��
����� ����
◆
■ ���� (�����)
◆
▼
▼
▼
◆
●
●
■ ■
■
◆ ����� (���������)
■◆
▼◆
◆ ●▼
▼◆
■ ■
▼◆
●
▼
■
◆
▲ ����� (������)
▲ ▼ ▼ ■
■ ●
■
▲ ▲◆
▼ ■
■●■ ■ ▲ ▲●
●
▲ ▲
▲ ▲
▼
▼ ������ (���������)
▼ ■
◆
▲ ▲
■
▲
◆
■
▼◆
●▼
●
▲
▼
◆
▲ ▲
▲
■
◆
●
▼ ▲
◆
■ ▼ ▲
▼
■
◆
▼ ▲
●
■ ▲
▼
◆
●
▼
■ ▲
●●
◆
▼
■ ▲
◆
▼
■ ▲
▼
●●●●●●●◆
▲
▼
■◆
●
▲
●◆
■
●
●
▲
�
���
���
���
���
���
�������� ����� (# ������� �����)
◆
Figure 4: Relative word gains for several acquisition orderings, at different learning stages. Error bars are approximately the same size as the
symbols in the plot.
Table 2: Average relative word gains for different orderings, evaluated when: 40 words have been learned
in VELS, 160 words have been learned in ELS, when
369 have been learned in LLS and when all the 529
MLN words have been learned (Total). The empirical
highest word gain of each stage is highlighted in red.
Results from the optimisation procedure are reported
at the end of the table. Error bars for the empirical
orderings were estimated as being ≈ 10−3 and are not
reported for brevity.
Ordering
Short Len.
Deg. (Asso.)
Clos. (Multiplex)
Freq. (Child.)
Multidegree
PageR. (Multiplex)
Opt. Deg. (VELS)
Opt. Deg. (ELS)
Opt. Deg. (LLS)
Opt. Clos. (VELS)
Opt. Clos. (ELS)
Opt. Clos. (LLS)
VELS
0.044
0.077
0.070
-0.007
0.070
0.063
0.17(2)
0.14(2)
0.13(2)
0.19(3)
0.16(2)
0.14(2)
ELS
0.143
0.129
0.191
0.091
0.114
0.142
0.13(2)
0.17(2)
0.17(2)
0.19(2)
0.25(2)
0.24(2)
LLS
0.026
0.088
0.083
0.077
0.072
0.081
0.08(1)
0.09(1)
0.09(1)
0.07(1)
0.09(1)
0.10(1)
Total
0.057
0.095
0.107
0.072
0.080
0.093
0.09(1)
0.11(1)
0.11(1)
0.12(2)
0.13(1)
0.14(1)
ranking measures on single network layers, and for the combined multiplex and shown that
prediction performance can benefit from taking account of information from multiple layers.
However, we assumed that each layer plays an equal role. Our next aim is to test for a
changing influence of layers as the early lexicon evolves to better understand developmental
mechanisms of acquisition
Optimisation Procedure
To explore the influence of different multiplex layers on word acquisition we consider linear convex combinations of single-layer network metrics (cf. Methods section). For a given
network metric, we then optimise relative word gains to obtain a set of coefficients that indicate the influence of all multiplex layers on word acquisition. In order to avoid over-fitting,
we perform a Monte Carlo robustness analysis: we optimise over subsets of word trajectories
sampled uniformly at random and including only 80% of the original whole word dataset. Reported results come with error bars calculated by averaging over randomly sampled word sets.
We also confirmed that the improvement in performance of optimised multiplex parameters
is strongly dependent on the structure and overlap of layers, as a randomised multiplex network is not able to replicate quantitatively similar prediction accuracies (see Supplementary
Information).
While we explored many different local and global network metrics (see Supplementary
Information) we here focus on presenting the results for the best performing measures of
degree and closeness. As reported in Tab. 3 optimisation of layer contributions clearly
improves prediction performance at all learnings stages over the word orderings from the
previous section.
All optimisation results gave negligible weights on the order of (≈ 10−3 ) for the phonological layer, which also performed poorly in the ordering experiments above. For this reason
we exclude the phonological layer from the presentation of results in Fig. 5. Ternary plots
8
0.16
0.16
0.12
0.12
0.08
0.08
0.04
0.04
0.03
0
0
0
0
0
0.4
●●●
■■
■
0.2
◆
◆◆
■
◆
19
●
Associations
■
Feature Norms
◆
Co-occurrences
■■
■■■■■■■■■■■■■■■■■■■■
◆◆
◆◆◆◆◆◆◆
◆◆◆◆◆◆◆◆◆◆◆◆◆
21
23
24
Closeness Optimisation
(d)
0.8
●●●●●●●●●●●●●●
●●●●●●
26
Optimal Layer Importance
Optimal Layer Importance
0.04
0
●
0.6
0.0
0.04
0.06
Degree Optimisation
●
●
●●●●
●●
●●
●
◆◆●
0.4
0.2
0.0
28
●
0.6
●●●●●●
●●●●●
●
Associations
■
Feature Norms
◆
Co-occurrences
◆
◆
■ ● ◆
●■
◆◆
◆◆◆◆◆◆
●
◆◆
■
◆◆◆◆
■
◆◆◆◆◆◆◆
■■
■■■■■■
■■■■■■■■■■■■■■
19
21
23
24
26
28
Learning Stage (Age in Months)
Learning Stage (Age in Months)
(�)
(�)
�������� ����� (��� �� ������)
��
��
��
��
��
◆
����
����
◼
◆
����
◆◆◆
���� ����
◆
◼
◆◼ ◼ ◼
◆◆◆ ◼
◆
◆◆
◼◼
◼
◆◼
◼◼ ◼◼
◆
◼
◼
◼
◼◆
◆◆
◆
◼
���� (�����)
����� (���������)
◼
◆
�������� ���� ����
����
◼◼
◆
◆◼
◆◆
◼◆
����
�
���
���
���
���
�������� ����� (# ������� �����)
��� ����
����
����
◆
����
����
◼
◆
����
◼◆
◼◆
◼
◆◆◆
���� ����
◆
◼
◆◼ ◼ ◼
◆◆◆ ◼
◆
◆◆
◼◼
◼
◆◼
◼◼ ◼◼
◆
◼
◼
◼
◼◆
◆◆
◆
◼
���� (�����)
����� (���������)
◼
◆
◼◼
◆
◆◼
◆◆
◼◆
◼◆
◼◆
◼
����
�
���
��
���� ����
����
��� ����
����
�������� ����� (��� �� ������)
��
��
��
��
��
��
���� ����
����
�������� ���� ����
0.08
0.12
0.08
●
0.12
0.18
0.12
0.8
0.06
0.24
0.16
(c)
0.09
���
���
���
���
�������� ����� (# ������� �����)
���
Figure 5: (a) and (b): Ternary plots of the average relative word gains at the end of VELS,
the middle of ELS and the end of VELS for degree-based (a) and closeness-based (b) optimisation. (c) and (d): average optimal coefficients indicating layer importances for different learning
stages obtained from cross-validation experiments with degree-based (c) and closeness-based (d)
optimisation. (e) and (f): averages for the relative word gains corresponding to weighting layers
for optimal word gains at the end of VELS, middle of ELS, and end of LLS for degree-based (e)
and closeness-based (f) optimisation. Error margins were estimated as standard deviations over
cross-validation ensembles.
9
in panels (a) and (b) in Fig. 5 visualise the optimisation landscape of relative word gains
over the remaining three layer weights at different stages of word acquisition, i.e. at the end
of VELS, middle of ELS, and end of LLS. Optima are typically fairly broad in VELS, but
they narrow for later learning stages. We also note that optima change their location in the
landscape, suggesting that different linguistic information may play different roles at different
stages of development.
To investigate this further, optimal layer weights are displayed as a function of age in
panels (c) and (d) of Fig. 5. Similar patterns are found for both degree and closeness
based rankings and we see a clear distinction between different learning stages. Initially in
VELS all layers contribute strongly toward optimal performance. In the case of degree-based
optimisation the main contributions stem from the association and the feature norm layer.
Instead, when optimising closeness combinations all layers are found to initially contribute
equally. The next learning stage, the ELS marks a transition region between the VELS and
the LLS in which dominant contributions stem from only one layer, the association layer. For
degree-based optimisations, contributions from the co-occurrence layer become very small in
the LLS, while associations contribute 80% of weight. An analogous transition is observed
in the ELS phase when closeness is optimised. In this case the importances of the feature
and the co-occurrence layers are changing over time but end up contributing less than 20%
of weight, while associations make up 80% (as in the LLS from the degree optimisation).
In panels (e) and (f) of Fig. 4 we compare the evolution of relative word gains for
optimisations aiming to predict word acquisition at different stages. We consider periods
of acquisition stopping at the end of VELS, up to the middle of ELS and the entire time
period up to age 28 months, using both degree and closeness based combinations. When
predicting only words learned in VELS, our model provides a significant enhancement in
average word gains in this stage, but this combination of layer weighting performs poorly
at later stages (see also Table 3), suggesting that VELS is qualitatively different than other
periods of language acquisition. Optimisation aimed to predict words up to the middle of
ELS gives similar results to optimisation for the entire time frame suggesting similar learning
strategies are used during both ELS and LLS.
More importantly, results in Fig. 4(e) and (f) point out the importance of closeness
centrality for predicting word acquisition at early stages. This finding is already reflected in
the highest word gains achieved by closeness centrality in the single-measure based rankings
reported in Table 3 in ELS. In fact, as reported in Fig. 4(e), in the early learning stage
ranking by multiplex closeness centrality clearly outperforms the optimal convex combination
of degree centralities. Optimal linear combinations of individual layer closeness perform even
better in ELS, see Fig. 4(f). It is important to stress that closeness centrality is a global
network feature accounting for the position of a node relative to all other nodes in the network.
This is distinct from local characteristics which only measure a nodes relationship to its
immediate neighbours, such as degree. Our findings thus give support to the hypothesis that,
particularly in the early learning stages, word learning is influenced by the global structure
of the multiplex lexical network. This global multiplex structure seems able to capture some
important word patterns that shape or influence early lexicon development.
Conclusions and future work
In this paper we have described and analysed normative word acquisition of children between
the ages of 19 and 28 months introducing the framework of multiplex lexical networks (MLN).
The motivation behind our approach is that multiplex networks allow for the simultaneous
10
analysis of correlations and interactions among words in phonological, semantic, and syntactic
contexts. This provides a network representation of the mental lexicon which is irreducible [42]
and captures more linguistic information than separate analysis of any single layer previously
could [15, 17, 24, 47].
We go beyond a purely topological description by exploiting network properties to predict
the order of normative word learning in children. Comparing the prediction power of various network features in a preferential acquisition scenario [15], we find evidence that global
multiplex measures can greatly outperform individual layer characteristics. Interestingly, we
find that the best topological feature in predicting word learning changes through the course
of development. This fact allows us to use network information to distinguish three stages
of learning: (i) a very early learning stage where all layers, apart from the phonological one,
contribute substantially to prediction, (ii) an early learning stage which marks a transition
period, and (iii) a late learning stage in which contributions of word associations dominate
word learning.
Comparing the predictive power of various MLN topological features over time confirms:
(i) the superiority of some multiplex network characteristics relative to single layer features,
and (ii) the special role closeness centrality might play in word learning. This is emphasised
in two ways: (i) strong performance of multiplex closeness, outperforming all combinations
of local network characteristics tested in the early learning phase, and (ii) the best overall
predictions are possible by using certain weightings of the individual layer measure of closeness. We thus find very strong indications that, particularly at early stages of learning, word
acquisition in children might be strongly driven by distances of learned words relative to other
known words at various levels of the human mental lexicon.
Our use of the MLN allows us to explore questions about lexical acquisition, uncovering
developmental learning stages and quantifying the influence of certain types of linguistic
structure on acquisition. Nevertheless, our MLN is limited in important ways. It is important
to bear in mind that the MLN representation is only a projection of an individual’s ML.
Additionally, the results presented in this paper consider lexical acquisition of a normative
child. That is to say, no specific child learns words in this particular order. Instead the
order is obtained by average over roughly 1000 productive vocabulary reports. We plan to
extend our model to the acquisition trajectory of individual children as future work. Another
limitation is the use of the CDI vocabulary. While the CDI is a commonly used check list of
words a child produces, there are many words that a child may know that are not on the CDI.
Expanding our model to the full vocabulary of a child, and for longer periods of development
will further increase our understanding of the language acquisition process.
In the future, our MLN can be corroborated by individual learning trajectories as well
as language learning at later stages and across other languages. Future work might extend
our analysis by considering correlations in word learning across other cognitive and linguistic
domains to build an even richer picture of language organisation and learning than the one
offered here. For instance, one might argue that the marginal influence of the phonological
layer found in this study may reflect that the definition of phonological similarity employed
here is not able to capture effects in young children. Indeed previous work finds phonological
information useful in explaining cognitive behaviour and language learning tasks in adults
[27, 49, 50]. Other metrics of phonological similarity for children should be tested in future
work.
Even considering these limitations, we have shown strong evidence that multiplex lexical
networks provide a powerful formalised representation for exploring and explaining cognitive
processes using language, capturing a richer picture of the mental lexicon than previous works
using only single-layer networks.
11
Methods
Age of acquisition dataset
Age of acquisition orderings are constructed based on the MacArthur-Bates Communicative
Development Inventory (CDI) norms [36]. The CDI, based on parent report of the productive
vocabulary of children aged between 16 and 30 months, has been shown to be predictive of
future language ability in young children [36,51]. The norms are averaged over more than 1000
vocabulary reports and indicate the percentage of children, at a given age, that reportedly
produce (and understand) a specific word. From these norms, we inferred an ordering of
learning. For example, ”mommy” is reported as produced by 93% of children by 16 months;
this word is learned earlier than the word ”chair” which is produced by only 14% of 16 month
olds. We assume a word is known once 50% of children in a given month produce that word,
we then order words within a month based on the assumption that higher rates of production
indicate earlier learning. Note that this type of ordering has been used in other developmental
modelling contexts [15, 47] and can be used to define a strict ordering of learning across all
items on the CDI.
Construction of the multiplex network
The four layers of the MLN were selected according to previous single-layer network studies
investigating which single-layer network representations may carry information about word
learning by young children [6, 15, 17, 19]. The four layers we consider here are (i) a semantic
layer based on the Florida Free Association Norms [12] where words A and B are connected if
one of them remind of the other; (ii) a layer capturing feature similarity based on the McRae
Feature Norms [14] where words A and B are connected if they share more than one semantic
features; (iii) a layer based on word co-occurrences (measured in child-directed speech [16])
where words A and B are connected if they co-occur more than X times, where the threshold
X was chosen approximately to match the link density of the other semantic layers; and a
(iv) layer capturing phonological similarities (based on WordNet 3.0 [52]) where words A and
B are connected if they have phonological transcriptions with edit distance 1. We treat the
association layer as undirected and edge weights in layers (i) to (iii) are ignored. The resulting
MLN is composed of 529 words which represent the intersection between the CDI data and
the network information for words that have at least one connection. For more details we
refer to the Supporting Information.
Overlap measures and word ranking
A word trajectory is an ordered list τ = (w1 , w2 , ..., wN ) and the learning trajectory derived
from the CDI gives an empirical ordering τaoa for word learning as observed in children.
Predicted word orderings are generated as follows. First, each word is ranked according to
a word score, si , which is derived from network specific or extrinsic measures (e.g. frequency
or word length). Words are then ordered according to word score, starting with the largest
score. Words receive a position in the predicted trajectory according to their position in the
ordering of word scores. Ties in word orderings si are taken into account by averaging over
all resolutions. For example, one might wish to consider orderings based on degree in the
association layer and would obtain the following word scores sf ood = 62, swater = 45, seat = 20.
The corresponding predicted acquisition trajectory would be τ = (f ood, water, eat). Apart
from the measures mention in the main text of the paper, we also considered further measures
listed in the Supporting Information.
12
To evaluate predictive performance of a word score, we first measure the overlap O(τ, τaoa , t)
between the predicted learning trajectory τ and the empirically known trajectory τaoa at time
t by counting the number of words that co-occur in τ and τaoa up to time t. We then define
relative word gain G(τ, τaoa , t) as the overlap minus the random performance, i.e.
G(τ, τaoa , t) =
O(τ, τaoa , t) − hO(τrand , τaoa , t)i
.
t
(1)
The reference case of random word ordering is constructed by averaging over random permutations of τaoa .
Relative word gain gives a measure of the predictive power of word scores. A value of
G = 0 indicates performance equivalent to random guessing, while scores larger (smaller)
than zero indicate performance superior (inferior) to random guessing. We also consider the
average word gain over the entire time period or specific learning periods.
Optimal combinations of layers
To allow for varying influences of layers we construct word scores as convex combinations of
individual layer word scores, i.e. a word score sw for word w is obtained by combining layerspecific values of that word
eat)
hon)
sw = αs(Asso)
+ βs(F
+ γs(Co−occ)
+ (1 − α − β − γ)s(P
,
w
w
w
w
(Asso)
(F eat)
(Co−occ)
(2)
(P hon)
where sw
, sw
, sw
, and sw
give the word scores obtained by using single layer
metrics in the respective layers and coefficients α, β, γ give the influence of each layer on the
word score sw . This linear combination of single-layer features is similar in spirit to previous
decompositions of multiplex centrality in edge-coloured graphs, see [53] and discussions in [42].
For individual layer word scores we consider all relevant single-layer network statistics,
see Supporting Information. Notice that, contrary to ranking experiments above, word scores
sw now depend on the combined set of coefficients, so that different values of α, β, γ can lead
to different word orderings for the same sets of individual layer scores. For a given network
statistic we then find the set of coefficients that maximises average word gain (either over
the entire time period or over specific learning phases) in a data set in which 20% of the
words have been left out at random. 20% was chosen as a reasonable value, in order not to
remove too many words particularly in the very early word learning stages (when only 20 or
40 words are learned). To perform the optimisation a numerical scheme using the differential
evolution method [54] is used and averages are calculated over 50 configurations of left out
words. In low dimensional cases, visualisations of the search space can be given in ternary
plots (see Fig. 5(a) and (b)) and we have verified that global optima have been reached by
the numerical scheme in these cases.
References
1 Ramon
Ferrer i Cancho and Ricard V Solé. Least effort and the origins of scaling in human
language. Proceedings of the National Academy of Sciences, 100(3):788–791, 2003.
2 Jean
Aitchison. Words in the mind: An introduction to the mental lexicon. John Wiley &
Sons, 2012.
3M
Ross Quillian. Word concepts: A theory and simulation of some basic semantic capabilities. Behavioral science, 12(5):410–430, 1967.
13
4 Allan
M Collins and M Ross Quillian. Retrieval time from semantic memory. Journal of
verbal learning and verbal behavior, 8(2):240–247, 1969.
5 Allan
M Collins and Elizabeth F Loftus. A spreading-activation theory of semantic processing. Psychological review, 82(6):407, 1975.
6 Javier
Borge-Holthoefer and Alex Arenas. Semantic networks: structure and dynamics.
Entropy, 12(5):1264–1302, 2010.
7 Andrea
Baronchelli, Ramon Ferrer-i Cancho, Romualdo Pastor-Satorras, Nick Chater,
and Morten H Christiansen. Networks in cognitive science. Trends in cognitive sciences,
17(7):348–360, 2013.
8 Mikko
Kivelä, Alex Arenas, Marc Barthelemy, James P Gleeson, Yamir Moreno, and Mason A Porter. Multilayer networks. Journal of Complex Networks, 2(3):203–271, 2014.
9 Stefano
Boccaletti, G Bianconi, R Criado, Charo I Del Genio, J Gómez-Gardeñes, M Romance, I Sendina-Nadal, Z Wang, and M Zanin. The structure and dynamics of multilayer
networks. Physics Reports, 544(1):1–122, 2014.
10 Michael
S Vitevitch. What can graph theory tell us about word learning and lexical
retrieval? Journal of Speech, Language, and Hearing Research, 51(2):408–422, 2008.
11 Manlio
De Domenico, Albert Solé-Ribalta, Emanuele Cozzo, Mikko Kivelä, Yamir Moreno,
Mason A Porter, Sergio Gómez, and Alex Arenas. Mathematical formulation of multilayer
networks. Physical Review X, 3(4):041022, 2013.
12 Douglas
L Nelson, Cathy L McEvoy, and Thomas A Schreiber. The University of South
Florida free association, rhyme, and word fragment norms. Behavior Research Methods,
Instruments, & Computers, 36(3):402–407, 2004.
13 Mark
Steyvers and Joshua B Tenenbaum. The large-scale structure of semantic networks:
Statistical analyses and a model of semantic growth. Cognitive science, 29(1):41–78, 2005.
14 Ken
McRae, George S Cree, Mark S Seidenberg, and Chris McNorgan. Semantic feature
production norms for a large set of living and nonliving things. Behavior research methods,
37(4):547–559, 2005.
15 Thomas
T Hills, Mounir Maouene, Josita Maouene, Adam Sheya, and Linda Smith. Longitudinal analysis of early semantic networks preferential attachment or preferential acquisition? Psychological Science, 20(6):729–739, 2009.
16 Brian
MacWhinney. The CHILDES project: The database, volume 2. Psychology Press,
2000.
17 Nicole
Beckage, Linda Smith, and Thomas Hills. Small worlds and semantic network growth
in typical and late talkers. PloS one, 6(5):e19348, 2011.
18 Fernanda
Marafiga Wiethan, Letı́cia Arruda Nóro, and Helena Bolli Mota. Early lexical
and phonological acquisition and its relationships. In CoDAS, volume 26, pages 260–264.
SciELO Brasil, 2014.
19 Nicole
M Beckage and Eliana Colunga. Language networks as models of cognition: Understanding cognition through language. In Towards a Theoretical Framework for Analyzing
Complex Linguistic Networks, pages 3–30. Springer, 2015.
20 Adilson
E Motter, Alessandro PS de Moura, Ying-Cheng Lai, and Partha Dasgupta. Topology of the conceptual network of language. Physical Review E, 65(6):065102, 2002.
21 Mariano
Sigman and Guillermo A Cecchi. Global organization of the WordNet lexicon.
Proceedings of the National Academy of Sciences, 99(3):1742–1747, 2002.
14
22 Ramon
Ferrer i Cancho and Richard V Solé. The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482):2261–2265,
2001.
23 Joaquı́n
Goñi, Gonzalo Arrondo, Jorge Sepulcre, Inigo Martincorena, Nieves Vélez de Mendizábal, Bernat Corominas-Murtra, Bartolomé Bejarano, Sergio Ardanza-Trevijano, Herminia Peraita, Dennis P Wall, et al. The semantic organization of the animal category:
Evidence from semantic verbal fluency and network theory. Cognitive processing, 12(2):183–
196, 2011.
24 Simon
De Deyne and Gert Storms. Word associations: Network and semantic properties.
Behavior Research Methods, 40(1):213–231, 2008.
25 Massimo
Stella and Markus Brede. Patterns in the English language: Phonological networks, percolation and assembly models. Journal of Statistical Mechanics, 2015:P05006,
2015.
26 Massimo
Stella and Markus Brede. Investigating the phonetic organisation of the english
language via phonological networks, percolation and Markov models. In Proceedings of
ECCS 2014, pages 219–229. Springer International Publishing, 2016.
27 Michael
S Vitevitch, Kit Ying Chan, and Steven Roodenrys. Complex network structure influences processing in long-term and short-term memory. Journal of memory and language,
67(1):30–44, 2012.
28 Haitao
Liu and Jin Cong. Empirical characterization of modern chinese as a multi-level
system from the complex network approach. Journal of Chinese Linguistics, 42(1):1–38,
2014.
29 Sanda
Martinčić-Ipšić, Domagoj Margan, and Ana Meštrović. Multilayer network of language: A unified framework for structural analysis of linguistic subsystems. Physica A:
Statistical Mechanics and its Applications, 457:117–128, 2016.
30 Manlio
De Domenico, Clara Granell, Mason A Porter, and Alex Arenas. The physics of
multilayer networks. arXiv preprint arXiv:1604.02021, 2016.
31 Michael
Szell, Renaud Lambiotte, and Stefan Thurner. Multirelational organization of
large-scale social networks in an online world. Proceedings of the National Academy of
Sciences, 107(31):13636–13641, 2010.
32 Jesús
Gómez-Gardenes, Irene Reinares, Alex Arenas, and Luis Mario Florı́a. Evolution of
cooperation in multiplex networks. Scientific reports, 2, 2012.
33 Federico
Battiston, Andrea Cairoli, Vincenzo Nicosia, Adrian Baule, and Vito Latora.
Interplay between consensus and coherence in a model of interacting opinions. Physica D:
Nonlinear Phenomena, 2015.
34 Shai
Pilosof, Mason A Porter, and Sonia Kéfi. Ecological multilayer networks: A new
frontier for network ecology. arXiv preprint arXiv:1511.04453, 2015.
35 Massimo
Stella, Cecilia S Andreazzi, Sanja Selakovic, Alireza Goudarzi, and Alberto
Antonioni. Parasite spreading in spatial ecological multiplex networks. arXiv preprint
arXiv:1602.06785, 2016.
36 P
S Dale and L Fenson. Lexical development norms for young children. Behavior Research
Methods, Instruments, & Computers, 28(1):125–127, 1996.
37 Mark
Newman. Networks: an introduction. Oxford University Press, 2010.
15
38 Manlio
De Domenico, Albert Solé-Ribalta, Sergio Gómez, and Alex Arenas. Navigability
of interconnected networks under random failures. Proceedings of the National Academy of
Sciences, 111(23):8351–8356, 2014.
39 Duncan
J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’ networks.
nature, 393(6684):440–442, 1998.
40 Marcus
Kaiser. Mean clustering coefficients: the role of isolated nodes and leafs on clustering measures for small-world networks. New Journal of Physics, 10(8):083042, 2008.
41 Federico
Battiston, Vincenzo Nicosia, and Vito Latora. Structural measures for multiplex
networks. Physical Review E, 89(3):032804, 2014.
42 Manlio
De Domenico, Vincenzo Nicosia, Alexandre Arenas, and Vito Latora. Structural
reducibility of multilayer networks. Nature communications, 6, 2015.
43 Michael
Zock. Words in books, computers and the human mind. Journal of Cognitive
Science, 16(4):355–378, 2015.
44 Valerio
Gemmetto and Diego Garlaschelli. Multiplexity versus correlation: the role of local
constraints in real multiplexes. Scientific reports, 5, 2015.
45 Adrien
Barbaresi. Language-classified Open Subtitles (LACLOS): download, extraction,
and quality assessment. PhD thesis, BBAW, 2014.
46 Victor
Kuperman, Hans Stadthagen-Gonzalez, and Marc Brysbaert. Age-of-acquisition
ratings for 30,000 english words. Behavior Research Methods, 44(4):978–990, 2012.
47 N
M Beckage, A Aguilar, and E Colunga. Modeling lexical acquisition through networks.
Proc. of the 37th Conf. of the Cog. Sci. Society, 2015.
48 Manlio
De Domenico, Albert Solé-Ribalta, Elisa Omodei, Sergio Gómez, and Alex Arenas.
Ranking in interconnected multilayer networks reveals versatile nodes. Nature communications, 6, 2015.
49 Michael
S Vitevitch, Kit Ying Chan, and Rutherford Goldstein. Insights into failed lexical
retrieval from network science. Cognitive psychology, 68:1–32, 2014.
50 Melissa
K Stamer and Michael S Vitevitch. Phonological similarity influences word learning
in adults learning spanish as a foreign language. Bilingualism: Language and Cognition,
15(03):490–502, 2012.
51 L
Fenson, P S Dale, J S Reznick, E Bates, D J Thal, S J Pethick, M Tomasello, C B Mervis,
and J Stiles. Variability in early communicative development. Monographs of the society
for research in child development, 59(5):1–185, 1994.
52 George
A Miller. WordNet: a lexical database for english. Communications of the ACM,
38(11):39–41, 1995.
53 Luis
Solá, Miguel Romance, Regino Criado, Julio Flores, Alejandro Garcia del Amo, and
Stefano Boccaletti. Eigenvector centrality of nodes in multiplex networks. Chaos: An
Interdisciplinary Journal of Nonlinear Science, 23(3):033131, 2013.
54 Kenneth
Price, Rainer M Storn, and Jouni A Lampinen. Differential evolution: a practical
approach to global optimization. Springer Science & Business Media, 2006.
55 Manlio
De Domenico, Mason A Porter, and Alex Arenas. Muxviz: a tool for multilayer
analysis and visualization of networks. Journal of Complex Networks, page cnu038, 2014.
16
Author contributions statement
M.S. designed the original study, M.S., M.B. and N.B. conceived the experiments, N.B.
provided the empirical data, M.S. conducted the experiments, M.S., M.B. and N.B. analysed
the results. All authors reviewed the manuscript.
Additional information
The authors declare no competing financial interests. M.S. was supported by an EPSRC
Doctoral Training Centre grant (EP/G03690X/1). The authors acknowledge the use of the
muxViz software [55].
17