Social Network Structures in Agent Based
Modeling: Finding an Optimal Structure Based on
Survey Data (or Finding the Network That Does
Not Exist)
Peter S. van Eck1, and Wander Jager1
1
Faculty of Economics and Business, University of Groningen, P.O. Box 800, 9700
AV Groningen, The Netherlands
{P.S.van.Eck, W.Jager}@rug.nl
Abstract. The social environments used in Agent Based Models developed from
simple, stylized grids to more complex network structures (e.g. small-world networks
and scale-free networks). The focus in developing these network structures lies in the
pattern of connectivity, while other factors such as the similarity between agents are
neglected and therefore the networks often do not seem to reflect „real networks‟.
Collecting complete empirical network data is quite difficult, as it does not contain all
influential relation (e.g. only online interactions). In this paper we propose a
methodology to construct simulated networks on the basis of (easier to collect) survey
data. These networks are optimized concerning the network properties. Such
simulated networks are expected to provide a more valid representation of real
networks, and hence may contribute to our further understanding of social interactions
in networks.
Keywords: Social Network Structure, Empirical (Survey) Data
1 Introduction
One of they key contribution of social simulation resides in its capability of creating
networks of interacting agents. This allows to model social interactions in a grid of
connected agents. Whereas initial models used very simple stylized social
environments, such as grids with Von Neumann or Moore environments, or fully
random networks, empirical data inspired researchers to develop networks capturing
critical factors such as close and distant links (small-world networks), variance in
number of connections (preferential attachment), clustering and the like (e.g. Watts
and Strogatz 1998; Barabasi, 2002). Most of this work is focusing on describing the
pattern of connections between agents. However, we also know that people like to
connect on the basis of similarity (homophily: Lazarsfeld and Merton, 1954), have a
larger chance of meeting and connecting with friends of friends, and sometimes value
the informational expertise from another person (and don‟t mind if this person is
really different) (Granovetter, 1973). Moreover, we know that people are
heterogeneous regarding these attributes. These factors are expected to be relevant in
understanding the dynamics of interaction, and hence it is a challenge of capturing
them in a simulated network. Ideally we want to have an empirically validated
simulated network where the network structure also reflects the reasons for being
connected and type of influence between agents. Complete network data however are
currently only available from web-based studies of interaction on social websites.
These data capture only the Internet part of the connections of a person, and are
unlikely to represent the complete network properties of a real population.
Representative data on processes of social interaction can be collected concerning the
social interactions of people, but without data on the networks structure in which
these interactions take place. In this paper we propose a methodology to construct
simulated networks on the basis of survey data that are optimized concerning the
network properties. Such simulated networks are expected to provide a more valid
representation of real networks, and hence may contribute to our further
understanding of social interactions in networks. In this paper we will start with
explaining the principles of the approach and a performance test on artificial data.
2 A New Approach: Finding the Network
2.1 Procedure and Assumptions
The procedure used to find the network structure is briefly described in figure 1 and
explained in more detail in this section.
1. Load respondent information and create agents
2. Select agent X (Random) and put agent X on „Selected List‟
3. Select a relation the agent X needs (random)
4. Calculate for all other agents: Deviation from needed relation
5. Create a list of candidates (Deviation ≤ Limit)
5.1 Prefer neighbors of neighbors
5.2 Otherwise: pick other candidates
5.3 Exception procedure
No candidate: Limit=Limit + 1
6. Agent X creates relation with random candidate (agent Y)
7. Move to next relation (random)
Until all relations are established
8. Move to next agent on „Selected List‟
Until all agents are selected
Fig. 1 Description of the procedure
In step 1 the information about the respondents is loaded in the model: for every
respondent a corresponding agent is created. This information can potentially be
anything that is included in the survey, but should at least contain some information
about the relations of the respondents. In the current model we assume that we have
the following information about the respondent:
- How many relations does the respondent have?
- For every relation:
Is this relation mostly normative or informational 1
Is this relation with a similar person or with a dissimilar person
(homophily)
- To which similarity group does the respondent belong
For demonstration purposes, the „available‟ information used in this model is
relatively simple. Obviously, the model can be modified to allow more complicated
information input.
In step 2 the first agent (agent X) is randomly selected and added to the „Selected
List‟. This list contains all the agents that have a relation with at least one other agents
(i.e. the list is updated every time step 6 is reached) and therefore ensures that all
agents have a chance to create the relations they need. Furthermore it provides the
order in which agents are selected in step 8 (after the first agent).
Step 3: every agent has a list of „Preferred Relations‟ based on the empirical data
(loaded in step 1). This list indicates how many relations the agent wants, what type
of relations the agent wants and whether these relations are with similar others. Agent
X randomly selects one of the relations of its „Preferred Relations‟ list.
Step 4: now that agent X selected the relation it needs, all other agents (hereafter
agent Y) can determine whether they are open to this relation. If agent Y is not open
to the requested relation, it calculates the „Deviation‟ between the requested relation
by agent X and the closest match within agent Y‟s own „Preferred Relations‟ list. In
the current model we use the following, deviations:
- if agent Y is open to the requested relation by agent X, the deviation is 0
- if agent Y is open to a relation, but not to the requested relation (e.g. a
normative relation is requested by agent X, while agent Y only needs
informational relations), the deviation is 1
- If agent Y is not open to new relations, the deviation is 2. However this
deviation increases if the agent already has more relations than it wants: if
agent Y wants 3 relations, but already has 5 relations, the deviation is 2 + 2 =
4.
The deviations are arbitrarily chosen in this model, but the main idea behind the
calculation of the deviation is that the deviation increases if the satisfaction of agent Y
with the requested relation decreases.
In step 5 a list of candidates is created. First agent X tries to find another agent (Y)
with „deviation = 0‟ (i.e. a perfect match). If agent X can not find this perfect match, it
starts searching for an agent with a small deviation (i.e. the „Limit‟ increases until a
candidate is found). In the current model agent X has three alternative methods to
create a list of candidates. Step 5.1 is the preferred method: in this step we assume
that people are likely to form clusters: friends of friends are likely to be friends of
each other (Granovetter, 1973). Based on this assumption the agent will first try to
connect to a neighbor of one of the agents it is already connected to. If this does not
result in a list of candidates, step 5.2 includes all other agents in the search process. If
we would only use step 5.1 and step 5.2 to create a list of candidates, there is a chance
that the resulting network is not connected. For example: the first five agents that are
o
o
1
Normative influence refers to the tendency to conform to expectations of others (Burnkrant
and Cousineau, 1975) and informational influence refers to the tendency to accept information
from others as evidence of reality (Deutsch and Gerrard, 1955).
selected form a cluster (only connecting to each other) and are fully satisfied with
their relations. We would have to randomly select another agent to proceed, but this
agent would never connect to the cluster of the first five agents. To prevent the
network from not being connected, we use a third method to create a list of
candidates: step 5.3. This step is only followed if:
- agent X is the last agents on the „Selected List‟
- there are still agents that are not connected to the network (and therefore
these agents are not on the „Selected List‟
- agent X only needs one more relation (i.e. it is the last chance to connect to
an unconnected agent)
So, in this particular situation agent X connects with an agent that is not connected to
the network yet. Given the fact that this relation is with an agent outside the cluster of
agent X, we assume that the relation is mostly informational and with a dissimilar
agent.
In step 6 a random agent (Y) is selected from the list of candidates. A relation is
created between agent X and agent Y, and agent Y is added to the „Selected List‟.
If agent X still wants more relations, another random relation is selected in step 7,
after which the process will return to step 4.
If agent X has all the relations it needs, the next agent on the „Selected List‟ is
selected in step 8, after which the process will return to step 3. This procedure
continues till all agents in the dataset had a chance to connect to other agents: the
network is completed.
2.2 Error Indicators and Suboptimal Networks
Although the procedure described in section 2.1 will always result in a network, this
network is not necessarily the best solution: given the random selections in steps 2, 3,
6 and 7, the solutions can differ and some solutions might be better than others.
Therefore we use three error-indicators: the relations-number-error, the relations-typeerror and the relations-similarity-error. These errors indicate the discrepancy between
the situation preferred by the agents and the actual situation. For example: agent X
prefers 4 relations, 3 normative relations with similar others and 1 informational
relation with a dissimilar other. Agent X actually has 3 relations: 1 normative relation
with a similar other, 1 normative relation with a dissimilar other and 1 informational
relation with a dissimilar other. The relations-number-error for agent X is 1 (|4-3|), the
relations-type-error is 1 (|3-2|+|1-1|) and the relations-similarity-error is 3 (|3-1|+|12|)2. The total-error for agent X is 5 (1+1+3). The error indicators for the network are
calculated by adding all error indicators of the separate agents.
Overall the network with the lowest total-error has the best fit. However, for a
specific research question it might be more important to minimize one of the specific
error indicators. It could be more important that agents are connected to similar others
or that agents have the right balance between normative and informational relations.
2
Note that the absolute values of the differences are taken: one error cannot compensate for
another error
Given the different error indicators, it is easy to select the network that best suites the
research question.
The procedure also allows to optimize the network based on the number of
relations and in addition either the type of relation or the similarity needed.3 In this
case the calculation of the deviation in step 4 does not take the other characteristics
into account. For example: agent Y still needs 1 normative relation with a similar
other. If we only take the number of relations into account and agent X is searching
for 1 informational relation with a similar other, the calculated deviation of agent Y is
0. However, even in this situation the relation-type-error of agent Y increases if the
relation is created.
2.3 Finding an Existing Network: An Example
To test whether the proposed method actually is able to find an optimal network, we
create „survey data‟ based on a fictional network (hereafter referred to as Input
Network). This way we ensure that a „perfect network solution‟ (total-error = 0)
exists: the procedure discussed in section 2.1 should be able to find this solution. We
created a network with 5 clusters of 5 similar agents (25 agents in total). Within the
clusters most agents are connected to each other with a normative relation; however
some agents are not connected to each other or have an informational relation. Some
agents have connections with agents of other clusters. Most of these relations are
informational, although some of the relations are normative. Based on this network
we created „survey data‟, containing: the similarity group the agents belong to (1-5),
the number of relations (3-7) and for every relation: the type of relation
(normative/informational) and the similarity between the agents required
(similar/dissimilar).
The method is tested in four settings: 1) Optimizing the number of connections, 2)
optimizing the type of connections, 3) optimizing the similarity required and 4)
optimizing number, type and similarity. In each setting 500 networks are created to
ensure that we find several optimal solutions. Table 1 shows the results of this test as
well as two network measures: the network-degree-centrality and the networkclustering-coefficient. The network-degree-centrality indicates whether agents within
the network have a different number of relations. A network-degree-centrality of 0
means that all agents have the same number of relations, while a network-degreecentrality of 1 indicates a star network (i.e. all agents are connected to one single
agent). The network-clustering-coefficient indicates how well neighboring agents are
connected to each other: 0 indicates that the neighbors of agents are not connected to
each other, while 1 indicates that all neighbors are connected to each other. Table 1
shows the three different error measures and the total-error for the 500 runs of each
setting. It also includes these measures for the „optimal solutions‟ within each setting
(e.g. if we try to optimize the network based on similarity, the relations-similarityerror should be 0) and the measures for the Input Network.
3
The „type of relation‟ and „similarity needed‟ are both defined on the relation level, which
implies that optimizing on one of these characteristic also optimizes the number of relations
Table. 1 Results of four test settings and „perfect network‟
Number of Type
of Similarity
Type
and Input
connections
Connection
Similarity
Network
500
41
500
161
500
131
500
12
RelationsNumber4
0
4
0
2
0
23
0
0
Error
Relations25
29
7
0
20
18
37
0
0
Type-Error
RelationsSimilarity146
142
132
140
14
0
40
0
0
Error
Total Error
175
171
143
140
36
18
100
0
0
NetworkDegree0.10 0.11 0.10 0.11 0.13 0.11 0.43 0.11
0.11
Centrality
NetworkClustering0.66 0.71 0.66 0.61 0.57 0.63 0.65 0.63
0.62
Coefficient
Note: all results are the average of all runs
1
Results in this column are only based on the (number of) optimal runs
2
The chance of finding this „optimal network‟ is very small: needs more than 500 runs
The results in Table 1 show that the first three settings do relatively well in
constructing networks with a low relations-number-error: in most of the created
networks the number of relations agents actually have is close to the number of
relations the agents prefer. The relations-type-error is also quite low in the first three
settings. In contrast, the relations-similarity-error is high for the first two settings and
only drops when the network is created based on this similarity. These findings can
easily be explained based on the network structure used to create the data: if similarity
does not matter, agents feel free to connect with agents from different cluster, thereby
increasing the relations-similarity-error. If similarity does matter, agents prefer to
connect to agents within their cluster and those relations are often normative: the
relations-type-error therefore does not really increase.
For all „optimal network solutions‟ the degree-centrality and clustering-coefficient
are quite close to the values of the actual network we used to create the data: given
that the relations-number-error is 0, the degree-similarity matches this value perfectly
(it is only based on the number of relations). However, the clustering-coefficient does
not have to be close to the real value: the networks in the first setting are more
strongly clustered than the real network. This can be explained by the fact that agents
in this setting are not bothered with relation-type or similarity: the deviation
calculated in step 4 is 0 for many agents and therefore agents are more likely to
connect to neighbors of neighbors in step 5.1.
The results of the model with both „type of relation‟ and „similarity‟ as
optimization variables seem to be more surprising: it produces the highest relationsnumber-error and relations-type-error. Because the relations-similarity-error scores
average, the total-error score of the model is still lower that the first two models.
Interestingly, this is also the only model setting that results in a very high average
network-degree-centrality over 500 runs. This indicates that at least one agent in the
network has many more relations than the other agents. Furthermore the optimal
network (in this case also a „perfect network‟) is only rarely found. This result is
caused by the fact that there are only very few solutions in this model setting: if one
of the created relations is not perfect, the resulting solution is not perfect. The strong
restrictions do not allow for any compensation in other relations.
In Figure 2, the network structure of (one of) the optimal runs of each setting is
shown. Only the network based on similarity (Figure 2d) and the network based on
both „type of relation‟ and „similarity‟ (Figure 2e) strongly resemble the Input
Network shown in Figure 2a. Based on the error measures, the network measures and
the actual network structure, we can conclude that the model based on similarity
between agents produces the best network structures.
a) Perfect Network
b) Number
c) Type
d) Similarity
e) Type & Similarity
Fig. 2 Network structures of optimal solutions
4 Conclusions and Discussion
The method we propose in this paper allows researchers to create a network structure
based on survey data. Although it may be tempting to use many restrictions to create
the optimal network, our example network shows that using too many restrictions
makes it more difficult to create the perfect network. The results also indicate that
creating a network based on a lower number of restrictions can also result in networks
close to the perfect network.
The model, example network and data in this paper are all relatively simple to
illustrate how the procedure works. To fit the model to a specific research situation, it
can be modified in several ways. In order to allow the use of more complex data step
4 (calculation of the deviation) can be modified: e.g. to allow for a continuous scale
with respect to the type of relation or the similarity between agents. Furthermore, if
there are strong indications that the agents should prefer to connect in a certain way
(e.g. friends of friends are likely to be friends (step 5.1)) these assumptions can be
included by introducing new rules to step 5.
Although the results of the current model seem to be promising, the next important
step is to actually use our empirical data to create an optimal network structure. In
order to show whether this approach really increases the realism of ABMs, we need to
compare the results of this approach with the results of a model in which empirical
data is only used as an input for rules and parameter distributions.
References
Barabasi, A.L. (2002). Linked: The New Science of Networks, Cambridge, Massachusetts:
Perseus Publishing.
Burnkrant, R.E., & Cousineau, A. (1975). Informational and Normative Social Influence in
Buyer Behavior. Journal of Consumer Research, 2(3), 206-215.
Deutsch, M., & Gerrard, H.B. (1955). A Study of Normative and Informational Social
Influence Upon Individual Judgement. Journal of Abnormal Social Psychology, 51, 629636.
Granovetter, M.S. (1993). The strength of Weak Ties. American Journal of Sociology, 78 (6),
1360-1380.
Lazarsfeld, P., & Merton, R.K. (1954). “Friendship as a Social Process: A Substantive and
Methodological Analysis” in Freedom and Control in Modern Society, Morroe Berger et al.,
eds., New York: Van Nostrand. 18-66.
Watts, D.J., & Strogatz, S.H. (1998). Collective Dynamics of “Small-World” Networks.
Nature, 393, 440-442.
© Copyright 2026 Paperzz