Open Research Online The Open University’s repository of research publications and other research outputs Project Testbed: Argument Mapping and Deliberation Analytics Other How to cite: Parent, M.-A., Liddo, A. D., Ullmann, T. D., & Klein, M. (2015). Project Testbed: Argument Mapping and Deliberation Analytics (Deliverable No. D4.2b). Retrieved from:http://catalyst-fp7.eu/wpcontent/uploads/2015/11/CATALYST D4.3.pdf For guidance on citations see FAQs. c [not recorded] Version: Not Set Link(s) to article on publisher’s website: http://catalyst-fp7.eu/wp-content/uploads/2015/11/CATALYST D4.3.pdf Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk Project Acronym: CATALYST Project Full Title: Collective Applied Intelligence and Analytics for Social Innovation Grant Agreement: 6611188 Project Duration: 24 months (Oct. 2013 - Sept. 2015) D4.2b Project Testbed: Argument Mapping & Deliberation Analytics Deliverable Status: File Name: Due Date: Submission Date: Dissemination Level: Task Leader: Final CATALYST_ D4.2b.pdf September 2015 (M24) November 2015 (M26) Public Purpose This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 The CATALYST project consortium is composed of: SO I4P OU UZH CSCP Purpose Wikitalia Sigma Orionis Imagination for People The Open University University of Zurich Collaborating Centre on Sustainable Consumption and Production Purpose Europe Wikitalia France France United Kingdom Switzerland Germany United Kingdom Italy Disclaimer All intellectual property rights are owned by the CATALYST consortium members and are protected by the applicable laws. Except where otherwise specified, all document contents are: “© CATALYST Project - All rights reserved”. Reproduction is not authorised without prior written agreement. All CATALYST consortium members have agreed to full publication of this document. The commercial use of any information contained in this document may require a license from the owner of that information. All CATALYST consortium members are also committed to publish accurate and up to date information and take the greatest care to do so. However, the CATALYST consortium members cannot accept liability for any inaccuracies or omissions nor do they accept liability for any direct, indirect, special, consequential or other losses or damages of any kind arising out of the use of this information. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 2 of 42 Revision Control Version 0.1 0.2 0.3 0.4 0.5 0.6 1.0 Author Marc-Antoine Parent Anna De Liddo, Thomas Ullmann Mark Klein Marc-Antoine Parent Marta Arniani Marc-Antoine Parent Marta Arniani Date October 30, 2015 November 6, 2015 November 10, 2015 November 19, 2015 November 23, 2015 November 26, 2015 November 26, 2015 Status Initial Draft Inputs from the Open University Inputs from the University of Zürich Draft Quality Check Final Draft reviewed Submission to the EC D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 3 of 42 Table of Contents Executive summary .................................................................................................................................... 5 Introduction ................................................................................................................................................. 6 1. Analytics and alerts implemented ................................................................................................ 7 1.1 Metrics ........................................................................................................................................................................................................7 1.2 Alerts ........................................................................................................................................................................................................ 12 1.3 Analytics Identification Methodology ......................................................................................................................................... 15 1.4 Visual analytics based on attention metrics ............................................................................................................................. 18 1.4.1 Activity bias visualisation ................................................................................................................................................................................. 18 1.4.2 Rating bias visualisation .................................................................................................................................................................................... 19 1.4.3 Attention map visualisation ............................................................................................................................................................................. 19 1.4.4 Community interest network visualisation .............................................................................................................................................. 20 1.4.5 Sub-communities network visualisation ................................................................................................................................................... 21 1.5 Alerts and user feedback interface in LiteMap and DebateHub ........................................................................................ 22 1.5.1 Alerts in LiteMap ................................................................................................................................................................................................... 23 1.5.1.1 LiteMap’s alert interface .............................................................................................................................................................................................. 23 1.5.1.2 User feedback Interface ............................................................................................................................................................................................... 24 1.5.2 Alerts in DebateHub............................................................................................................................................................................................. 24 1.6 Semantic clusters................................................................................................................................................................................. 26 2. Community testing of visual analytics...................................................................................... 26 2.1 Evaluation .............................................................................................................................................................................................. 27 2.1.1 Background information ................................................................................................................................................................................... 27 2.1.2 Task performance ................................................................................................................................................................................................. 27 2.1.3 Usability ..................................................................................................................................................................................................................... 27 2.1.4 Discussion ................................................................................................................................................................................................................. 28 3. Community testing of alerts ......................................................................................................... 28 3.1 Loomio test of harvester alerts ...................................................................................................................................................... 28 3.1.1 Experimental design ............................................................................................................................................................................................ 29 3.1.2 Testbed outcomes ................................................................................................................................................................................................. 29 3.2 Seventh Sustainable Summer School ........................................................................................................................................... 30 3.2.1 Participation-centric analytics ....................................................................................................................................................................... 30 3.2.2 Semantic clusters and semantic proximity ............................................................................................................................................... 31 4. Post-hoc testing ................................................................................................................................ 31 4.1 4.2 4.3 Alerts ........................................................................................................................................................................................................ 31 Metrics and visualization ................................................................................................................................................................. 32 Suggestions based on semantic clusters ..................................................................................................................................... 37 Conclusions and future work............................................................................................................... 40 References .................................................................................................................................................. 41 List of Figures ............................................................................................................................................ 42 List of Tables.............................................................................................................................................. 42 D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 4 of 42 Executive summary The present document is a deliverable of the CATALYST project, funded by the European Commission’s Directorate-General for Communications Networks, Content & Technology (DG CONNECT), under its 7th EU Framework Programme for Research and Technological Development (FP7). One key goal of the Catalyst project was to design metrics that could capture and represent aspects of the conversation’s structural quality, to assist harvesters and moderators. Many such metrics, alerts and visualizations were developed in the course of the project, but initial user testing has shown that users find it difficult to interpret abstract signals. Following that, we have both introduced new analytics that we felt could be more directly useful, and improved the representation of existing ones. We attempted to test those later refinements, but could not do it with large communities, as planned. Instead, we evaluated their usefulness in a smaller conversation and in experimental settings. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 5 of 42 Introduction Discussion analytics aim to represent patterns in the conversation processes that might be indicative of various community functions or dysfunctions. The CATALYST analytics server is a free open-source web service that deliberation platforms can use to address these important challenges. It provides a substantive library of analytics for assessing crowd-authored deliberation maps. These analytics take two forms: metrics and alerts: Figure 1 - Metric and alert server data flow The role of metrics is to provide summary overviews of the status of the deliberation, highlighting phenomena (such as controversy hot spots or balkanization) that would be difficult to identify simply by browsing the map on a post-by-post basis. Visualization based on the metrics can enable community managers (moderators and harvesters) to monitor the health of the conversation, and by extension, they can also make participants in general aware of conversation patterns. Some of those metrics can be developed into alerts, which will signal to participants that some of the metrics’ scores have passed certain thresholds that require attention, and also provide user-specific notifications of what elements of the deliberation map a participant should consider in order to maximize their contribution. In particular, analytics and alerts allow the following: Help community managers monitor patterns of participation (either decaying or balkanized participation); Help participants find topics and other participants that could interest them; Help harvesters find emerging topics; Help moderators find various communication dysfunctions. The analytics implemented in this server were identified based on systematic analysis of deliberation needs and challenges, prioritized by CATALYST team members using a crowd-sourced online tool, implemented using state-of-the-art data mining techniques such as social network analysis, dimensionality reduction, and linguistic corpus analysis, and evaluated using a range of empirical tests. We have integrated those tools into our deliberation platforms, and they were put in the hands of community managers and participants. From there, we gathered their responses to the visualizations and alerts at their disposal. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 6 of 42 1. Analytics and alerts implemented 1.1 Metrics The role of metrics is to provide easy-to-understand summaries of different aspects of the health of a deliberation engagement. While in some cases the metrics are single numbers that can be presented as is, in many cases the metrics return a table of numbers that need to be rendered using a simple visualization, such as a scatter plot, in order to make their message clear. The visualizations developed by the CATALYST project are described in a separate part of the CATALYST final report, and recapitulated below. The following table summarizes the metrics implemented in the CATALYST analytics server. The table gives the name of each metric and describes how the metric works: Table 1. List of metrics Name Description branchsize Returns the product of the average width and depth for the posts in a branch. controversy Returns the controversy score for every post in a given branch, based on the ratings the posts got. user activity Returns data about the user activity to date for a given branch. interest by user Measures the level of interest for each user in each post. interest by community Measures the level of community interest for each post in the map. interest inequality Measures to what extent community interest been applied unequally to the children of each post, where 0 = fully equal, 1 = all discussion is on single child. No data is given for nodes with zero or one children, since the inequality is zero, by definition, in those cases. interest space dimensions Defines the topics in the discussion, where a topic is a set of posts that all tend to be looked at together. The tweight (a number ) specifies how useful the topic is for clustering user activity. The pweights specify how important the corresponding posts are in each topic. To visualize this, you can display a version of the argument map which shows only the high-weighted posts for a given topic, with a font size that is proportional to the pweight. interest space post coordinates Gives how active each post is in each topic. People tend to be interested in one post tend to also be interested in other posts with similar coordinates. interest space post clusters Returns clusters of posts that tend to be looked at together. This metric uses singular vector decomposition to find posts whose activity is correlated. So, for example, we may find people who pay a lot of attention to the posts on stock markets also typically pay a lot of attention to the posts on banking. Posts with highly correlated activity can then be said to represent a “theme” or “topic” family. In the example above, the theme would be something like “finance”. So, this metric looks for topics, and tells you all the posts in that topic. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 7 of 42 This can be used to give recommendations (e.g. look at the other posts in a cluster if you looked at one of the posts) as well as to reveal dependencies across issues in a map (i.e. if posts from different issue branches have correlated activity). interest space post clustering This measures how much clustering occurs in the post, on a scale from 0 (no clustering) to 1 (clear distinct clusters). interest space user coordinates Gives the degree of interest each user has in each topic. Users with similar coordinates tend to have similar interests. interest space user clusters Identifies clusters of users who are interested in the same topics.This metric uses singular vector decomposition. It looks for correlations in what users attend to e.g. when people who look at the posts on the stock market almost always look at the posts on banking. The posts whose activity is highly correlated can then be viewed as representing a “theme” e.g. a “financial” theme, in the example above. Let’s say our map ends up having two major themes: posts about finances, and posts about sports. And we classify people by their activity on each theme. We may find clusters e.g. a group of people who are interested in sports but not finance, and another group that is the opposite. Each cluster is a “community”. So, the short answer is that a community is a group of users that have similar interests. interest space user clustering Returns a value, from 0 to 1, that measures how much clustering occurs in the users.A high degree of clustering means that there are distinct groups (i.e. balkanization). interest narrowing Specifies how quickly (over time) the attention focused on a subset of the children of a post. support by user Returns the rating the users gave each post, on a scale of 1 (low) to 5 (high). support by community Measures the level of community support for the posts in a map, factoring in support for the posts below it, on a scale of 1 (low) to 5 (high). ● an issue's importance is the average of it's own ratings and those of its sub-issues ● an idea's promise is the average of it's own ratings and those of its sub-ideas and arguments ● an argument's convincing-ness is the average of it's own ratings and those of its arguments support inequality Measures to what extent the community support is unequal forthe ideas under an issue, where 0 = fully equal, 1 = all discussion is on single child. NB: data is only given for issues with ideas attached. Calculated using a gini coefficient. support space dimensions Describes the biases in the discussion, where a bias = a set of posts that tend to be liked together. The bweight (a number) specifies how useful the bias is for clustering user ratings. The pweights specify the importance of the most important posts in each bias. If the branch root is not given, this metric looks at the entire discussion. To visualize this, you can display a version of the argument map which shows only the high-weighted posts for a given topic, with a font size that is proportional to the weight. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 8 of 42 support space post coordinates Gives the support space coordinates for all the ideas in the given branch. A bias is a set of posts whose ratings tend to be correlated. For example, if people who like solar also tend to like wind power, this would represent a “bias” for both forms of renewable energy. People who like one post tend to also like other posts with similar support space coordinates. If the branch root is not given, this metric looks at the entire discussion. support space post clusters Returns clusters of posts that tend to be liked together. support space post clustering Returns a value, from 0 to 1, that measures how much clustering occurs in the support space for posts. support space user coordinates Gives a user’s position in the support space. Users with similar coordinates have similar biases. A highly balkanized or polarized discussion will result in distinct clusters of users in the support space. support space user clusters Returns clusters of users with similar biases. support space user clustering Returns a value, from 0 to 1, that measures how much clustering occurs in users based on their biases. map topology Gives topology statistics for posts in the map. If “root” is not specified, it looks at the entire map. The fields include: type-issue | idea | pro | con numposts-number of posts in the branch under the post numissues-number of issues in the branch numideas-number of ideas in the branch numpros-number of pro arguments in the branch numcons-number of con arguments in the branch mindepth-minimum depth of posts in the branch maxdepth-maximum depth of posts in the branch avgdepth-average depth of posts in the branch stdevdepth-standard deviation of posts in the branch minbreadth-minimum breadth (# of children) in the branch (excludes leaves, whose breadth is 0) maxbreadth-maximum breadth (# of children) in the branch (excludes leaves, whose breadth is 0) avgbreadth-average breadth (# of children) in the branch (excludes leaves, whose breadth is 0) stdevbreadth-standard deviation of breadth (# of children) in the branch (excludes leaves, whose breadth is 0) only for issue or idea posts minsdepth-minimum depth of “skeleton” (issue and idea) posts in the branch maxsdepth-maximum depth of “skeleton” (issue and idea) posts in the branch avgsdepth-average depth of “skeleton” (issue and idea) posts in the branch stdevsdepth-standard deviation of depth of “skeleton” (issue and idea) posts in the branch D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 9 of 42 minsbreadth- minimum breadth of skeleton posts in the branch (excludes leaves, whose breadth is 0) maxsbreadth- maximum breadth of skeleton posts in the branch (excludes leaves, whose breadth is 0) avgsbreadth- average breadth of skeleton posts in the branch (excludes leaves, whose breadth is 0) stdevsbreadth- standard deviation of breadth of skeleton posts in the branch (excludes leaves, whose breadth is 0) -only for idea posts minadepth- minimum depth of argument (pro con) tree under the idea maxadepth- maximum depth of argument (pro con) tree under the idea avgadepth- average depth of argument (pro con) tree under the idea stdevadepth- standard deviation of depth of argument (pro con) tree under the idea minabreadth- minimum breadth of argument (pro con) tree under the idea (excludes leaves, whose breadth is 0) maxabreadth- maximum breadth of argument (pro con) tree under the idea (excludes leaves, whose breadth is 0) avgabreadth- average breadth of argument (pro con) tree under the idea (excludes leaves, whose breadth is 0) stdevabreadth-standard deviation of breadth of argument (pro con) tree under the idea (excludes leaves) nsupports- number of arguments that support the idea (e.g. pros, cons of cons …) nauthorsupports-number of supporting arguments created by the author of the idea preachiness-nauthorsupports/nsupports: the fraction of supporting arguments that come from the idea author expertise Specifies the average rating for the posts a user has contributed in a given topic. 0 means the user didn’t contribute anything to that topic. controversy Specifies how controversial the discussion for an issue is, ranging from 0 (low controversy) to 1 (highly controversial). maturity Specifies how mature the discussion for an issue is. This can be estimated easily, in deliberation maps, by gathering statistics on the topology of the branch (e.g. in terms of breadth and depth of issues, ideas and arguments) for that topic. The following issue, for example, probably represents a relatively immature discussion, because it includes few ideas and arguments: The following issue, by contrast, represents a more mature discussion: D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 10 of 42 agreement This returns a table, for each issue, where the rows and columns represent users, and the cells represent how much each user pair agrees about the best ideas for the issue. This could be visualized as a force-directed graph where nodes = users, where agree and disagree links have different colors, and where users that agree with each other are placed close to each other and far from those they disagree with. Balkanization and polarization would show up as strong clustering in the graph. support consistency Measures to what extent an idea’s average rating is consistent with the ratings for the underlying arguments. This can be done for the ratings from an individual user, for a group of users, or for all users. social graph Returns a graph showing which users have interacted (i.e. have rated, commented on, responded to, or edited posts created by the other user). “linktype” currently just gives the number of links between the two users. groupthink Returns an estimate of the level of groupthink in the deliberation for a given issue. groupthink occurs when a crowd converges prematurely on a given (often the first) solution idea, without giving adequate attention to competing ideas. This can be detected at how quickly the Gini coefficient (which measures inequality) increases for the ideas addressing an issue in the deliberation map. Consider the following examples. In the first example, the participant's activity related to each competing idea (rendered at each time period as the size of the idea icon): D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 11 of 42 In this case, the gini coefficient declines slowly over time, indicating that the community considers a range of options for a while. In the second example, one idea almost immediately becomes the sole focus of the community's interest, resulting in a gini coefficient that drops very rapidly. 1.2 Alerts The role of alerts is to inform participants of opportunities that maximize their ability to contribute positively to a deliberation. An alert can point users to map posts they should view, as well as to other users they should be aware of. Alerts do so using information about the deliberation map as well as user roles and activity, thereby building a model of user interests and expertise D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 12 of 42 as well as of the deliberation maps’ strengths and gaps. A matchmaking procedure then points users to the parts of the deliberation where they can do the most good. Each alert is a kind of “daemon” that continually scans the deliberation map for instances that trigger it. Every alert has a “strength” value, representing how strongly it was triggered. Alert triggers are specified using a pattern query language (Klein and Bernstein, 2004) developed by a CATALYST team member. In addition to looking for such "local" data patterns as "user X rated post Y without viewing underlying argument Z", alerts can also look for extreme values in the summary metrics described in the previous section. Posts in a deliberation map may trigger multiple alerts. The deliberation system can therefore highlight, for each user, the posts that have multiple highly-weighted alerts associated with them e.g. using the following kind of user interface: Figure 2 - UI for alerts in tree view The following table summarizes the alerts implemented in the current version of the CATALYST analytics server. The table gives the name of each alert, indicates the information flow (i.e. what kind of entity is proposed to what kind of user1), and describes how the alert works. The currently implemented alerts include: Table 2. List of alerts Name Flow Description unseen by me post -> author The author has not yet viewed the post response to me post -> author The post is a response by someone else to a post created by the author. unrated by me post -> author The author has not yet rated this post. lurking user user ->moderator The user has not edited or created any posts 1 We distinguish three types of user role: author (responsible for contributing and rating the issues ideas and arguments that make up a deliberation map), moderator (responsible for ensuring a deliberation engagement achieves useful results), and customer (responsible for defining the aims of the deliberation as well as for using its results). D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 13 of 42 ignored post post -> author The post has not been viewed by anyone but original author mature issue post -> customer The issue has>= 3 ideas with at least one argument each. The "strength" of the alert is a function of the total size of the deliberation map branch under that issue. This alert can help customers find the parts of the deliberation map that are mature and thus ready to support a decision process. immature issue post -> moderator, author The inverse of “mature issue”; i.e. the post is an issue which does NOT have at least three ideas with at least one argument each. well evaluated idea post -> moderator, customer An idea post has several pros and cons, including some rebuttals poorly evaluated idea post -> author The inverse of “well evaluated idea”; an idea post has few pros and cons, and no rebuttals inactive user post -> moderator The user has been inactive in the deliberation. interesting to me post -> author The post should interest a user, because it is close, in the deliberation map, to posts the author attended to in the past. interesting to people like me post -> customer author, This post was viewed by people whose interests are similar to the user. This alert uses SVD (see section 5) to produce a lowdimensional rendition of what posts users attend to, and then uses cosine similarity to identify posts that are close in that space to posts that the user has already seen. The weight for this trigger is an inverse function of the cosine distance between a proposed post and posts that the user has already seen. supported by people like me post -> customer author, This post was highly rated by people whose opinions are similar to the user. This operates the same as the "interesting to people like me" metric, only it is applied to rating scores rather than activity scores. hot post post -> author This post is in the top quartile of the most active posts for the last 24 hours. orphaned idea post -> author This idea post is receiving substantially less activity than it’s siblings. winning idea post -> author This idea is receiving the predominance of community support for a given issue. NB: if all the ideas have a high support score, there is no clear winner so this alert will not fire. contentious issue post -> author This is an issue with ideas that the community ratings are strongly divided over. controversial idea post -> author The community has widely divergent opinions (as reflected by their ratings) of whether an idea is a good one or not. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 14 of 42 inconsistent support post -> author This is an idea where the user's support for the idea and for it's underlying arguments are inconsistent: propagating the support values from the arguments to the idea produce an inferred rating that is quite different than the rating the user gave the post. This alert can be symptomatic either of an ill-considered rating, or of the user evaluating an idea based on arguments that he/she has not rated or added to the map yet. person with interests like mine user -> customer author, Identifies a user who has similar interests to me, based on activity patterns. The strength score reflects closeness of match. person who agrees with me user -> customer author, Identifies a user who has similar opinions to mine, based on rating patterns. The strength score reflects closeness of match. user gone inactive user -> moderator Identifies a user who was initially active, but have been inactive for at least a duration specified in the analytics request. rating ignored argument post -> author Identifies a relevant argument that the user did not view before rating a post. Clearly, a careful rating for a post should take into account the arguments for and against it. rating ignored competitor post -> author Identifies a competing idea (i.e. that responds to the same issue) that the user did not view before rating a post. This can be valuable because the scale on which we rate ideas will often be influenced by the overall strength of the competing ideas. unseen response post -> author Identifies a response authored by someone else to a post I authored. unseen competitor post -> author Identifies an idea authored by someone else that competes (by virtue of responding to the same issue) with an idea I authored. user ignored competitors user -> author, moderator Identifies a user who ignored competitors to their ideas. user ignored arguments user -> author, moderator Identifies a user who ignored underlying arguments when rating posts. user ignored responses user -> author, moderator Identifies a user who ignored responses authored by other people to their posts. In addition, the alert definition language makes it easy to define alerts that look for extreme values in summary metrics e.g. an issue where the balkanization metric value is high. 1.3 Analytics Identification Methodology The analytics we implemented in the CATALYST server were identified using a systematic methodology, known as process-goalexception analysis, that was developed by a member of the CATALYST team (Klein, 2012). The key idea is that analytics can be viewed as the tools we use to identify how well a process is achieving (or failing to achieve) its goals. In this method, we thus first define a normative model of the target process and what goals would ideally be achieved for each process step. For each goal, we D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 15 of 42 then identify analytics for measuring its success, as well as how that goal can be violated (the exceptions). For each exception, finally, we identify analytics for assessing when an exception is taking place. Each metric can be annotated with the kind of user that would be interested in that kind of information. Identify normative process model: The first step is to identify a model of how the target process should work. The core process supported by the CATALYST ecology is "social innovation: i.e. crowd-based solution identification. Our model of this process includes the following subtasks: Social Innovation Process Identify problems to solve Identify possible solutions for these problems Evaluate the solutions candidate Select the best solution(s) from amongst the candidates Identify goals: The next step is to identify what each task in the process should ideally achieve: its' goals. Our current model of the social innovation process includes the following goals: Figure 3 - Selection of goals for alerts = process,= goal. A social innovation process should, for example, use a good process (i.e. where the right people contribute actively and effectively to performing the most critical tasks) to achieve good results (i.e. complete, high-quality, well-organized content) while also strengthening and learning about the members of the user community. Identify exceptions: For each goal, we then identify how it can be violated (the exceptions). The goal of having the right participants involved, for example, can have the following exceptions: D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 16 of 42 Figure 4 - Selection of exceptions for alerts = exception= metric process= handler process We can have too few authors, for example, or inadequate diversity in the author population. Identify Analytics: For each exception, finally, we identify analytics that can detect when the exception is taking place. Subsequent to this analysis, we developed a purpose-built online system to collect feedback from the CATALYST technical and evaluation partners to prioritize which analytics get implemented: Figure 5 - Alert selection system D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 17 of 42 The results of this prioritization exercise guided subsequent development of the analytics server metrics and alerts library. 1.4 Visual analytics based on attention metrics The CI analytics visualisations developed during CATALYST provide visual representations of conversations. Each visualisation has an analytical focus and the visualisations provide means to analyse a specific facet of a conversation. The CI Dashboard (https://cidashboard.net) contains all visualisations developed within the CATALYST project. Deliverable D3.9 (Liddo & Bachler, 2014) described the dashboard architecture and the first set of visualisations. Deliverable D4.6 (Ullmann, Liddo, & Bachler, 2014) reports on several additional visualisations and their evaluation. The dashboard has been extended with several visualisations since then. The CI Dashboard makes use of the CI metrics server. Deliverable D3.5 (Klein, 2014) describes metrics provided by the server. These metrics have been extended since then, as described above. The CI Dashboard provides visualisations for these metrics. The following sections highlight several visualisations of the CI Dashboard that make use of CI metrics delivered by the metrics server. Each section describes a particular metric and the chosen corresponding visualisation. It outlines the design and the rationale of each visualisation. 1.4.1 Activity bias visualisation The activity bias visualisation introduced in deliverable D4.6 (Ullmann, Liddo, & Bachler, 2014) shows contributions plotted on a xyplot (scatter plot). The visualisation aims to make visible the existence of biased activity patterns within conversation. A bias may exists if a conversation exhibits several distinct activity patterns. Figure 6 shows the activity bias visualisation. Each dot represents one contribution or several contributions in case they lie on the same coordinates. Figure 6 - Attention bias visualisation The position of a contribution within the plot is determined by a metric of the UZH metrics server (see deliverable D3.5 (Klein, 2014) for more information about the metrics server). The underlying metric of this visualisation is based on singular value decomposition (SVD) (Golub & Kahan, 1965). Singular value decomposition is amongst others a popular technique for dimension reduction. For this visualisation the metric server calculates the singular value decomposition by taking as input contributions and their activity (e.g. viewing, editing, updating) and returns coordinates for each contribution. The visualisation shows the first two dimensions of the n-dimensional space returned by the metric server, which are also the most important ones. A scatter plot visualisation is a common way to represent the results of a SVD. Clusters or groupings of conversations shown in the visualisation represent conversations that are similar according to the singular value decomposition. Their coordinates are close to each other. A visualisation showing several clusters represent a situation where contributions are similar within a particular clusters but different from other clusters. Each cluster can be seen as a D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 18 of 42 representation of a distinct pattern. If the visualisation shows more than one cluster than the used contribution data contain several different patterns. This may indicate that the contributions are biased regarding their activity pattern, meaning that there exists a set of contribution clusters that all have a different activity pattern. The activity bias visualisation aims at helping to determine if there are different patters in the underlying data. SVD, however, does not specify what the pattern is. The main function of this visualisation is to provide means to reveal the existence of patterns of conversations. Being aware of contributions having different activity patterns is important as it may lead to further exploration of the data in order to determine reasons of contributions having different activity patterns. 1.4.2 Rating bias visualisation Similar to the activity bias visualisation, the rating bias visualisation is based on SVD. The metric takes into account all contributions and their ratings instead of contribution activity as for the activity bias visualisation. As the activity and rating bias visualisation are similar, the description of the underlying SVD can be found in the section about the activity bias visualisation. The visualisation represents the first two dimensions returned by the metric in a scatterplot visualisation. Figure 7 shows the rating bias visualisation. Figure 7 - Rating bias visualisation Each dot represents a contribution. The coordinates of each dot are determined by the metric. A cluster indicates that the contributions within the cluster had a similar rating pattern. If the visualisation shows several clusters than this is an indicator that not all contributions were rated in a similar way. The rating behaviour of the users might have been biased as they rated contributions with different rating patterns. 1.4.3 Attention map visualisation The attention map visualisation shows how equal or unequal attention is given to conversation threads of a conversation topic. The attention map visualisation shows an entire conversation as nested circles of contributions. The colouring of the circles indicate the equality or inequality of contributions by analysing the sub-parts of which a contribution is made of. The attention map visualisation uses the same visual principles of the 'conversation nesting visualisation' described in Deliverable D4.6 (Ullmann, Liddo, & Bachler, 2014). A conversation is understood as a hierarchy of contributions types, for example a group consists of issues, an issue consists of ideas, and idea can have supporting or counter arguments). A circle represents one of these contribution types, circles inside the contribution represent the sub-types of this contribution. Figure 8 shows an example of the attention map visualisation. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 19 of 42 Figure 8 - Attention map visualisation The colour of the circles is determined by a metric of the UZH metric server, which calculates the equality or inequality of a contribution based on its sub-contributions. The colour spectrum ranges from blue to red. Circles representing contributions in the blue spectrum are generally equally supported, while circles in the red spectrum represent contributions that are unequally supported. The metric made available by the metric server is based on the Gini coefficient. The metric server returns for each contribution a value between 0 and 1 (from equal to unequal), which is mapped to the colour range of the circles of the visualisation. As the visualisation shows the whole conversation in one picture, the visualisation makes it easy to spot areas that are equally supported or less equally supported. The interaction with the visualisation allows zooming into each circle allowing the close inspection of the contributions within a contribution. 1.4.4 Community interest network visualisation The community interest network visualisation shows the interest of community participants in form of a network visualisation. The visualisation shows the whole conversation in form a network graph. Each node represents a contribution and the link between the nodes expresses that both note are in a relation to each other. For example, a node can be an issue and another node can be an idea. If the idea is part of the issue, then a link is drawn between both nodes. Figure 9 shows an example of the community interest network visualisation. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 20 of 42 Figure 9 - Community interest visualisation The interest of the community is depicted by the colouring of the nodes and the size of the nodes. The more red and large the node, the more interest received by the particular contribution represented by the node. The more blue and small the node, the less interest received by the contribution. 1.4.5 Sub-communities network visualisation Similar to the community interest network visualisation, the sub-communities network visualisation shows a whole conversation in form of a network. Nodes are represented as coloured shapes. Figure 10 shows an example of the sub-communities network visualisation. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 21 of 42 Figure 10 - Sub-communities network visualisation Shapes that are the same represent similar contributions. The similarity is determined by a metric of the metrics server. Given a conversation, the metric returns clusters of contributions that tend to be looked at together. This metric uses singular value decomposition to find contributions whose activity is correlated. So, for example, we may find people who pay a lot of attention to the posts on stock markets also typically pay a lot of attention to the posts on banking. Posts with highly correlated activity can then be said to represent a 'theme' or 'topic' family. In the example above, the theme family would be something like 'finance'. So, this metric looks for topics, and tells you all the posts in that topic. This approach can be also used to give recommendations (e.g. look at the other contributions in a cluster if you looked at one of the posts) as well as to reveal dependencies across issues in a map (i.e. if contributions from different issue branches have correlated activity). Note that those attention-based topic families are distinct from the semantic clustering described later. 1.5 Alerts and user feedback interface in LiteMap and DebateHub The whole list of alerts implemented from the analytics in the Catalyst project can be found in the CI Dashboard website at: https://cidashboard.net/#alert (See Figure below). The CI Dashboard interface consists of three main components: 1. 2. 3. A top part where the data is provided to the CI dashboard service (basically URLs of CIF formatted data of the debate to which Users want to apply the Alerts); A central right part, where users can select the type of alerts they want to be calculated by the CI dashboard service; A central left part where an example/preview of the Alerts widget is shown together with a list of information about dependency of each alert form specific data entry to be provided. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 22 of 42 Figure 11 - CI Dashboard alerts interface These alerts have been added to DebateHub and LiteMap in different clusters and with two different interfaces. 1.5.1 Alerts in LiteMap Alerts can be classified in several ways besides the thematic distinctions above. In LiteMap we distinguish alerts between Personalized Alerts which mainly consist of alerts that are tailored to the logged in users and take into consideration their contributions and activities; and General Alerts, which are visible by all users at any point and do not require them to be logged in. These alerts are based on the overall debate data (contributions, users’ activities, debate structure etc.). The General Alerts are the following (taken from the list of alerts above): ignored post; mature issue; immature issue; hot post; orphaned idea; emerging winner; contentious issue; controversial idea. The Personalized Alerts are extra alerts that logged-in users can see, and are built on logged in user’s personal data. They are: unseen by me; response to my post; interesting to me. 1.5.1.1 LiteMap’s alert interface D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 23 of 42 In Litemap Alerts are provided in the Alert sidebar which can be activated or deactivated on demand, and is displayed on the right side of the argument map. “A thumb up” icon was added to the left of each alert for users to provide quick feedback on the usefulness of the alerts. Figure 12 - LiteMap alerts interface 1.5.1.2 User feedback Interface By rolling over the thumb up icon a pop up message appears which says “Click if you found this alert useful”. If the user decides to click, a fading message is displayed which says: “Thanks for your feedback”. 1.5.2 Alerts in DebateHub In DebateHub there are not general alerts, which is to say that no alerts are displayed to users who are not logged in. Debatehub has special moderator features available only to Debates’ Administrators. Therefore, for logged in users, alerts are divided into Moderator Alerts and Personalized alerts. Moderator’s alerts are alerts built on general debate data, and which can help moderators to better manage the debate: for example by pointing users’ attention on immature or orphaned ideas, attracting lurkers to contribute, managing contentious issues etc. Moderators can act upon the alerts suggestion by leaving “moderator’s comments” or by merging or splitting contributions. Moderator Alerts are the following: Lurking user; ignored post; mature issue; immature issue; hot post; orphaned idea; emerging winner; contentious issue; controversial idea. Personalized Alerts are alerts that logged-in users can see, and are built on their personal data. They show user-centric data: where their interests lay, attracting their attention to things they may have not seen or may be interested to see, and pointing out which D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 24 of 42 contributions have been made by the community in response to their contribution. They are the following: unseen by me; response to my post; unrated by me; interesting to me. Figure 13 - DebateHub alerts interface In DebateHub Moderator alerts are displayed on the top right just below the moderators pictures list. Additionally, logged-in users (both debate participants and moderators) can see their alerts displayed at the bottom of the right sidebar under the My Alerts bar (See image below). When clicking on one alert, the page scrolls down to the contribution which the alerts is pointing at, and the contribution is highlighted in yellow for a couple of seconds before fading away. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 25 of 42 1.6 Semantic clusters Whereas LiteMap and DebateHub focus on structured conversation, Assembl focuses on progressive structuration through harvesting. For that reason, deliberation analytics using deliberation structure are less relevant, and the focus was put instead on metrics useful to harvesters. I4P and UZH developed an analytics that detects clusters of related posts, based on latent semantic indexing2, with basic linguistic pre-treatment3. We first experimented with the DBScan clustering algorithm 4 on the resulting semantic data, but later replaced it with Optics5 clustering (Ankerst et al. 1999). This development of those analytics was done as part of the community test with the 7th Sustainable Summer School. The goal was to give harvesters access to the semantic neighbourhood of a given message, allowed them to understand which messages of the discussion were covering similar topics, and to use cluster detection to propose grouping related messages under a topic. 2. Community testing of visual analytics Several of the now existing visualisation that can be found on the CI Dashboard (http://cidashboard.net) have been evaluated in deliverable D4.6 (Ullmann, Liddo, & Bachler, 2014), either in the context of a field study examining the participants of the 'Design community 2014' or of lab study. In total 10 visualisations have been evaluated regarding their usefulness and usability as well as regarding users' task performance. The detailed description of each of the visualisation, the evaluation design, and evaluation results can be found in D4.6. The following paragraphs summarise some of these experiences made during the evaluation. The evaluation presented here is based on two studies. The first study was conducted as a field experiment in the open, why the second experiment took place in a usability lab. The participants of the field experiment were group members of the DebateHub 6 'Design community 2014' group7. This group used DebateHub to discuss group issues, to come up with ideas and to weight the ideas with supporting or counter arguments. The group members have been invited to participate in the evaluation via Email. Participation was voluntarily. The lab experiment participants voluntarily agreed to take part in the evaluation study. They followed the advertisement of the evaluation study distributed through several channels of the Open University. The general setup for both studies was the same, while the concrete implementation differed in order to adapt to the context of the study. The building blocks of the general setup consisted of a background questionnaire, an introduction to the scenario, a phase where the participants explored the visualisation on their own time, a task, and a questionnaire to evaluate the usability of the visualisation. The participants of the field experiment received an email with links to a questionnaire8 for each visualisation. The questionnaire guided the participants through all building blocks of the evaluation. The facilitator of the study guided the lab participants. They were asked to fill out the background questionnaire, they were verbally introduced to the scenario, had time to explore the visualisation on their own, they were asked the same task questions, and they had to fill out the same usability questionnaire. 2 Using gensim, https://radimrehurek.com/gensim/ Multi-lingual stemming with snowball: http://snowball.tartarus.org/, phrase detection and Tfidf normalization with gensim. 4 Using the implementation from scikit-learn (Pedregosa et al., 2011) 5 Our implementation, based on work by Brian Clowers: http://www.chemometria.us.edu.pl/download/optics.py 6 http://debatehub.net/ 7 Data Available at: http://debatehub.net/group.php?groupid=9811386440502935001409871956 8 Created with the maQ-online questionnaire generator. Developed by Ullmann, T. D. (2004). maQ-Fragebogengenerator. Make a Questionnaire. Available online at: http://maq-online.de 3 D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 26 of 42 The scenario consisted of asking the participants to imagine that they are tasked to make sense of large online conversations. These conversations are large enough that manual inspection of the contributions is neither reliable nor effective. Instead participants should use the analytics visualisations to make sense of the conversation. Each visualisation contained a small task. Participants had to answer three to four questions for each visualisation. The participants could find all solutions to the task by using the visualisation. For example, two of the four questions for the user activity analysis visualisation were: 'How many times did the most active user contribute to the debate?' and 'How many counter arguments have been made in the whole debate?’ The data for the visualisations were taken from the CIF file generated from the conversation of the 'Design community 2014' group. The data were the same for both groups. 2.1 Evaluation The evaluation presents the background information of the participants, their task performance, and the usability scores for each visualisation. The field experiment participants as well as the lab participants could stop the task anytime. In the case of the field experiments, the participants were not required to fill out all questionnaires. During the lab study on average two visualisations were evaluated per participant. 2.1.1 Background information Field experiment: On average 7.4, mostly female, participants filled out the questionnaire for each visualisation. Most of them visited DebateHub between two to ten times. Most participants made one contribution, and a few made more than 10 contributions. Analytics dashboards were mostly new to them; they had slightly more familiarity with visualisations to explore data. Lab experiment: 12 participants evaluated the visualisations (5 female and 7 male). Each visualisation was rated by five of them. Their familiarity with analytics dashboards ranged from novice to advanced, they were mostly novices with visualisations to explore data, but also all levels of familiarity (from novice to expert) were present. 2.1.2 Task performance Table 1: Task performance shows the performance of the participants on the tasks for each visualisation. It shows the number of participants (N), the number of questions (Questions), and the percentage of correct answers. For example, the field experiment group answered in 90% of the cases correctly. Out of 40 answers 4 were answered incorrectly. The lowest amount of correct answers had the lab group for the debate network visualisation. 10 out of 15 answers were correctly answered. Table 3. Task performance Field Visualisation Questions N Lab Per cent N Per cent Quick overview 4 10 90 5 100 Debate network vis. 3 6 72 5 67 Conversation nesting 3 7 100 5 87 Activity analysis 4 4 75 5 80 User activity analysis 4 6 75 5 90 2.1.3 Usability The usability of the visualisations was measured with the SUS usability questionnaire (Bangor et al. 2008, 2009; Brooke 2013). Table 2: Usability shows the results of the usability questionnaire. The table shows the calculated SUS usability indices and the average of the ratings to the question 'Overall, I would rate the user-friendliness of this visualisation as [worst, awful, poor, ok, good, excellent, best]'. The usability of all visualisations has been rated between ok and excellent. The participants of the field experiment D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 27 of 42 rated most visualisations as good, with the exception of the quick overview visualisation, which was rated as ok. Similarly was the lab group, which rated all but the debate network visualisation as good. Table 4. Usability Field experiment Visualisation N SUS Over. Lab experiment N SUS Over. Quick overview 9 57.50 4.22 5 86.0 5.2 Debate network vis. 6 67.08 4.50 5 68.0 4.4 Conversation nesting 6 81.67 5.83 5 78.5 5.4 Activity analysis 4 53.75 4.50 5 79.5 5.4 User activity analysis 6 67.08 4.67 5 71.0 5.2 2.1.4 Discussion Overall, the participants performed well on the task independently from the two testing conditions (in the wild and in the lab). This is an encouraging finding, considering that most of the participants were novices regarding analytics dashboards and visualisations to explore data. Task performance differed. Both groups performed very well on the task of the quick overview visualisation, and both groups made mistakes in less or equal to one third of the questions of the debate network visualisation. More or equal to 75% of the questions got answered correctly for all other visualisations. Most visualisations were rated as having a good usability. Overall, the lab participants rated the usability as higher than the field experiment group. Differences have been found in the quick overview visualisation and the activity analysis visualisation, which was found more usable in the lab group. All other three visualisations have been on the same level. The participants had more problems with answering the tasks for the debate network visualisation. The low ratings, relative to the other ratings, of the usability may indicate usability issues that need to be followed up. Unfortunately, this testing was done in an early phase of the project, and we did not get to test the more complex visualizations and metrics. Some of the visualizations, such as activity bias and rating bias, depend on somewhat complex calculations, and we have found it difficult to convey to users what the analytics mean in terms of the discussion. Without formal testing, users who have seen those visualizations found them difficult to understand. Attempts to explain the underlying principles were not too successful, but on the other hand, users were also perplexed by a naked score extracted from the metric without the context of where the score came from. We now believe that the best solution is to translate the metric’s signal back into the basic entities of the deliberation; and we are working on some visualizations based on this (much more demanding) approach, one example of which is presented below. 3. Community testing of alerts By the time the alerts were ready for testing, we had two community tests scheduled: Loomio and the Seventh Sustainable Summer School. 3.1 Loomio test of harvester alerts The Loomio testing was designed to test the use of Alerts, built from deliberation analytics, in the context of the LiteMap tool. A series of meetings were first organized to figure out what type of discussion topics and testing structure would fit both research and community interest. Research questions: Does the use of Litemap improve and stimulate existing online discussions? D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 28 of 42 Does the alert functionality of Litemap stimulate more/better mapping activity? 3.1.1 Experimental design In the end we agreed with Loomio on the following AB testing structure: 34 Litemap participants were split into two groups - A and B; Two separate Litemap instances for A and B were created with alert and alert-less functionality respectively. A Loomio group was created where participants could ask questions about the project and give feedback on Litemap as the study progresses. Instructions and outline of the test was provided to this group. Both groups were also emailed the instructions in the form of a link to the video demo How to Harvest Online Content with Litemap9, and were invited to create a Litemap account, and join the Loomio group. Group A was invited to harvest content on the theme of “Business models for open-source social impact ventures”, and provided a link to the Long-term financial sustainability of Loomio10 discussion as an example of content and other resources elsewhere (forthcoming). Group B was invited to harvest content on the theme of “Role of online democracy tools in government”. A link was provided to the Political use of Loomio11 discussion as an example of content and other resources elsewhere. After that, there was to be a two week phase where comments would be posted in each of these discussions, informing participants about the mapping of their discussion (and the general topic), and inviting existing discussants to participate in the harvesting by joining the study’s Loomio group and following the instructions (above). Then the harvesting phase would begin. When the map would reach a certain size threshold, or a specified period of time (two weeks) would pass, a link to the map would be posted in each of Loomio discussions, and the discussants would be asked to rate its usefulness. Participants would be encouraged to provide feedback on the tool and study as it unfolds, allowing us to make minor adjustments and clarify as necessary. After the mapping activity would end, or a specified time period would have passed, we would follow up with harvesters and discussants through surveys about their experience. From the comparison of performance and users’ feedback from group A (with alerts) and B (without alerts), we’d be able to establish if the presence of alerts made a difference in the harvesting work. At the same time the answers to the Litemap survey sent to Loomio discussion participants would have provided insights on if/and to what extent LiteMap had improved Loomio’s discussion. 3.1.2 Testbed outcomes Unfortunately the test failed to attract participants, only 4 people of the 34 recruited by Loomio participated to the discussion in the two groups, and only two people created a LiteMap account and contributing only a couple of ideas to the maps. While exploring the reasons for this failure to engage the community we learned that Loomio is a platform used worldwide but does not host large-scale debates. Loomio’s moderator Simon Tegg explained to us that, even in the most successful debates, there is usually a medium of 5-9 participants per conversation. This was not what we had envisioned and what we intended in the Loomio’s open call submission. As a lesson learned for the future, we now know that when recruiting for testing organizations which bring online discussion tools, we should not simply ask communality partners: ‘How many users do you have?’ Or ‘What is the geographical scale of your outreach?’ 9 https://www.youtube.com/watch?v=uBzVLUDIxE8 https://www.loomio.org/d/mUOKA6vc/long-term-financial-sustainability-for-loomio 11 https://www.loomio.org/d/lGSO7guf?page=1 10 D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 29 of 42 Many communities in fact have very wide mailing lists from people all across the globe but are unable to mobilize large numbers of people at once in one discussion. Basically the large-scale users list is simply constituted of widely fragmented small group of people, which do not represent a community and cannot support large-scale discussions. That is why the question to ask to recruit people should rather be: ‘What is the medium number of participants you are able to engage in a single debate on your platform’? In Catalyst we have found out that few community partners can meet this requirement. Another minor problem was worldwide participation. Website access was slow from New Zealand and Australia. Several options have been tried to improve the speed but possible solutions did not depend on LiteMap improvements so in the end we had to cut out participants from that area of the world. In the end we had 34 people out of the initial 47 recruited from Loomio. In any case this caused only a loss of a few participants, while the main issue remained that the 34 recruited people did not even sign up to the website. We therefore explored a way around solution, which allowed us to carry on the interoperability testing of Loomio with other 3 Catalyst tools in the following user scenario: We pick an existing Loomio discussion, which has received enough participation. The Catalyst’s tech team uses Assembl’s Loomio feed import to get Loomio’s posts into Assembl and then use Assembl to mark up and structure key ideas out of the existing Loomio discussion in form of an ideas’ thread Then one harvester works to IBISify the discussion, that is to say, she creates an argument map out of the Assemble ideas’ thread. Then Loomio embed the argument map and other relevant discussion’s visual analytics from the CI Dashboard in the Loomio discussion page. Finally, ask feedback on the discussion’s visualisations and analytics to the Loomio community with a very small survey we sent to them by email. (the survey should aim to answer questions such as: Do they find these visualisations useful, understandable, engaging etc?) Loomio’s was unable to support stages 3 and 4 so the final interoperability test was conducted by the Catalyst tech team and has been showcased in this demo movie: https://www.dropbox.com/s/rivf95cnaz50gs7/LiteMap-Loomio-Assembl-CIDashboardInteroperability_Demo.mov?dl=0 As a consequence of these various shortcomings Loomio’s second grant payment was not granted. 3.2 Seventh Sustainable Summer School This test is described in detail in D4.3. Unlike the Loomio test, the summer school was active, but on a very small scale. For that reason, many alerts that would have been useful in a larger conversation were of limited use in this context, and we had to design new alerts that could be used in the context of the conversation. 3.2.1 Participation-centric analytics In particular, participation-centric alerts designed to alert moderators to participant activity (e.g. the alert that a user has gone inactive) are of limited use to moderators in a very small-scale debate such as that in the school, because moderators are familiar with individual participants and can notice activity patterns directly. Nonetheless, the basic statistics about user activity presented by Assembl were consulted by the moderators, and moderators did say that they would appreciate the more detailed participation analytics and alerts in the context of a larger conversation. There was also a perception that, even if moderators were made aware of users who were not actively participating, it was very difficult to use that information to enjoin them to participate without annoying them. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 30 of 42 For similar scale reason, the alerts telling a participant about other participants with similar activity patterns, or ideas interesting to such similar participants, were seen as potentially of very limited use in a context where participants were familiar with one another and were not pursued. 3.2.2 Semantic clusters and semantic proximity For all those reasons, we developed the clustering metrics described above, under the assumptions that they could be more useful to harvesters in this context, with a good number of posts by very few users. This development was done during the discussion, and was thus only made available near the end of the process. Nonetheless, the late discussion was also when the messages were the most numerous, and when the feature was most needed. Data on semantic neighbourhoods did help harvesters to find related messages elsewhere in the discussion. This was especially true given the structure of the discussion, where the overall discussion was divided in three broad phases: first challenges, then solutions, then sustainable design. The same broad themes were discussed within each phase, but the grouping of the conversation in phases made it particularly useful to examine each theme across the phases through semantic neighbourhoods. While the final synthesis was structured in phases, the content streams of each theme were also presented during a live meeting with the students, and the semantic neighbourhood function allowed to extract those content streams more easily and with better recall. As for the clustering, the first version of the clustering algorithm (with DBScan) did reveal a few subsets of posts that were not reflected in the table of ideas; notably, a subdivision of a discussion around waste at work into paper and electricity waste. However, those posts had been grouped deliberately under a broader heading, and the subdivision, though meaningful, was not deemed essential to the structure of the table of ideas. The optics-based algorithm also identified clusters whose content was mostly contained within an idea, but with a few outlying posts that should probably have been classified with that idea but were not. There is little doubt that this would have allowed more exhaustive harvesting. In other cases, some of clusters correspond to concepts that were already harvested in ideas, and we can surmise that it would have helped the harvesters if the cluster had been found before the idea had been identified. However, because those clustering analytics came quite late, we do not have live examples where the cluster preceded the harvester’s action. Overall, our limited community testing shows a valuable use for semantic neighbourhood search, and our first experiments with automatic clustering show that it can exhibit some meaningful post clusters most of the time. It remains to be proven whether those clusters will actually assist harvesters in populating the table of ideas, but we have every reason to expect cases of this to emerge naturally as we make those features available to harvesters. 4. Post-hoc testing Following the qualified success of one of the tests, and the failure of the second to gather useful data, we decided to do more testing of the analytics in laboratory condition. Our aim was to demonstrate that the metrics could detect useful conditions in real discussion data, even if we could not get a significant community to use those alerts at this stage in the process. We evaluated both the alerts and metrics components of the CATALYST analytics server. 4.1 Alerts The alerts were evaluated by assessing how often clearly dysfunctional patterns (e.g. a user rates a post without viewing the underlying arguments) occur in real-world deliberations. Clearly, if it turns out that such dysfunctional patterns appear very rarely in representative deliberation maps, then the alert is unlikely to add much value to a deliberation. To test this, we selected three representative alerts that seem to represent clearly dysfunctional patterns: D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 31 of 42 · · · Rating ignored argument Rating ignored competitor Unseen response We then assessed how frequently these alerts appeared in an online engagement wherein 189 participants created a deliberation map with nearly 1600 issues, ideas and arguments (on the topic of bio-fuels use in Italy): Figure 14 - How often users ignore arguments when rating posts Roughly 86% of all participants rated at least some posts without viewing relevant arguments, and almost 6% of the participants rated posts without seeing any of the relevant arguments. On participants, users did not view roughly 37% of the relevant arguments when rating posts. This phenomenon could occur either because the participants ignored existing arguments when rating a post, or because the arguments were added after the user rated the post. Remarkably, roughly 96% of the participants did not view at least some competing ideas (i.e. ideas responding to the same issue) when rating an idea. This is potentially important because it is usually important to assess the quality range of a set of ideas before assigning any of them a label such as "very promising" versus "somewhat promising". We also found that participants, on average, did not view the responses to roughly 16% of the posts they authored, and that they miss roughly of these responses all told. This is potentially important because responses to their posts - e.g. rebuttals of their arguments, arguments against their ideas - would be likely to elicit further engagement in the deliberation process. In all these cases, it seems clear that alerting users to view missed posts relevant to their contributions and ratings has the potential for significant impact on the deliberation. 4.2 Metrics and visualization Metrics were evaluated by how assessing to what extent they make it possible to recognize important deliberation patterns that would be otherwise difficult to detect without such metrics e.g. if the user could only browse the deliberation map itself. For this purpose we chose to look at the metrics which allow users to assess to what extent the participant community's preferences concerning an issue are polarized into opposed camps, or not. We started with the following simple deliberation map, consisting of a single issue, two competing ideas, and a total of six arguments: D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 32 of 42 Figure 15 - Simple deliberation map We compared two situations: one where eight simulated users provided uncorrelated (randomly selected) ratings for the posts in the map, and a second where the eight simulated users were divided into two balkanized groups, where one half of the users like all the posts that the other users do not, and vice versa. To keep the example simple, we assume that all users rate posts as either a "1" (strongly dislike) or a "5" (strongly like). In the uncorrelated setting, the ratings from each user, in our example, are as follows: Table 5. Uncorrelated ratings example 5.1.1.1 5.1.1.2 5.1.1.3 u 5.1.1.4 u 5.1.1.5 u 5.1.1.6 u 5.1.1.7 u u 2 3 4 5 6 7 u 8 5.1.1.8 5.1.1.9 5.1.1.10 5 5.1.1.11 1 5.1.1.12 5 5.1.1.13 5 1 1 5.1.1.14 5.1.1.15 5.1.1.16 1 5.1.1.17 1 5.1.1.18 1 5.1.1.19 1 5 1 5.1.1.20 5.1.1.21 5.1.1.22 5 5.1.1.23 5 5.1.1.24 5 5.1.1.25 5 5 1 5.1.1.26 5.1.1.27 5.1.1.28 1 5.1.1.29 5 5.1.1.30 5 5.1.1.31 5 1 5 5.1.1.32 5.1.1.33 5.1.1.34 5 5.1.1.35 5 5.1.1.36 5 5 5 post u1 idea: yes 1 5 pro: carbon offsets do reduce greenhouse gas emissions (if not fraudulent) 1 5 pro: it is getting easier and easier to find good carbon offsets 5 1 pro: Many major meetings are using them 1 1 idea: no 5 5 5 D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 33 of 42 pro: can have unexpected consequences 1 5 pro: it fosters complacency 5 1 pro: it's too easy to cheat 5 5 5.1.1.37 5.1.1.38 5.1.1.39 1 5.1.1.40 5 5.1.1.41 1 5.1.1.42 5 1 1 5.1.1.43 5.1.1.44 5.1.1.45 1 5.1.1.46 1 5.1.1.47 5 5.1.1.48 5 1 1 5.1.1.49 5.1.1.50 5.1.1.51 5 5.1.1.52 1 5.1.1.53 1 5.1.1.54 5 1 1 In the fully balkanized setting, the ratings from each user, in our example, are as follows: Table 6. Balkanized ratings example 6.1.1.16.1.1.2 u6.1.1.3 u6.1.1.4 u6.1.1.5 u6.1.1.6 u6.1.1.7 u 2 3 4 5 6 7 post u1 idea: yes 1 1 pro: carbon offsets do reduce greenhouse gas emissions (if not fraudulent) 1 1 pro: it is getting easier and easier to find good carbon offsets 1 1 pro: Many major meetings are using them 1 1 idea: no 5 5 pro: can have unexpected consequences 5 5 pro: it fosters complacency 5 5 u 8 6.1.1.86.1.1.9 16.1.1.10 16.1.1.11 56.1.1.12 56.1.1.13 5 5 6.1.1.14 6.1.1.15 16.1.1.16 16.1.1.17 56.1.1.18 56.1.1.19 5 5 6.1.1.20 6.1.1.21 16.1.1.22 16.1.1.23 56.1.1.24 56.1.1.25 5 5 6.1.1.26 6.1.1.27 16.1.1.28 16.1.1.29 56.1.1.30 56.1.1.31 5 5 6.1.1.32 6.1.1.33 56.1.1.34 56.1.1.35 16.1.1.36 16.1.1.37 1 1 6.1.1.38 6.1.1.39 56.1.1.40 56.1.1.41 16.1.1.42 16.1.1.43 1 1 6.1.1.44 6.1.1.45 56.1.1.46 56.1.1.47 16.1.1.48 16.1.1.49 1 1 D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 34 of 42 6.1.1.50 6.1.1.51 56.1.1.52 56.1.1.53 16.1.1.54 16.1.1.55 1 pro: it's too easy to cheat 5 5 Note that the ratings histogram for all the posts in the uncorrelated and fully balkanized conditions would be identical, as follows: Figure 16 - Balkanized rating histogram The existence and nature of any balkanization would be completely opaque to a user looking at the deliberation map, especially if (as is commonly done to encourage honest ratings), the ratings are anonymized. The presence of absence of balkanization is, however, quickly revealed by simple visualizations of the appropriate metrics from CATALYST analytics server. If we plot the posts according to the first two (most predictive) eigenvectors returned by the interest space post coordinates metric, we can see that there is no clear clustering: Figure 17 - Eigenvector decomposition of uncorrelated interest space D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 35 of 42 1 This is borne out by the fact that the support space clustering metric returns a value of 0 i.e. no clustering was observed. By contrast, if we plot the posts from the fully balkanized case according to the first two (most predictive) eigenvectors returned by the interest space post coordinates metric, the clustering (and thus balkanization) becomes clear: Figure 18 - Eigenvector decomposition of balkanized interest space The clustering coefficient, in this simple example, is 1.0. Finally, we can use the interest space dimensions metric to visualize the fault line across which the two balkanized groups are divided. The post weights for the most predictive eigenvector are as follows: Table 7. Eigenvector analysis idea: yes 0.35 pro: carbon offsets do reduce greenhouse gas emissions (if not fraudulent) 0.35 pro: it is getting easier and easier to find good carbon offsets 0.35 pro: Many major meetings are using them 0.35 idea: no -0.35 pro: can have unexpected consequences -0.35 pro: it fosters complacency -0.35 pro: it's too easy to cheat -0.35 If we simply color code the posts deliberation map according to the valence of the post weights (green for support, red for opposition), we can immediately see that the community is divided over whether carbon offsetting is a good idea or not: D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 36 of 42 Figure 19 - Topic tree color-coded by balkanized community This represents, we believe, a compelling demonstration of how metrics provided by the CATALYST Analytics Server can allow system users to derive a qualitatively deeper understanding of the "health" of the deliberation, and how mapping back abstract mathematical results to the original discussion entities can improve intelligibility. 4.3 Suggestions based on semantic clusters The initial positive response of the design summer school moderators to semantic distance and clustering encouraged us to design further alerts that would go beyond clustering and provide direct help to harvesters. In bottom-up tools like Assembl in particular, harvesters are trying to extract semantic clusters by hand from harvested content, and this allows us to use the semantic proximity machinery to answer two very different kinds of questions: Do the automatic clusters correspond to meaningful concepts? In which case: Can the automatic clustering help enrich the harvester-defined content groups around ideas? Can the automatic clustering help carve out new content groups within those already identified? Do the content groups correspond to semantic clusters? We have focused on the first kind of questions, due to their immediate practical use, but we have done some work of approaching the problem by the other end: It is possible to measure how much a given group of content (posts) defined by a harvester stands in contrast to neighbouring groups by calculating the silhouette score (Rousseeuw, 1987) of the group's content in the conversation. We had to adapt the algorithm slightly, as silhouette scores are a global measure of clustering, and we wanted to measure the score of a single group; also, the same message can be associated with a plurality of groups (ideas) by harvester, and the silhouette algorithm expects a single category per point. We have also measured what we call the internal silhouette score of a group, which is the silhouette score that corresponds to the subdivision of the group's content in its immediate sub-ideas. In general, we have found significantly positive scores (between 0.1 and 0.2, where zero should represent a random grouping of messages and negative numbers can represent a grouping that runs counter to computed clusters) in our conversations, at least in the leaf ideas. The ideas closer to the root of the thematic tree are usually too general to be distinguishable by semantic analysis, but the consistent positive results lower down in the tree would tend to confirm that the semantic analysis is not wholly irrelevant to the practice of harvesting. So in theory, it should be possible for a recommender to propose adding unclassified content to the "nearest" few groupings, and to see whether the silhouette score is improved; and to subdivide each idea in many ways, and to see which subdivisions have high global silhouette scores. In practice, the second algorithm is computationally intensive, and we have found it more practical to reuse the clusters from our earlier work (using the Optics algorithm) to accomplish both tasks, at least as an initial implementation. So our recommender takes each optics cluster, and finds the content groups with the most shared content. It then attempts the following: D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 37 of 42 Adding the rest of the cluster to the group, to test whether that group's local silhouette score is improved; Using the cluster to introduce a new subgroup in that group, to see whether the group's internal silhouette score is improved; A combination of both strategies above. In the case of additions, the recommender chooses the idea most improved for a given cluster, and in the case of partitions, it chooses the cluster that has the most impact on a given idea. Combinations have priority on either. Though still partial, early results are encouraging. We have asked harvesters to evaluate clusters found by our algorithm in existing real conversations, and compare the usefulness rating of real clusters to fake clusters. The fake clusters are constructed by reverting the optics algorithm, and suggestions that minimize the silhouette score were presented to harvesters and mixed randomly with suggestions that improve it. The results are somewhat mixed; many results from the suggestion engines were considered of limited use by harvesters, whereas many random results were found useful. Still, overall, engine recommendations were judged more useful than random. (Additions and partitions were analysed separately, because the former are scored by a delta to the outer silhouette score, whereas the latter are scored by a delta to the inner silhouette score. Mixed additions and partitions are analysed as partitions.) Table 8. Cluster ratings by harvesters Type Suggestion or random # useful # useless addition suggestion 30 23 addition random 6 17 partition suggestion 19 21 partition random 5 12 However, this report fails to take into account the fact that, within each category (suggestion or random), the score delta indicates degree of certainty of the result. Looking at the average score delta within each category, we obtain the following: Table 9. Average silhouette score for cluster categories Type Judged useful average score delta addition yes 0.029 addition no 0.023 partition yes 0.025 partition no 0.008 Those results are not, however, significant, according to a Student’s t-test: additions: statistic=0.62, p value=0.54 D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 38 of 42 partitions: statistic=1.14, p value=0.26 Nonetheless, the fact that so many of the additions or partition suggestions were judged useful by the harvesters is encouraging in its own right, and the cluster analysis will be included in the Assembl platform. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 39 of 42 Conclusions and future work While there has been substantial effort devoted to manually-coded, post-hoc metrics on the efficacy of on-line deliberations (Steenbergen et al., 2003) (Stromer-Galley, 2007) (Trénel, 2004) (Cappella et al., 2002) (Spatariu et al., 2004) (Nisbet, 2004), existing deliberation technologies have made only rudimentary use of automated real-time metrics to foster better emergent outcomes during the deliberations themselves. The core reason for this lack is that, in existing deliberation tools, the content takes the form of unstructured natural language text, limiting the possible deliberation metrics to the analysis of word frequency statistics, which is a poor proxy for the kind of semantic understanding that would be necessary to adequately assess deliberation quality. One of the important advantages of using argument maps to mediate deliberation is that they allow us, by virtue of their additional semantics, to automatically derive metrics that would require impractically resource-intensive manual coding for more conventional social media. We have also shown how exception analysis can identify useful deliberation metrics, as well as how a collective intelligence approach can be used to gather community input on which metrics are most useful. It seems clear overall that deliberation analytics can serve a useful role in medium to large-scale moderated conversations, whether through visualizations or alerts. However, one conclusion of our study is that, at least in the case of the more mathematically grounded alerts, their utilisation by community moderators requires designing a visualization that can relate those raw results to deliberation entities. For that reason, we will continue work on visualization. The CI dashboard allows us to test new visualizations comparatively easily, as we find new opportunities for large-scale conversations. In particular, participation and interest metrics have been shown to be of limited use for smaller conversations, but we look forward to testing them in a larger-scale debate. Harvesters have declared an interest in semantic clustering alerts, and we will endeavour to refine those functions. Though we think we have demonstrated the usefulness of many deliberation analytics, and shown the potential usefulness of many more, we were hoping to show that they could assist community managers to improve the quality of the conversation, and the limited scale of our tests has been such that un-assisted human intelligence could have achieved the same effect. So that research question remains open for now, though we have set in place a research apparatus that we hope could answer that question in the near future. Beyond that, we were hoping that deliberation analytics could also help participants become aware of the community’s collective communication dynamic, and raise the level of collective intelligence of the community. This question remains open, and our results show that the pedagogical element cannot be underestimated. In particular, past work by Mark Klein has shown that highquality deliberative structure could be achieved in a moderated context (Klein 2012). It is yet unclear whether deliberation analytics could allow a comparable structure to emerge without moderation, through a combination of harvesting, just-in-time advice on (micro) deliberation structure, and alerts and visualization on macro conversation dynamics. Though we have shown the usefulness of deliberative analytics in assisting the harvesting process, and in identifying these conversation dynamics, measuring their impact on the capacity for self-organization of collective intelligence communities remains a question for future research. D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 40 of 42 References Ankrest, M., Breunig,M., Kriegel, H. & Sander, J. (1999). OPTICS: Ordering Points To Identify the Clustering Structure, Proc. ACM SIGMOD’99 Int. Conf. on Management of Data, Philadelphia PA, pp. 49-60. http://doi.org/10.1145/304181.304187 Bangor, A., Kortum, P., and Miller, J. Determining what individual SUS scores mean: Adding an adjective rating scale. Journal of usability studies 4, 3 (2009), 114–123. Bangor, A., Kortum, P.T., and Miller, J.T. An Empirical Evaluation of the System Usability Scale. International Journal of HumanComputer Interaction 24, 6 (2008), 574–594. Brooke, J. SUS: a retrospective. Journal of Usability Studies 8, 2 (2013), 29–40. Cappella, J. N., Price, V., & Nir, L. (2002). Argument Repertoire as a Reliable and Valid Measure of Opinion Quality: Electronic Dialogue During Campaign 2000. Political Communication, 19(1), 73 - 93. Golub, G., & Kahan, W. (1965). Calculating the Singular Values and Pseudo-Inverse of a Matrix. Journal of the Society for Industrial and Applied Mathematics Series B Numerical Analysis, 2(2), 205–224. http://doi.org/10.1137/0702016 Klein, M., & Bernstein, A. (2004). Towards High-Precision Service Retrieval. IEEE Internet Computing Journal, 8(1), 30-36. Klein, M. (2012). Enabling Large-Scale Deliberation Using Attention-Mediation Metrics. Computer-Supported Collaborative Work, 21(4), 449-473. Klein, M. (2014). Deliberation analytics (No. D3.5). University of Zurich. Retrieved from http://catalyst-fp7.eu/wpcontent/uploads/2014/06/CATALYST_D3.5.pdf Liddo, A. D., & Bachler, M. (2014). Collective Intelligence Dashboard (No. D3.9). Milton Keynes: The Open University. Retrieved from http://catalyst-fp7.eu/wp-content/uploads/2014/06/CATALYST_D3.9.pdf Nisbet, D. (2004). Measuring the Quantity and Quality of Online Discussion Group Interaction. Journal of eLiteracy, 1122-139. Pedregosa et al. (2011) Scikit-learn: Machine Learning in Python, JMLR 12: 2825-2830. url: http://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html Rousseeuw, P. (1987). Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis. Computational and Applied Mathematics 20: 53–65. doi:10.1016/0377-0427(87)90125-7. Spatariu, A., Hartley, K., & Bendixen, L. D. (2004). Defining and Measuring Quality in Online Discussions. The Journal of Interactive Online Learning, 2(4),. Steenbergen, M. R., Bachtiger, A., Sporndli, M., & Steiner, J. (2003). Measuring political deliberation: a discourse quality index. Comparative European Politics, 1(1), 21-48. Stromer-Galley, J. (2007). Measuring deliberation's content: A coding scheme. Journal of Public Deliberation, 3(1), 12. Trénel, M. (2004). Measuring the quality of online deliberation. Coding scheme 2.4. Social Science Research Center Berlin, Germany. Available at: http://www.wz-berlin.de/online-mediation/files/publications/quod_2_4.pdf . Ullmann, T. D. (2004). maQ-Fragebogengenerator. Make a Questionnaire. Available online at: http://maq-online.de Ullmann, T. D., Liddo, A. D., & Bachler, M. (2014). Collective Intelligence Analytics Dashboard Usability Evaluation (No. D4.6). The Open University. Retrieved from http://catalyst-fp7.eu/wp-content/uploads/2014/12/CATALYST_-D4.6.pdf D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 41 of 42 List of Figures Figure 1 - Metric and alert server data flow ........................................................................................................................................... 6 Figure 2 - UI for alerts in tree view ....................................................................................................................................................... 13 Figure 3 - Selection of goals for alerts .................................................................................................................................................. 16 Figure 4 - Selection of exceptions for alerts ......................................................................................................................................... 17 Figure 5 - Alert selection system .......................................................................................................................................................... 17 Figure 6 - Attention bias visualisation .................................................................................................................................................. 18 Figure 7 - Rating bias visualisation........................................................................................................................................................ 19 Figure 8 - Attention map visualisation .................................................................................................................................................. 20 Figure 9 - Community interest visualisation ......................................................................................................................................... 21 Figure 10 - Sub-communities network visualisation ............................................................................................................................. 22 Figure 11 - CI Dashboard alerts interface ............................................................................................................................................. 23 Figure 12 - LiteMap alerts interface ..................................................................................................................................................... 24 Figure 13 - DebateHub alerts interface ................................................................................................................................................ 25 Figure 14 - How often users ignore arguments when rating posts ....................................................................................................... 32 Figure 15 - Simple deliberation map..................................................................................................................................................... 33 Figure 16 - Balkanized rating histogram ............................................................................................................................................... 35 Figure 17 - Eigenvector decomposition of uncorrelated interest space ............................................................................................... 35 Figure 18 - Eigenvector decomposition of balkanized interest space .................................................................................................. 36 Figure 19 - Topic tree color-coded by balkanized community .............................................................................................................. 37 List of Tables Table 1. Table 2. Table 3. Table 4. Table 5. Table 6. Table 7. Table 8. Table 9. List of metrics ........................................................................................................................................................................... 7 List of alerts ............................................................................................................................................................................ 13 Task performance .................................................................................................................................................................. 27 Usability ................................................................................................................................................................................. 28 Uncorrelated ratings example................................................................................................................................................ 33 Balkanized ratings example.................................................................................................................................................... 34 Eigenvector analysis ............................................................................................................................................................... 36 Cluster ratings by harvesters.................................................................................................................................................. 38 Average silhouette score for cluster categories ..................................................................................................................... 38 D4.2b - Project Testbed: Argument Mapping & Deliberation Analytics November 2015 Imagination for People The CATALYST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n°6611188 Page 42 of 42
© Copyright 2026 Paperzz