Circular Tree Map

School of Computer and Information Science
CIS Research Placement Report
Agile Visualisation Using Roassal in Smalltalk.
Marc Seyfang
Date: 08/11/16
Supervisor: Dr. Georg Grossmann
Abstract
The first aspect of this project was to research what agile visualisation is and how it can
be useful. Another portion of the project is to learn how to code in the Smalltalk
programming language in the VisualWorks development environment; specifically how
to use the Roassal visualization engine to create a number of data visualisations in order
to aid data analysis. The final part of the project was to construct a number of
visualisations to help decide on more complex visualizations to work on.
Contents
1
Introduction ................................................................................................................. 3
2
Visualisation ................................................................................................................ 3
3
Smalltalk and Roassal .................................................................................................. 4
4
Implementation ............................................................................................................ 5
4.1
Circular Tree Map ................................................................................................ 9
5
Conclusion ................................................................................................................. 11
6
Bibliography .............................................................................................................. 12
1 Introduction
The aim of this project was to create visualisations of data, using the agile visualisation
method by programming in Smalltalk using VisualWorks. The first step in achieving this
was to research what agile visualisation is and how it can be useful. The next portion of
the project is to learn how to code in the Smalltalk programming language in the
VisualWorks development environment; specifically how to use the Roassal visualization
engine to create a number of data visualisations in order to aid data analysis. The final
aspect of the project was to construct a number of visualisations to help decide on more
complex visualisations to work on. The first visualisations that will be looked at is the
weighting of data based on the number of connections to different nodes. The second
visualisation is the use of circular tree map for the representation of data in a hierarchical
structure. These visualisations will then be able to provide an insight to the benefits to
visualising data.
2 Visualisation
Data is the collection of facts, which can include measurements, observations and
descriptions. Raw data is data that has not been processed, such as a list of everyone’s
eye colour in an area, this list could contain thousands of pieces of data and can be
somewhat meaningless without it being processed. (Steele 14 Feb. 2012) Data
visualisation is the presentation of data such as this in a graphical or pictorial format;
common data visualisations include graphs, charts, trees and maps. Once the data is
processed into one of these visualisations it can have greater meaning; with the example
data from above a pie chart could show the percentage of people with specific eye
colours, this gives the viewer some useful information from the data that they would not
easily get from the raw data.
This process of extracting useful information from raw data is called data analysis. When
completing a data analysis the useful information can sometimes be difficult to extract
from the data and this is where agile visualisation can be useful. Agile Visualisation is
process of creating many data visualisations in a short time period; the quicker a
visualisation can be created the more can be produced, which helps data analysts arrive at
the useful information quicker. (Bergel 6 September 2016)
3 Smalltalk and Roassal
There are many programming languages to consider when choosing a language to
program in; Smalltalk is an object oriented programming language which is ideal for
rapid, iterative development. (Leon 3 April 2007) Smalltalk also contains the Roassal
visualization engine which contains many methods for agile visualization. For these
reasons VisualWorks, a Smalltalk development environment, was chosen as the program
to be used for the creation of data visualisations.
Roassal creates a visualisation with the following components; views, elements, shapes,
edges and interactions. A view is a container of the graphical elements or nodes of the
visualisation, these elements are representations of an object which contains information
such as a number or a string; these elements can be added and removed from the view.
The graphical representation of these elements can be changed using the Roassal
component shape, this includes the following shapes; circle, box, and labels. Edges can
be created, connecting elements to one another to represent a relationship between the
two elements. Finally the user is able to interact with the visualisation and position the
elements around the view, hovering over elements displays the name of the
corresponding object. All of these Roassal features can be used for the benefit of data
analysis to create many different visualisations in as short a time as possible.
4 Implementation
For this project some data was provided to try out some visualisations; the data is a csv
file containing 3 columns of data, a column for subject, predicate and object nodes. Each
piece of data contained in these columns is a string representing a URL. Each row in the
data is contains one subject predicate and object node and these nodes can be connected
forming a relationship; this relationship can then be extracted into an ordered connection
called edgeAssociations. When learning how to program in Smalltalk possible ways to
visualise this data where considered; figure 1 below contains some of the starting layouts
of the data.
Figure 1: (left) Circular layout, (center) force based layout and (right) rectangle
pack layout
The above visualisations use the view RTMondrian; Mondrian is a code library designed
to build expressive and flexable visualizations. The Circular layout above evenly
distributes the elements around a circle, this is useful for seeing the connections between
the nodes. For a force based layout, elements repel one another similar to electric charges
repelling and a rectangle pack layout packs all the elements as tightly as possible.
After learning more about Smalltalk and visualisation a specific visualisation was chosen
to work on, increasing the size of the nodes based on how many connections they had.
First a feature of Smalltalk called normalizer was found that could adjusted their size of
the nodes based upon a variable or method of the object. However this did not solve the
problem as the objects in the given data are just strings, meaning that only a few variables
and methods are available for use, such as #size, which returns the length of the string;
this can be seen in figure 2 below.
Figure 2: The data adjusted by the size of the string.
To solve this problem a new class was created call AVNode, which stands for Agile
Visualisation Node. The AVNode class contained a variable URL, to contain a string that
would hold the data and the AVNode also contained a sorted collection called edges to
store a list of the nodes that connected to it, and finally the AVNode class was given a
method called countEdges which returns the number of nodes stored in the edges sorted
collection. An AVNode was created for each piece of data and placed in an ordered
collection called allNodes. This ordered collection allNodes was then looped over and
each node was compared to the list of edgeAssociations and when a match was found the
corresponding nodes were added to the edge collection of the AVNode. This allowed
allNodes to be used as the nodes of the RTMondrian and the normaliser to use the
method #countEdges, which can be seen in figure 3 below.
Figure 3: Data weighted based on the number of edge connections.
As shown in figure 3 the nodes have variable size, however the connection stopped being
shown. It was found that now that the nodes represent AVNodes rather than strings the
edgeAssociations collection used to generate the edges no longer worked. This was
because the edgeAssociations collection contains associations of strings to strings rather
than AVNode to AVNode. A new collection had to be created using AVNodes, this was
done by looping over the edgeAssociations collection and the allNodes collection and for
each edge association, adding the nodes that correspond to the association to a new
collection of associations edgeNodes. This new collection was in the correct format of
AVNode to AVNode, which allowed edgeNodes to be used in the formation of
connections in the visualisation, as seen in figure 4.
Figure 4: Data weighted and showing connections.
Originally when the nodes were strings, you could hover over the nodes and it would
display the string, this is part of the interaction that Roassal provides. However now that
the node are AVNodes when hovering over the nodes the string is not displayed just the
text ‘an AVNode’. This slightly limits the usefulness of the visualization, for example if
the user hovers over the largest node it will no longer tell them which URL it represents.
This problem could be solved using highlights, popups, or labels; labels were investigated
and it was found that labels can be added when specifying the shape of the nodes. These
labels can be added by specifying an aspect of the nodes, in this case the label will be the
URL string of the AVNode using the code ‘withTextAbove: #url’. A number of layouts
were tried for the final visualisation, including cluster, sugiyama and force however the
best layout found for visualising the size differences and the connections was the circle
layout because the spacing between the nodes makes the visualisation clearer as seen in
figure 5.
Figure 5: Final weighted and labelled visualisation of the data.
4.1 Circular Tree Map
A second visualisation for this data was attempted, the circular tree map; a tree is a
hierarchy visualisation where there is a root node and this root node branches into child
nodes and child nodes can further branch in more nodes. A circular tree map is very
similar where the root node is a large circular graphical element and child nodes are
circular graphical elements inside the parent node as seen in figure 6. (Bergel 6
September 2016)
Figure 6: A circular tree map visualising the root node RTObject and all subclasses.
In figure 6 the smaller circles within larger circles are subclasses of the larger circles,
their superclasses. The transparent circles are ones that contain smaller circles and hence
have subclasses and the purple circles are classes that do not have any subclasses.
This tree visualisation is not really compatible the original data without some
manipulation of it, as it is not in the parent and child format. As the strings in the data are
in the format of URLs with all of the data coming from two different websites and the
data further branching from the subfolders and subpages of the websites. In order to
create compatible data to use the circular tree map a new class was created, a StringNode;
this StringNode contain a string called substring which contains the portion of the URL
that makes it unique. Each StringNode also contains a collection of nodes called
substringNodes that contains all the StringNodes that are the substrings of the URL.
This allowed a hierarchy to be formed using the strings and substrings, for example the
string ‘http://www.pinpoll.com/ontology/pollee’ and the string
‘http://www.pinpoll.com/ontology/general’ would be subNodes of the parent node
‘http://www.pinpoll.com/ontology/’ as this is where the website pages diverge. The
resulting circular tree map can be seen in Figure 7.
Figure 7: Circular tree map of the hierarchy of the websites and subpages on the site.
5 Conclusion
Research was conducted to find out what visualisation is and how it can be useful; it was
found that visualisation is the process of presenting data in a graphical or pictorial way
and that the main benefit of visualisation is to improve the usefulness and meaning of raw
data. In order to visualise the data a program had to be used, the chosen program was
VisualWorks because of Smalltalks benefits for rapid, iterative development along with
the capabilities of the Roassal visualization engine. It was found that Roassal uses views,
elements, shapes, edges and interactions to create a number of visualisations for data
analysis.
Agile visualisation was found to be the process of creating many data visualisations in a
short time period to help find useful information from data as quickly as possible. This
was done initially and a number of basic visualisations were attempted, then two specific
visualisations were attempted. The first visualisation was to adjust the size of the nodes
depending on the number of connections. The second visualisation constructed was a
circular tree map to represent the data as a hierarchy depending on the subfolders of the
website URL. In conclusion it was found that many different visualisations can be
constructed from a singular data source and these visualisations can be useful in finding
more meaning in the data.
6 Bibliography
Bergel, A. (6 September 2016). Agile Visualization. Acessed 10/11/16. Web URL:
https://dl.dropboxusercontent.com/u/31543901/AgileVisualization/Introduction/0001Introduction.html
Leon, R. (3 April 2007). "Why Smalltalk." Acessed 13/11/16. Web URL:
http://onsmalltalk.com/why-smalltalk
Steele, J. (14 Feb. 2012). "Why Data Visualization Matters.". Accessed 12/11/16. Web
URL: https://www.oreilly.com/ideas/why-data-visualization-matters