Multiform Bivariate Matrix

Getting To Know The Multiform Bivariate Matrix
1: Introduction
A manipulable matrix is a generic component that can accept a variety of
representation forms as elements. Some example elements include bivariate
maps, scatter plots, space-filling visualizations, and histograms. There are
several types of matrices in GeoVISTA Studio such as the Multiform Bivariate
Matrix and Bivariate Small Multiples. The Multiform Bivariate Matrix can display
the same representation form on both sides of the diagonal as in a standard
scatterplot matrix (Figure 1) or it can show different forms on either side as in
the map and scatterplot matrix shown below (Figure 2). The Bivariate Small
Multiples Matrix is different from the Multiform Bivariate Matrix in that different
rows in Bivariate Small Multiples display different graph representations (Figure
3).
Figure 1: Scatterplot matrix showing three variables; rate of cervical cancer, rate of breast
cancer, and per capita income.
1
Figure 2: A Bivariate matrix using scatterplots and maps of cervical cancer, breast cancer, and
per capita income data for white women, 1993-1997.
Figure 3: Example of a Multiform small multiple matrix using scatterplots, maps, and spacefills
of cervical cancer, breast cancer, mammogram, and per capita income data for white women,
1993-1997.
In the example bivariate matrix with maps and scatterplots (Figure 2), the region
above the diagonal displays a scatterplot matrix, in which the Y-axis is
determined by the row and X-axis is determined by the column that the
scatterplot is in. The region below the diagonal contains a map matrix, which
displays geographic information. Colors in these visualizations are assigned by
the corresponding variables indicated in the same rows or columns in the matrix.
2
2: Interaction with Matrices
A step-by-step guide (as well as the sample data used in this tutorial) describing
how to build a Matrix using Studio is available at:
http://www.geovista.psu.edu/grants/nci-esda/tutorials.html
To launch Studio with the matrix design pre-loaded, click here:
http://www.geovistastudio.psu.edu/autobuild/gvstudio-matrix.jnlp
To begin using a Matrix in Studio, load data with the MtSimpleFileChooser
bean. Click the ‘Select’ button on the File Chooser and navigate to your data
directory. You need a shape file and a corresponding attribute file in either ‘.dbf’
or ‘.csv’ format. You must keep the name of the shape file and data file the
same.
In the following sections, you’ll read about tasks performed on county level
demographic and cancer registry data for part of the Appalachian Cancer
Network. The graphical user interface (GUI) for the design shown in Figure 5 is
presented here in the Studio GUIBox (Figure 4).
Figure 4: A map and scatterplot matrix linked to a spreadsheet showing cervical cancer, breast
cancer, mammogram, and per capita income data for white women, 1993-1997, in NAACCR
gold states within the Appalachian Cancer Network.
The left side of Figure 4 shows a matrix consisting of scatterplots and maps. The
“x” or horizontal variable of each individual scatterplot or map is determined by
which variable appears in the same column as that scatterplot or map, and the
3
“y” or vertical variable is determined by the variable, which appears in the row as
that scatterplot or map. For example, the scatterplot in the upper right corner of
the figure above shows “CER9397AGE” on the vertical, “y” axis, and “PCINCOME”
on the x, or horizontal axis. Here, the variable “CER9397AGE” represents cervical
cancer rate for while female in year 1993-1997, and “PCINCOME” represents per
capita income. The maps are colored with a bivariate scheme based on the
combination of both variables.
The spreadsheet shown on the right side of Figure 6 is dynamically linked to
selections made in the matrix. This enables detailed exploration of the actual
values being visualized.
The goal of Multiform Bivariate Matrix is to enable dynamic exploration of
relationships between particular counties and multiple variables. From within the
matrix, you can interactively change the order of the variables, the number of
variables being shown, select subsets of data both by value and by geography,
set extent for variables, change color and classification schemes, and see
detailed views of the individual components.
3.1 Reordering variables
To change the order of variables being shown, first click and hold on one of the
top buttons labeled with a variable, as shown below:
Figure 5: Change the order of the variables.
Next, while you continue to hold the click, drag the button over another variable
name, and release it. The position of the variables will then be changed.
3.2 Selecting variables
To change which variables are being shown, click on the button in the upper-left
corner of the matrix as shown in Figure 6.
4
Figure 6: Click the button at the upper left corner of the matrix to open the variable selection
window.
The variable selector dialog will pop up, as shown below (Figure 7):
Figure 7: The pop-up window for selecting variables.
Select the variables you want to explore, and click ‘Apply’. You can use the
same methods for multiple selection that work in most other software; e.g. shiftclicking and ctrl-clicking. The matrix will update to show the selected variables.
After you are finished selecting variables, click ‘Close’.
3.3 Selecting observations
Dynamic selection of particular observations is indicated with a color, the default
in this case being blue. If you want to select observations according to a
5
variable, click and drag inside any scatterplot. You’ll see a dashed box from your
cursor, as shown in Figure 8 below.
Figure 8: Select a subset of observations by dragging a box with the mouse.
If the box encloses values, those observations will highlight in each scatterplot
and map. At the same time, the observations will highlight in the spreadsheet
(Figure 9). In scatterplots, the solid dots are those observations selected. In
maps, the counties with colors are those that are selected.
Figure 9: Selected results are highlighted in the matrix and other components.
The same procedure works for dynamic selection of data in map matrix elements
(Figure 10):
6
Figure 10: Selecting a subset of observations according to their geography.
If you want to add to a selection that you have already made, just hold down the
“shift” key while you make a selection. The selection you make while holding
down the “shift” key will be added to the current selection.
3.4 Setting a Variable’s Extent in the Scatterplot
You can control a variable’s extent in each scatterplot by right-clicking or by
double-clicking on components. If you right click on a scatterplot, you will see a
pop-up menu (Figure 11). If you left click on ‘Set Range’, a dialog will appear
that allows you to change the maximum and minimum values that are displayed.
These values will then change for all instances of that variable in the matrix.
Figure 11: Pop-up window for setting the extent of observations displayed in scatterplots.
To change the spatial extent shown on a map in the matrix, double click on a
map, and you will see a pop-up window with a detailed map (Figure 12).
7
Figure 12: Detailed map.
The toolbar allows you to control the extent shown. Try clicking on the first icon
(the selection tool), and then drag a box on the map. The counties within the
box you just dragged should have been selected. The second and third buttons
are zoom-in and zoom-out buttons. The fourth one with the “house” image will
take you to the full extent of the loaded spatial data. The button with the “hand”
image is for panning the map. The sixth button activates a function called
“Eccentric Labels”, which will allow the display of all of the labels of the counties
within a small area. The button with the fish image will causes a “fisheye” effect
on the map when the mouse is rolled over data.
3.5 Detailed graphs
To see a detailed individual graph, double-click on a scatterplot or a histogram.
You will see a detailed version of the scatterplot or histogram that you can resize
and manipulate outside of the matrix.
3.6 Dynamic indication
You will notice that as you mouse-over observations or counties in scatterplots
and maps, they will highlight in bright green (Figure 13). The same observation
displayed in other graphs will also be highlighted.
8
Figure 13: One observation is highlighted in bright green during a mouse-over.
3.7 The Bivariate classification scheme
We can change the classification and color scheme in the matrix to reflect the
variables in the matrix more accurately. To add this functionality to any design,
you need to add the BivariateColorSchemeVisualizer and two instances of
the VisualClassifier to the matrix you wish to change (Figure 14).
Figure 14: Connecting the Bivariate Color Scheme bean to the Bivariate Matrix.
Once you’ve done this, you will now see some additional components in your
application. These are two “Visual Classifiers” and a bivariate color scheme
visualizer (Figure 15). Double click on the color patch in the bottom of Visual
Classifier, as shown below:
Figure 15: Visual Classifiers and the Bivariate Color Scheme Visualizer.
This will cause a color chooser to pop up. For now, choose a blue color as shown
below (Figure 16):
9
Figure 16: Color chooser.
Then click ‘OK’. Your Visual Classifier should now look like this:
Figure 17: The new bivariate color scheme.
Now load data with the MtSimpleFileChooser bean, and take a look at the
scatterplots and maps in the matrix (Figure 20). You can read the colors as
follows: the degree of ‘blueness’ is reflected in the “x” or horizontal variable, and
the degree of ‘magentaness’ is reflected in the “y” or vertical axis. Observations
that are with low values in both are light gray, and observations that are high in
both are dark blue. You can also use the corresponding scatterplot as a legend
for the map, for example:
10
Figure 18: Map and scatterplot matrix with the new color scheme.
In the scatterplot and map displaying variable BR9397AGEA and CER9397AGE,
the counties that are both low rates in breast cancer and cervical cancer for
white females from 93 to 97 are colored light gray. The counties that are high in
breast cancer rate and low in cervical cancer rate are light blue. Conversely, the
counties that are high in cervical cancer rate and low in breast cancer rate are
magenta. Those that are high in both are dark blue.
Mathematical classification schemes can be changed by clicking on the dropdown
boxes in the middle of the Visual Classifier. The options are: Quantiles, Equal
Intervals, and Standard Deviations.
4: Explore!
Feel free to explore different types of matrices and graph representations. Our
mission is to create tools for visualizing data in ways that foster exploration and
analysis.
We’d love to hear about any new relationships you discover in your data using
our tools. Please email us at [email protected] with any comments,
complaints, compliments, or questions. Thank you!
11