Lab 6
Data Communication using Gephi and Tableau
January 19, 2015
Overview of Lab6: We will work with graph data visualization with Gephi that is strongly founded in
algorithmic research. Then we will familiarize ourselves with professional data analysis and visualization
software Tableau using four exercises. You will observe that the learning curve for Tableau is indeed
very flat.
Gephi is open source software for graph and network visualization. Gephi input data comprises of
<nodes, edges> either in a tabular form, csv file or xml file. The Gephi has a strong algorithmic
foundation and the details of the algorithms and research used for any graph processing step is readily
available as a part of the tool. Gephi is good for finding clusters and communities.
1. Start Gephi and study the various parts of the Gephi development environment. At the top is the
project and file management top line menu. Just below that are three important tabs: Overview,
Data Laboratory and Preview.
2. Data Laboratory is for creating and importing edges and nodes data.
3. Overview tab is for configuring and working with the data.
4. Preview is visualizing the final product, the graph or network. From the Preview the graph can be
exported into a form (pdf, png) that can be published.
5. On the left part of the screen shot shown above are the graph manipulation commands; the right
are commands for computing statistics, filtering, modularity detection etc.
6. We explore the features of Gephi using two examples: (i) first one to get started with Gephi, with
large number of nodes and (ii) a well-known data set of the cast of Les Miserables, we can visualize
the importance/influence of the various characters.
Exercise 1: Goal of this project is to understand the basics of importing data, obtaining a graph and
execute simple clustering of the data.
1. Open Gephi New Project Data Laboratory click on nodes import spreadsheet
Make sure “nodes table” is clicked in the dialog box that openschoose the nodes_dh11.csv
Repeat the same the same for the edges. Makes sure edges table is clicked in the dialog box opens.
The details are shown below;
Import Nodes Data
Click on the Nodes tab here, then click on Import Spreadsheet
With Nodes table selected here, select the Nodes data file of .csv format by browsing from here
Once you have selected the file Preview appears like above
If the data looks correct then click -> Next, it will ask you if the data-types of imported columns are fine
then click-> Finish
Import Edges Data
Select Edges, and then click on Import Spreadsheet
Select option Edges table like here, browse and load the edges file from here and we see an edge table
preview in grid, like below
Preview the Imported Data
Now we will try to see how the data we imported looks like in a graph. Click on Overview tab to switch
from data laboratory tab.
Below is how the above imported data looks like if we click Overview.
That much looks like a hair ball. Click on context window in Upper Right Hand Bar.
621 Nodes
733 Edges
Directed Graph
Force-Atlas Layout
It makes the connected nodes attracted to each other and pushes the unconnected nodes apart to
create a cluster of connections. Go to Layout section. Upper left side
This box tells you more about
algorithm of the Layout. Go
ahead click it
Once you have clicked on the Layout pane. Click the dropdown and select Force-Atlas
Set the “Repulsion strengh” at 1000 to strongly repel the disconnected nodes
away from each other
Set the “Attraction strength” at 10000 to strongly bring the connected nodes
closer to each other
All the connected nodes have condensed together and disconnected ones are far apart. The layout run
will auto stablize. It may be different visual than the above. Stop the Layout run after that. We have
segregating the clusters from each other. We will stop here and analyze a complete example of known
data.
Exercise 2: In this exercise we will work with a known case of the data about the cast of Les Miserables.
The data in this case is presented as an XML file with node and edge tags. Nodes in this case study
represent the characters in the Les Miserables and the edges the relationship between them and the
strength of the relationship. The file is in a Gephi XML format file (.gexf).
1. Gephi New Project Open file LesMiserables.gefx file, you will see the import report below.
2. OK and Click Overview. You will see a crowded graph with nodes and edges connected.
3. Now on the left panel, choose Force Atlas from the drop down box, set the repulsion to 10000
and click Run to run the analysis.
4. Stop the run after it auto stabilizes.
5. You see a network that is little bit clearer than the first one. We will process it further.
6. Locate the Ranking module on the left panel. Choose Degree as the rank and click Apply to see
the results.
7. Move the mouse over the gradient component and double click on the triangle to configure
color.
8. Click on the small table symbol at the left bottom of Ranking panel and click Apply to see the
ranking of the various nodes. You can observe that Valjean has the highest degree of
connectivity.
9. Click the triangle for size on the panel and click Apply. You will see the nodes are represented by
their relative strengths.
10. Click on Adjust by Size in the bottom panel and Run it for some time to clean the graph further.
11. Click on T (Show Node Labels) at the bottom left of the display Window. Adjust the font and size
etc. We will manipulate the size of the labels using the buttons at the bottom so that only
important ones are visible.
12. Community detection: Now we will examine Statistics and metrics available on the right and
select the modularity. Run.
13. In the Partition panel on the left, select Modularity class as the parameter, and click refresh and
Apply. You will see the communities colored.
14. Using this community information decisions can be made and strategies designed.
15. We will filter nodes with low connectivity. Use the Filter feature for this. Select the Filter tab on
the panel on the right, select Topology, degree range parameter; move the selection to the
bottom Queries window, set it to 2. Watch the outlier nodes disappear.
16. Press Preview to view the graph. You can set various parameters to visualize the graph with
different parameters such a curved edge and black background. The graph can be exported as
SVG/PDF/PNG for use in your presentations and reports. The result is shown below.
Tableau Exercises: We will learn the Tableau layout and then work on four exercises. For the various
features of Tableau study the screen shot given below. We will create this worksheet to understand
Tableau layout.
Exercise 3:
1. Start Tableau connect to dataStudy the various sources that Tableau supports. Click Excel
Worksheet import Worldbankdata.xlsxStudy the various tables/data available. You can
drag any data sheet into Tableau workspace. We will Drag Country data and Region Data and do
an inner join on Country.
2. Drag the Data By Country into the workspace and Click the Go to Worksheet icon in the middle.
3. Examine the various regions of the Tableau interface for worksheet; Starting at the top left: (i)
the list of data sources (ii) dimensions and measures available in the selected data source, (iii)
columns and row shelves (header and axes) (iv) filters for visualization (v) Marks for controlling
visualization using color, size, label, tooltip text, and shape, (vi) the canvas where visualization is
displayed, (vii) sheets and dashboard tables at the bottom, (viii) session tab indicating the
session: connect to data etc. and (ix) the Show me windows at the right panel showing all the
possible chart types for choose from.
4. Creation of the worksheet shown above involves few clicks, drag and drops at appropriate
tabs/buttons.
5. We want to plot GPD per capita for the countries and color and size them. Press Cntrl key + GDP
per capita from Measures + Country Name from Dimensions; select Horizontal bars from Show
me Panel.
6. Drag Region name to Color Mark button and see the chart get colorized.
7.
8.
9.
10.
11.
12.
13.
See the 12 nulls at the bottom right corner, click on it and select Filter Data.
Drag Sum GDP into Label mark button so that GDP can be displayed along with the bars.
Drag Date into Filter panel and filter the visualization for the year 2010.
Present the plot using Presentation tab above the plot in the line below the top line menu.
See the interactions possible with the presentation chart.
Return to workspace by clicking on the presentation symbol at the bottom right corner.
Save the worksheet for future review.
Exercise 4: Multiple variables: In the above exercise we introduced the various features of Tableau.
Data is often exhibits more than 2 dimensions. Tableau handles this very elegantly. In this exercise we
will analyze the data on the top 100 point scorers in NHL and examine what we can understand from
this data and the conclusions we can make form this analysis. This exercise is based on an exercise
discussed in reference [1] by B. Jones. We will use scatter plots as the main instrument for visualization.
1. Tableau connect to data excel worksheetNHLTop100.xlsx
2. Study the data. Check any of your favorite players are on the list. Study the attributes: points(P),
games played (GP), assists (A) etc. Most of the “team played for information” is null.
3. Navigate to the worksheet; we will create a basic scatter plot.
4. Cntrl-Player-G-A; then click on scatter plot on the “Show Me” panel. You will see that Tableau
has placed SUM(G) on the Column shelf, SUM(A) on the Row shelf and Player in Marks card
“Detail”. Make sure you understand this selection: Since Tableau does a lot of things
automatically you have to make sure the choices are acceptable for you.
5. Now we will analyze the plot. In class discussion.
6. We have compared two variable Goals and Assists. In the next step we will add 2 more variables.
This is very easily done: drag P for points into Size shelf, and GP for games played into the Color
shelf. Observe and understand the changes that happen.
7. Next change the “automatic” selection at the top of the Marks to “Circles”.
8. Next change the color palette from “green sequential” to anything attractive: “Orange-whiteBlue-Diverging” by clicking the Color shelf. Also make the border black.
9. Mouse-over the circles and check out if your favorite player is present in the plot.
10. We will now label the circles. It is as easy a dragging the Player variable (or Dimension) to the
Label shelf. Major difference between Gephi and Tableau is that Tableau displays as many labels
as possible without creating a messy view. You can left click on Labels shelf and click on the
allow overlap check box and see how messy it is if all the labels show.
11. It is possible to modify the tooltip to limit the information or add more information such as the
team played for. Try this by left-clicking on the Tooltip and editing it.
12. You can also add annotations to the circles. We will do it Wayne Gretzky. Unclick the other
labels to view only the annotation you made.
13. We will now add a filter to explore the data further. We are interested in the Position (Center
(C), L, R, and Defense (D)). We will add a radio button style filter to the worksheet. We will add
two more filters for +/- and PIM (penalty in minutes). Right click on these and select “Show
Quick Filter”. You will see three filers appearing on the right side of the worksheet.
14. We will rename the sheet1 to GAGPG by right click on the sheet1 and renaming it.
15. Edit the axes by right clicking on the axes and making it fixed.
16. Then right click on it to duplicate the sheet 4 times. We will use the filters and carry out some
exploratory data analysis (EDA). Rename the sheets by right clicking them and renaming them
as (i) Centers (ii) Defensemen (iii) RightWingers (iv) LeftWingers
17. Visualize each of these by clicking on the right filter C,L, D, and R on the position filter.
18. You will be able to view all the plots on the same plane using presentation symbol for multiple
sheets at the right top corner. Class /team discussion on the resulting exploratory plots.
19. Adding background images: using top line menu item MapBackground Images filename; In
the Options menu that appears you can set Aspect ratio, and other choices. I have added an
image “thegreatone.jpg” for you to use as background.
20. Calculated field: We want to compute Points per game (PPG), assists per game (APG) and Goals
per game (GPG) and use this derived data for creating stacked bar plot.
21. Right click on the Measures area and create a Calculated field.
22. Change Data type of GP to “decimal” or Float by right clicking on it and choosing to change the
data type. This change is needed for Calculated Fields mentioned above.
23. For GPG enter SUM([G])/SUM([GP]) in the formula box. For APG enter SUM([A])/SUM([GP]) in
the formula box and click OK. For PPG enter SUM([P])/SUM([GP]), click OK. Now you will see all
three calculated fields in the Measures area on the left of the worksheet.
24. Now we will do several “drag and drop”: Drag Player to Rows shelf, Measures Values to Column
Shelf, Measures Names to Colors Shelf;
25. Remove all except newly calculated Assists per Game, and Goals per Game out of the Marks
card/shelf.
26. Click on the blue Player pill on the Rows Shelf and select sort, descending order by Points per
game field. Change colors if you prefer. We will discuss the plot that appears as shown below.
27. Next we want to explore regression and trend lines that are really very useful in predicting
future demands as well as revealing any correlations.
28. Now you try the ease of Tableau: Draw a scatter plot for goals (G) vs shots (on goal) with
position (POS) as color and player as tooltip. Click on the worksheet and choose Trend line
Show Trend lines. Choose the “Force Y intercept zero” since no shots means no goals.
29. Study the linear regression model and the p-value (for correlation goodness) of the trend lines.
30. We will end this comprehensive exercise with a highly useful quadrant chart that is really useful
identifying the performance of players (or effectiveness of certain business initiatives).
31. Right click on each axis and select Add Reference lines. Right click on each quadrant and
AnnotateArea type in the characteristic of the quadrant. {high production, high accuracy,
low production, low accuracy}. Adjust the boxes that appear as shown below.
32. Save the Tableau exercise for future use, discussions, creation of Dashboards, and Story. It will
be saved with the extension .twb (Tableau workbook).
Exercise 5: Tableau dashboard. We will create a dashboard with the NHL 100. A dashboard is comprised
of one or more sheets. Unlike Powerpoint it offers superb interaction to the data and charts presented.
1. With the same workbook as created above, Click on Dashborad top line menu Dashboard new
Dashboard rename it NHL100.
2. If you have closed the workbook, open it using File Open xyz.tbx where xyz is the name of the
workbook that you saved in Exercise 4.
3. Important caution: Make sure all the worksheets have the axes fixed by clicking on the axis and
selecting fixed axis.
4. After you create the new Dashboard, you will see the sheets available on the left panel. Drag and drop
sheets you want on the Dashboard. We will compose a dashboard with these four sheets: GAGPG,
Trendlines, QuadChart, StackPlot as shown below. You can adjust the layout as per your needs and also
go back and change anything on the worksheet. Save the changes, it will automatically be updated on
the dashboard. (You may need an artistic designer to design/optimize the layout).
5. Click on the presentation and see the visual and the interaction Tableau dashboard offers. You can
also add background images to the dashboard that promotes your brand.
Exercise 6: Creating a Story for presentation. A “Story” in Tableau is like sequence of slides (or a slide
deck) that is compiled from dashboards and worksheets created earlier.
We will work with Tableau sheet that has worksheets and dashboard that are ready for use in a
presentation. You can think of A Tableau Story as a presentation.
1. Open SheetDashBoardTableauStory.twb
2. We will go through a “Story” that is already created and available for you to review. We will go
through the Story and understand the concept of a Tableau Story.
3. Create your own Story book: Story New Story
4. Right Click on the Story TitleEdit Title”World GDP/Population Show” or some such title
5. Observe the worksheets and the dashboard on the left panel; these form the raw material or
“points” for your story.
6. You can see the content of these worksheet and dashboard by clicking on the tabs representing
them at the bottom.
7. Now drag and drop a worksheet, say, “Population” on the workspace of the new story you
created. Add a caption to this worksheet in the box just above the workspace.
8. Create a New Blank Point, and drag and drop the dashboard into the workspace; add a caption
GDP.
9. You can then make “another point” by adding the worksheet PopulationAge into the
workspace. Add a suitable caption.
10. As a last point in the story add worksheet GDPYear into the Story.
11. Now click through the story in the presentation mode and narrate your “story”. Make your
point. Observe ease with which you can navigate through the presentation that has many
features such as tooltips and selection of a particular item. You can also add filters to make it
truly interactive data analytics discussion.
12. Save the Story for future use and presentation.
Tableau is all about creative assembly of data points. No programming necessary. With some artistic
creativity and domain knowledge of the data being analyzed, Tableau can help in preparing
impressive and convincing data-driven presentations. Use it as tool for your everyday discussions
during team meetings.
© Copyright 2026 Paperzz