Visualize Border Crossing data and analyze the impact of weather at

Visualize Border Crossing data
and analyze the impact of weather
at different crossing points
Sushma Jayaram,
Shwetha Mallya
1 Contents Summary .................................................................................................................................................. 3 Project Flowchart ..................................................................................................................................... 3 Datasets ................................................................................................................................................... 4 • Border Crossing Data ...................................................................................................................... 4 • Storm events Data ............................................................................................................................ 4 • Temperature Data ............................................................................................................................ 4 Data Munging .......................................................................................................................................... 5 Database Design ...................................................................................................................................... 5 • ER Diagram ..................................................................................................................................... 5 • Database design change ................................................................................................................... 6 • Relational Vocab ............................................................................................................................. 7 • Sample Tables .................................................................................................................................. 7 Database creation .................................................................................................................................... 8 Data import ............................................................................................................................................ 10 Data export ............................................................................................................................................ 11 Visualization using Tableau .................................................................................................................... 13 Challenges .............................................................................................................................................. 17 2 Summary
This project demonstrates the weather and storm data for all the border states of US and analyzes
its impact on the number of people travelling into the United States through these borders. The
analysis is visualized using map and graphs on Tableau. In order to analyze this, we use the
weather data and the border crossing data. Using Tableau, we visualize our analysis and
demonstrate it using graphs and maps. We started the project with an assumption that number of
people crossing the borderstates would vary based on weather conditions. Through different
processes that starts with downloading datasets and ends with visualization, we were able to
provide sufficient examples that proves the assumption.
Project Flowchart
The above figure represents the flowchart for our project workflow.
1. Data files: For this project we will be using data from multiple data sources. The border
crossings data file contains border crossing data across the border states in the United States,
along with the number of passengers travelling across the border using bus, trains etc. as a
means of transport.
The next data file is the weather data which contains data for occurrences of storm, rain,
snow, etc. for each month over a couple of years.
The temperature data contains the average temperature for the border states for each month
over years.
3 2. Border Crossing database: The above data files are exported from csv into the database using
python.
3. Files extracted from SQL queries: Based on the visualization task and the data required for it,
make appropriate SQL queries to the database and export the results in the form of a csv
which can be later used for visualization.
4. Visualization: Using tools like mapstory or photoshop, visualize how the weather and
temperature affected the border crossings across the border states and display results on a map
of the US. For this project we used Tableau.
Datasets
Border Crossing Data
Link:http://transborder.bts.gov/programs/international/transborder/TBDR_BC/TBDR_BCQ.html
This is the Bureau of Transportation statistics dataset from the US department of transportation.
The data needs to be selected based on the port of entry, year and month and the retrieved dataset
contains the number of passengers who came into the US from that port via various means of
transportation such as: train, bus, pedestrians and personal vehicles.
•
Storm events Data
Link: http://www.ncdc.noaa.gov/stormevents/
This dataset is provided by the national climatic data center (NCDC). Their database contains the
occurrence of storms or any other significant weather phenomena across US. Since we are
focusing on only border states for this project, we have selected the storm data only for those
states. The retrieved dataset contains storm and weather data for that state for a given month and
year.
•
•
Temperature Data
Link: http://www.ncdc.noaa.gov/cag/
This dataset is also provided by the NCDC. This gives us the average temperature in a given
state for a particular year and month.
Border States: Alaska, Arizona, California, Idaho, Maine, Michigan, Minnesota, New Mexico,
New York, North Dakota, Texas, Vermont, Washington
Years: 2013, 2014
4 Storm/Weather Events: Blizzard, Drought, Extreme Heat, Extreme Cold, Flood, Heavy Rain,
Heavy Snow, High Wind, Tornado, Winter Storm
Data Munging
1. For the storm data, the website did not allow bulk download since there was a 500 row
limit on the retrieved result. Hence the data had to be queried separately for a state for
each month and year separately. After the download was made, they had to be stitched
together into a single file. Also, the data contained a lot of columns that were not relevant
to this project and contained rows for each day of the year. For the scope of this project
we required only major weather phenomena and a count of its occurrence in a given
month. Hence, the final stitched data file had to be converted into a pivot table to get the
data in a format that could be loaded into sql.
2. In border crossing statistics, the means of transportation appeared as columns in the table.
we had to convert them into separate rows For example- For 2013 January, we will show
number of passengers travelling by different modes of transport in separate rows. we
transposed table in such a way that it will help us apply aggregate functions by grouping
set of rows .
3. The data for average temperature was included as a part of the weather data in the same
table.
Database Design
ER Diagram
The design of the ER diagram started with analysis on type of data that we would ultimately
want to use for visualization. The design tries to capture weather data and its impact on various
borders across the US states. Weather table captures storm related data that could be drought,
snow, rain and wind. The number corresponding to these columns refers to the number of times
that particular event (drought, snow, rain and wind) would have occurred for a given month and
year. http://www.ncdc.noaa.gov/stormevents/eventdetails.jsp?id=490068 provides the required
data. Average temperature is an important measure in any weather related data. We extracted
average temperature from a different data source available at http://www.ncdc.noaa.gov/cag/.
Each row in the Weather table would correspond to a specific month and year for any given state
with border crossing ports. This results in a ‘has and belongs’ to many relationship between
Weather and State tables with state_id as a foreign key in Weather table.
BorderCrossing table contains data on number of crossings through different modes of transport
http://transborder.bts.gov/programs/international/transborder/TBDR_BC/TBDR_BCQ.html
•
5 Since every border crossing is associated with a single state and a state can have multiple ports
across its borders, we have a has and belongs to many relationship between state and
BorderCrossing tables. Different modes of transport here refers to the number of passengers
entering or leaving borders by buses, trains, pedestrians or personal vehicles.
Modes of transport is placed in separate table called TransportMode. In BorderCrossing table
each row uniquely refers to only one mode of transport. Hence this results in a has and belongs
to many relationship between BorderCrossing and TransportMode with transport_mode_id as a
foreign key in BorderCrossing table.
By capturing BorderCrossing data with each transport mode separately, we will be able to find
the average/total number of people crossing border by any mode of transport in a specific year or
month or number of passengers crossing the border in a period of 6 months by their personal
vehicles.
The ER diagram, relational vocab and table sketches are provided in the following sections:
Database design change
After certain design considerations, the database design was slightly modified from the original
in order to accommodate a foreign key for the border crossing data into the weather table, which
was missed in our previous design. In order to make the database more efficient and to form a
link between the weather and the border crossing data, we made the weather_id, the primary key
from the Weather table, as the foreign key in the BorderCrossing table. We also eliminated the
TransportMode table from the previous design since it was redundant.
The new ER diagram looks as follows:
•
6 •
Relational Vocab
Weather: has_many BorderCrossing
BorderCrossing: belongs_to Weather
State: has_many Weather
Weather: belongs_to State
State: has_many BorderCrossing
BorderCrossing: belongs_to State
•
Sample Tables
7 Database creation
The database was created using PhpMyAdmin.
8 BorderCrossing
State
Weather
9 Data import
The data from the csv data files was loaded into the database using Python script. Here is an
excerpt from the file:
10 Data export
The data for visualization was queried using a python script and exported into a csv file. Here is
a snippet of the script:
11 SQL Queries for visualization
The queries made to the database were based on finding results that correlate the weather data to
the border crossings.
The above query is to find the total number of people crossing the border to Alaska in 2013, and
also retrieve average temperature for each month of that year.
12 This query is to list all the weather calamities and the number of passengers to the state of North
Dakota in 2014.
Visualization using Tableau
To visualize the results for our analysis, we decided that it would be visually appealing if we
could look at the results on a map view. In order to achieve that, we came across Tableau as the
best fit for visualization. Tableau can easily load data from microsoft excel files, text file, csv
files and even database connections via Microsoft Access. As a part of our research on how to
use this tool, we spoke to a few students who had used this tool before for a different class. We
also looked at tutorials online to see how we could load data and what forms of visualization
tableau is capable of achieving. The results from our queries were exported from the database
using Python into a csv file. We directly used this file to feed in as an input to Tableau.
Tableau is a very powerful tool and is capable of providing different forms of visualization from
line graphs, bar graphs, pie charts, to gantt charts, treemaps and box-and-whisker plots. The type
of graph that was most relevant to us was the symbol maps or filled maps. These maps require a
geographic dimension in order to be populated for the view. Another interesting feature that we
found in this was its ability to identify regions on the map. In our database table we used the state
abbreviations such as ‘AK’ for Alaska, ‘NM’ for New Mexico and so on, and we were concerned
if we had to give complete state names in order for it to link to our data correctly. But Tableau
directly used the state abbreviations and marked those states appropriately.
We have used Tableau to analyze the impact of weather at different crossing points. The visual
representation of map and data display using dimensions and measure values provided useful
insights on correlation between weather and it’s impact on people crossing borders points.
Tableau can be used to analyze the impact of border crossing in many different perspectives. We
considered multiple scenarios that included comparing values across all the states during a
specific month or compare the values across all the months for a specific state. Finally we also
considered comparing effects of individual parameters like heavy snow, flood, extreme heat,
extreme cold, blizzard etc.
13 The screenshot below shows the visualization of data across all the Border States. The variation
in color indicates the variation in average temperature value. We can observe that numbers of
passengers travelling across different border points are higher when temperature is moderate and
there are no incidents of extreme weather conditions. Hovering on a particular state would give
all the additional attributes on number of occurrences of heavy snow, winter storms, heavy rain,
average temperature etc. This clearly demonstrates the impact of weather on number of
passengers as we can observe that in the month of January in 2013, where average temperature
was 9.8F, the numbers of passengers are very less as compared to the number of people crossing
borders in states of Texas and California where the temperature conditions were suitable for
travelling.
Selecting specific states gives us more dimensions to analyze data. Using Tableau we can take a
look on how particular weather phenomenon can affect people crossing the border. It also gives
us an idea on which seasons are more favorable as compared to others for different states. The
next visualization is an analysis of the effect of temperature on the number of people crossing the
border to the state of Alaska for 2013. The months being shown are some parts of winter: Jan,
Feb and parts of summer: June, July and August. As we can see lower the temperature, lesser
number of people travel into the state. Hence, the months of Jan and Feb have around only 3000
passengers. As it gets warmer there is an increase in this number, the highest being 121,064
people in July.
14 The below graph represents the effect of extreme cold weather on the number of passengers into
North Dakota for 2014. As seen, there were 137 ‘extreme cold’ occurrences in the month of
January and this corresponds to a dip in the number of people travelling. Unlike for the months
of July and August, this had a rise in the number of people travelling.
15 The next visualization describes how multiple weather phenomenon can affect the number of
passengers crossing the border. This is stacked bar graph that analyzes effects of average
temperature, extreme cold and blizzard occurrences in North Dakota for the year 2014. And the
effect on number of passengers travelling can be obtained through a line graph as shown. We can
observe that in the month of January, when the average temperature was below 10 with high
occurrences of blizzard and extreme cold, the number of passengers were very low (around 85k)
as compared to the month of August, where the temperature was favorable with no occurrences
of blizzard or extreme cold.
16 Challenges
The initial challenge that we faced was an inconsistency in the database design. As seen from the
first ER diagram, we hadn’t formed any connection between the weather and the bordercrossing
table. This led to a lot of inconsistent results when we were querying for data by joining those
two tables. Hence, we realized that the flaw in the database design needed to be fixed in order to
get accurate data while querying. Adding the foreign key for the weather_id in the border
crossing table fixed this issue and reduced the complexity of our database.
Another challenge was data format versus sql format. The data that we received from the csv
files for the storm or weather data contained more than 20 columns with features that were not
relevant to the scope of this project. It also contained data for everyday of the month whereas we
were only tracking the data for month - to - month. The task was to convert 30,700 rows of raw
data into a month to month summary which listed only selected weather phenomena and a count
of its occurrence in the state for that particular month.
17 In order to achieve this, we used to pivot table. Using pivot tables, we could select only the fields
that were relevant to us and the aggregate function made it easy to fetch only the count of
occurrences of that event. The final data after conversion was only 354 rows, which was also in
an SQL table format which made it easier to load it into the database. To be able to take 30,700
rows of data and convert it into 354 with something as simple as Pivot table, was something we
consider as the most interesting part of our project.
18