Visualize Border Crossing data and analyze the impact of weather at different crossing points Sushma Jayaram, Shwetha Mallya 1 Contents Summary .................................................................................................................................................. 3 Project Flowchart ..................................................................................................................................... 3 Datasets ................................................................................................................................................... 4 • Border Crossing Data ...................................................................................................................... 4 • Storm events Data ............................................................................................................................ 4 • Temperature Data ............................................................................................................................ 4 Data Munging .......................................................................................................................................... 5 Database Design ...................................................................................................................................... 5 • ER Diagram ..................................................................................................................................... 5 • Database design change ................................................................................................................... 6 • Relational Vocab ............................................................................................................................. 7 • Sample Tables .................................................................................................................................. 7 Database creation .................................................................................................................................... 8 Data import ............................................................................................................................................ 10 Data export ............................................................................................................................................ 11 Visualization using Tableau .................................................................................................................... 13 Challenges .............................................................................................................................................. 17 2 Summary This project demonstrates the weather and storm data for all the border states of US and analyzes its impact on the number of people travelling into the United States through these borders. The analysis is visualized using map and graphs on Tableau. In order to analyze this, we use the weather data and the border crossing data. Using Tableau, we visualize our analysis and demonstrate it using graphs and maps. We started the project with an assumption that number of people crossing the borderstates would vary based on weather conditions. Through different processes that starts with downloading datasets and ends with visualization, we were able to provide sufficient examples that proves the assumption. Project Flowchart The above figure represents the flowchart for our project workflow. 1. Data files: For this project we will be using data from multiple data sources. The border crossings data file contains border crossing data across the border states in the United States, along with the number of passengers travelling across the border using bus, trains etc. as a means of transport. The next data file is the weather data which contains data for occurrences of storm, rain, snow, etc. for each month over a couple of years. The temperature data contains the average temperature for the border states for each month over years. 3 2. Border Crossing database: The above data files are exported from csv into the database using python. 3. Files extracted from SQL queries: Based on the visualization task and the data required for it, make appropriate SQL queries to the database and export the results in the form of a csv which can be later used for visualization. 4. Visualization: Using tools like mapstory or photoshop, visualize how the weather and temperature affected the border crossings across the border states and display results on a map of the US. For this project we used Tableau. Datasets Border Crossing Data Link:http://transborder.bts.gov/programs/international/transborder/TBDR_BC/TBDR_BCQ.html This is the Bureau of Transportation statistics dataset from the US department of transportation. The data needs to be selected based on the port of entry, year and month and the retrieved dataset contains the number of passengers who came into the US from that port via various means of transportation such as: train, bus, pedestrians and personal vehicles. • Storm events Data Link: http://www.ncdc.noaa.gov/stormevents/ This dataset is provided by the national climatic data center (NCDC). Their database contains the occurrence of storms or any other significant weather phenomena across US. Since we are focusing on only border states for this project, we have selected the storm data only for those states. The retrieved dataset contains storm and weather data for that state for a given month and year. • • Temperature Data Link: http://www.ncdc.noaa.gov/cag/ This dataset is also provided by the NCDC. This gives us the average temperature in a given state for a particular year and month. Border States: Alaska, Arizona, California, Idaho, Maine, Michigan, Minnesota, New Mexico, New York, North Dakota, Texas, Vermont, Washington Years: 2013, 2014 4 Storm/Weather Events: Blizzard, Drought, Extreme Heat, Extreme Cold, Flood, Heavy Rain, Heavy Snow, High Wind, Tornado, Winter Storm Data Munging 1. For the storm data, the website did not allow bulk download since there was a 500 row limit on the retrieved result. Hence the data had to be queried separately for a state for each month and year separately. After the download was made, they had to be stitched together into a single file. Also, the data contained a lot of columns that were not relevant to this project and contained rows for each day of the year. For the scope of this project we required only major weather phenomena and a count of its occurrence in a given month. Hence, the final stitched data file had to be converted into a pivot table to get the data in a format that could be loaded into sql. 2. In border crossing statistics, the means of transportation appeared as columns in the table. we had to convert them into separate rows For example- For 2013 January, we will show number of passengers travelling by different modes of transport in separate rows. we transposed table in such a way that it will help us apply aggregate functions by grouping set of rows . 3. The data for average temperature was included as a part of the weather data in the same table. Database Design ER Diagram The design of the ER diagram started with analysis on type of data that we would ultimately want to use for visualization. The design tries to capture weather data and its impact on various borders across the US states. Weather table captures storm related data that could be drought, snow, rain and wind. The number corresponding to these columns refers to the number of times that particular event (drought, snow, rain and wind) would have occurred for a given month and year. http://www.ncdc.noaa.gov/stormevents/eventdetails.jsp?id=490068 provides the required data. Average temperature is an important measure in any weather related data. We extracted average temperature from a different data source available at http://www.ncdc.noaa.gov/cag/. Each row in the Weather table would correspond to a specific month and year for any given state with border crossing ports. This results in a ‘has and belongs’ to many relationship between Weather and State tables with state_id as a foreign key in Weather table. BorderCrossing table contains data on number of crossings through different modes of transport http://transborder.bts.gov/programs/international/transborder/TBDR_BC/TBDR_BCQ.html • 5 Since every border crossing is associated with a single state and a state can have multiple ports across its borders, we have a has and belongs to many relationship between state and BorderCrossing tables. Different modes of transport here refers to the number of passengers entering or leaving borders by buses, trains, pedestrians or personal vehicles. Modes of transport is placed in separate table called TransportMode. In BorderCrossing table each row uniquely refers to only one mode of transport. Hence this results in a has and belongs to many relationship between BorderCrossing and TransportMode with transport_mode_id as a foreign key in BorderCrossing table. By capturing BorderCrossing data with each transport mode separately, we will be able to find the average/total number of people crossing border by any mode of transport in a specific year or month or number of passengers crossing the border in a period of 6 months by their personal vehicles. The ER diagram, relational vocab and table sketches are provided in the following sections: Database design change After certain design considerations, the database design was slightly modified from the original in order to accommodate a foreign key for the border crossing data into the weather table, which was missed in our previous design. In order to make the database more efficient and to form a link between the weather and the border crossing data, we made the weather_id, the primary key from the Weather table, as the foreign key in the BorderCrossing table. We also eliminated the TransportMode table from the previous design since it was redundant. The new ER diagram looks as follows: • 6 • Relational Vocab Weather: has_many BorderCrossing BorderCrossing: belongs_to Weather State: has_many Weather Weather: belongs_to State State: has_many BorderCrossing BorderCrossing: belongs_to State • Sample Tables 7 Database creation The database was created using PhpMyAdmin. 8 BorderCrossing State Weather 9 Data import The data from the csv data files was loaded into the database using Python script. Here is an excerpt from the file: 10 Data export The data for visualization was queried using a python script and exported into a csv file. Here is a snippet of the script: 11 SQL Queries for visualization The queries made to the database were based on finding results that correlate the weather data to the border crossings. The above query is to find the total number of people crossing the border to Alaska in 2013, and also retrieve average temperature for each month of that year. 12 This query is to list all the weather calamities and the number of passengers to the state of North Dakota in 2014. Visualization using Tableau To visualize the results for our analysis, we decided that it would be visually appealing if we could look at the results on a map view. In order to achieve that, we came across Tableau as the best fit for visualization. Tableau can easily load data from microsoft excel files, text file, csv files and even database connections via Microsoft Access. As a part of our research on how to use this tool, we spoke to a few students who had used this tool before for a different class. We also looked at tutorials online to see how we could load data and what forms of visualization tableau is capable of achieving. The results from our queries were exported from the database using Python into a csv file. We directly used this file to feed in as an input to Tableau. Tableau is a very powerful tool and is capable of providing different forms of visualization from line graphs, bar graphs, pie charts, to gantt charts, treemaps and box-and-whisker plots. The type of graph that was most relevant to us was the symbol maps or filled maps. These maps require a geographic dimension in order to be populated for the view. Another interesting feature that we found in this was its ability to identify regions on the map. In our database table we used the state abbreviations such as ‘AK’ for Alaska, ‘NM’ for New Mexico and so on, and we were concerned if we had to give complete state names in order for it to link to our data correctly. But Tableau directly used the state abbreviations and marked those states appropriately. We have used Tableau to analyze the impact of weather at different crossing points. The visual representation of map and data display using dimensions and measure values provided useful insights on correlation between weather and it’s impact on people crossing borders points. Tableau can be used to analyze the impact of border crossing in many different perspectives. We considered multiple scenarios that included comparing values across all the states during a specific month or compare the values across all the months for a specific state. Finally we also considered comparing effects of individual parameters like heavy snow, flood, extreme heat, extreme cold, blizzard etc. 13 The screenshot below shows the visualization of data across all the Border States. The variation in color indicates the variation in average temperature value. We can observe that numbers of passengers travelling across different border points are higher when temperature is moderate and there are no incidents of extreme weather conditions. Hovering on a particular state would give all the additional attributes on number of occurrences of heavy snow, winter storms, heavy rain, average temperature etc. This clearly demonstrates the impact of weather on number of passengers as we can observe that in the month of January in 2013, where average temperature was 9.8F, the numbers of passengers are very less as compared to the number of people crossing borders in states of Texas and California where the temperature conditions were suitable for travelling. Selecting specific states gives us more dimensions to analyze data. Using Tableau we can take a look on how particular weather phenomenon can affect people crossing the border. It also gives us an idea on which seasons are more favorable as compared to others for different states. The next visualization is an analysis of the effect of temperature on the number of people crossing the border to the state of Alaska for 2013. The months being shown are some parts of winter: Jan, Feb and parts of summer: June, July and August. As we can see lower the temperature, lesser number of people travel into the state. Hence, the months of Jan and Feb have around only 3000 passengers. As it gets warmer there is an increase in this number, the highest being 121,064 people in July. 14 The below graph represents the effect of extreme cold weather on the number of passengers into North Dakota for 2014. As seen, there were 137 ‘extreme cold’ occurrences in the month of January and this corresponds to a dip in the number of people travelling. Unlike for the months of July and August, this had a rise in the number of people travelling. 15 The next visualization describes how multiple weather phenomenon can affect the number of passengers crossing the border. This is stacked bar graph that analyzes effects of average temperature, extreme cold and blizzard occurrences in North Dakota for the year 2014. And the effect on number of passengers travelling can be obtained through a line graph as shown. We can observe that in the month of January, when the average temperature was below 10 with high occurrences of blizzard and extreme cold, the number of passengers were very low (around 85k) as compared to the month of August, where the temperature was favorable with no occurrences of blizzard or extreme cold. 16 Challenges The initial challenge that we faced was an inconsistency in the database design. As seen from the first ER diagram, we hadn’t formed any connection between the weather and the bordercrossing table. This led to a lot of inconsistent results when we were querying for data by joining those two tables. Hence, we realized that the flaw in the database design needed to be fixed in order to get accurate data while querying. Adding the foreign key for the weather_id in the border crossing table fixed this issue and reduced the complexity of our database. Another challenge was data format versus sql format. The data that we received from the csv files for the storm or weather data contained more than 20 columns with features that were not relevant to the scope of this project. It also contained data for everyday of the month whereas we were only tracking the data for month - to - month. The task was to convert 30,700 rows of raw data into a month to month summary which listed only selected weather phenomena and a count of its occurrence in the state for that particular month. 17 In order to achieve this, we used to pivot table. Using pivot tables, we could select only the fields that were relevant to us and the aggregate function made it easy to fetch only the count of occurrences of that event. The final data after conversion was only 354 rows, which was also in an SQL table format which made it easier to load it into the database. To be able to take 30,700 rows of data and convert it into 354 with something as simple as Pivot table, was something we consider as the most interesting part of our project. 18
© Copyright 2026 Paperzz