analysis and evaluation of carbon dioxide (co2)

National College of Ireland
Hdip in science in Data Analytics
2013/2014
Taofikat Shade Salawu
X13120387
[email protected]
ANALYSIS AND EVALUATION OF CARBON DIOXIDE
(CO2) EMISSIONS TARGET IN CAR MANUFACTURING IN
UNITED STATES OF AMERICA (USA).
Dissertation
I
ABSTRACT
In recent years it has become imperative for the government of United States of America
(USA) to sanction vehicle manufaturers that are not compliant with the CO2 emissions target
set. Stricter regulations, like Carbon tax and CO2 emissions minimum and maximum caps
have been introduced to ensure that the transportation sector which accounts for about 32% of
CO2 emission in the USA complies.
The analysis of trends in CO2 emissions in car manufacturing and evaluarion of its impacts
on human life and the society as a whole is of paramount importance due to health
implications posed to human and its effect on climate and global warming issues.
A detailed review of literature on CO2 emissions in vehicle manufacturing and its impact on
human health and the society in USA was carried out to support the study. Also, data from
the internet was analysed using data mining techniques. Evidence gathered from the findings
suggested a linkage between the trends in CO2 emissions target in vehicle manaufacturing
and its impacts on human health and society as whole.
This research outcome provides the importance and understanding of compliance and non
compliance with CO2 emissions target set not only for car manufacturers, but also the US
government and the americans.
II
TABLE OF CONTENTS
ABSTRACT
TABLE OF CONTENTS
II
III-IV
LIST OF FIGURES
V
CHAPTER 1: INTRODUCTION
1
1.1 BACKGROUND
1
1.2 SCOPE OF THE RESEARCH
2
1.3 RESEARCH QUESTION, AND OBJECTIVES
3
1.4 STRUTURE OF THE PROJECT
3
CHAPTER 2: LITERATURE REVIEW
4
2.1 CO2 EMISSIONS IN USA
4
2.2 CONCEPT OF CO2 EMISSION TARGET IN CAR CAR MANUFACTURING
IN USA
2.3 EMERGENCE OF HYBRID AND LOW EMITERS CARS IN USA
6
7
2.3.1 Battery Electric Vehicles
8
2.3.2 Plug-in hybrid electric vehicles
8
2.3.3 Hybri electric vehicles
8
2.3.4 Fuel cell vehicles
9
2.4 IMPACT OF CO2 EMISSIONS ON HUMAN HEALTH AND SOCIETY
2.5 CONCLUSIONS
CHAPTER 3: RESEARCH METHODOLOGY
9
10
12
3.1 RESEARCH QUESTION AND OBJECTIVES
12
3.2 DATA COLLECTION METHODS
12
3.3 DATA MINING PROCESS
13
3.3.1 Data mining
13
3.3.2 Weka
14
3.4 CONCLUSIONS
18
III
CHAPTER 4: FINDINGS AND ANALYSIS
19
4.1 FINDINGS
19
4.2 ANALYSIS OF FINDINGS
22
4.2.1 ANALYSIS OF CO2 EMISSIONS TRENDS
22
4.2.2 IMPACT OF CO2 EMISIONS ON HUMAN HEALTH AND SOCIETY
23
CHAPTER 5: CONCLUSIONS AND RECOMENDATIONS
25
5.1 OVERALL CONCLUSIONS
25
5.2 RECOMENDATIONS AND CONTRIBUTION TO KNOWLEDGE
26
5.3 LIMITATIONS AND AREAS FOR FUTURE RESEARCH
26
REFERENCES
27
IV
LIST OF FIGURES
Chapter 2
Figure 2.1: USA CO2 emissions
5
Figure 2.2: USA CO2 emissions 1990 - 2012
6
Chapter 3
Figure 3.1: WEKA graphical user interphase
15
Figure 3.2 Screen shot of raw data collected
16
Figure 3.3 Data re-processing stage
16
Figure 3.4 Screen shot of processed data
17
Figure 3.5 Explorer user interphase
18
Chapter 4
Figure 4.1 Cluster panel in WEKA
19
Figure 4.2 Cluster created in WEKA
20
Figure 4.3 Cluster output from WEKA
21
Figure 4.4 Cluster instances
22
Figure 4.5 Visualized panel
24
Figure 4.6 Result from visualized panel
25
V
CHAPTER 1
INTRODUCTION
The overview of the project is provided in this chapter. It’s background and importance are
discussed in Section 1.1. Section 1.2 presents the scope, aims, objectives and research
questions were outlined in Section 1.3 and the final section highlights the structure of the
project.
1.1 BACKGROUND
There is no doubt that United States of America (USA) is the largest automotive market in
the world. On an average, about 8.1 million passenger vehicles are produced annually by
vehicle manufacturers in USA. All the major European, Japanese and Korean automakers has
one or more manufacturing facilities in USA. Some of the biggest USA auto companies
includes General Motors, Honda, Chrysler, Toyota, Nissan, Hyundia-Kia, BMW, Mazda,
Mitsubishi, Subaru and Volkswagen (The Vehicle Production Group, 2012). Many of the
above auto manufacturers set up engine and transmission plants in USA and they conduct
research and development initiatives to transform, innovates and meet the challenges within
the sector in the recent years. Also, design and testing of major car manufaturers are carried
out in the USA, hence the sector has employed over 0.7 million and accounts for about 5% of
USA’s GDP (Aman, 1989).
There is substantial network of automotives spare parts suppliers that serves the industry. The
shipment of about $200 billion accounted for 3% of total USA manufacturing. According to
the Centre for Automotive Research, over 3 million jobs were created in the automotive spare
parts sector and this provides economic wellbeing than any other manufacturing sector.
In the recent years, the USA exported about 3 million vehicles to more than 180 countries
around the world valued at $61 billion, with additional exports of automotive parts valued up
to $67 billion. Due to USA open investment policy, a large consumer market, a highly skilled
workforce, available infrastructure, and government incentives, it is the premier place for the
1
future of the auto industry. It is evident from the above that automotives sector contribute
positively to the economy of USA.
However, car manufacturing involves different production processes that give rise to internal
combustion in engines which cause CO2 emissions. Therefore, it is important to analyse the
trends of CO2 emissions with a view to identify the car manufacturers that achieve the CO2
emissions targets set and evaluate the impact of the above analysis on human and the
environment.
Therefore a better understanding of the evaluation of the trends in CO2 emission reduction
strategies in automotive industries will go a long way to enable the scientists to evaluate its
impact on human life both in short and long term. This bring about the question, ‘how would
compliance and non compliance of auto manufacturers to CO2 emission targets set by
government agents in USA affect human life and environment?, and this has not been
addressed in detail in the literature.
Currently, the author is pursuing Higher Diploma in Science - Data Analytics in National
College in Ireland, Dublin. As always, there are currently many collaborative researches
being undertaking within the automotive sector to find a lasting solution to the current
menace of CO2 emission rate into the atmosphere that supporting awareness campaign on
global warming.
This project aims to analyse the trends of CO2 emissions targets in car manufacturing in USA
using data minning techniques and evaluate its impact on human life and environment at
large. It is anticipated that better understanding of this impact will enable the automotive
industries to do more to achieve the CO2 emissions target set either through innovative
processes or production of hybrid vehicles. Chapter 1.2 will further elaborate the research
gaps with regards to the literature. The scope, general aims, research study question and
objectives will be defined below.
1.2 SCOPE OF THE RESEARCH
This project is focused on analysis of the trends of CO2 emissions targets in vehicle
manufacturing in USA and its impact on human life and society. The scope of the research is
as described below:
2

The analysis of the trends of CO2 emissions reduction in car manufacturing in USA.

Evaluation of the impact of the above results on human life and society at large.
1.3 RESEARCH AIMS, OUESTION AND OBJECTIVES
The aim of this project is to analyse the trends of CO2 emissions target in car manufacturing
and evaluates the impact of the results ot the analysis on human life and society.
The project outcome will provide a basis for compliance and non compliance of initiatives on
global warming and environmental issues in USA and the world at large. Hence, the research
question is as follows:
What impact will adherence and non adherence to CO2 emission reduction target have on car
manufacturers, human life and society.
To address the research question, the detailed objectives shown below were outlined as
follows:
(1)
To analyse and evaluate the trend of CO2 emission reduction strategy from the
perspectives of different car manufacturers in USA.
(2)
To evaluate the impact of the above on human life and society
1.4 STRUCTURE OF THE PROJECT
The thesis is organised into five chapters as follows:

Chapter 1 provides a general overview of the research including: background, scope,
aims, questions and objectives.

Chapter 2 provides a review of the literature on CO2 emissions in car manufacturing
in USA and its impact on human life and society, with the aim of covering the
theoretical background of this study.

Chapter 3 reviews the methodology.

Chapter 4 provides findings and data analysis

Chapter 5 provides conclusions and recommendations
3
CHAPTER 2
LITERATURE REVIEW
In the first chapter, a brief explanation of the research background was given and the
importance of the project was highlighted. The understanding of this in car manufacturing
sector is very important in many national economies. This chapter will review relevant
literature to provide a theoretical background for this work. Main areas of the literature for
consideration are CO2 emissions and car manufacturing in USA, control of CO2 emissions
and emergence of hybrid cars and impact of CO2 emissions on human life and society. The
aim of reviewing these areas is to refine the potential areas for the research study question.
This chapter is organised into four sections. A brief outline of the concept of CO2 emissions
in USA and its emissions reduction in car manufacturing sector in USA were covered in
Section 2.1 and 2.2 respectively. Section 2.3 presents the emergence of hybrid cars in USA.
Section 2.4 reviews the impacts of CO2 emissions reduction on human life and societies, and
the conclusion is provided in Sections 2.5.
2.1
CO2 emissions in USA
According to (Matthews et al, 2008; Wiedmann and Minx, 2008), CO2 is the prime green
house gas (GHG) emitted and accounted for about 82% of all USA GHG emissions from
human activities. It is naturally present in the atmosphere as part of the earth's carbon cycle
(i.e the circulation of carbon in the atmosphere, oceans, soil, plants, and animals). The carbon
cycle has been altered by human activities by adding more CO2 to the atmosphere and
influencing the ability of natural sinks, like forests, to remove CO2 from the atmosphere.
Though, CO2 emissions come from an array of natural sources, but since the Industrial
Revolution began around 1750, human activities have contributed significantly to climate
change due to increase in CO2 and other heat-trapping gases to the atmosphere The main
human activity that emits CO2 is the combustion of fossil fuels (coal, natural gas, and oil) for
energy and transportation (Peters et al., 2009).
4
In broad term and as shown in Figure 2.1, the main sources of CO2 emissions in the USA are
Tranportation, Electricity and Industry, CO2 emissions in the USA increased by about 5%
between 1990 and 2012 (Weber and Matthews, 2007). Because the combustion of fossil fuel
is the largest source of GHG emissions in the USA, changes in emissions from fossil fuel
combustion have historically been the dominant factor affecting total U.S. emission trends.
Therefore, changes in CO2 emissions from fossil fuel combustion are influenced by factors,
such as economic growth, energy prices, new technologies, and seasonal temperatures
(Califonia Energy Commission, 2008).
Figure 2.1: USA Carbon Dioxide Emissions, By Source
All emission estimates from the Inventory of U.S. Greenhouse Gas Emissions and
Sinks: 1990-2012.
As shown in Figure 2.2, between 1990 and 2012, the increase in CO2 emissions
corresponded with increased energy use by an expanding economy and population, and an
overall growth in emissions from electricity generation. Emissions due to transportation also
contributed to the 5% increase largely due to an increase in miles travelled by motor vehicles
(Light-Day Automobile Technology, 2012)..
5
Figure 2.2: USA Carbon Dioxide Gas Emissions, 1990-2012
All emission estimates from the Inventory of U.S. Greenhouse Gas Emissions and Sinks:
1990-2012.
Going forward, CO2 emissions in the United States are projected to grow by about 1.5%
between 2005 and 2020.
2.2
Concept of CO2 emissions target in car manufacturing in USA
The combustion of fossil fuels such as gasoline and diesel to transport people and goods is
the second largest source of CO2 emissions, accounting for about 32% of total U.S. CO2
emissions and 27% of total U.S. GHG emissions in 2012 (Light-Day Automotive
Technology, 2012). This category includes transportation sources such as highway vehicles,
air travel, marine transportation, and rail.
The US Department of State, (2007), on the projected GHG emissions highlighted that, CO2
emissions connected to the production of transportation fuels, focusing on well-to-wheel
analysis. However, It ignores the CO2 emissions with respect to production of the vehicles
which id the subject-matter of this study. In the recent years, the production of motor vehicles
6
in the USA alone emitted about 800 million metric tonnes of CO2 annually, comparable to
aviation industry (950 million metric tonnes) (U.S. Environmental Protection Agency). To
get a proper picture of the carbon footprint of transportation, it is neccessary to consider and
include the production of the vehicles.
Motor Vehicles Facts and Figures, (1998), highlighted that, there is no doubt that life-cycle
assessments indicated that 50 percent of the GHG emissions of car manufacturing are related
to materials and because car manufacturing has complex international supply chains, a
detailed analysis of the emissions from the production of imported products is essential.
However, this may be difficult to obtain due to complex nature of materials involved.
The analysis of CO2 emissions of different auto manufacturers in USA has become very
important (Motor Vehicles Facts and Figures, 1998; The Vehicle Production Group LLC,
2012), hence the basic importance of this study The objectives of emission reduction
initiatives are to improve the car manufacturing process and qualities to ensure the consumers
have an increasing number of high fuel economy and low CO2 emissions vehicle choice.
Hence most automotive manufacturers now adopting strategic innovation initiative to achieve
increased fuel economy resulting in lower CO2 emissions (Malliaris et al., 1976; Peters et al.,
2009; Light-Day Automobile Technology, 2012).
A number of innovation strategies have been adopted and implemented to improve quality,
(Shelef, M. 2004). These new initiatives have increased the relevance of consumer-focused
approach for providing quick response to complex consumer demands and increasing
expectations, hence nearly all car manufacturees today are selling different vehicles that can
meet future CO2 emissions targets (Hellman and Murell, 1996; The Vehicle Production
Group, 2012). The automotive sector in USA has been improving in recent times, creating
new improved value for their consumers which can be regarded as the ultimate wayfor the car
manufacturers to remain competitive, sustain growth and for long-term profitability.
2.3
Emergence of hybrid and emissions reduced vehicles in USA
The achievement of CO2 emissions reduction targets by automotive industries has led to the
introduction of innovative brands of vehicles (Amann, 1989; Murell and Heavenrich, 1990;
Heavenrich, 2006) i.e. hybrid cars and other emissions reduction vehicles which are less
reliant on fossil fuel that generate combustion of GHG into the atmosphere, which in turn
impacts on value on products and reputation of the organisation (Hellman and Murell, 1986;
7
Lipset et al., 1994). According to (Lash and Wellington, 2007; U.S. EPA, 2012), CO2
emissions reduction in car manufacturing will have significant impact on the market share
over a long-term. The most effective way to reduce CO2 emissions is to reduce fossil fuel
consumption Some of the initiatives employed by car manufacturers in USA are discussed
below.
2.3.1 Battery Electric Vehicles (BEV)
The BEV are propelled by one or more electric motors powered by rechargeable battery
packs. They are energy efficient and reduce dependence on gasoline since electricity is
produced from domestic sources. It emit no tailpipe polutants, although the power plants
producing the electricity may emit pollution. BEV have other performance benefits, they are
quiet, have instant torque for quick acceleration and requires less maintenance than internal
combustion engines. However, they are more expensive than conventional vehicles and
hybrids due to the costs of large battery packs. In the recent years, the manufacturers are
working round the clock to improve driving range, reduce the costs and ensure public
charging stations are widely available in the future (Collier, 2006; U.S. EPA, 2012).
2.3,2 Plug-in Hybrid Electric Vehicles (PHEV)
These are hybrid vehicles that engages high capacity batteries that can be charged by
plugging them into electrical outlet or charging stations. They emit no tailpipe pollutants and
can store sufficient electricity from the power grid to reduce their gasoline consumption
under a typical driving conditions (U.S. EPA, 2012).
Basic configuration of PHEVs are either series or blended. In the series, the electric motors is
the only power source that turns the wheel and the gasoline engine generate the electricity.
However, in the blended, both the engine and the electric motor are mechanically connected
to the wheels and both may propel the vehicles U.S. EPA, 2012).
2.3.3 Hybrid Electric Vehicles
HEVs combines the best features of the internal combustion engines with other electric
motors and can significantly improves fuel economy without affecting performance of
driving range. Though HEV are primarily propelled by an internal combustion vehicles,
however, they also convert energy wasted during coasting and braking into electricity which
8
is stored in a battery until needed by electric motor and emit no tailpipe pollutant (U.S. EPA,
2012).
2.3.4 Fuel Cell Vehicles (FCV)
FCVs are propelled by electric motors powered by fuel cells which produce electricity from
the chemical energy of hydrocarbons. Fuel cell technology is more efficient than internal
combustion engines and environmentally cleaner. Hydrocarbon fuel cell vehicles only emit
water and there are many challenges that need to be overcome before FCVs can be marketed
in mass or sold at local dealership (U.S. EPA, 2012).
Also, the advancement in diesel engine technology have improved performance, reduced
noise and fuel odour and decreased emissions of pollutants and the Compressed Natural Gas
Vehicles (CNGVs) reduce dependence on petroleum and produce fewer smog-forming and
GHG pollutants.
2.4
Impact of CO2 emissions on human health and society
(Weber and Matthews, 2007; Matthews and Hedrickson, 2008), stressed that CO2 remain a
prime reasons why earth can sustain life, as plants absorb CO2 in order to produce breathable
air and also contributes to the greenhouse effect, i.e. a natural occurrence in which light and
heat travel to the earth from the sun, hence, CO2 are trapped and cannot radiate back out into
space, thereby, keeps the earth warm enough to support life.
Moreever, scientists from Stanford University have spelled out the direct links between
increased levels of carbon dioxide in the atmosphere and increases in human mortality,
Hence, the need to study and evaluate the impact of the CO2 emissions on human life and the
society. The new findings come to light just after the U.S. Environmental Protection
Agency's recent ruling against states setting specific emission standards for GHG based on
lack of data showing the link between CO2 emissions and their health effects.
In the recent years, Mark Jacobson of Stanford University in Califonia, highlighted that for
each increase of one degree Celsius caused by carbon dioxide, the resulting air pollution
9
would lead annually to about a thousand additional deaths and many more cases of
respiratory illness and asthma in the USA (Woods et al., 1998;.
(Hammond, G. 2007; Weiddema, et al., 2008) stated that chemical and meteorological
changes due to CO2 emissions, increase mortality due to increased ozone, particles and
carcinogens in the air and the effects of CO2's warming are most significant where the
pollution is already severe
It has been highlighted that the health-related effects of CO2 emissions are prominent in areas
with significant pollution. As such, increased warming due to CO2 will worsen people's
health at a much faster rate in the USA. Also, much of the population of the USA already has
been directly affected by climate change through the air they have inhaled over the last few
decades and in consequence, the health effects would grow worse if temperatures continue to
rise (OSHA, 2012).
It has been further revealed by Brian, (1998) that the amounts of ozone and airborne particles
that result from temperature increases, caused by increases in carbon dioxide emissions
causes and worsens respiratory and cardiovascular illnesses, emphysema and asthma, hence,
many published studies have associated increased ozone with higher mortality, whereby
particles are responsible for cardiovascular and respiratory illness and asthma.
Furthermore, increasing temperatures on pollution will lead to higher temperatures as CO2
increases the chemical rate of ozone production and increased water vapour due to carbon
dioxide-induced higher temperatures will boost chemical ozone production in urban areas
(Hammond G, 2007).
(Woods, et al., 1998; Califonia Energy Commission, 2008) stated that due to CO2 emissions,
air temperatures goes up quickly than ground temperatures, which change the vertical where
they formed.
.
It should be noted that, CO2 emissions definitely caused the changes in ozone and particles,
transport, clouds, emissions and other processes that affect pollution, because it is the only
input that varies in proportion in the atmosphere.
10
2.5 Conclusions
Carbon emissions, most notably carbon dioxide (CO2), are part of a collection of gases that
negatively influence the quality of our air and increase the greenhouse effect. Greenhouse
gases have a direct influence on the environment, causing extreme weather changes, a global
temperature increase, the loss of ecosystems and potentially hazardous health effects for
people.
Some have sought ways to regulate carbon emissions through federal government mandates.
There are two prevailing schools of thought regarding governmental control of carbon
emissions. The first, a carbon tax, is exactly what it sounds like -- taxing companies directly,
based on the amount of carbon they put into the atmosphere. The goal of a carbon tax would
be to convince businesses and other organizations to reduce their total emissions.
The second governmental control approach that has been under study in recent years is
referred to as cap-and-trade legislation. In this system, the government sets a "cap" on the
maximum amount of emissions it will allow. From here, it then auctions off emissions
allowances to companies until it reaches that cap. Companies that cannot cover their
emissions with their allowances are forced to either reduce their totals or buy allowances
from other companies. This system is designed to promote stricter emissions standards
without directly taxing companies. It does have some potential problems, however, as the
carbon cap grows more stringent over time under plans that have been discussed, companies
may have to buy special permits at high prices, a cost that would likely be passed along to
consumers. Also, the program could have a negative impact on the overall economy: “As
utility rates go up, as well as rates of anything that uses energy in its production, the fear is
consumers will tighten their wallets, which could lead to cutbacks in production, consumer
spending and jobs” (Wall Street Journal).
11
CHAPTER 3
METHODOLOGY
This chapter will describe the methodology that was used for achieving the aims and
objectives of the research outlined in the literature review. According to Saunders et al.,
(2001), researchers need to take into consideration, the research methodology and the detailed
procedures to be used. The constituents of each of the above will be described coherently in
order to portray the entire research process. This chapter consists of three sections. The
research objectives are discussed in Section 3.1, and Section 3.2 give an explanation of the
data collection methods and the chosen methods for this research. Section 3.3 discusses the
conclusion.
3.1 Research question and objectives
This study poses the question to analyse the CO2 emissions trends in car manufacturing in
USA and how it impact on human life and society. The researcher intend to use data minning
techniques for the analysis.
As the literature review reveals, the data available on this subject-matter is insufficient,
therefore, this study seeks to put additional knowledge to the existing one in the area. It is
also hoped that the outcome of this research will be able to serve as a platform for further
reseach study in this area. The aim of the research is to analyse the trends in CO2 emissions
in car manufacturing and evaluates its impact on human life and society in USA.
3.2 Data collection methods
As stated by Saunders et al. (2012), two main types of data collection exist, which are
qualitative and quantitative research methods. According to Silverman, (2005), the difference
between the two is the research design which are very much of “real practical value”. For the
12
purpose of this research, quantitative research method involving data mining techniques will
be used.
The screenshot of the data collected from the internet in Figure 4 will be used for this study
and descriptive statistics (a baseline technique) and data minning techniques was used in
getting a sense of the data collected. Data used for this research were collected from the
webiste of and data minning techniques were used for its anlysis.
3.3 Data mining process
Data mining techniques and WEKA machine learning tools will be used to achieve the aim
and objectives one of this study
3.3.1 Data mining
This refer to the process of discovering useful pattern or knowledge from data source e.g the
databases, text, images, the web etc. the pattern discovered has to be valid, potentially useful
and understandable. It is also known as Knowledge Discovery in a Database (KDD). Data
mining is a multidisciplinary field involving machine learning, statistic, database, artificial
intelligence, information retrieval and visualisation. Data mining task include classification
(supervised learning), clustering (unsupervised learning), association rule and sequential
pattern mining (B. Liu, 2011).
In this project, the author focussed the data mining tasks on clustering to classify the data into
5 clusters. These tasks will enable the researcher to identify the trend in the CO2 emissions
from tailpipe perspectives in car manufacturing.
As a data analyst, the reseacher started the process of data minning tasks with understanding
of application domain and then identified a target data used for the data mining tasks carried
out, which involved three main stages:
1. Pre-processing: the data were cleaned and presence of any abnormality and irrelevant
column were removed.
2. Data mining: The data were transported to the data mining algorithm which will give
the output as a pattern or knowledge.
13
3. Post-processing: the useful pattern for the application were identified, hence series of
evaluation and visualization techniques were used to achieve a better decision (B.Liu,
2011).
Data Source: Official U S Government Source for Fuel Economy Information
Fuel economy data are the result of vehicle testing done at the Environmental Protection
Agency's National Vehicle and Fuel Emissions.
Data set: the dataset consist of fuel economy data. The raw data consist of 34437 variables
and 71 attributes, which is 11,542 KB in size.
The dataset contain the following
co2TailpipeGpm => numeric
comb08 => numeric
cylinders=> numeric
displ => numeric
highway08 => numeric
UCity => numeric
UHighway => numeric
Model => nominal
Trany=> nominal
The dataset consist of 11 attribute, and 34,631 variables (instances).
Data Description: The dataset consist of categorical, numeric and logical attributes
The data set, in its original format, is stored in tabular format with headers and with
accompany documentation.
Vehicles.csv: this is the name of the data set from fuel economy website, with accompany
documentation showing the name of attributes. The column names and definition are showing
in the accompany documentation.
The original file was down loaded from
14
http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip
Then, WEKA, an open source software with explorer window loaded with default dataset
was used to provide extensive support for the data mining processes.
3.3.2 WEKA
WEKA is an open source software with explorer window and a machine learning tool loaded
with default dataset, developed from University of Wakaito in New Zealand. It was written in
Java and the current version was released recently. It’s one of the machine learning tool with
a cross plat form of operating system
As shown in below in Figure 3.1, the functional requirements of WEKA graphical interface
consist of four applications and these are Explorer, Experimental, Knowledge Flow and
Simple CLI.
Figure 3.1: WEKA Graphical User Interphase (GUI)
Source: http://weka.software.informer.com/screenshot/119641/
WEKA workbench is a single machine learning scheme that comprises of all machine
learning algorithm. It is a useful tool that provide extensive support for the duration of data
mining processes. The support includes data preparation for input, statistic evaluation,
visualising the input data and as well as result of learning from the output.
WEKA, as a machine learning tool provides implementation of learning algorithm that would
be quickly apply to the datasets. Its tools can be used to transform datasets, the dataset can be
15
pre-processed, put into a learning scheme and analyse the resulting classifier and its
performance, which can be done without writing a code.
For the purpose of this project, WEKA will be used through the GUI called explorer. The
dataset will be read in inform of arff file format or csv format (spreadsheet). The decision tree
can be build and also allow many algorithm to explore. The explorer interface has different
choices of menu to ensure an orderly manner operations (Witten et al., 2011).
The screen shot of the raw data collected from US Fuel Economy site in Figure 3.2 was
downloaded from www.fueleconomy.gov
Figure 3.2: Screen shot of raw data collected
The data re-processing stage in Figure 3.3, involves data cleansing of attributes that are not
supply with instances. i.e removal of “noisy data contains errors and inconsistent data contain
discrepancies in codes or names”. (Tang et al., 2011).
Figure 3.3: Data re-processing stage
16
Source: ttp://www.zentut.com/data-mining/what-is-data-mining
The processed dataset shown in Figure 3,4 below, was stored in comma separated file format
and then load into WEKA using explorer GUI as shown in Figure 3.5.
Figure 3.4: Screeshot of processed data
The data has been cleaned, the missing value has been sorted out, then the resulting data file
i.e. the data file vehiclesa.csv were loaded into WEKA, the pre process stage in WEKA were
shown below in Figure 3.5, In weka choosing cluster and specify the evaluation method is
one of the most important task to be carried out. Since the researcher is interested in clusters,
cluster tab, was selected.SimpleKmeans algorithm is the best known partitional clustering
method and is the most widely use among the other algorithm because of is simplicity and its
17
effectiveness hence, SimpleKmeans was chosen from the drop down lists. Hence all the
attributes and their data type can be seen from the right handside of the panel.
Clustering is the process of arranging data that are similar in some way into groups called
clusters. The cluster is refer to as group of data instances which are similar to each other but
dissimilar to data instances in other clusters. (B. Liu, 2011).
The Kmeans algorithm involves two phase which are:


It assigns examples to an initial set of k clusters.
Then, it updates the assignments by adjusting the cluster boundaries according to the
examples that currently fall into the cluster. (Nci, 2014)
The stoping criaterion are:



No or minimum re-assignments of data points to identify clusters.
No or minimum change of centroids.
Minimum decrease in the sum of squared error (SSE).
SSE= ∑∑ dist(x,m)^2 (Liu, B., 2011)
The Kmean algorithm process involved updating and assigning of k clusters occurs several
times, until no changes or improvement of the cluster fit occur. Then the process stops at this
point and the clusters are finalized. (Nci, 2014)
The weka has the builtin algorithm as it default and capable of performing the clustering
within view second that will normally take ages when doing it manually
 The k-Means Algorithm
(
1.
)
(
);
2.
3.
4.
5.
6.
18
7.
8.
(Liu, B., 2011)
Fig 3.5: Explorer user interface
3.4 Conclusions
The author followed the research study process which enable him to conclude the best
possible approach for this dissertation. The author conducted the data collection, data mining,
data cleansing and clustering processes in the WEKA to add credibility, legitimacy and
reliability to the data provided for the analysis.
19
CHAPTER 4
FINDINGS AND ANALYSIS
This chapter reviews and analyse the research carried out for this study, and it is organised
into three main sections. Section 4.1 provides critiques and findings for and against the views
stated in the literature review chapter. Section 4.2 provides comprehensive analysis and
answers to the research objectives questions using evidence that were gathered from the
findings and the research objectives were linked to the findings in Section 4.3.
4.1 Findings
There were no missing values on the attribute selected. Data preparation reduces the dataset
into smaller datasets. All the duplicates were removed and this led to smaller datasets than the
original dataset. The cluster panel in WEKA was shown in Figure 4.1
Fig 4.1: Cluster panel in WEKA
20
It should be noted that, the only attribute algorithm the resercher is interested in adjusting is
numClusters field, this showed the number of clusters created, then, the default setting was
changed from 2 to 5 and the researcher clicked ok to accept the value of K(numCluster) as
shown in Figure 4.2
Figure 4.2: Clusters created in WEKA
21
Clustering is the process whereby, of arranging data that are similar in some way are arranged
into groups called clusters The cluster is refer to as group of data instances which are similar
to each other but dissimilar to data instances in other clusters. (B. Liu, 2011). The output
shown in Figure 4.3 was obtained fromWEKA when the start button was clicked
Figure 4.3: Clusster output from WEKA
22
The end result produced 4 clusters already specified shown below in Figure 4.4, i.e.
Clustered Instances
Co2tailpipeGpm
0
5836 ( 17%) <-
464.9743
1
8261 ( 24%) <-
617.9806
2
11871 ( 34%) <-
373.1918
3
8663 ( 25%) <-
505.2976
Figure 4.4: Cluster instances
23
4.2 Analysis of findings
Sections 4.2.1, and 4.2.2 below will give further analysis to the findings and will provide
answers to the research question and objectives.
24
Fig 4.5 Visualizing Panel
At this stage the barrel08 is used in x-axis. And Co2TailpipeGpm on y-axis from the fig 4.5
above it can be seen that there is linear relationship between between them. The gilter is
used to see the overlapping cluster by moving it from left to right. Therefore I can see all the
clusters have mixed response to the aspiration. Therefore the variable can be change in
both x-axis and y-axis to visualise other aspect of result. At this point I realised that the weka
has generated extral variables called ‘cluster’ this does not present in the original data set.
But it signified the cluster member of various instance. The output also specified the cluster
membership of that instances.
Fig 4.6 Result from visualized panel
25
This the result from visualizes panel. It show different instances that belong to each clusters.
Viewing this panel am able to see which clusters have highest emission.
Fig 4.7 the output of percentage split
26
The fig above show the percentage split of 80% for training set and 20% of test data.
From the fig 4.7 the test set of 20% used to test the model work pretty well with the model .
the car manufacture will be able to see that they are not comply with emission reduction as
to what government is emphasized on.
4.2.1 Analysis of CO2 emissions trends
27
From the evidence gathered as shown in the findings and Figure 10 above, it can be analysed
further that:
The cluster0: This group of cluster consist of 5836 instances which is 17% of total
instancing the dataset. This group also has co2tailpipeGpm emission of 464.9743 in their
group, this group of cluster has car model of Mutang and transmission is manual with 5-seed.
Ucity and UHighway are inversely proportion to highway08.
Cluster 1: This group has a car model F150 pickup 4WD, Automatics with 4-speed and the
Co2TailpipeGpm of 617.9806 and it has the highest cylinder of 8.0 this make it the highest
emitter and also has a displacement (displ) of 5.0. the higest bareel08 0f 22.837 is used per
annual which increases its Co2TailpipeGpm emission.
Cluster2: This group has Civic model with manual transmission of 5 Speed (spd). This
group has the lowest emission of Co2TailpipeGpm of 373.1918, cylinder of 4.0 and
displacement of 2.0 which make it emit very low. This emission is very important when
considering the global worming effect.
This group also uses lower barrels08 (annual
petroleum consumption in barrels ) of 13.8302. This group used the lowest barrel08 per
annum of 13.8302 but has the highest city08(city MPG) of 22.2531. this group has
highway08 of 28.9071 and Ucity of 28.4256, and UHighway of 48.5959 are very high
compare to other groups.
Cluster3: This group has caravan/Gran caravan with 2 wheel drive with automatic
transmission of 5 speed (spd)
this group are the second largest emitters with
Co2TailpipeGpm of 505.2976, this group has a displacement of 6.0 and cylinder size of 3.0 .
This group has highway08 of 21.5997 and Ucity of 19.6514 and also UHighway of 29.973
In summary, the above analysis will help the the car users to identify and determine or target
a particular car to buy or what model of car they should go for. This shows that, the classes of
cars in cluster 2 will be viable and sought after by the car users because vehicles in this
catgory are low emiters of CO2. However, limitations to the above might includes consumers
28
economic powers, taste and fashions and collective family decision reached by consumers
when making a choice.
Also, the cluster analysis will help to identify the car manufacturers that are following the
rules and regulations of the USA government, as non compliance may attract fines, penalty,
and prosecutions which may lead to reputation damage for non-compliant car manufacturing
organisations and eventual downturn in their volume of activities, earnings and market share.
The above analysis provides answers to the research objective one.
4.2.2 Impact of CO2 emissions trends in human life and society
If most of the current and potential car users opts for their cars from cluster 2 above , i.e. cars
that are low emiters of CO2 and low comsumption of petroleum , this will result in improved
health issues such as respiratory and cardiovascular illnesses, emphysema and asthma caused
due to high level of CO2 emitted into the atmosphere not only from automotive vehicles, but
also from other human activities that claims thousands of life in USA everyear. Many
published studies have associated abundance of CO2 to the level of ozone in the atmosphere,
hence increased ozone upturns mortality level, whereby particles are responsible for
cardiovascular and respiratory illness and asthma. This emission is very important when
considering the global warming effect that impact adversely not only in USA, but in our
global environment and society. The above analysis answers research objective two question
Therefore, compliance with CO2 emissions target will potray car manufacturers as good
corporate citizens, improved earnings and growth market shares, Furthermore, reduced health
hazards, less risk to human life and less endangered environment and better society will be
achieved. Moreover, as stated earlier, not compliance will attract fine, penalty, bad publicity
and impaired reputation for car manufacturers, while prolonged health issues will be order of
the day in USA. This provide answers to the research question of this study.
29
CHAPTER 5
CONCLUSIONS AND RECOMMENDATION
The analysis of CO2 emissions in car manufacturing and evaluation of its impact on human
life and society in USA are important phenomena to the government of USA, considering the
risks posed to human health and damages done to the environment and society. There is no
doubt that, the menace of global warming have not been well addressed properly ,especially
in developed nations, which USA is one. Also of serious concern is the fact that
transportation contributed about 32% of CO2 enissions in USA. the above enumerated
reasons need to be considered carefully, hence, the importance of the above subject-matter.
However, based on the literature, there appear to be no comprehensive and detail theory
concerning the above context. Hence, the research aimed to fill an important research gap
with regards to impact of evaluation of CO2 emissions on human life and society This
chapter discusses and summarises the knowledge gained from this study and it also identifies
important areas for future research. Section 5.1 discusses the overall conclusions.
Recommendations and contribution to knowledge were highlighted in Section 5.2 while
Section 5.3 discusses the limitations as well as areas for future research.
5.1 OVERALL CONCLUSIONS
The outcomes of this research study provides understanding of the analysis of CO2 emissions
trends in USA for improving car manufacturing processess and led to the production of
innovative and hybrid cars, i.e. low emiters, e-cars and low combustion engines. Data
analysed provided key findings in this perspective.
These findings support the linkage which exist between the trends in the analysis of CO2
emissions and its impact on the society and human health.
It should be noted that, United States is likely to bear an increasingly disproportionate burden
of death if no new restrictions are placed on carbon dioxide emissions, by inhaling a greater
abundance of deleterious chemicals due to carbon dioxide and the climate change associated
with it. Therefore, the logical step is to reduce carbon dioxide, that would reduce its warming
30
effect and improve the health of people in the USA. and around the world who are currently
suffering from air pollution health problems associated with it."
The opinions and views expressed in the review of literature were similar to results from the
findings,.
5.2 RECOMMENDATIONS AND CONTRIBUTION TO KNOWLEDGE
This study contributes to knowledge in human life and environmental issues in particular, the
impacts of CO2 emissions on human health, environment, climate and global warming. It is
recommended that USA government should enforce vehicle manufacturers that are not
compliant with CO2 emissions target to offset part of the medical costs involved with people
suffering from air pollution health related diseases.
5.3 LIMITATIONS AND AREAS FOR FUTURE RESEARCH
There were limitations to this research which may have affected the comprehensiveness,
reliability, and generalisation of the linkages discussed above. Primarily, the data collected
for this research were downloaded from the internet, perhaps, another data from different
source may provide different results, i.e. data used were drawn from the internet which were
considered to be appropriate and feasible for the research study. Also, the results of this
project have to do with USA and this might differ in other country contexts, most especially
national culture and government policies and practices. differences of which could have an
impact on the general applicability of the proposed linkages. The limitations identified above
suggested an important areas for future research..
31
References
Aman, A. (1989), “Automotive Engine – A Future Perspective,”.
Breuer, K, Satish U. (2003). Emergency Management Simulations: An Approach to the
Assessment of Decision-Making Processes in Complex Dynamic Crisis Environments: From
Modeling To Managing Security: A System Dynamics Approach (González J.J, ed)
Kristiansand, Norway: Norwegian Academic Press, pp. 145–156.
Brian J.E. Jr. (1998). Carbon Dioxide and the Cerebral Circulation. Anesthesiology Vol.
88(5): pp. 1365–1386.
California Energy Commission. (2008), Building Energy Efficiency Standards for Residential
and Nonresidential Buildings. CEC-400-2008-001-CMF. Sacramento, CA: California Energy
Commission.
Collier, J. (2006), The Automobiles New York: Cavendish Benchmark, ISBN 0-7614-1877-6
Cline, W, (1992), The Economics of Global Warming: Institute for International Economics,
Washinton D.C.
Fuglestvedt, J. S, Berntsen, T. K, Godal, O, Sausen, R, Shine, K. P, Skodvin, T. (2003),
Metrics of climate change: Assessing radiative forcing and emission indices, Climate
Change. Vol. 58 (3) pp. 267– 331
G.Mohan, R.Kumar, and T.Ravi,( 2012) ‘Coalescing Clustering and Classification’ [Online]
IEEE XPLORER Available from:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6719160
[Access on 23/05/2014]
Hammond, G. (2007), Time to give due Weight to the Carbon Footprint Issue: Nature 2007,
Vol. 445 (7125) pp. 256– 256
Hellman and Murell, (1986). “Trends in Alternative Measures of Vehicle Fuel Economy,”
SAE Paper 861426,
Hertwich, E. G. (2005), Lifecycle approaches to Sustainable Consumption: A critical review
of Environment, Science & Technology. Vol. 39 (13) pp. 4673– 4684
Intergovernmental Panel on Climate Change (IPCC), (1992), Climate Change 1992: The
Supplementary Report to the IPCC Scientific Assessment (Cambridge: Cambridge
University Press
Lash, J.; Wellington, F. (2007), Competitive Advantage on a Warming Planet, Harvard.
Business. Review. 2007, Vol. 85 (3) pp. 94–104
Liu, B. (2011) Web Data Mining.(Exploring Hyperlinks, Contents and Usage Data) second
ed. Chicago,USA: Springers
32
Lipsett M.J, Shusterman D.J, Beard R.R. (1994), Inorganic Compounds of Carbon, Nitrogen,
and Oxygen In: Patty’s Industrial Hygiene and Toxicology (Clayton G.D, Clayton F.D, ed).
New York: John Wiley & Sons, pp. 4523–4554.
Malliaris, Hsia and Gould, (1976), Concise Description of Auto Fuel Economy in Recent
Years,” February 1976.
Matthews, H. S, Hendrickson, C. T, Weber, C. L. (2008), “The Importance of Carbon
Footprint” Estimation Boundaries, Journal of Environent, Science & Technology. Vol. 42
(16), pp. 5839– 5842
Metz, B, Davidson, O. R, Bosch, P. R, Dave, R, Meyer, L. A. (2007), Climate change:
Mitigation. Contribution of Working Group III to the Fourth Assessment Report of the
Intergovernmental Panel on Climate Change; Cambridge University Press: Cambridge.
Motor Vehicle Facts and Figures, (1998), Washington, D.C: American Automobile
Manufacturers Association.
Mureell and Heavenrich, (1989), “Options for Controlling Global Warming: Impact from
Motor Vehicles” U.S. EPA, EPA/AA/CTAB/89-08,
Mureell and Heavenrich, (1990), Downward Trend in Passenger Car Fuel Economy – A
View of Recent Data,” U.S. EPA, EPA/AA/CTAB/90-01.
National Research Council (NRC) (2010), Advancing the Science of Climate Change: The
National Academies Press, Washington, DC.
Office of Transportation and Air Quality, (2012), “Light-Day Automotive Technology,
Caarbon dioxide Emissions and Fuel Trends 1975 through 2011” U.S. EPA420-R-12-001,
OSHA (Occupational Safety and Health Administration). (2012), Sampling and Analytical
Methods:
Carbon
Dioxide
in
Workplace
Atmospheres.
Available:
http://www.osha.gov/dts/sltc/methods/inorganic/id172/id172.html [accessed 7 April 2014].
Pechan & Associates, Inc., (2008), “The Emmisions and Generation Resource Integrated
Database for 2007. Technical Support Document,” prepared for the U.S. Environmental
Protection Agency, Washinton, D C, September 2008.
Persily AK, Gorfain J. (2008), Analysis of Ventilation Data from the U.S. Environmental
Protection Agency Building Assessment Survey and Evaluation (BASE) Study. NISTIR7145-Revised Gaithersburg, MD: National Institute for Standards and Technology.
Peters, G. P, Marland, G, Hertwich, E. G, Saikku, L, Rautiainen, A, Kauppi, P. (2009),
“Trade, Transport, and Sinks extend the Carbon Dioxide Responsibility of Countries”
Climate. Change, DOI: 10.1007/s10584-009-9606-2.
Rober M. Heavenrich, (2006). “Light-Day Automotive Technology and Fuel Trends 1975
though 2006” U.S. EPA420-R-06-011,
33
R. Weber, (1997), Acedemia.ed ‘Data preprocessing and intelligent Data Analysis’ [Online]
Available from
http://www.academia.edu/2883703/Data_Preprocessing_and_Intelligent_Data_Analysis
[Accessed on 23/05/2014]
Saunders, M. Lewis, P. and Thornhill, A. (2001). Research Methods for Business Students.
2nd Edition, Prentice-Hall, London.
Saunders, M. Lewis, P. and Thornhill, A. (2012). Research Methods for Business Students.
Hallow: Pearson.
Shelef M. (1994), Unanticipated Benefits of Automotive Emission Control: Reduction in
Fatalities by Motor Vehicle Exhaust Gas. Journal of Science and Total Environmental, pp.
146-147; 93-101.
Suh, S. (2006), Are services better for climate change? Journal of Environment, Science.
Technology, Vol. 40 (21), pp. 6555– 6560
The Vehicle Production Group LLC, (2012), http://www.vpgautos.com/ last assessed March
2014.
U.S. Department of State (2007). Fourth Climate Action Report to the UN Framework
Convention on Climate Change: Projected Greenhouse Gas Emissions. U.S. Department of
State, Washington, DC, USA
“U.S. Environmental Protection Agency, (1972), \fuel \economy and Emission Control”
November 1972.
US. Department Of Energy. (2014), [Online] Energy Efficient & renewable Energy Available
from http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zipvvv. [Accessed on
20/03/2014]
US. Department Of Energy. (2014), [Online] Energy Efficient & renewable Energy Available
from http://www.fueleconomy.gov/feg/ws/index.shtml#vehicle [Accessed on 20/03/2014].
Weber, C. L, Matthews, H. S. (2008), Quantifying the Global and Distributional Aspects of
American Household Carbon Footprint Journal of Ecology Economics. Vol. 66 pp. (2−3)
379– 391
Weber, C. L, Matthews, H. S. (2007), Embodied Environmental Emissions in U.S.
International Trade, 1997−2004 Environmental. Science. Technology. 2007, Vol. 41(14) pp.
4875– 4881
34
Weidema, B. P, Thrane, M, Christensen, P, Schmidt, J, Lokke, S. (2008), Carbon Footprint:
A Catalyst for Life-cycle Assessment Journal of Industrial, Ecological. Vol. 12(1) pp. 3– 6
Wiedmann, T, Minx, J. (2008), A Definition of “Carbon Footprint”. In Ecological Economics
Research Trends, Pertsova, C. C, Ed., Nova Science Publishers, Inc.: New York, pp. 1− 11.
Woods S.W, Charney D.S, Goodman W.K, Heninger G.R. (1988), Carbon Dioxide–induced
Anxiety: Bbehavioral, Physiologic, and Biochemical Effects of Carbon Dioxide in Patients
with Panic Disorders and Healthy Subjects. Arch Gen Psychiatry. Vol. 45(1), pp. 43–52.
Witten, I.H., Frank, E., and Hall, M.A.,( 2011), Data Mining (Practical Machine Tools and
Techniques. 3rd ed. Burlinton, USA: MK (morgan kaufmann)
Zentut,( 2014) ‘what is data mining’[Online] Available from ttp://www.zentut.com/datamining/what-is-data-mining. [Accessed on 20/05/2014]
35
8.0 Appendix
8.1 PROJECT PROPOSAL
TITLE: EVALUATION OF CO2 EMISSION RATE OF DIFFERENT CAR
MANUFACTURE IN UNITED STATE OF AMERICA.
WRITTEN BY Taofikat Shade Salawu, 13120387,
[email protected]
Higher Diploma in Science in Data Analytics
Date: 20- 02-2014
36
Objectives and Contribution to the Knowledge:
The project will enhance any research to use it as a base of their knowledge. The main
objective of this project is to identify which of the car manufacture has impacted positively in
its CO2 emission reduction. Also to identify the trend of reduction in CO2 emission. To
verify the data collected are correct and up to date to achieve the accurate analysis of
datasets.
The data has been generated, collected and store, this project will visualize the statistical
pattern, trends and information and information which is hidden in the data.
(http://en.wikipedia.org/wiki/data_mining#process)
Backgroung: According to US Environmental Protection Agency, Both GHG gases in the
atmosphere, which makes the Earth warmer as it increases. Humanbeing are adding more to
the atmosphere, the effect of these Gases on climate change depends on three main factors: 1.
How much, 2.How long and 3. How powerful is this gas on climate change. ( US EPA, 2011)
available at http://www.epa.gov/climatestudents/basics/index.html, access on 18/02/14
The majority of Green House Gas is mainly from CO2 while others of GHG are CH4
(methane), N2O (nitrous oxide) and they are just little in percentage compare to CO2. This
project will look into how to calculate or how to measure the GHG from different car type in
other to make a better prediction for future.
“Transportation: The combustion of fossil fuels such as gasoline and diesel to transport
people and goods is the second largest source of CO2 emissions, accounting for about 31% of
total U.S. CO2 emissions and 26% of total U.S. greenhouse gas emissions in 2011. This
category includes transportation sources such as highway vehicles, air travel, marine
transportation, and rail.” (US EPA, 2011).
This project will basically show what people can do to reduce the emission.
According to US EPA, generally it is more difficult to calculate vehicle emission of CH4,
N2O and fluorinate gas. (US EPA, 2011)
Technical Approach: The technical approach will include research and collection of data,
Analysis of data through different software Application i.e weka and R studio.
Data Mining: This will include Anomaly detection(outlier/ change/deviation detection),
Association rule, Cluster, Classification, Regression, and Summarization. (Fayyad, et al,
2008). Finally is the verification of the pattern produce by data mining algorithms in the
wider dataset. Available at (www.analythicsasaservice.org).
System/ Dataset: The free open source data mining software and application I will use is
WEKA is suitable machine learning software application and is written in java programming
language.
The Data I will use, the source are from test Car List Data Files (US EPA, 2013) available at
37
www.epa.org/testcars/
R_studio is another software package that I will use to analyses the data. The dataset will
include two different year of car testing data set.
Evaluation, Test and analysis: This stage will involve collection of data from data source,
and importing the data in to Weka software by the use of explorer or experiment to process
the data i.e classify, cluster, association rule, select the attribute and visualized the output.
The analysis will include comparing the two years of car model and CO2 emission btw
different cars and the car type that gives higheremission rate.
According to (Ian H. Witten et al, 2011), the analysis panel will be used to perform statistical
significant test of one learn scheme
Special Resource Required: Not available at the moment
Consultation with Specialization Person(s):
Dr Leona Lecturer
Student Signature:Taofikat Shade Salawu, 13120387
References:
(US EPA, 2014)http://www.epa.gov/otaq/climate/whatyoucando.htm
(US EPA, 2013), http://www.epa.gov/otaq/climate/strategies.htm
Witten. H, Frank. E, Hall. A (2011) Data Mining: practical machine learning tools and
techniques.
38
8.2
SPECIFICATION REQUIREMENT
TITLE: EVALUATION OF CO2 EMISSION IN
CAR MANUFACTURIN IN USA.
By: Taofikat Shade Salawu
Student no: x13120387
Student Email :
[email protected]
39
1 INTRODUCTION
The main driver of global warming is the greenhouse gas (GHG) carbon dioxide
It is produce mainly by burning fossil fuel and cause 60% of the anthropogenic greenhouse
effect. Due to its intensive use of fossil fuel the transport sector is one of the main emitters of
CO2. (M.Achtncht, 2011) Available http://linkspringer.com/aticle/10.1007/510584-0110362-8
“Greenhouse gases trap heat in the atmosphere, which makes the Earth warmer.
People are adding several types of greenhouse gases to the atmosphere, and each gas's effect
on climate change depends on three main factor, how much, how long, and how powerful”
(Us epa, 2013) Available at http://www.epa.gov/climatestudents/basics/today/greenhousegases.html
1.1 Purpose
The purpose of this document is to set out the requirement of carbon dioxide emission rate in
a different car manufacture. And to developed the use case for the user to have the knowledge
about the emission rate of different cars. This project will use different software to analyse
the data and to make a future prediction.
The intended customers are: Automobile industry, Road transport authority, Environmental
protection agency.
The intended attribute were vehicle manufacture name, and the four greenhouse gas i.e
methane (CH4), Hydrocarbon (CH), Carbon monoxide (CO), Carbon dioxide (CO2).
Project Scope
The scope of this project is to develop which of the car manufacture have the highest CO2
emission. And action to reduce the CO2 emission by looking into past history of co2
emission.
According to US EPA, The document will design to estimate the highest CO2
emission of car with the intention to characterizing the higher emitting car from the dataset.
(US EPA, 2012) Available at http://www.epa.gov/otaq/emission-factorsresearch/420r08010.pdf
The software application will be use in this project are WEKA and R, this software
application should be able to understand by the user in making a choice of which car is
environmental friendly.
40
Description of the dataset
The dataset collected for this project is from (US EPA), Available at
(http://www.epa.gov/oms/tcldata.htm)
The us epa have a lot of data store in their data base which is made available to public for
research purpose.
In this project I choose 3 different year for full training set and another year for subset of
training.
The dataset consist of 4620, 51 column.
Data Source
US EPA, 2013 Available at http://www.epa.gov/oms/tcldata.htm
User interface:
Input Data in to weka:
41
This section will involve the use of explorer in weka, where data will be pre-process. In
explorer there is six different panel that support different task of data mining. The pre-process
will show what the maximum, minimum, mean, and standard deviation of the data loaded in
to weka.
Below is the screen shot of pre-processing stage.
Pre-processing stage
42
Cluster stage
43
44
Screen shot of attribute selection in weka explorer
Definitions, Acronyms and Abbreviations:
Some of abbreviation: CH4 (Methane), CO2 (Carbon dioxide), NOx
The rest will be filled in later as project progresses.
User Requirements Definition:
Users include: Automobile industry, Government i.e. insurance company, Revenue,
Driver, car dealer, community.
The definition is not available at the moment.
Overview:
This section will describe what to follow and how it will be organize.
45
Visualisation stage
46
KEY words
CO2
CAR MANUFACTURING in USA—overview,
GHG
Evaluation—effect on society
Analysis of Result.
Objective:To verify if the car manufacural are achieving their co2 emission reduction target.
2. To evaluate the impact on society if CO2 emission strategy is achieve or not.
DO they achieve the target.
Under literature review.
Talk about Hybrid Car.
8.3.1 Management Progress
Report 1
47
TITLE: EVALUATION OF CO2 EMISSION RATE
OF DIFFERENT CAR MANUFACTURE IN
UNITED STATE OF AMERICA.
By Taofikat Shade Salawu
X13120387,[email protected]
Date: 20- 02-2014
Higher Diploma in Science in Data Analytics
Table of Content
Pages
Highlight Report------------------------------------------------------------ 3
Highlight Report Purpose-------------------------------------------------- 3
Activity during the period-------------------------------------------------- 3
Issue arising------------------------------------------------------------------- 3
Research Task-----------------------------------------------------------------3
48
Analysis tasks------------------------------------------------------------------3
Statistic Aspect----------------------------------------------------------------4
Machine Learning-------------------------------------------------------------4
The Key Achievement--------------------------------------------------------4
Key Milestones Achieved Since Last Report------------------------------4
Planned work for Next Period------------------------------------------------4
Appendix2-----------------------------------------------------------------------5
Management progress report
1. Highlight Report from 08/03/14 to 16/03/14
2. Highlight Report Purpose: This Highlight report will provides the Project
supervisor with a summary of the status of a project at agreed stages this summary is used to
monitor progress. The supervisor will use the Highlight report to advice base on Project of
any potential problems or areas that may arise.
49
3. Activity during the period:
During this period, I collected the data from US EPA and cleaning has been carried out.
I started the analysis of the data, by loading it in to WEKA, pre-processing took place.
Visualisation, Classification, Clustering and regression analysis was carried out
4. Issue arising:
Initially it was hard to get the code to clean the data, but at end I finally got one that work
perfectly. Based on these project the data collected will be analyse on statistical methods and
other data mining techniques
5. Research Task:




The research task include finding the dataset which was explain in the project
specification. This was carried out by searching different web site before I decided to
go on U.S environmental protection agency.
Getting to know the dataset is very tedious which consumed most of the time and the
next step of this task is to do the data cleansing.
The next stage will be the data pre-processing.
After the pre-processing is the main data process which include classification of
attributes, clustering, visualization, and association rule.
1.1.1.
6. Analysis tasks:
6.1 Statistic Aspect: Use of Multiple Linear Regression
USE Of R: Using of this software, the data was successfully imported and data frame
was visualized. Multiple linear regression: this model uses more than one x variable to
estimate the value of y in this case sample of data are choosing from raw dataset in other to
determine how they are related and how significance they are.
At the moment the multiple regression is determined. And In the main project the Strength
and weaknesses will be discussed.
Statistic aspect:
However, statistical techniques are driven by the data and are used to discover patterns and
build predictive models.
6.2 Machine Learning:
This is carried out by Weka software and data was loaded in successfully.
50
The task will include regression, classification, clustering, Association and trend analysis also
include determination of Gas mileage and fuel types for different model cars.
7. The Key Achievement at this stage are as follow:


The data pre-processing is achieved,
Statistic aspect is done.
8. Key Milestones Achieved Since Last Report:
9. Planned work for Next Period (After the Reading week) to 22/03/14
At this point I should be able to do the main body of the project including the final analysis.
Appendix 1
51
The above image is the visualisation aspect in weka software in which the Rate Horse power
is on x-axis and CO2 (g/mi) is on y-axis
Appendix2
52
This is classifier output
Appendix 3
53
MindGenius tools
54
8.3.1 EVALUATION OF CO2 EMISSION OF A
DIFFERENT CAR MANUFACTURE IN USA.
TAOFIKAT SHADE SALAWU
XI3120387
[email protected]
MANAGEMENT PROGRESS REPORT
HIGHER DIPLOMAL IN SCIENCE IN DATA
ANALYTICS
55
MANAGEMENT PROGRESS REPORT 3
Highlight Report from 24/03/14 to 04 /05/14
Highlight Report Purpose: This Highlight report will provides the Project supervisor with a
summary of the status of a project at agreed stages this summary is used to monitor progress.
The supervisor will use the Highlight report to advice base on Project of any potential
problems or areas that may arise
Activity during the period:
DATA SOURCE: Official U S government source for fuel economy information. Fuel
economy data are the result of vehicle testing done at the Environmental Protection Agency's
National Vehicle and Fuel Emissions.
Data set: the dataset consist of fuel economy data. The raw data consist of 34437 variables
and 71 attributes, which is 11,542 KB in size.
Data Description: The dataset consist of categorical, numeric and logical attributes
The data set, in its original format, is in comma delimited i.e. (csv) stored as tabular data with
headers in a spreadsheet. With data sets accompany documentation.
Vehicle.cs: this is the name of the data set from U S fuel economy web site, with the
accompany documentation showing the name of attributes. The column name and definition
are showing in the accompany documentation.
The original file was downloaded from
http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip
DESCRIPTION OF THE DATA TYPE
The data type used in this project consist of:
From previous management report, the data consist of categorical data, numeric and logical
data.
I did the data cleaning in Excel and save it in comma delimited csv file. The dataset was save
in csv from the source and it was save as vehicles.csv.
The description of dataset was carried out.
The next stage is to Review others people work that are related to my project. I.e. to identify
papers/solutions related to your project.
56
GUI graphical user interface
WEKA
According to (I.Witten, E Frank, M.Hall, 2011), Weka workbench is a collection of machine
learning algorithms for data mining tasks. It consist of many algorithm. Its design is so
simple and easy to understand.
It was developed in New Zealand at university of Waikato. The system is writing in Java and
distributed under the umbrella of GNU General public licence. It run on all Linux, windows
and also run on personal digital assistant. Weak has a powerful tool for data pre-processing,
classification, clustering, Regression, association rules, and data visualization. Weka is wellsuited for developing new machine learning schemes.
Weka provide learning algorithm that is easily apply to dataset, also allow user to pre-process
the dataset feed it into a learning scheme, and analyse the classifier result and its
performance.
(I.Witten, E. Frank, M.Hall, 2011)
THE ISSUE ARISING
The previous dataset was discard because it was not enough in given the accurate analysis.
The pre process was done on the new dataset.
The Attributes chosen is the main problem now which am contemplating of changing the.
They do not relate well from the graph.
At this moment the current issue is to find a related work and girder useful information.
SCOPE
The scope of this project will focus on which car has the most co2 emission Reduction. And
also to see which manufacture follows the U S government reduction standard.
THE RESEARCH QUESTION
What impact will adherence and non-adherence to CO2 emission reduction strategy have on
car manufacturers and human life.
RESEARCH TASK:
To address the above research question the below stated the objectives were outline
57
1. To analyse and evaluate the trend of CO2 emission reduction strategy from different
car manufacturs in USA
2. To evaluate the impact of the above on human life.
As a data analytic: The main goal of this project is to develop approaches, methods and tools
to improve, simplify and reduce the effort of CO2 emission involved in data for analytics
purposes. So as encourage or help the community in selecting the appropriate type of Car
with lower emission and fuel economy efficiency.
Quality Related work: The related work is still going on at the moment
Problem at Hand: The clustering is the process of grouping the dataset into clusters this is
carried out in weka.
Weka is a collection of several machine learning algorithm that can easily apply to the data
set.
Task for next stages:
This will include the analysis of related work and analysis of CO2 to predict the future
emission and it impact on Global warming.
58
FIG
WEKA USER INTERFACE
The above fig1 showing the clustering stage in weka.
59
60
8.4 Data Description available at
http://www.fueleconomy.gov/feg/ws/index.shtml#vehicle
vehicle

atvtype - type of alternative fuel or advanced technology vehicle

barrels08 - annual petroleum consumption in barrels for fuelType1 (1)

barrelsA08 - annual petroleum consumption in barrels for fuelType2 (1)

charge120 - time to charge an electric vehicle in hours at 120 V

charge240 - time to charge an electric vehicle in hours at 240 V

city08 - city MPG for fuelType1 (2)

city08U - unrounded city MPG for fuelType1 (2), (3)

cityA08 - city MPG for fuelType2 (2)

cityA08U - unrounded city MPG for fuelType2 (2), (3)

cityCD - city gasoline consumption (gallons/100 miles) in charge depleting mode
(4)

cityE - city electricity consumption in kw-hrs/100 miles

cityUF - EPA city utility factor (share of electricity) for PHEV

co2 - tailpipe CO2 in grams/mile for fuelType1 (5)

co2A - tailpipe CO2 in grams/mile for fuelType2 (5)

co2TailpipeAGpm - tailpipe CO2 in grams/mile for fuelType2 (5)

co2TailpipeGpm- tailpipe CO2 in grams/mile for fuelType1 (5)

comb08 - combined MPG for fuelType1 (2)

comb08U - unrounded combined MPG for fuelType1 (2), (3)

combA08 - combined MPG for fuelType2 (2)

combA08U - unrounded combined MPG for fuelType2 (2), (3)

combE - combined electricity consumption in kw-hrs/100 miles
61

combinedCD - combined gasoline consumption (gallons/100 miles) in charge
depleting mode (4)

combinedUF - EPA combined utility factor (share of electricity) for PHEV

cylinders - engine cylinders

displ - engine displacement in liters

drive - drive axle type

emissionsList

engId - EPA model type index

eng_dscr - engine descriptor; see
http://www.fueleconomy.gov/feg/findacarhelp.shtml#engine

evMotor - electric motor (kw-hrs)

feScore - EPA Fuel Economy Score (-1 = Not available)

fuelCost08 - annual fuel cost for fuelType1 ($) (7)

fuelCostA08 - annual fuel cost for fuelType2 ($) (7)

fuelType - fuel type with fuelType1 and fuelType2 (if applicable)

fuelType1 - fuel type 1. For single fuel vehicles, this will be the only fuel. For dual
fuel vehicles, this will be the conventional fuel.

fuelType2 - fuel type 2. For dual fuel vehicles, this will be the alternative fuel
(e.g. E85, Electricity, CNG, LPG). For single fuel vehicles, this field is not used

ghgScore - EPA GHG score (-1 = Not available)

ghgScoreA - EPA GHG score for dual fuel vehicle running on the alternative fuel (1 = Not available)

guzzler- if G or T, this vehicle is subject to the gas guzzler tax

highway08 - highway MPG for fuelType1 (2)

highway08U - unrounded highway MPG for fuelType1 (2), (3)

highwayA08 - highway MPG for fuelType2 (2)

highwayA08U - unrounded highway MPG for fuelType2 (2),(3)

highwayCD - highway gasoline consumption (gallons/100miles) in charge
depleting mode (4)

highwayE - highway electricity consumption in kw-hrs/100 miles

highwayUF - EPA highway utility factor (share of electricity) for PHEV

hlv - hatchback luggage volume (cubic feet) (8)

hpv - hatchback passenger volume (cubic feet) (8)

id - vehicle record id

lv2 - 2 door luggage volume (cubic feet) (8)
62

lv4 - 4 door luggage volume (cubic feet) (8)

make - manufacturer (division)

mfrCode - 3-character manufacturer code

model - model name (carline)

mpgData - has Your MPG data; see yourMpgVehicle and yourMpgDriverVehicle

phevBlended - if true, this vehicle operates on a blend of gasoline and electricity
in charge depleting mode

pv2 - 2-door passenger volume (cubic feet) (8)

pv4 - 4-door passenger volume (cubic feet) (8)

rangeA - EPA range for fuelType2

rangeCityA - EPA city range for fuelType2

rangeHwyA - EPA highway range for fuelType2

trans_dscr - transmission descriptor; see
http://www.fueleconomy.gov/feg/findacarhelp.shtml#trany

trany - transmission

UCity - unadjusted city MPG for fuelType1; see the description of the EPA test
procedures

UCityA - unadjusted city MPG for fuelType2; see the description of the EPA test
procedures

UHighway - unadjusted highway MPG for fuelType1; see the description of the
EPA test procedures

UHighwayA - unadjusted highway MPG for fuelType2; see the description of the
EPA test procedures

VClass - EPA vehicle size class

year - model year

youSaveSpend - you save/spend over 5 years compared to an average car ($).
Savings are positive; a greater amount spent yields a negative number. For dual
fuel vehicles, this is the cost savings for gasoline.

sCharger - if S, this vehicle is supercharged

tCharger - if T, this vehicle is turbocharged
emissions

emissionsList
o
emissionsInfo o
efid - engine family ID
o
id - vehicle record ID (links emission data to the vehicle record)
63
o
salesArea - EPA sales area code
o
score - EPA 1-10 smog rating for fuelType1
o
scoreAlt - EPA 1-10 smog rating for fuelType2
o
smartwayScore - SmartWay Code
o
standard - Vehicle Emission Standard Code
o
stdText - Vehicle Emission Standard
fuel prices

fuelPrices
o
midgrade - $ per gallon of midgrade gasoline(9)
o
premium - $ per gallon of premium gasoline(9)
o
regular - $ per gallon of regular gasoline(9)
o
cng - $ per gallon of gasoline equivalent (GGE) of compressed natural
gas(10)
o
diesel - $ per gallon of diesel(9)
o
e85 - $ per gallon of E85(10)
o
electric - $ per kw-hr of electricity(10)
o
lpg - $ per gallon of propane(10)
yourMpgVehicle - summary of all Your MPG data for this vehicle

avgMpg - harmonic mean of average MPG shared by fueleconomy.gov users

cityPercent - average % city miles

highwayPercent - average % highway miles

maxMpg - maximum user average MPG

minMpg - minimum user average MPG

recordCount - number of records for this vehicle

vehicleId - vehicle record id (links Your MPG data to the vehicle record)
yourMpgDriverVehicle - summary of driver data reported for this vehicle

cityPercent - user average % city miles

highwayPercent - user average % highway miles

lastDate - date records were last updated (yyyy-mm-dd)

mpg - average MPG

state - state of residence

vehicleId - vehicle record ID (links Your MPG data to the vehicle record)
64
Footnotes:
(1) 1 barrel = 42 gallons. Petroleum consumption is estimated using the Department of
Energy's GREET model and includes petroleum consumed from production and refining to
distribution and final use. Vehicle manufacture is excluded.
(2) EPA revised how MPG is calculated for 2008 and later model year vehicles. MPG
estimates for 1985-2007 model year vehicles have been updated to make them
comparable to the estimates for 2008 and later vehicles. These are not the original EPA
MPG estimates for these vehicles. For electric and CNG vehicles this number is MPGe
(gasoline equivalent miles per gallon).
(3) Unrounded MPG values are not available for some vehicles.
(4) This field is only used for blended PHEV vehicles.
(5) For model year 2013 and beyond, tailpipe CO2 is based on EPA tests. For previous
model years, CO2 is estimated using an EPA emission factor. -1 = Not Available.
(6) For PHEVs this is the charge depleting range.
(7) Annual fuel cost is based on 15,000 miles, 55% city driving, and the price of fuel
used by the vehicle.
(8) Interior volume dimensions are not required for two-seater passenger cars or any
vehicle classified as truck which includes vans, pickups, special purpose vehicles,
minivan and sport utility vehicles.
(9) Fuel prices for gasoline and diesel fuel are from the Energy Information
Administration and are updated weekly.
(10) Fuel prices for E85, LPG, and CNG are from the Office of Energy Efficiency and
Renewable Energy's Alternative Fuel Price Report and are updated quarterly.
65