National College of Ireland Hdip in science in Data Analytics 2013/2014 Taofikat Shade Salawu X13120387 [email protected] ANALYSIS AND EVALUATION OF CARBON DIOXIDE (CO2) EMISSIONS TARGET IN CAR MANUFACTURING IN UNITED STATES OF AMERICA (USA). Dissertation I ABSTRACT In recent years it has become imperative for the government of United States of America (USA) to sanction vehicle manufaturers that are not compliant with the CO2 emissions target set. Stricter regulations, like Carbon tax and CO2 emissions minimum and maximum caps have been introduced to ensure that the transportation sector which accounts for about 32% of CO2 emission in the USA complies. The analysis of trends in CO2 emissions in car manufacturing and evaluarion of its impacts on human life and the society as a whole is of paramount importance due to health implications posed to human and its effect on climate and global warming issues. A detailed review of literature on CO2 emissions in vehicle manufacturing and its impact on human health and the society in USA was carried out to support the study. Also, data from the internet was analysed using data mining techniques. Evidence gathered from the findings suggested a linkage between the trends in CO2 emissions target in vehicle manaufacturing and its impacts on human health and society as whole. This research outcome provides the importance and understanding of compliance and non compliance with CO2 emissions target set not only for car manufacturers, but also the US government and the americans. II TABLE OF CONTENTS ABSTRACT TABLE OF CONTENTS II III-IV LIST OF FIGURES V CHAPTER 1: INTRODUCTION 1 1.1 BACKGROUND 1 1.2 SCOPE OF THE RESEARCH 2 1.3 RESEARCH QUESTION, AND OBJECTIVES 3 1.4 STRUTURE OF THE PROJECT 3 CHAPTER 2: LITERATURE REVIEW 4 2.1 CO2 EMISSIONS IN USA 4 2.2 CONCEPT OF CO2 EMISSION TARGET IN CAR CAR MANUFACTURING IN USA 2.3 EMERGENCE OF HYBRID AND LOW EMITERS CARS IN USA 6 7 2.3.1 Battery Electric Vehicles 8 2.3.2 Plug-in hybrid electric vehicles 8 2.3.3 Hybri electric vehicles 8 2.3.4 Fuel cell vehicles 9 2.4 IMPACT OF CO2 EMISSIONS ON HUMAN HEALTH AND SOCIETY 2.5 CONCLUSIONS CHAPTER 3: RESEARCH METHODOLOGY 9 10 12 3.1 RESEARCH QUESTION AND OBJECTIVES 12 3.2 DATA COLLECTION METHODS 12 3.3 DATA MINING PROCESS 13 3.3.1 Data mining 13 3.3.2 Weka 14 3.4 CONCLUSIONS 18 III CHAPTER 4: FINDINGS AND ANALYSIS 19 4.1 FINDINGS 19 4.2 ANALYSIS OF FINDINGS 22 4.2.1 ANALYSIS OF CO2 EMISSIONS TRENDS 22 4.2.2 IMPACT OF CO2 EMISIONS ON HUMAN HEALTH AND SOCIETY 23 CHAPTER 5: CONCLUSIONS AND RECOMENDATIONS 25 5.1 OVERALL CONCLUSIONS 25 5.2 RECOMENDATIONS AND CONTRIBUTION TO KNOWLEDGE 26 5.3 LIMITATIONS AND AREAS FOR FUTURE RESEARCH 26 REFERENCES 27 IV LIST OF FIGURES Chapter 2 Figure 2.1: USA CO2 emissions 5 Figure 2.2: USA CO2 emissions 1990 - 2012 6 Chapter 3 Figure 3.1: WEKA graphical user interphase 15 Figure 3.2 Screen shot of raw data collected 16 Figure 3.3 Data re-processing stage 16 Figure 3.4 Screen shot of processed data 17 Figure 3.5 Explorer user interphase 18 Chapter 4 Figure 4.1 Cluster panel in WEKA 19 Figure 4.2 Cluster created in WEKA 20 Figure 4.3 Cluster output from WEKA 21 Figure 4.4 Cluster instances 22 Figure 4.5 Visualized panel 24 Figure 4.6 Result from visualized panel 25 V CHAPTER 1 INTRODUCTION The overview of the project is provided in this chapter. It’s background and importance are discussed in Section 1.1. Section 1.2 presents the scope, aims, objectives and research questions were outlined in Section 1.3 and the final section highlights the structure of the project. 1.1 BACKGROUND There is no doubt that United States of America (USA) is the largest automotive market in the world. On an average, about 8.1 million passenger vehicles are produced annually by vehicle manufacturers in USA. All the major European, Japanese and Korean automakers has one or more manufacturing facilities in USA. Some of the biggest USA auto companies includes General Motors, Honda, Chrysler, Toyota, Nissan, Hyundia-Kia, BMW, Mazda, Mitsubishi, Subaru and Volkswagen (The Vehicle Production Group, 2012). Many of the above auto manufacturers set up engine and transmission plants in USA and they conduct research and development initiatives to transform, innovates and meet the challenges within the sector in the recent years. Also, design and testing of major car manufaturers are carried out in the USA, hence the sector has employed over 0.7 million and accounts for about 5% of USA’s GDP (Aman, 1989). There is substantial network of automotives spare parts suppliers that serves the industry. The shipment of about $200 billion accounted for 3% of total USA manufacturing. According to the Centre for Automotive Research, over 3 million jobs were created in the automotive spare parts sector and this provides economic wellbeing than any other manufacturing sector. In the recent years, the USA exported about 3 million vehicles to more than 180 countries around the world valued at $61 billion, with additional exports of automotive parts valued up to $67 billion. Due to USA open investment policy, a large consumer market, a highly skilled workforce, available infrastructure, and government incentives, it is the premier place for the 1 future of the auto industry. It is evident from the above that automotives sector contribute positively to the economy of USA. However, car manufacturing involves different production processes that give rise to internal combustion in engines which cause CO2 emissions. Therefore, it is important to analyse the trends of CO2 emissions with a view to identify the car manufacturers that achieve the CO2 emissions targets set and evaluate the impact of the above analysis on human and the environment. Therefore a better understanding of the evaluation of the trends in CO2 emission reduction strategies in automotive industries will go a long way to enable the scientists to evaluate its impact on human life both in short and long term. This bring about the question, ‘how would compliance and non compliance of auto manufacturers to CO2 emission targets set by government agents in USA affect human life and environment?, and this has not been addressed in detail in the literature. Currently, the author is pursuing Higher Diploma in Science - Data Analytics in National College in Ireland, Dublin. As always, there are currently many collaborative researches being undertaking within the automotive sector to find a lasting solution to the current menace of CO2 emission rate into the atmosphere that supporting awareness campaign on global warming. This project aims to analyse the trends of CO2 emissions targets in car manufacturing in USA using data minning techniques and evaluate its impact on human life and environment at large. It is anticipated that better understanding of this impact will enable the automotive industries to do more to achieve the CO2 emissions target set either through innovative processes or production of hybrid vehicles. Chapter 1.2 will further elaborate the research gaps with regards to the literature. The scope, general aims, research study question and objectives will be defined below. 1.2 SCOPE OF THE RESEARCH This project is focused on analysis of the trends of CO2 emissions targets in vehicle manufacturing in USA and its impact on human life and society. The scope of the research is as described below: 2 The analysis of the trends of CO2 emissions reduction in car manufacturing in USA. Evaluation of the impact of the above results on human life and society at large. 1.3 RESEARCH AIMS, OUESTION AND OBJECTIVES The aim of this project is to analyse the trends of CO2 emissions target in car manufacturing and evaluates the impact of the results ot the analysis on human life and society. The project outcome will provide a basis for compliance and non compliance of initiatives on global warming and environmental issues in USA and the world at large. Hence, the research question is as follows: What impact will adherence and non adherence to CO2 emission reduction target have on car manufacturers, human life and society. To address the research question, the detailed objectives shown below were outlined as follows: (1) To analyse and evaluate the trend of CO2 emission reduction strategy from the perspectives of different car manufacturers in USA. (2) To evaluate the impact of the above on human life and society 1.4 STRUCTURE OF THE PROJECT The thesis is organised into five chapters as follows: Chapter 1 provides a general overview of the research including: background, scope, aims, questions and objectives. Chapter 2 provides a review of the literature on CO2 emissions in car manufacturing in USA and its impact on human life and society, with the aim of covering the theoretical background of this study. Chapter 3 reviews the methodology. Chapter 4 provides findings and data analysis Chapter 5 provides conclusions and recommendations 3 CHAPTER 2 LITERATURE REVIEW In the first chapter, a brief explanation of the research background was given and the importance of the project was highlighted. The understanding of this in car manufacturing sector is very important in many national economies. This chapter will review relevant literature to provide a theoretical background for this work. Main areas of the literature for consideration are CO2 emissions and car manufacturing in USA, control of CO2 emissions and emergence of hybrid cars and impact of CO2 emissions on human life and society. The aim of reviewing these areas is to refine the potential areas for the research study question. This chapter is organised into four sections. A brief outline of the concept of CO2 emissions in USA and its emissions reduction in car manufacturing sector in USA were covered in Section 2.1 and 2.2 respectively. Section 2.3 presents the emergence of hybrid cars in USA. Section 2.4 reviews the impacts of CO2 emissions reduction on human life and societies, and the conclusion is provided in Sections 2.5. 2.1 CO2 emissions in USA According to (Matthews et al, 2008; Wiedmann and Minx, 2008), CO2 is the prime green house gas (GHG) emitted and accounted for about 82% of all USA GHG emissions from human activities. It is naturally present in the atmosphere as part of the earth's carbon cycle (i.e the circulation of carbon in the atmosphere, oceans, soil, plants, and animals). The carbon cycle has been altered by human activities by adding more CO2 to the atmosphere and influencing the ability of natural sinks, like forests, to remove CO2 from the atmosphere. Though, CO2 emissions come from an array of natural sources, but since the Industrial Revolution began around 1750, human activities have contributed significantly to climate change due to increase in CO2 and other heat-trapping gases to the atmosphere The main human activity that emits CO2 is the combustion of fossil fuels (coal, natural gas, and oil) for energy and transportation (Peters et al., 2009). 4 In broad term and as shown in Figure 2.1, the main sources of CO2 emissions in the USA are Tranportation, Electricity and Industry, CO2 emissions in the USA increased by about 5% between 1990 and 2012 (Weber and Matthews, 2007). Because the combustion of fossil fuel is the largest source of GHG emissions in the USA, changes in emissions from fossil fuel combustion have historically been the dominant factor affecting total U.S. emission trends. Therefore, changes in CO2 emissions from fossil fuel combustion are influenced by factors, such as economic growth, energy prices, new technologies, and seasonal temperatures (Califonia Energy Commission, 2008). Figure 2.1: USA Carbon Dioxide Emissions, By Source All emission estimates from the Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990-2012. As shown in Figure 2.2, between 1990 and 2012, the increase in CO2 emissions corresponded with increased energy use by an expanding economy and population, and an overall growth in emissions from electricity generation. Emissions due to transportation also contributed to the 5% increase largely due to an increase in miles travelled by motor vehicles (Light-Day Automobile Technology, 2012).. 5 Figure 2.2: USA Carbon Dioxide Gas Emissions, 1990-2012 All emission estimates from the Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990-2012. Going forward, CO2 emissions in the United States are projected to grow by about 1.5% between 2005 and 2020. 2.2 Concept of CO2 emissions target in car manufacturing in USA The combustion of fossil fuels such as gasoline and diesel to transport people and goods is the second largest source of CO2 emissions, accounting for about 32% of total U.S. CO2 emissions and 27% of total U.S. GHG emissions in 2012 (Light-Day Automotive Technology, 2012). This category includes transportation sources such as highway vehicles, air travel, marine transportation, and rail. The US Department of State, (2007), on the projected GHG emissions highlighted that, CO2 emissions connected to the production of transportation fuels, focusing on well-to-wheel analysis. However, It ignores the CO2 emissions with respect to production of the vehicles which id the subject-matter of this study. In the recent years, the production of motor vehicles 6 in the USA alone emitted about 800 million metric tonnes of CO2 annually, comparable to aviation industry (950 million metric tonnes) (U.S. Environmental Protection Agency). To get a proper picture of the carbon footprint of transportation, it is neccessary to consider and include the production of the vehicles. Motor Vehicles Facts and Figures, (1998), highlighted that, there is no doubt that life-cycle assessments indicated that 50 percent of the GHG emissions of car manufacturing are related to materials and because car manufacturing has complex international supply chains, a detailed analysis of the emissions from the production of imported products is essential. However, this may be difficult to obtain due to complex nature of materials involved. The analysis of CO2 emissions of different auto manufacturers in USA has become very important (Motor Vehicles Facts and Figures, 1998; The Vehicle Production Group LLC, 2012), hence the basic importance of this study The objectives of emission reduction initiatives are to improve the car manufacturing process and qualities to ensure the consumers have an increasing number of high fuel economy and low CO2 emissions vehicle choice. Hence most automotive manufacturers now adopting strategic innovation initiative to achieve increased fuel economy resulting in lower CO2 emissions (Malliaris et al., 1976; Peters et al., 2009; Light-Day Automobile Technology, 2012). A number of innovation strategies have been adopted and implemented to improve quality, (Shelef, M. 2004). These new initiatives have increased the relevance of consumer-focused approach for providing quick response to complex consumer demands and increasing expectations, hence nearly all car manufacturees today are selling different vehicles that can meet future CO2 emissions targets (Hellman and Murell, 1996; The Vehicle Production Group, 2012). The automotive sector in USA has been improving in recent times, creating new improved value for their consumers which can be regarded as the ultimate wayfor the car manufacturers to remain competitive, sustain growth and for long-term profitability. 2.3 Emergence of hybrid and emissions reduced vehicles in USA The achievement of CO2 emissions reduction targets by automotive industries has led to the introduction of innovative brands of vehicles (Amann, 1989; Murell and Heavenrich, 1990; Heavenrich, 2006) i.e. hybrid cars and other emissions reduction vehicles which are less reliant on fossil fuel that generate combustion of GHG into the atmosphere, which in turn impacts on value on products and reputation of the organisation (Hellman and Murell, 1986; 7 Lipset et al., 1994). According to (Lash and Wellington, 2007; U.S. EPA, 2012), CO2 emissions reduction in car manufacturing will have significant impact on the market share over a long-term. The most effective way to reduce CO2 emissions is to reduce fossil fuel consumption Some of the initiatives employed by car manufacturers in USA are discussed below. 2.3.1 Battery Electric Vehicles (BEV) The BEV are propelled by one or more electric motors powered by rechargeable battery packs. They are energy efficient and reduce dependence on gasoline since electricity is produced from domestic sources. It emit no tailpipe polutants, although the power plants producing the electricity may emit pollution. BEV have other performance benefits, they are quiet, have instant torque for quick acceleration and requires less maintenance than internal combustion engines. However, they are more expensive than conventional vehicles and hybrids due to the costs of large battery packs. In the recent years, the manufacturers are working round the clock to improve driving range, reduce the costs and ensure public charging stations are widely available in the future (Collier, 2006; U.S. EPA, 2012). 2.3,2 Plug-in Hybrid Electric Vehicles (PHEV) These are hybrid vehicles that engages high capacity batteries that can be charged by plugging them into electrical outlet or charging stations. They emit no tailpipe pollutants and can store sufficient electricity from the power grid to reduce their gasoline consumption under a typical driving conditions (U.S. EPA, 2012). Basic configuration of PHEVs are either series or blended. In the series, the electric motors is the only power source that turns the wheel and the gasoline engine generate the electricity. However, in the blended, both the engine and the electric motor are mechanically connected to the wheels and both may propel the vehicles U.S. EPA, 2012). 2.3.3 Hybrid Electric Vehicles HEVs combines the best features of the internal combustion engines with other electric motors and can significantly improves fuel economy without affecting performance of driving range. Though HEV are primarily propelled by an internal combustion vehicles, however, they also convert energy wasted during coasting and braking into electricity which 8 is stored in a battery until needed by electric motor and emit no tailpipe pollutant (U.S. EPA, 2012). 2.3.4 Fuel Cell Vehicles (FCV) FCVs are propelled by electric motors powered by fuel cells which produce electricity from the chemical energy of hydrocarbons. Fuel cell technology is more efficient than internal combustion engines and environmentally cleaner. Hydrocarbon fuel cell vehicles only emit water and there are many challenges that need to be overcome before FCVs can be marketed in mass or sold at local dealership (U.S. EPA, 2012). Also, the advancement in diesel engine technology have improved performance, reduced noise and fuel odour and decreased emissions of pollutants and the Compressed Natural Gas Vehicles (CNGVs) reduce dependence on petroleum and produce fewer smog-forming and GHG pollutants. 2.4 Impact of CO2 emissions on human health and society (Weber and Matthews, 2007; Matthews and Hedrickson, 2008), stressed that CO2 remain a prime reasons why earth can sustain life, as plants absorb CO2 in order to produce breathable air and also contributes to the greenhouse effect, i.e. a natural occurrence in which light and heat travel to the earth from the sun, hence, CO2 are trapped and cannot radiate back out into space, thereby, keeps the earth warm enough to support life. Moreever, scientists from Stanford University have spelled out the direct links between increased levels of carbon dioxide in the atmosphere and increases in human mortality, Hence, the need to study and evaluate the impact of the CO2 emissions on human life and the society. The new findings come to light just after the U.S. Environmental Protection Agency's recent ruling against states setting specific emission standards for GHG based on lack of data showing the link between CO2 emissions and their health effects. In the recent years, Mark Jacobson of Stanford University in Califonia, highlighted that for each increase of one degree Celsius caused by carbon dioxide, the resulting air pollution 9 would lead annually to about a thousand additional deaths and many more cases of respiratory illness and asthma in the USA (Woods et al., 1998;. (Hammond, G. 2007; Weiddema, et al., 2008) stated that chemical and meteorological changes due to CO2 emissions, increase mortality due to increased ozone, particles and carcinogens in the air and the effects of CO2's warming are most significant where the pollution is already severe It has been highlighted that the health-related effects of CO2 emissions are prominent in areas with significant pollution. As such, increased warming due to CO2 will worsen people's health at a much faster rate in the USA. Also, much of the population of the USA already has been directly affected by climate change through the air they have inhaled over the last few decades and in consequence, the health effects would grow worse if temperatures continue to rise (OSHA, 2012). It has been further revealed by Brian, (1998) that the amounts of ozone and airborne particles that result from temperature increases, caused by increases in carbon dioxide emissions causes and worsens respiratory and cardiovascular illnesses, emphysema and asthma, hence, many published studies have associated increased ozone with higher mortality, whereby particles are responsible for cardiovascular and respiratory illness and asthma. Furthermore, increasing temperatures on pollution will lead to higher temperatures as CO2 increases the chemical rate of ozone production and increased water vapour due to carbon dioxide-induced higher temperatures will boost chemical ozone production in urban areas (Hammond G, 2007). (Woods, et al., 1998; Califonia Energy Commission, 2008) stated that due to CO2 emissions, air temperatures goes up quickly than ground temperatures, which change the vertical where they formed. . It should be noted that, CO2 emissions definitely caused the changes in ozone and particles, transport, clouds, emissions and other processes that affect pollution, because it is the only input that varies in proportion in the atmosphere. 10 2.5 Conclusions Carbon emissions, most notably carbon dioxide (CO2), are part of a collection of gases that negatively influence the quality of our air and increase the greenhouse effect. Greenhouse gases have a direct influence on the environment, causing extreme weather changes, a global temperature increase, the loss of ecosystems and potentially hazardous health effects for people. Some have sought ways to regulate carbon emissions through federal government mandates. There are two prevailing schools of thought regarding governmental control of carbon emissions. The first, a carbon tax, is exactly what it sounds like -- taxing companies directly, based on the amount of carbon they put into the atmosphere. The goal of a carbon tax would be to convince businesses and other organizations to reduce their total emissions. The second governmental control approach that has been under study in recent years is referred to as cap-and-trade legislation. In this system, the government sets a "cap" on the maximum amount of emissions it will allow. From here, it then auctions off emissions allowances to companies until it reaches that cap. Companies that cannot cover their emissions with their allowances are forced to either reduce their totals or buy allowances from other companies. This system is designed to promote stricter emissions standards without directly taxing companies. It does have some potential problems, however, as the carbon cap grows more stringent over time under plans that have been discussed, companies may have to buy special permits at high prices, a cost that would likely be passed along to consumers. Also, the program could have a negative impact on the overall economy: “As utility rates go up, as well as rates of anything that uses energy in its production, the fear is consumers will tighten their wallets, which could lead to cutbacks in production, consumer spending and jobs” (Wall Street Journal). 11 CHAPTER 3 METHODOLOGY This chapter will describe the methodology that was used for achieving the aims and objectives of the research outlined in the literature review. According to Saunders et al., (2001), researchers need to take into consideration, the research methodology and the detailed procedures to be used. The constituents of each of the above will be described coherently in order to portray the entire research process. This chapter consists of three sections. The research objectives are discussed in Section 3.1, and Section 3.2 give an explanation of the data collection methods and the chosen methods for this research. Section 3.3 discusses the conclusion. 3.1 Research question and objectives This study poses the question to analyse the CO2 emissions trends in car manufacturing in USA and how it impact on human life and society. The researcher intend to use data minning techniques for the analysis. As the literature review reveals, the data available on this subject-matter is insufficient, therefore, this study seeks to put additional knowledge to the existing one in the area. It is also hoped that the outcome of this research will be able to serve as a platform for further reseach study in this area. The aim of the research is to analyse the trends in CO2 emissions in car manufacturing and evaluates its impact on human life and society in USA. 3.2 Data collection methods As stated by Saunders et al. (2012), two main types of data collection exist, which are qualitative and quantitative research methods. According to Silverman, (2005), the difference between the two is the research design which are very much of “real practical value”. For the 12 purpose of this research, quantitative research method involving data mining techniques will be used. The screenshot of the data collected from the internet in Figure 4 will be used for this study and descriptive statistics (a baseline technique) and data minning techniques was used in getting a sense of the data collected. Data used for this research were collected from the webiste of and data minning techniques were used for its anlysis. 3.3 Data mining process Data mining techniques and WEKA machine learning tools will be used to achieve the aim and objectives one of this study 3.3.1 Data mining This refer to the process of discovering useful pattern or knowledge from data source e.g the databases, text, images, the web etc. the pattern discovered has to be valid, potentially useful and understandable. It is also known as Knowledge Discovery in a Database (KDD). Data mining is a multidisciplinary field involving machine learning, statistic, database, artificial intelligence, information retrieval and visualisation. Data mining task include classification (supervised learning), clustering (unsupervised learning), association rule and sequential pattern mining (B. Liu, 2011). In this project, the author focussed the data mining tasks on clustering to classify the data into 5 clusters. These tasks will enable the researcher to identify the trend in the CO2 emissions from tailpipe perspectives in car manufacturing. As a data analyst, the reseacher started the process of data minning tasks with understanding of application domain and then identified a target data used for the data mining tasks carried out, which involved three main stages: 1. Pre-processing: the data were cleaned and presence of any abnormality and irrelevant column were removed. 2. Data mining: The data were transported to the data mining algorithm which will give the output as a pattern or knowledge. 13 3. Post-processing: the useful pattern for the application were identified, hence series of evaluation and visualization techniques were used to achieve a better decision (B.Liu, 2011). Data Source: Official U S Government Source for Fuel Economy Information Fuel economy data are the result of vehicle testing done at the Environmental Protection Agency's National Vehicle and Fuel Emissions. Data set: the dataset consist of fuel economy data. The raw data consist of 34437 variables and 71 attributes, which is 11,542 KB in size. The dataset contain the following co2TailpipeGpm => numeric comb08 => numeric cylinders=> numeric displ => numeric highway08 => numeric UCity => numeric UHighway => numeric Model => nominal Trany=> nominal The dataset consist of 11 attribute, and 34,631 variables (instances). Data Description: The dataset consist of categorical, numeric and logical attributes The data set, in its original format, is stored in tabular format with headers and with accompany documentation. Vehicles.csv: this is the name of the data set from fuel economy website, with accompany documentation showing the name of attributes. The column names and definition are showing in the accompany documentation. The original file was down loaded from 14 http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip Then, WEKA, an open source software with explorer window loaded with default dataset was used to provide extensive support for the data mining processes. 3.3.2 WEKA WEKA is an open source software with explorer window and a machine learning tool loaded with default dataset, developed from University of Wakaito in New Zealand. It was written in Java and the current version was released recently. It’s one of the machine learning tool with a cross plat form of operating system As shown in below in Figure 3.1, the functional requirements of WEKA graphical interface consist of four applications and these are Explorer, Experimental, Knowledge Flow and Simple CLI. Figure 3.1: WEKA Graphical User Interphase (GUI) Source: http://weka.software.informer.com/screenshot/119641/ WEKA workbench is a single machine learning scheme that comprises of all machine learning algorithm. It is a useful tool that provide extensive support for the duration of data mining processes. The support includes data preparation for input, statistic evaluation, visualising the input data and as well as result of learning from the output. WEKA, as a machine learning tool provides implementation of learning algorithm that would be quickly apply to the datasets. Its tools can be used to transform datasets, the dataset can be 15 pre-processed, put into a learning scheme and analyse the resulting classifier and its performance, which can be done without writing a code. For the purpose of this project, WEKA will be used through the GUI called explorer. The dataset will be read in inform of arff file format or csv format (spreadsheet). The decision tree can be build and also allow many algorithm to explore. The explorer interface has different choices of menu to ensure an orderly manner operations (Witten et al., 2011). The screen shot of the raw data collected from US Fuel Economy site in Figure 3.2 was downloaded from www.fueleconomy.gov Figure 3.2: Screen shot of raw data collected The data re-processing stage in Figure 3.3, involves data cleansing of attributes that are not supply with instances. i.e removal of “noisy data contains errors and inconsistent data contain discrepancies in codes or names”. (Tang et al., 2011). Figure 3.3: Data re-processing stage 16 Source: ttp://www.zentut.com/data-mining/what-is-data-mining The processed dataset shown in Figure 3,4 below, was stored in comma separated file format and then load into WEKA using explorer GUI as shown in Figure 3.5. Figure 3.4: Screeshot of processed data The data has been cleaned, the missing value has been sorted out, then the resulting data file i.e. the data file vehiclesa.csv were loaded into WEKA, the pre process stage in WEKA were shown below in Figure 3.5, In weka choosing cluster and specify the evaluation method is one of the most important task to be carried out. Since the researcher is interested in clusters, cluster tab, was selected.SimpleKmeans algorithm is the best known partitional clustering method and is the most widely use among the other algorithm because of is simplicity and its 17 effectiveness hence, SimpleKmeans was chosen from the drop down lists. Hence all the attributes and their data type can be seen from the right handside of the panel. Clustering is the process of arranging data that are similar in some way into groups called clusters. The cluster is refer to as group of data instances which are similar to each other but dissimilar to data instances in other clusters. (B. Liu, 2011). The Kmeans algorithm involves two phase which are: It assigns examples to an initial set of k clusters. Then, it updates the assignments by adjusting the cluster boundaries according to the examples that currently fall into the cluster. (Nci, 2014) The stoping criaterion are: No or minimum re-assignments of data points to identify clusters. No or minimum change of centroids. Minimum decrease in the sum of squared error (SSE). SSE= ∑∑ dist(x,m)^2 (Liu, B., 2011) The Kmean algorithm process involved updating and assigning of k clusters occurs several times, until no changes or improvement of the cluster fit occur. Then the process stops at this point and the clusters are finalized. (Nci, 2014) The weka has the builtin algorithm as it default and capable of performing the clustering within view second that will normally take ages when doing it manually The k-Means Algorithm ( 1. ) ( ); 2. 3. 4. 5. 6. 18 7. 8. (Liu, B., 2011) Fig 3.5: Explorer user interface 3.4 Conclusions The author followed the research study process which enable him to conclude the best possible approach for this dissertation. The author conducted the data collection, data mining, data cleansing and clustering processes in the WEKA to add credibility, legitimacy and reliability to the data provided for the analysis. 19 CHAPTER 4 FINDINGS AND ANALYSIS This chapter reviews and analyse the research carried out for this study, and it is organised into three main sections. Section 4.1 provides critiques and findings for and against the views stated in the literature review chapter. Section 4.2 provides comprehensive analysis and answers to the research objectives questions using evidence that were gathered from the findings and the research objectives were linked to the findings in Section 4.3. 4.1 Findings There were no missing values on the attribute selected. Data preparation reduces the dataset into smaller datasets. All the duplicates were removed and this led to smaller datasets than the original dataset. The cluster panel in WEKA was shown in Figure 4.1 Fig 4.1: Cluster panel in WEKA 20 It should be noted that, the only attribute algorithm the resercher is interested in adjusting is numClusters field, this showed the number of clusters created, then, the default setting was changed from 2 to 5 and the researcher clicked ok to accept the value of K(numCluster) as shown in Figure 4.2 Figure 4.2: Clusters created in WEKA 21 Clustering is the process whereby, of arranging data that are similar in some way are arranged into groups called clusters The cluster is refer to as group of data instances which are similar to each other but dissimilar to data instances in other clusters. (B. Liu, 2011). The output shown in Figure 4.3 was obtained fromWEKA when the start button was clicked Figure 4.3: Clusster output from WEKA 22 The end result produced 4 clusters already specified shown below in Figure 4.4, i.e. Clustered Instances Co2tailpipeGpm 0 5836 ( 17%) <- 464.9743 1 8261 ( 24%) <- 617.9806 2 11871 ( 34%) <- 373.1918 3 8663 ( 25%) <- 505.2976 Figure 4.4: Cluster instances 23 4.2 Analysis of findings Sections 4.2.1, and 4.2.2 below will give further analysis to the findings and will provide answers to the research question and objectives. 24 Fig 4.5 Visualizing Panel At this stage the barrel08 is used in x-axis. And Co2TailpipeGpm on y-axis from the fig 4.5 above it can be seen that there is linear relationship between between them. The gilter is used to see the overlapping cluster by moving it from left to right. Therefore I can see all the clusters have mixed response to the aspiration. Therefore the variable can be change in both x-axis and y-axis to visualise other aspect of result. At this point I realised that the weka has generated extral variables called ‘cluster’ this does not present in the original data set. But it signified the cluster member of various instance. The output also specified the cluster membership of that instances. Fig 4.6 Result from visualized panel 25 This the result from visualizes panel. It show different instances that belong to each clusters. Viewing this panel am able to see which clusters have highest emission. Fig 4.7 the output of percentage split 26 The fig above show the percentage split of 80% for training set and 20% of test data. From the fig 4.7 the test set of 20% used to test the model work pretty well with the model . the car manufacture will be able to see that they are not comply with emission reduction as to what government is emphasized on. 4.2.1 Analysis of CO2 emissions trends 27 From the evidence gathered as shown in the findings and Figure 10 above, it can be analysed further that: The cluster0: This group of cluster consist of 5836 instances which is 17% of total instancing the dataset. This group also has co2tailpipeGpm emission of 464.9743 in their group, this group of cluster has car model of Mutang and transmission is manual with 5-seed. Ucity and UHighway are inversely proportion to highway08. Cluster 1: This group has a car model F150 pickup 4WD, Automatics with 4-speed and the Co2TailpipeGpm of 617.9806 and it has the highest cylinder of 8.0 this make it the highest emitter and also has a displacement (displ) of 5.0. the higest bareel08 0f 22.837 is used per annual which increases its Co2TailpipeGpm emission. Cluster2: This group has Civic model with manual transmission of 5 Speed (spd). This group has the lowest emission of Co2TailpipeGpm of 373.1918, cylinder of 4.0 and displacement of 2.0 which make it emit very low. This emission is very important when considering the global worming effect. This group also uses lower barrels08 (annual petroleum consumption in barrels ) of 13.8302. This group used the lowest barrel08 per annum of 13.8302 but has the highest city08(city MPG) of 22.2531. this group has highway08 of 28.9071 and Ucity of 28.4256, and UHighway of 48.5959 are very high compare to other groups. Cluster3: This group has caravan/Gran caravan with 2 wheel drive with automatic transmission of 5 speed (spd) this group are the second largest emitters with Co2TailpipeGpm of 505.2976, this group has a displacement of 6.0 and cylinder size of 3.0 . This group has highway08 of 21.5997 and Ucity of 19.6514 and also UHighway of 29.973 In summary, the above analysis will help the the car users to identify and determine or target a particular car to buy or what model of car they should go for. This shows that, the classes of cars in cluster 2 will be viable and sought after by the car users because vehicles in this catgory are low emiters of CO2. However, limitations to the above might includes consumers 28 economic powers, taste and fashions and collective family decision reached by consumers when making a choice. Also, the cluster analysis will help to identify the car manufacturers that are following the rules and regulations of the USA government, as non compliance may attract fines, penalty, and prosecutions which may lead to reputation damage for non-compliant car manufacturing organisations and eventual downturn in their volume of activities, earnings and market share. The above analysis provides answers to the research objective one. 4.2.2 Impact of CO2 emissions trends in human life and society If most of the current and potential car users opts for their cars from cluster 2 above , i.e. cars that are low emiters of CO2 and low comsumption of petroleum , this will result in improved health issues such as respiratory and cardiovascular illnesses, emphysema and asthma caused due to high level of CO2 emitted into the atmosphere not only from automotive vehicles, but also from other human activities that claims thousands of life in USA everyear. Many published studies have associated abundance of CO2 to the level of ozone in the atmosphere, hence increased ozone upturns mortality level, whereby particles are responsible for cardiovascular and respiratory illness and asthma. This emission is very important when considering the global warming effect that impact adversely not only in USA, but in our global environment and society. The above analysis answers research objective two question Therefore, compliance with CO2 emissions target will potray car manufacturers as good corporate citizens, improved earnings and growth market shares, Furthermore, reduced health hazards, less risk to human life and less endangered environment and better society will be achieved. Moreover, as stated earlier, not compliance will attract fine, penalty, bad publicity and impaired reputation for car manufacturers, while prolonged health issues will be order of the day in USA. This provide answers to the research question of this study. 29 CHAPTER 5 CONCLUSIONS AND RECOMMENDATION The analysis of CO2 emissions in car manufacturing and evaluation of its impact on human life and society in USA are important phenomena to the government of USA, considering the risks posed to human health and damages done to the environment and society. There is no doubt that, the menace of global warming have not been well addressed properly ,especially in developed nations, which USA is one. Also of serious concern is the fact that transportation contributed about 32% of CO2 enissions in USA. the above enumerated reasons need to be considered carefully, hence, the importance of the above subject-matter. However, based on the literature, there appear to be no comprehensive and detail theory concerning the above context. Hence, the research aimed to fill an important research gap with regards to impact of evaluation of CO2 emissions on human life and society This chapter discusses and summarises the knowledge gained from this study and it also identifies important areas for future research. Section 5.1 discusses the overall conclusions. Recommendations and contribution to knowledge were highlighted in Section 5.2 while Section 5.3 discusses the limitations as well as areas for future research. 5.1 OVERALL CONCLUSIONS The outcomes of this research study provides understanding of the analysis of CO2 emissions trends in USA for improving car manufacturing processess and led to the production of innovative and hybrid cars, i.e. low emiters, e-cars and low combustion engines. Data analysed provided key findings in this perspective. These findings support the linkage which exist between the trends in the analysis of CO2 emissions and its impact on the society and human health. It should be noted that, United States is likely to bear an increasingly disproportionate burden of death if no new restrictions are placed on carbon dioxide emissions, by inhaling a greater abundance of deleterious chemicals due to carbon dioxide and the climate change associated with it. Therefore, the logical step is to reduce carbon dioxide, that would reduce its warming 30 effect and improve the health of people in the USA. and around the world who are currently suffering from air pollution health problems associated with it." The opinions and views expressed in the review of literature were similar to results from the findings,. 5.2 RECOMMENDATIONS AND CONTRIBUTION TO KNOWLEDGE This study contributes to knowledge in human life and environmental issues in particular, the impacts of CO2 emissions on human health, environment, climate and global warming. It is recommended that USA government should enforce vehicle manufacturers that are not compliant with CO2 emissions target to offset part of the medical costs involved with people suffering from air pollution health related diseases. 5.3 LIMITATIONS AND AREAS FOR FUTURE RESEARCH There were limitations to this research which may have affected the comprehensiveness, reliability, and generalisation of the linkages discussed above. Primarily, the data collected for this research were downloaded from the internet, perhaps, another data from different source may provide different results, i.e. data used were drawn from the internet which were considered to be appropriate and feasible for the research study. Also, the results of this project have to do with USA and this might differ in other country contexts, most especially national culture and government policies and practices. differences of which could have an impact on the general applicability of the proposed linkages. The limitations identified above suggested an important areas for future research.. 31 References Aman, A. (1989), “Automotive Engine – A Future Perspective,”. Breuer, K, Satish U. (2003). Emergency Management Simulations: An Approach to the Assessment of Decision-Making Processes in Complex Dynamic Crisis Environments: From Modeling To Managing Security: A System Dynamics Approach (González J.J, ed) Kristiansand, Norway: Norwegian Academic Press, pp. 145–156. Brian J.E. Jr. (1998). Carbon Dioxide and the Cerebral Circulation. Anesthesiology Vol. 88(5): pp. 1365–1386. California Energy Commission. (2008), Building Energy Efficiency Standards for Residential and Nonresidential Buildings. CEC-400-2008-001-CMF. Sacramento, CA: California Energy Commission. Collier, J. (2006), The Automobiles New York: Cavendish Benchmark, ISBN 0-7614-1877-6 Cline, W, (1992), The Economics of Global Warming: Institute for International Economics, Washinton D.C. Fuglestvedt, J. S, Berntsen, T. K, Godal, O, Sausen, R, Shine, K. P, Skodvin, T. (2003), Metrics of climate change: Assessing radiative forcing and emission indices, Climate Change. Vol. 58 (3) pp. 267– 331 G.Mohan, R.Kumar, and T.Ravi,( 2012) ‘Coalescing Clustering and Classification’ [Online] IEEE XPLORER Available from: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6719160 [Access on 23/05/2014] Hammond, G. (2007), Time to give due Weight to the Carbon Footprint Issue: Nature 2007, Vol. 445 (7125) pp. 256– 256 Hellman and Murell, (1986). “Trends in Alternative Measures of Vehicle Fuel Economy,” SAE Paper 861426, Hertwich, E. G. (2005), Lifecycle approaches to Sustainable Consumption: A critical review of Environment, Science & Technology. Vol. 39 (13) pp. 4673– 4684 Intergovernmental Panel on Climate Change (IPCC), (1992), Climate Change 1992: The Supplementary Report to the IPCC Scientific Assessment (Cambridge: Cambridge University Press Lash, J.; Wellington, F. (2007), Competitive Advantage on a Warming Planet, Harvard. Business. Review. 2007, Vol. 85 (3) pp. 94–104 Liu, B. (2011) Web Data Mining.(Exploring Hyperlinks, Contents and Usage Data) second ed. Chicago,USA: Springers 32 Lipsett M.J, Shusterman D.J, Beard R.R. (1994), Inorganic Compounds of Carbon, Nitrogen, and Oxygen In: Patty’s Industrial Hygiene and Toxicology (Clayton G.D, Clayton F.D, ed). New York: John Wiley & Sons, pp. 4523–4554. Malliaris, Hsia and Gould, (1976), Concise Description of Auto Fuel Economy in Recent Years,” February 1976. Matthews, H. S, Hendrickson, C. T, Weber, C. L. (2008), “The Importance of Carbon Footprint” Estimation Boundaries, Journal of Environent, Science & Technology. Vol. 42 (16), pp. 5839– 5842 Metz, B, Davidson, O. R, Bosch, P. R, Dave, R, Meyer, L. A. (2007), Climate change: Mitigation. Contribution of Working Group III to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change; Cambridge University Press: Cambridge. Motor Vehicle Facts and Figures, (1998), Washington, D.C: American Automobile Manufacturers Association. Mureell and Heavenrich, (1989), “Options for Controlling Global Warming: Impact from Motor Vehicles” U.S. EPA, EPA/AA/CTAB/89-08, Mureell and Heavenrich, (1990), Downward Trend in Passenger Car Fuel Economy – A View of Recent Data,” U.S. EPA, EPA/AA/CTAB/90-01. National Research Council (NRC) (2010), Advancing the Science of Climate Change: The National Academies Press, Washington, DC. Office of Transportation and Air Quality, (2012), “Light-Day Automotive Technology, Caarbon dioxide Emissions and Fuel Trends 1975 through 2011” U.S. EPA420-R-12-001, OSHA (Occupational Safety and Health Administration). (2012), Sampling and Analytical Methods: Carbon Dioxide in Workplace Atmospheres. Available: http://www.osha.gov/dts/sltc/methods/inorganic/id172/id172.html [accessed 7 April 2014]. Pechan & Associates, Inc., (2008), “The Emmisions and Generation Resource Integrated Database for 2007. Technical Support Document,” prepared for the U.S. Environmental Protection Agency, Washinton, D C, September 2008. Persily AK, Gorfain J. (2008), Analysis of Ventilation Data from the U.S. Environmental Protection Agency Building Assessment Survey and Evaluation (BASE) Study. NISTIR7145-Revised Gaithersburg, MD: National Institute for Standards and Technology. Peters, G. P, Marland, G, Hertwich, E. G, Saikku, L, Rautiainen, A, Kauppi, P. (2009), “Trade, Transport, and Sinks extend the Carbon Dioxide Responsibility of Countries” Climate. Change, DOI: 10.1007/s10584-009-9606-2. Rober M. Heavenrich, (2006). “Light-Day Automotive Technology and Fuel Trends 1975 though 2006” U.S. EPA420-R-06-011, 33 R. Weber, (1997), Acedemia.ed ‘Data preprocessing and intelligent Data Analysis’ [Online] Available from http://www.academia.edu/2883703/Data_Preprocessing_and_Intelligent_Data_Analysis [Accessed on 23/05/2014] Saunders, M. Lewis, P. and Thornhill, A. (2001). Research Methods for Business Students. 2nd Edition, Prentice-Hall, London. Saunders, M. Lewis, P. and Thornhill, A. (2012). Research Methods for Business Students. Hallow: Pearson. Shelef M. (1994), Unanticipated Benefits of Automotive Emission Control: Reduction in Fatalities by Motor Vehicle Exhaust Gas. Journal of Science and Total Environmental, pp. 146-147; 93-101. Suh, S. (2006), Are services better for climate change? Journal of Environment, Science. Technology, Vol. 40 (21), pp. 6555– 6560 The Vehicle Production Group LLC, (2012), http://www.vpgautos.com/ last assessed March 2014. U.S. Department of State (2007). Fourth Climate Action Report to the UN Framework Convention on Climate Change: Projected Greenhouse Gas Emissions. U.S. Department of State, Washington, DC, USA “U.S. Environmental Protection Agency, (1972), \fuel \economy and Emission Control” November 1972. US. Department Of Energy. (2014), [Online] Energy Efficient & renewable Energy Available from http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zipvvv. [Accessed on 20/03/2014] US. Department Of Energy. (2014), [Online] Energy Efficient & renewable Energy Available from http://www.fueleconomy.gov/feg/ws/index.shtml#vehicle [Accessed on 20/03/2014]. Weber, C. L, Matthews, H. S. (2008), Quantifying the Global and Distributional Aspects of American Household Carbon Footprint Journal of Ecology Economics. Vol. 66 pp. (2−3) 379– 391 Weber, C. L, Matthews, H. S. (2007), Embodied Environmental Emissions in U.S. International Trade, 1997−2004 Environmental. Science. Technology. 2007, Vol. 41(14) pp. 4875– 4881 34 Weidema, B. P, Thrane, M, Christensen, P, Schmidt, J, Lokke, S. (2008), Carbon Footprint: A Catalyst for Life-cycle Assessment Journal of Industrial, Ecological. Vol. 12(1) pp. 3– 6 Wiedmann, T, Minx, J. (2008), A Definition of “Carbon Footprint”. In Ecological Economics Research Trends, Pertsova, C. C, Ed., Nova Science Publishers, Inc.: New York, pp. 1− 11. Woods S.W, Charney D.S, Goodman W.K, Heninger G.R. (1988), Carbon Dioxide–induced Anxiety: Bbehavioral, Physiologic, and Biochemical Effects of Carbon Dioxide in Patients with Panic Disorders and Healthy Subjects. Arch Gen Psychiatry. Vol. 45(1), pp. 43–52. Witten, I.H., Frank, E., and Hall, M.A.,( 2011), Data Mining (Practical Machine Tools and Techniques. 3rd ed. Burlinton, USA: MK (morgan kaufmann) Zentut,( 2014) ‘what is data mining’[Online] Available from ttp://www.zentut.com/datamining/what-is-data-mining. [Accessed on 20/05/2014] 35 8.0 Appendix 8.1 PROJECT PROPOSAL TITLE: EVALUATION OF CO2 EMISSION RATE OF DIFFERENT CAR MANUFACTURE IN UNITED STATE OF AMERICA. WRITTEN BY Taofikat Shade Salawu, 13120387, [email protected] Higher Diploma in Science in Data Analytics Date: 20- 02-2014 36 Objectives and Contribution to the Knowledge: The project will enhance any research to use it as a base of their knowledge. The main objective of this project is to identify which of the car manufacture has impacted positively in its CO2 emission reduction. Also to identify the trend of reduction in CO2 emission. To verify the data collected are correct and up to date to achieve the accurate analysis of datasets. The data has been generated, collected and store, this project will visualize the statistical pattern, trends and information and information which is hidden in the data. (http://en.wikipedia.org/wiki/data_mining#process) Backgroung: According to US Environmental Protection Agency, Both GHG gases in the atmosphere, which makes the Earth warmer as it increases. Humanbeing are adding more to the atmosphere, the effect of these Gases on climate change depends on three main factors: 1. How much, 2.How long and 3. How powerful is this gas on climate change. ( US EPA, 2011) available at http://www.epa.gov/climatestudents/basics/index.html, access on 18/02/14 The majority of Green House Gas is mainly from CO2 while others of GHG are CH4 (methane), N2O (nitrous oxide) and they are just little in percentage compare to CO2. This project will look into how to calculate or how to measure the GHG from different car type in other to make a better prediction for future. “Transportation: The combustion of fossil fuels such as gasoline and diesel to transport people and goods is the second largest source of CO2 emissions, accounting for about 31% of total U.S. CO2 emissions and 26% of total U.S. greenhouse gas emissions in 2011. This category includes transportation sources such as highway vehicles, air travel, marine transportation, and rail.” (US EPA, 2011). This project will basically show what people can do to reduce the emission. According to US EPA, generally it is more difficult to calculate vehicle emission of CH4, N2O and fluorinate gas. (US EPA, 2011) Technical Approach: The technical approach will include research and collection of data, Analysis of data through different software Application i.e weka and R studio. Data Mining: This will include Anomaly detection(outlier/ change/deviation detection), Association rule, Cluster, Classification, Regression, and Summarization. (Fayyad, et al, 2008). Finally is the verification of the pattern produce by data mining algorithms in the wider dataset. Available at (www.analythicsasaservice.org). System/ Dataset: The free open source data mining software and application I will use is WEKA is suitable machine learning software application and is written in java programming language. The Data I will use, the source are from test Car List Data Files (US EPA, 2013) available at 37 www.epa.org/testcars/ R_studio is another software package that I will use to analyses the data. The dataset will include two different year of car testing data set. Evaluation, Test and analysis: This stage will involve collection of data from data source, and importing the data in to Weka software by the use of explorer or experiment to process the data i.e classify, cluster, association rule, select the attribute and visualized the output. The analysis will include comparing the two years of car model and CO2 emission btw different cars and the car type that gives higheremission rate. According to (Ian H. Witten et al, 2011), the analysis panel will be used to perform statistical significant test of one learn scheme Special Resource Required: Not available at the moment Consultation with Specialization Person(s): Dr Leona Lecturer Student Signature:Taofikat Shade Salawu, 13120387 References: (US EPA, 2014)http://www.epa.gov/otaq/climate/whatyoucando.htm (US EPA, 2013), http://www.epa.gov/otaq/climate/strategies.htm Witten. H, Frank. E, Hall. A (2011) Data Mining: practical machine learning tools and techniques. 38 8.2 SPECIFICATION REQUIREMENT TITLE: EVALUATION OF CO2 EMISSION IN CAR MANUFACTURIN IN USA. By: Taofikat Shade Salawu Student no: x13120387 Student Email : [email protected] 39 1 INTRODUCTION The main driver of global warming is the greenhouse gas (GHG) carbon dioxide It is produce mainly by burning fossil fuel and cause 60% of the anthropogenic greenhouse effect. Due to its intensive use of fossil fuel the transport sector is one of the main emitters of CO2. (M.Achtncht, 2011) Available http://linkspringer.com/aticle/10.1007/510584-0110362-8 “Greenhouse gases trap heat in the atmosphere, which makes the Earth warmer. People are adding several types of greenhouse gases to the atmosphere, and each gas's effect on climate change depends on three main factor, how much, how long, and how powerful” (Us epa, 2013) Available at http://www.epa.gov/climatestudents/basics/today/greenhousegases.html 1.1 Purpose The purpose of this document is to set out the requirement of carbon dioxide emission rate in a different car manufacture. And to developed the use case for the user to have the knowledge about the emission rate of different cars. This project will use different software to analyse the data and to make a future prediction. The intended customers are: Automobile industry, Road transport authority, Environmental protection agency. The intended attribute were vehicle manufacture name, and the four greenhouse gas i.e methane (CH4), Hydrocarbon (CH), Carbon monoxide (CO), Carbon dioxide (CO2). Project Scope The scope of this project is to develop which of the car manufacture have the highest CO2 emission. And action to reduce the CO2 emission by looking into past history of co2 emission. According to US EPA, The document will design to estimate the highest CO2 emission of car with the intention to characterizing the higher emitting car from the dataset. (US EPA, 2012) Available at http://www.epa.gov/otaq/emission-factorsresearch/420r08010.pdf The software application will be use in this project are WEKA and R, this software application should be able to understand by the user in making a choice of which car is environmental friendly. 40 Description of the dataset The dataset collected for this project is from (US EPA), Available at (http://www.epa.gov/oms/tcldata.htm) The us epa have a lot of data store in their data base which is made available to public for research purpose. In this project I choose 3 different year for full training set and another year for subset of training. The dataset consist of 4620, 51 column. Data Source US EPA, 2013 Available at http://www.epa.gov/oms/tcldata.htm User interface: Input Data in to weka: 41 This section will involve the use of explorer in weka, where data will be pre-process. In explorer there is six different panel that support different task of data mining. The pre-process will show what the maximum, minimum, mean, and standard deviation of the data loaded in to weka. Below is the screen shot of pre-processing stage. Pre-processing stage 42 Cluster stage 43 44 Screen shot of attribute selection in weka explorer Definitions, Acronyms and Abbreviations: Some of abbreviation: CH4 (Methane), CO2 (Carbon dioxide), NOx The rest will be filled in later as project progresses. User Requirements Definition: Users include: Automobile industry, Government i.e. insurance company, Revenue, Driver, car dealer, community. The definition is not available at the moment. Overview: This section will describe what to follow and how it will be organize. 45 Visualisation stage 46 KEY words CO2 CAR MANUFACTURING in USA—overview, GHG Evaluation—effect on society Analysis of Result. Objective:To verify if the car manufacural are achieving their co2 emission reduction target. 2. To evaluate the impact on society if CO2 emission strategy is achieve or not. DO they achieve the target. Under literature review. Talk about Hybrid Car. 8.3.1 Management Progress Report 1 47 TITLE: EVALUATION OF CO2 EMISSION RATE OF DIFFERENT CAR MANUFACTURE IN UNITED STATE OF AMERICA. By Taofikat Shade Salawu X13120387,[email protected] Date: 20- 02-2014 Higher Diploma in Science in Data Analytics Table of Content Pages Highlight Report------------------------------------------------------------ 3 Highlight Report Purpose-------------------------------------------------- 3 Activity during the period-------------------------------------------------- 3 Issue arising------------------------------------------------------------------- 3 Research Task-----------------------------------------------------------------3 48 Analysis tasks------------------------------------------------------------------3 Statistic Aspect----------------------------------------------------------------4 Machine Learning-------------------------------------------------------------4 The Key Achievement--------------------------------------------------------4 Key Milestones Achieved Since Last Report------------------------------4 Planned work for Next Period------------------------------------------------4 Appendix2-----------------------------------------------------------------------5 Management progress report 1. Highlight Report from 08/03/14 to 16/03/14 2. Highlight Report Purpose: This Highlight report will provides the Project supervisor with a summary of the status of a project at agreed stages this summary is used to monitor progress. The supervisor will use the Highlight report to advice base on Project of any potential problems or areas that may arise. 49 3. Activity during the period: During this period, I collected the data from US EPA and cleaning has been carried out. I started the analysis of the data, by loading it in to WEKA, pre-processing took place. Visualisation, Classification, Clustering and regression analysis was carried out 4. Issue arising: Initially it was hard to get the code to clean the data, but at end I finally got one that work perfectly. Based on these project the data collected will be analyse on statistical methods and other data mining techniques 5. Research Task: The research task include finding the dataset which was explain in the project specification. This was carried out by searching different web site before I decided to go on U.S environmental protection agency. Getting to know the dataset is very tedious which consumed most of the time and the next step of this task is to do the data cleansing. The next stage will be the data pre-processing. After the pre-processing is the main data process which include classification of attributes, clustering, visualization, and association rule. 1.1.1. 6. Analysis tasks: 6.1 Statistic Aspect: Use of Multiple Linear Regression USE Of R: Using of this software, the data was successfully imported and data frame was visualized. Multiple linear regression: this model uses more than one x variable to estimate the value of y in this case sample of data are choosing from raw dataset in other to determine how they are related and how significance they are. At the moment the multiple regression is determined. And In the main project the Strength and weaknesses will be discussed. Statistic aspect: However, statistical techniques are driven by the data and are used to discover patterns and build predictive models. 6.2 Machine Learning: This is carried out by Weka software and data was loaded in successfully. 50 The task will include regression, classification, clustering, Association and trend analysis also include determination of Gas mileage and fuel types for different model cars. 7. The Key Achievement at this stage are as follow: The data pre-processing is achieved, Statistic aspect is done. 8. Key Milestones Achieved Since Last Report: 9. Planned work for Next Period (After the Reading week) to 22/03/14 At this point I should be able to do the main body of the project including the final analysis. Appendix 1 51 The above image is the visualisation aspect in weka software in which the Rate Horse power is on x-axis and CO2 (g/mi) is on y-axis Appendix2 52 This is classifier output Appendix 3 53 MindGenius tools 54 8.3.1 EVALUATION OF CO2 EMISSION OF A DIFFERENT CAR MANUFACTURE IN USA. TAOFIKAT SHADE SALAWU XI3120387 [email protected] MANAGEMENT PROGRESS REPORT HIGHER DIPLOMAL IN SCIENCE IN DATA ANALYTICS 55 MANAGEMENT PROGRESS REPORT 3 Highlight Report from 24/03/14 to 04 /05/14 Highlight Report Purpose: This Highlight report will provides the Project supervisor with a summary of the status of a project at agreed stages this summary is used to monitor progress. The supervisor will use the Highlight report to advice base on Project of any potential problems or areas that may arise Activity during the period: DATA SOURCE: Official U S government source for fuel economy information. Fuel economy data are the result of vehicle testing done at the Environmental Protection Agency's National Vehicle and Fuel Emissions. Data set: the dataset consist of fuel economy data. The raw data consist of 34437 variables and 71 attributes, which is 11,542 KB in size. Data Description: The dataset consist of categorical, numeric and logical attributes The data set, in its original format, is in comma delimited i.e. (csv) stored as tabular data with headers in a spreadsheet. With data sets accompany documentation. Vehicle.cs: this is the name of the data set from U S fuel economy web site, with the accompany documentation showing the name of attributes. The column name and definition are showing in the accompany documentation. The original file was downloaded from http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip DESCRIPTION OF THE DATA TYPE The data type used in this project consist of: From previous management report, the data consist of categorical data, numeric and logical data. I did the data cleaning in Excel and save it in comma delimited csv file. The dataset was save in csv from the source and it was save as vehicles.csv. The description of dataset was carried out. The next stage is to Review others people work that are related to my project. I.e. to identify papers/solutions related to your project. 56 GUI graphical user interface WEKA According to (I.Witten, E Frank, M.Hall, 2011), Weka workbench is a collection of machine learning algorithms for data mining tasks. It consist of many algorithm. Its design is so simple and easy to understand. It was developed in New Zealand at university of Waikato. The system is writing in Java and distributed under the umbrella of GNU General public licence. It run on all Linux, windows and also run on personal digital assistant. Weak has a powerful tool for data pre-processing, classification, clustering, Regression, association rules, and data visualization. Weka is wellsuited for developing new machine learning schemes. Weka provide learning algorithm that is easily apply to dataset, also allow user to pre-process the dataset feed it into a learning scheme, and analyse the classifier result and its performance. (I.Witten, E. Frank, M.Hall, 2011) THE ISSUE ARISING The previous dataset was discard because it was not enough in given the accurate analysis. The pre process was done on the new dataset. The Attributes chosen is the main problem now which am contemplating of changing the. They do not relate well from the graph. At this moment the current issue is to find a related work and girder useful information. SCOPE The scope of this project will focus on which car has the most co2 emission Reduction. And also to see which manufacture follows the U S government reduction standard. THE RESEARCH QUESTION What impact will adherence and non-adherence to CO2 emission reduction strategy have on car manufacturers and human life. RESEARCH TASK: To address the above research question the below stated the objectives were outline 57 1. To analyse and evaluate the trend of CO2 emission reduction strategy from different car manufacturs in USA 2. To evaluate the impact of the above on human life. As a data analytic: The main goal of this project is to develop approaches, methods and tools to improve, simplify and reduce the effort of CO2 emission involved in data for analytics purposes. So as encourage or help the community in selecting the appropriate type of Car with lower emission and fuel economy efficiency. Quality Related work: The related work is still going on at the moment Problem at Hand: The clustering is the process of grouping the dataset into clusters this is carried out in weka. Weka is a collection of several machine learning algorithm that can easily apply to the data set. Task for next stages: This will include the analysis of related work and analysis of CO2 to predict the future emission and it impact on Global warming. 58 FIG WEKA USER INTERFACE The above fig1 showing the clustering stage in weka. 59 60 8.4 Data Description available at http://www.fueleconomy.gov/feg/ws/index.shtml#vehicle vehicle atvtype - type of alternative fuel or advanced technology vehicle barrels08 - annual petroleum consumption in barrels for fuelType1 (1) barrelsA08 - annual petroleum consumption in barrels for fuelType2 (1) charge120 - time to charge an electric vehicle in hours at 120 V charge240 - time to charge an electric vehicle in hours at 240 V city08 - city MPG for fuelType1 (2) city08U - unrounded city MPG for fuelType1 (2), (3) cityA08 - city MPG for fuelType2 (2) cityA08U - unrounded city MPG for fuelType2 (2), (3) cityCD - city gasoline consumption (gallons/100 miles) in charge depleting mode (4) cityE - city electricity consumption in kw-hrs/100 miles cityUF - EPA city utility factor (share of electricity) for PHEV co2 - tailpipe CO2 in grams/mile for fuelType1 (5) co2A - tailpipe CO2 in grams/mile for fuelType2 (5) co2TailpipeAGpm - tailpipe CO2 in grams/mile for fuelType2 (5) co2TailpipeGpm- tailpipe CO2 in grams/mile for fuelType1 (5) comb08 - combined MPG for fuelType1 (2) comb08U - unrounded combined MPG for fuelType1 (2), (3) combA08 - combined MPG for fuelType2 (2) combA08U - unrounded combined MPG for fuelType2 (2), (3) combE - combined electricity consumption in kw-hrs/100 miles 61 combinedCD - combined gasoline consumption (gallons/100 miles) in charge depleting mode (4) combinedUF - EPA combined utility factor (share of electricity) for PHEV cylinders - engine cylinders displ - engine displacement in liters drive - drive axle type emissionsList engId - EPA model type index eng_dscr - engine descriptor; see http://www.fueleconomy.gov/feg/findacarhelp.shtml#engine evMotor - electric motor (kw-hrs) feScore - EPA Fuel Economy Score (-1 = Not available) fuelCost08 - annual fuel cost for fuelType1 ($) (7) fuelCostA08 - annual fuel cost for fuelType2 ($) (7) fuelType - fuel type with fuelType1 and fuelType2 (if applicable) fuelType1 - fuel type 1. For single fuel vehicles, this will be the only fuel. For dual fuel vehicles, this will be the conventional fuel. fuelType2 - fuel type 2. For dual fuel vehicles, this will be the alternative fuel (e.g. E85, Electricity, CNG, LPG). For single fuel vehicles, this field is not used ghgScore - EPA GHG score (-1 = Not available) ghgScoreA - EPA GHG score for dual fuel vehicle running on the alternative fuel (1 = Not available) guzzler- if G or T, this vehicle is subject to the gas guzzler tax highway08 - highway MPG for fuelType1 (2) highway08U - unrounded highway MPG for fuelType1 (2), (3) highwayA08 - highway MPG for fuelType2 (2) highwayA08U - unrounded highway MPG for fuelType2 (2),(3) highwayCD - highway gasoline consumption (gallons/100miles) in charge depleting mode (4) highwayE - highway electricity consumption in kw-hrs/100 miles highwayUF - EPA highway utility factor (share of electricity) for PHEV hlv - hatchback luggage volume (cubic feet) (8) hpv - hatchback passenger volume (cubic feet) (8) id - vehicle record id lv2 - 2 door luggage volume (cubic feet) (8) 62 lv4 - 4 door luggage volume (cubic feet) (8) make - manufacturer (division) mfrCode - 3-character manufacturer code model - model name (carline) mpgData - has Your MPG data; see yourMpgVehicle and yourMpgDriverVehicle phevBlended - if true, this vehicle operates on a blend of gasoline and electricity in charge depleting mode pv2 - 2-door passenger volume (cubic feet) (8) pv4 - 4-door passenger volume (cubic feet) (8) rangeA - EPA range for fuelType2 rangeCityA - EPA city range for fuelType2 rangeHwyA - EPA highway range for fuelType2 trans_dscr - transmission descriptor; see http://www.fueleconomy.gov/feg/findacarhelp.shtml#trany trany - transmission UCity - unadjusted city MPG for fuelType1; see the description of the EPA test procedures UCityA - unadjusted city MPG for fuelType2; see the description of the EPA test procedures UHighway - unadjusted highway MPG for fuelType1; see the description of the EPA test procedures UHighwayA - unadjusted highway MPG for fuelType2; see the description of the EPA test procedures VClass - EPA vehicle size class year - model year youSaveSpend - you save/spend over 5 years compared to an average car ($). Savings are positive; a greater amount spent yields a negative number. For dual fuel vehicles, this is the cost savings for gasoline. sCharger - if S, this vehicle is supercharged tCharger - if T, this vehicle is turbocharged emissions emissionsList o emissionsInfo o efid - engine family ID o id - vehicle record ID (links emission data to the vehicle record) 63 o salesArea - EPA sales area code o score - EPA 1-10 smog rating for fuelType1 o scoreAlt - EPA 1-10 smog rating for fuelType2 o smartwayScore - SmartWay Code o standard - Vehicle Emission Standard Code o stdText - Vehicle Emission Standard fuel prices fuelPrices o midgrade - $ per gallon of midgrade gasoline(9) o premium - $ per gallon of premium gasoline(9) o regular - $ per gallon of regular gasoline(9) o cng - $ per gallon of gasoline equivalent (GGE) of compressed natural gas(10) o diesel - $ per gallon of diesel(9) o e85 - $ per gallon of E85(10) o electric - $ per kw-hr of electricity(10) o lpg - $ per gallon of propane(10) yourMpgVehicle - summary of all Your MPG data for this vehicle avgMpg - harmonic mean of average MPG shared by fueleconomy.gov users cityPercent - average % city miles highwayPercent - average % highway miles maxMpg - maximum user average MPG minMpg - minimum user average MPG recordCount - number of records for this vehicle vehicleId - vehicle record id (links Your MPG data to the vehicle record) yourMpgDriverVehicle - summary of driver data reported for this vehicle cityPercent - user average % city miles highwayPercent - user average % highway miles lastDate - date records were last updated (yyyy-mm-dd) mpg - average MPG state - state of residence vehicleId - vehicle record ID (links Your MPG data to the vehicle record) 64 Footnotes: (1) 1 barrel = 42 gallons. Petroleum consumption is estimated using the Department of Energy's GREET model and includes petroleum consumed from production and refining to distribution and final use. Vehicle manufacture is excluded. (2) EPA revised how MPG is calculated for 2008 and later model year vehicles. MPG estimates for 1985-2007 model year vehicles have been updated to make them comparable to the estimates for 2008 and later vehicles. These are not the original EPA MPG estimates for these vehicles. For electric and CNG vehicles this number is MPGe (gasoline equivalent miles per gallon). (3) Unrounded MPG values are not available for some vehicles. (4) This field is only used for blended PHEV vehicles. (5) For model year 2013 and beyond, tailpipe CO2 is based on EPA tests. For previous model years, CO2 is estimated using an EPA emission factor. -1 = Not Available. (6) For PHEVs this is the charge depleting range. (7) Annual fuel cost is based on 15,000 miles, 55% city driving, and the price of fuel used by the vehicle. (8) Interior volume dimensions are not required for two-seater passenger cars or any vehicle classified as truck which includes vans, pickups, special purpose vehicles, minivan and sport utility vehicles. (9) Fuel prices for gasoline and diesel fuel are from the Energy Information Administration and are updated weekly. (10) Fuel prices for E85, LPG, and CNG are from the Office of Energy Efficiency and Renewable Energy's Alternative Fuel Price Report and are updated quarterly. 65
© Copyright 2026 Paperzz