Facebook, Twitter, Google generating petabytes of data everyday. Hadron Collider project discarding large amount of data as they won’t be able to analyse. Hoping that they haven’t thrown anything valuable. Interesting facts but …. Why is Big Data important? Lets understand via an example Insurance 3rd Party Survey Expert Debates Optimal Price? Bank Optimal Price Maximise Profit Decision Support Data Warehousing System Insurance Repository Competitors Pricing Web Activity Market Trends Bank Run Statistical Algorithms Maximise Profit Transaction Statistics Data Warehouse Optimal Price Optimal Price? Volume Variety Velocity Decision Support System Data Warehousing Insurance Repository Competitors Pricing Web Activity Market Trends Bank Run Statistical Algorithms Maximise Profit Transaction Statistics Data Warehouse Optimal Price Optimal Price? Decision Support System Fundamental block to Digital Nervous System Organisations behaving like Biological nervous system Business @ speed of thought Sense Fundamental Block to Act Data Interpret Data Decide Skynet Avatar Repository Competitors Pricing Web Activity Market Trends Mobile Alert with Travel insurance Bank Transaction Statistics Optimal Price From Gartner: 4.4 Million IT Jobs Globally to Support Big Data By 2015. International Data Corporation’s (IDC) 6th annual study: From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 40,000 exabytes, or 40 trillion gigabytes More than 5,200 gigabytes for every man, woman, and child in 2020. From now until 2020, the digital universe will about double every two years. 33% of the digital data might be valuable if analysed, compared with 25% today. 2003-04 1996-2000 2005-06 2010 2013 Dreadnaught Big Data problem faced by All Search engines Google File System And MapReduce Papers and Mike Hadoop spawns off Nutch Doug Joins Cloudera 0.xx Releases of hadoop YARN/MapReduce 2/ Next Generation Hadoop Price Advantage: 1. Clusters use commodity hardware, cheaper than one expensive server. 2. Software License is free. User Google MapReduce file1 MapReduce Name node HDFS Google File System map map map map Data nodes map Reduce Oozie Yahoo Pig Facebook Hive MapReduce HBase Sqoop/Flume Structured Stores HDFS Storm Chukwa Log collection Kafka Message broker 1. Complex Algorithms needs to be correctly sensitive to week correlations. 2. Complex Algorithms are thus difficult to code and design. Complex Algorithm on a small dataset Simple Algorithm on a large dataset Data Engineer Data Scientist Role To engineer software solutions. To solve business problems using data. Skills More of programing and technical skills and ability to architect technical solutions. Strong of Mathematical Skills and understanding of statistical Models. -> Skeleton Version -> All the ecosystems need to be additionally installed. -> Important ecosystem members included. -> Proprietary Hadoop code written in C. -> Integrated with Hadoop ecosystem members. -> Few Proprietary tools like Enterprise Manager. -> Based out of Apache hadoop. -> Supports .NET framework -> Launches Hadoop Distribution: Pivotal HD Thank You!!! Superstar-Doug!!! A small fan :- Me And the real Hadoop
© Copyright 2026 Paperzz