Low Latency Computations on Massive Data Ion Stoica CS Division, UC Berkeley Fujitsu Symposium Mountain View, June 5, 2013 UC BERKELEY Challenges Data grows faster than Moore’s law* Data is dirty » uncurated, no schema, no consistent syntax and sematics Complex questions, e.g., » Is there a virus outbreak? » Is the building structurally safe? *[IDC report, Kathy Yelick, LBNL] Low Latency & Massive Data May not be able to achieve both of them! Even if all data in memory, computation may take tens of seconds Key Insight Answers don’t always need to be exact • Input often noisy: exact computations do not guarantee exact answers • Error often acceptable if small and bounded Best scale ± 0.5lb error Speedometers ± 2.5 % error (edmunds.com) OmniPod Insulin Pump ± 0.96 % error (www.ncbi.nlm.nih.gov/pubmed/22226273) Error-bounded Computations Error depends on sample size (S) not on original data size: » error ~1/ S » E.g., error of a poll on 1,000 people is “same” for a population of 1M or 100M people New generation of scale-independent algorithms What Does It Mean? Can trade between answer’s latency and accuracy Data rapid increase no longer a problem… Moore's Law Data 2012 2014 2016 2018 2020 What Does It Mean? Can trade between answer’s latency and accuracy Data rapid increase no longer a problem… Moore's Law Data Error 2012 2014 2016 2018 2020 Moore’s Law error halves every two years
© Copyright 2026 Paperzz