1 Big Data Complexities for Scientific Computing in the Oil and Gas Industry noSQL, SQL, and mo’SQL http://www.limitpoint.com/images/Publications/BigDataInOilAndGas.pdf David M. Butler, President Limit Point Systems, Inc. © Limit Point Systems, Inc. 2014 2 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model © Limit Point Systems, Inc. 2014 3 The oil and gas business Adapted from [Krebbers] “Upstream” is exploration and production (“E&P”) (upper left) “Downstream” is transportation, refining, and marketing (lower right) © Limit Point Systems, Inc. 2014 4 Major Acquired “Upstream” Data Types Time lapse raw seismic Time lapse prestack seismic image Time lapse poststack seismic image Well logs Production monitoring dozens of other data types © Limit Point Systems, Inc. 2014 5 Time lapse raw seismic data each sensor gives amplitude as a function of time ~10K sensors moving towards ~1M ~10K shots ~5K samples/shot ~4 – 12 bytes/sample time lapse: repeat ~2/year ~10 years from [KrisEnergy] ~10 TB/project*~100 projects/year/major company ~1PB/year/major © Limit Point Systems, Inc. 2014 6 Time lapse prestack seismic image data clean up seismic data remove noise remove artifacts other signal processing operations “migrate” data focus signal energy convert time to position up to 5D array of data reflectivity as a function of 3D position source-sensor 2D offset ~same size as raw seismic © Limit Point Systems, Inc. 2014 7 Poststack seismic image data “stack” of prestack data aggregate over 1 or more array indices reduces size ~100x 2D or 3D image reflectivity as function of position similar to medical ultrasound image [epmag 1] interpret to produce model of subsurface © Limit Point Systems, Inc. 2014 8 Well logs lower sensor package into well measure various properties as a function of depth ~10k samples ~1k components simple numbers bore hole images others typically done once before production starts [decogeo] ~100MB/well*~1K wells/year/major ~ 100GB/year/major © Limit Point Systems, Inc. 2014 9 Production monitoring Classical methods at well head flow volumes gas/oil/water composition temperature pressure Distributed sensing methods fiber optic cables in well acoustic sensing temperature sensing ~1000 equivalent discrete sensors ~1k samples/sec continuous monitoring ~10-100GB/day/well function of time and position along well path ~1K wells (growing rapidly) ~1PB/year/major © Limit Point Systems, Inc. 2014 [epmag 2] [slb 1] 10 Major interpreted/modeled data types Geological structure model Velocity model Basin model Reservoir models geological quantitative engineering Geomechanical model dozens of other data types © Limit Point Systems, Inc. 2014 11 Geological structure model geologist interprets seismic image identifies surfaces defining rock strata and faults very complex networks of intersecting surfaces iterative process seismic image depends on acoustic velocity acoustic velocity depends on rock type rock type interpreted from seismic image and well data ~1GB/structure ~1K structures/year/major ~1TB/year/major © Limit Point Systems, Inc. 2014 12 Velocity model velocity of sound as a function of position in volume corresponding to geological structure scalar, vector, or tensor models used to produce seismic images accurate velocity model key to good seismic image ~1-10GB/model [geosoft] [pdgm 1] ~1K models/year/major ~1TB/year/major © Limit Point Systems, Inc. 2014 13 Basin model dynamic model of entire sedimentary basin rock movement fluid movement study history of hydrocarbon deposits generation expulsion migration to reservoir entrapment useful in predicting whether structure contains oil or gas ~100GB/model*~100/year/major ~10TB/year/major © Limit Point Systems, Inc. 2014 [outernode] 14 Reservoir models static models prior to production estimate volume and other properities dynamic models fluid flow fluid composition function of position and time used to guide drilling & production keep wells producing ~100GB/project [dgi] many fields, many versions/year/major ~100 TB/year/major © Limit Point Systems, Inc. 2014 15 Geomechanical model simulation of mechanical stresses and strains whole subsurface specific reservoirs stress, strain, deformation as function of position and time used to anticipate mechanical changes around bore hole and in reservoir ~1-10GB/model [slb3] ~100 models/year/major ~100GB/year/major © Limit Point Systems, Inc. 2014 16 Summary of “Upstream” Data Types (Order of magnitude estimates) Variety Volume (/object) Velocity (/year/major) Raw seismic ~1TB ~1PB Prestack seismic ~1TB ~1PB Poststack seismic ~10GB ~10TB Well logs ~100MB/well ~100GB Production monitoring ~10GB ~1PB Geological structure ~1GB ~1TB Velocity model ~1GB ~1TB Basin model ~100GB ~10TB Reservoir models ~100GB ~100TB Geomechanical model ~1GB ~100GB dozens of other data types, all important variety rather than volume or velocity is dominant feature © Limit Point Systems, Inc. 2014 17 Upstream Data Flow (partial) [cda] complex interoperation between data types © Limit Point Systems, Inc. 2014 18 Shared Earth Model concept integrated data base for evolving models of subsurface all data types multiple scales structure reservoir basin multiple interpretations and versions per object uncertainty quantification for everything provenance for everything constantly evolving holy grail of Exploration and Production (“E&P”) data integration in practice: still mostly vendor proprietary islands of integration © Limit Point Systems, Inc. 2014 19 Shared Earth Model conceptually similar to conventional enterprise data warehouse analysis and report oriented rather than transaction oriented integrates data from many different applications Extract-Transfer-Load (“ETL”) processes a critical component conventional warehouse and ETL relational data model provides conceptual framework Shared Earth Model for E&P data relational data model has not proven particularly useful why not? most data is physicist’s “field” data © Limit Point Systems, Inc. 2014 20 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model © Limit Point Systems, Inc. 2014 21 Field Theory for Data Scientists physicist’s “field” not same as database admin’s “field” field describes some physical property as function of position and/or time in some physical object position in a physical object physical property physical property as a function of position use a simple example to introduce these ideas © Limit Point Systems, Inc. 2014 22 A simple example derrick floor Upper well well junction Lower well bore 2 bore 1 Branched well © Limit Point Systems, Inc. 2014 23 position in a physical object position represented y R2 by coordinate vector 𝑥(𝑝) 𝑟⃗ = 𝑦(𝑝) y(p) p x(p) © Limit Point Systems, Inc. 2014 x 24 Physical property physical property types specified by mathematical physics family of types jointly referred to as multilinear algebra scalar types single number F vector types 𝐹0 ⃗ column of numbers 𝐹 = 𝐹1 tensor types 𝐹00 ⃡ matrix of numbers 𝐹 = 𝐹10 𝐹01 𝐹11 each has important algebraic properties a few dozen standard types, many more app specific types © Limit Point Systems, Inc. 2014 25 Physical property as a function of position function (map) from physical space to property space associates a value of F with each p in the object 𝑭𝟎𝟎 𝑭 𝒓 = 𝑭𝟏𝟏 𝑭𝟎𝟎 𝑭𝟏𝟏 𝒙 𝒚 y R2 p y(p) infinite number of points infinite number of property values x(p) how do we represent this on the computer? © Limit Point Systems, Inc. 2014 𝑭𝟎𝟎 𝑭𝟏𝟏 x 𝑭𝟎𝟎 𝑭𝟏𝟏 26 How do we represent a field on the computer? numerous methods small industry busy creating new methods makes interoperation and integration difficult some common features decompose physical object into simple pieces approximate by simple function on each piece © Limit Point Systems, Inc. 2014 27 Decompose physical object into simple pieces mathematicians call each piece a “cell” decomposition is a “cell complex” df df s0 v1 s1 j j s2 v3 s3 v4 more commonly called a “mesh” © Limit Point Systems, Inc. 2014 s4 v5 s5 v6 28 Approximate by simple function on each cell for each cell c: store a data tuple specify an evaluation example: linear interpolation F method evaluation method F(p) = evalc(p)(p, data tuple) data tuple may or may not correspond to value of field at some point depends on evaluation method F1 value(p) F0 v0 p v1 u(p) value(p) = u*F1 + (1-u)*F0 data for entire field is an array of tuples © Limit Point Systems, Inc. 2014 29 Data for entire field is an array of tuples cell 0 cell 1 cell 2 scalar F0 F1 vector F0,0 ... F2 cell 0 cell n-1 cell 1 F1,0 F0,1 Fn-1 cell 2 F1,1 F0,2 cell 0 tensor F00,0 F01,0 F10,0 cell n-1 F1,2 ... F0,n-1 F1,n-1 cell 1 F11,0 F00,1 F01,1 F10,1 cell n-1 F11,1 ... F00,n-1 F01,n-1 F10,n-1 F11,n-1 tuple components typically real (float or double) but may be of any type © Limit Point Systems, Inc. 2014 30 How do we want to use field data? operations specified by mathematical physics five main categories topological operations compose and decompose geometric operations change the shape functional operations set and get the value at a point move field from one mesh to another algebraic operations add, subtract, multiply, divide, diagonalize, ... calculus operations differentiate and integrate © Limit Point Systems, Inc. 2014 31 Why isn’t the relational model useful for field data? doesn’t fit the way we want to store field data relational schema can’t directly capture field entity captures data tuple entity instead of entire field entity field entity has to be reconstructed by queries normalization forces introduction of surrogate keys may require recursive queries doesn’t fit the way we want to use field data table operations are too low level aren’t useful for high level field operations no pay-off to using relational model most field data is stored in app-specific, proprietary flat files so what data model is useful for field data? © Limit Point Systems, Inc. 2014 32 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model © Limit Point Systems, Inc. 2014 33 The data model paradigm Data model [Codd] specifies class of mathematical objects operations on those objects constraints valid instances must satisfy Languages, libraries, tools based on data model Applications developed on top of tools Numerous benefits © Limit Point Systems, Inc. 2014 34 Benefits of data model paradigm Increases level of abstraction for application development Increases capability of applications Facilitates interoperation and integration Increases productivity of programmers But … © Limit Point Systems, Inc. 2014 35 But … Benefits only accrue if model captures application structure The more structure captured the bigger the benefit Important to capture as much structure as possible © Limit Point Systems, Inc. 2014 36 Spectrum of mathematical structure captured by various data models most noSQL models capture less structure than relational the “no” in noSQL should perhaps be “less” scientific apps have way more mathematical structure relational model isn’t nearly structured enough scientific apps don’t need no Structured Query Language need a (much) more Structured Query Language – mo’SQL © Limit Point Systems, Inc. 2014 37 Data model/mo’SQL requirements must capture common math structure of scientific data scalars, vectors, tensors topology and geometry fields algebra and calculus operations must describe how math entities are represented/stored decomposition into primitive types and operations decomposition for parallelism must maintain rigorous connection between high level semantics and low level implementation need a new data model © Limit Point Systems, Inc. 2014 38 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model © Limit Point Systems, Inc. 2014 39 Sheaf data model objects are discrete sheaves over finite distributive lattices math details: http://www.limitpoint.com/images/Publications/The%20Sheaf%20Data%20Model.pdf finite distributive lattice “part space” all distinct composite parts formed from set of basic parts discrete sheaf describes association of attributes with parts algebraic description of decomposition of abstract data types into tuples of primitive attributes © Limit Point Systems, Inc. 2014 40 Visualizing a finite distributive lattice directed acyclic graph “Hasse diagram” composite part A two kinds of nodes composite parts basic parts covers links represent “covers” covers := immediately basic part B includes A covers B if and only if A includes B there is no C such that A includes C includes B. draw graph so that if A covers B, B is lower on page example © Limit Point Systems, Inc. 2014 covers basic part C 41 Example: branched well well derrick floor lower well upper well Upper well well junction Lower well bore 1 bore 2 bore 2 bore 1 df Well parts junction Hasse diagram basic parts are independent objects composite parts are precisely the sum of their basic parts © Limit Point Systems, Inc. 2014 42 Sheaf table metaphor data base is a set of tables each table represents a type each row an instance each column an attribute rows carry client-defined lattice order col lattice is row lattice of some other table schema are first class objects unified algebraic framework for all common scientific data types © Limit Point Systems, Inc. 2014 43 Unified framework for scientific data types tabular types contains relational model as limiting case row lattice is a boolean lattice physical property types scalars, vectors, tensors object-oriented types with multiple inheritance col lattice is subobject inclusion hierarchy spatial types (meshes) any decomposition of space row lattice represents spatial inclusion field types any property, any mesh, any evaluation method col lattice = tensor(mesh row lattice, property col lattice) rigorous connection between abstract math types and numeric reps from high level specification to tuples of primitives © Limit Point Systems, Inc. 2014 44 Open Source Implementation SheafSystem™ Community Edition C++ libraries with Java, Python, and C# bindings Field API field types pushers refiners Geometry API coordinate sections (invertible sections) point locators Fiber Bundle Data Model API spatial types physical property types groups tensors Jacobians section types Sheaf Data Model API sheaf storage agent HDF5 www.sheafsystem.org or github © Limit Point Systems, Inc. 2014 45 Outline Big Data in oil & gas exploration & production Field theory for data scientists The data model paradigm The sheaf data model A query language for the sheaf data model © Limit Point Systems, Inc. 2014 46 Query language for sheaf data model work in progress with Prof Magne Haveraaen Bergen Language Design Laboratory, University of Bergen started with initial guess at operators extension of relational operators experience with implementation formalizing and refining definitions goal is “mo’SQL” © Limit Point Systems, Inc. 2014 47 Acknowledgements Mark Verschuren, Shell, provided many useful comments and other input for this presentation Original research and development funded by subcontracts B347785, B515090, and B560973 of prime contract W-7405ENG-48 with the Department of Energy National Nuclear Security Administration (DOE/NNSA) Ongoing development has been funded by Shell GameChanger and Shell TaCIT http://www.limitpoint.com/images/Publications/BigDataInOilAndGas.pdf © Limit Point Systems, Inc. 2014 48 END © Limit Point Systems, Inc. 2014 49 References 1 [Krebbers] “Big Data & Analytics: Exploiting it”, Johan Krebbers, VP Architecture, Shell http://cdn.osisoft.com/corp/en/media/presentations/2013/ UsersConference2013/PDF/UC2013_Shell_Krebbers_GlobalIT Architecture_1.pdf [KrisEnergy] http://www.krisenergy.com/company/aboutoil-and-gas/exploration/ [epmag 1] http://www.epmag.com/Exploration-GeologyGeophysics/Three-D-Seismic-Advances-Improve-ExplorationSuccess_90469 [decogeo] http://www.decogeo.com/upload/Image/log1_bigl.jpg © Limit Point Systems, Inc. 2014 50 References 2 [epmag 2] http://www.epmag.com/item/DAS-enables- simultaneous-multiwell-VSP_121593 [slb1] http://www.slb.com/resources/case_studies/completions/~/medi a/Images/completions/intelligent/wellwatcher_neon_tp_01tn.jpg [slb 2] System of subsurface faults and horizons in the Gulfaks oil field in the Norwegian sector of the North Sea. Data set courtesy of Schlumberger Limited. [geosoft] http://blogs.geosoft.com/exploringwithdata/2012/08/3dmodelling-with-velocity-volumes-in-gm-sys.html [pdgm 1] http://www.pdgm.com/getmedia/c72b49d9-571b-4fe8ae3f-bfd00f862b0d/Skua-salt2010.jpg.aspx?width=1024&height=650&ext=.jpg © Limit Point Systems, Inc. 2014 51 References 3 [slb 3] http://www.software.slb.com/PublishingImages/total- stress.jpg [dgi] http://www.dgi.com/images/cvslideshow/fullsize/CoViz4D_Slides how_003.jpg [outernode] http://outernode.pir.sa.gov.au/__data/assets/image/0020/119009 /Curnamona_3D.jpg [cda] http://www.oilandgasuk.co.uk/cmsfiles/custom/html/report14.png [Codd] E. F. Codd. 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (June 1970), 377-387. DOI=10.1145/362384.362685 http://doi.acm.org/10.1145/362384.362685 © Limit Point Systems, Inc. 2014
© Copyright 2026 Paperzz