“An automated tool designed to ease the pain of test creation and maintenance.” Nil Weerasinghe Bryan Robbins Mohamed Ibrahim About FINRA ■ Financial Industry Regulatory Authority • Largest independent regulator for all securities firms doing business in the U.S. • ~4,500 brokerage firms • ~163,500 branch offices • ~634,400 registered securities representatives Our Mission: Investor Protection. Market Integrity. Providing independent, vigorous regulation Educating & informing investors Computerized certification and continued education. Inviting active industry involvement & input Actively supporting Arial Body firms’ compliance efforts Copy Series 7, 63 …etc. American University Presentation Copyright 2011 FINRA 1 FINRA Open Source Projects ■ Increase Community Involvement ■ FINRA Open Source Projects • http://finraos.github.io/ ■ DataGenerator • http://finraos.github.io/DataGenerator/ ■ JTAF-ExtWebDriver • http://finraos.github.io/JTAFExtWebDriver/ American University Presentation Copyright 2011 FINRA 2 How to get involved. ■ Use it ■ Extend it • Fork it • Discuss idea – Open ticket – Google group discussion – [email protected] • Commit – DCO and ApacheV2 ■ Report bugs ■ Help document http://finraos.github.io/DataGenerator/ https://github.com/FINRAOS/DataGenerator American University Presentation Copyright 2011 FINRA 3 Agenda • What is the DataGenerator? • Demo. – Dependency Modeling – Pairwise Data Generation. • Current Limitations. • Re-architecture plan. • Questions American University Presentation Copyright 2011 FINRA 4 Video http://finraos.github.io/DataGenerator/ http://www.youtube.com/watch?v=Wxa1T0gp56k American University Presentation Copyright 2011 FINRA 5 Current Approach DataSpec Model Datasets Outputs ■ Two ways to describe and generate datasets • Equivalence Classes + Combinations • Dependency Model + Graph Coverage ■ Both use Apache Velocity to generate output from templates American University Presentation Copyright 2011 FINRA 6 Demo ■ Pairwise Combinations • Uses equivalence classes from DataSpec to populate datasets DataSpec ■ All Paths • Uses annotations from graphical model to populate datasets Model American University Presentation Copyright 2011 FINRA 7 Limitations of Current Approach ■ Limited set of graph annotations • Can only set variable values within model • No support for logic, pos/neg equivalence classes in current version • We need more powerful annotation ■ Logic often split across spec, model, and templates • Anything dynamic must be injected into Velocity template, as model and spec are both static • We need more dynamic evaluation ■ Performance considerations • Breadth-first enumeration doesn’t scale well as domain becomes more complex • We need more performant implementation American University Presentation Copyright 2011 FINRA 8 Re-architecting Data Generator ■ Replacing Visio with SCXML, an open standard to represent the state machine. <scxml xmlns="http://www.w3.org/2005/07/scxml" xmlns:cs="http://commons.apache.org/scxml" version="1.0" initial="start"> <state id="start"> <transition event="RECORD_TYPE" target="RECORD_TYPE"/> </state> <state id="RECORD_TYPE"> <!-- Mandatory --> <onentry> <assign name="var_out_RECORD_TYPE" expr="set:{a,b,c}"/> </onentry> <transition event="REQUEST_IDENTIFIER" target="REQUEST_IDENTIFIER"/> </state> . . . American University Presentation Copyright 2011 FINRA 9 Re-architecting Data Generator ■ SCXML Allows for complex modelling using embedded EL <state id="PRODUCT_TYPE_CODE"> <!-- Mandatory --> <onentry> <assign name="var_out_PRODUCT_TYPE_CODE" expr="#ProductTypeCode_Cycle"/> </onentry> <transition event="OPTIONS_SYMBOLOGY_IDENTIFIER" target="OPTIONS_SYMBOLOGY_IDENTIFIER" cond="${var_out_PRODUCT_TYPE_CODE=='Derivatives-Options'}" /> <transition event="OPTIONAL_SECURITY_SYMBOL" target="OPTIONAL_SECURITY_SYMBOL" cond="${var_out_PRODUCT_TYPE_CODE!='Derivatives-Options'}" /> </state> . . . American University Presentation Copyright 2011 FINRA 10 Re-architecting Data Generator ■ SCXML Allows for complex modelling: A state can be written as a state machine itself ■ We’re using apache commons-scxml in out POC American University Presentation Copyright 2011 FINRA 11 Re-architecting Data Generator ■ Overcoming memory issues by enhancing the all-paths algorithm, use DFS with minimal memory overhead American University Presentation Copyright 2011 FINRA 12 Re-architecting Data Generator ■ Short demo: <scxml xmlns=http://www.w3.org/2005/07/scxml xmlns:cs=http://commons.apache.org/scxml version="1.0" initial="start"> <state id="start"> <transition event="RECORD_TYPE" target="RECORD_TYPE"/> </state> <state id="RECORD_TYPE"> <onentry> <assign name="var_out_RECORD_TYPE" expr="set:{a,b,c}"/> </onentry> <transition event="REQUEST_IDENTIFIER" target="REQUEST_IDENTIFIER"/> </state> <state id="REQUEST_IDENTIFIER"> <onentry> <assign name="var_out_REQUEST_IDENTIFIER" expr="set:{1,2,3}"/> </onentry> <transition event="MANIFEST_GENERATION_DATETIME" target="MANIFEST_GENERATION_DATETIME"/> </state> <state id="MANIFEST_GENERATION_DATETIME"> <onentry> <assign name="var_out_MANIFEST_GENERATION_DATETIME" expr="#{nextint}"/> </onentry> <transition target="end"/> </state> <state id="end"> </state> </scxml> American University Presentation Copyright 2011 FINRA 13 Re-architecting Data Generator ■ Restructure the code to allow Hadoop Map Reduce and Giraph to operate on it. ■ Data Generator won’t itself directly depend on Hadoop or Girpah, but will abstract the following: • Input: Allow input from files • Execution: Allow the execution from a middle state provided input variables • Output: Allow outputs to different formats text files, several files, gz. The user will be able to extend the output to support: sequence files, redshift, hbase American University Presentation Copyright 2011 FINRA 14 Re-architecting Data Generator American University Presentation Copyright 2011 FINRA 15
© Copyright 2026 Paperzz