LDR User Manual
LDR User Manual
Copyright
© THE CONTENTS OF THIS DOCUMENT ARE THE COPYRIGHT OF LAVASTORM ANALYTICS LIMITED. ALL RIGHTS
RESERVED. THIS DOCUMENT OR PARTS THEREOF MAY NOT BE REPRODUCED IN ANY FORM WITHOUT THE WRITTEN
PERMISSION OF LAVASTORM ANALYTICS.
Disclaimer
No representation, warranty or understanding is made or given by this document or the information contained within it
and no representation is made that the information contained in this document is complete, up to date or accurate. In
no event shall Lavastorm be liable for incidental or consequential damages in connection with, or arising from its use,
whether LAVASTORM ANALYTICS was made aware of the probability of such loss arising or not.
LAVASTORM ANALYTICS
lavastorm.com
Page 1
Issue 1
LDR User Manual
Contents
1
DOCUMENT CONTROL ............................................................................................ 5
2
INTENDED AUDIENCE .............................................................................................. 7
3
OBJECTIVE ............................................................................................................... 7
4
SCOPE AND APPROACH ......................................................................................... 7
4.1 Overview ......................................................................................................................................................... 7
4.2 Current Process .............................................................................................................................................. 8
4.3 New Approach ................................................................................................................................................ 8
4.3.1
Architecture Overview
8
4.3.2
Supported Data Types
15
4.3.3
Primitive Data Type Support
16
5
INPUT SPECIFICATION .......................................................................................... 17
5.1 Specification Structure Elements ................................................................................................................. 17
5.1.1
DRIX – Top Level Element
17
5.1.2
Libraries
18
5.1.3
Including Other Libraries
20
5.1.4
Primary Field
21
5.1.5
Versioning
21
5.1.6
Namespaces
22
5.1.7
Using
23
5.1.8
Resolution of types with Namespace and Using
24
5.1.9
Overriding Types in Included Libraries
27
5.2 Fully Constructed Elements ......................................................................................................................... 32
5.2.1
Types
32
5.2.2
Fields
55
5.2.3
Flow Control & Data Structure Elements
70
6
OUTPUT SPECIFICATION ...................................................................................... 95
6.1 DROX – The top level element ..................................................................................................................... 95
6.2 Outputs ......................................................................................................................................................... 97
6.3 Mappings ...................................................................................................................................................... 97
6.3.1
Including Fields
99
6.3.2
Renaming Fields
102
6.3.3
Excluding Fields
107
6.3.4
Combining Multiple Mappings
113
6.4 Dumper & Dump Tags ............................................................................................................................... 125
6.5 Field Patterns.............................................................................................................................................. 127
6.5.1
What is a Pattern?
128
6.5.2
Traits
128
6.5.3
Test Composition
129
6.5.4
Quantifiers {m,n}
130
6.5.5
Regular Expressions (//)
131
6.6 Trigger Events ............................................................................................................................................ 131
6.6.1
Multiple Trigger Events and Mapping Unions
135
6.6.2
Output Suspension and Mapping Unions
136
6.6.3
Zero-Width Trigger Fields
138
6.6.4
Clearing Actions
139
LAVASTORM ANALYTICS
lavastorm.com
Page 2
Issue 1
LDR User Manual
6.6.5
Include TriggerProperties
140
6.7 Special Identifier Fields .............................................................................................................................. 140
7
ADVANCED CONCEPTS....................................................................................... 145
7.1 Program Flow ............................................................................................................................................. 145
7.2 Required Interfaces .................................................................................................................................... 148
7.2.1
TickerTape
148
7.2.2
LDRByteBufferInterface
150
7.2.3
ParserContext
150
7.2.4
Parser
151
7.2.5
ParserLog
152
7.3 Advanced Code Elements for Construction of New Types ........................................................................ 152
7.3.1
Test Method
156
7.3.2
Scan
159
7.3.3
Skip
162
7.3.4
SkipCount
164
7.3.5
Read
166
7.3.6
Code
168
7.3.7
Code Required for the Generator Tag
172
7.4 Code Generation ......................................................................................................................................... 174
7.4.1
Primitive Types
174
7.4.2
Standard Types
178
7.5 Performance Tuning DRIX Files ............................................................................................................... 180
8
LAE DATA READING INTERFACE ....................................................................... 183
8.1.1
9
Mapping of LDR Data Types to BRD
183
COMPATIBILITY .................................................................................................... 185
9.1 Self-Compatibility & Versioning ................................................................................................................ 185
9.2 Compatibility with other Software ............................................................................................................. 185
9.2.1
LAE 3.x and Earlier
186
9.2.2
LAE 4.x
186
10
ERROR HANDLING ........................................................................................... 187
10.1.1
10.1.2
10.1.3
10.1.4
10.1.5
10.1.6
10.1.7
10.1.8
10.1.9
11
Error Levels
Thresholding & Logging
Error Types
Exception Types
Error Filtering on Fields
Identifying Errors on Output Records
Failed Data Format Reporting
Parse Tracing
Incorrect Specification Reporting
187
189
193
214
215
219
222
227
231
LDR RESERVED CHARACTERS ...................................................................... 233
APPENDIX A SPECIAL DATA TYPE HANDLING ..................................................... 235
ASN.1 Data .......................................................................................................................................................... 235
Supported Encodings
235
Support for ASN.1 Constructs/Keywords
236
ASN.1 Specification Format
238
Comments and Special Encodings
238
COBOL Copybook Data ..................................................................................................................................... 239
Limitations of Support
239
LAVASTORM ANALYTICS
lavastorm.com
Page 3
Issue 1
LDR User Manual
APPENDIX B INDEX OF DRIX TAGS ........................................................................ 240
APPENDIX C INDEX OF DROX TAGS....................................................................... 241
APPENDIX D INDEX OF EXAMPLES ........................................................................ 241
APPENDIX E INDEX OF TABLES.............................................................................. 245
APPENDIX F
LAVASTORM ANALYTICS
lavastorm.com
INDEX OF FIGURES ............................................................................ 245
Page 4
Issue 1
LDR User Manual
1 Document Control
Date
Issue
Author
Change
09/07/2008
(DD/MM/CCYY)
15/07/2008
(DD/MM/CCYY)
17/09/2008
(DD/MM/CCYY)
Draft A
Tim Meagher
First Draft
0.0001
Tim Meagher
Added Primitive types to Appendix
0.1000
Tim Meagher
29/09/2008
(DD/MM/CCYY)
0.1001
Tim Meagher
07/10/2008
(DD/MM/CCYY)
0.1010
Tim Meagher
14/10/2008
(DD/MM/CCYY)
20/10/2008
(DD/MM/CCYY)
10/12/2008
(DD/MM/CCYY)
12/12/2008
(DD/MM/CCYY)
0.1011
Tim Meagher
0.1100
Tim Meagher
0.1101
Rocco Pigneri
0.1110
Tim Meagher
15/12/2008
(DD/MM/CCYY)
13/1/2009
(DD/MM/CCYY)
31/1/2009
(DD/MM/CCYY)
03/02/2009
(DD/MM/CCYY)
20/02/2009
(DD/MM/CCYY)
0.1111
Rocco Pigneri
0.10000
Rocco Pigneri
0.10001
Tim Meagher
0.10010
Tim Meagher
0.10011
Tim Meagher
02/03/2009
(DD/MM/CCYY)
0.10100
Tim Meagher
01/05/2009
(DD/MM/CCYY)
0.10101
Tim Meagher
Massive changes since last update. Input
specification format essentially complete &
vastly different to the original input
specification ideas. Many other changes as
well.
Section on Input Metadata complete. Still
requires review. Output Metadata section
begun.
Input & Output Specifications complete.
First self-review complete. Awaiting
further reviewing from co-authors &
contributors. All other sections still
incomplete.
First draft of LDR node design complete.
Draft sent out for review.
Started making changes from first review
meeting.
Updated Appendix D to contain new types
per design meeting on 10 December 2008.
Updated Table of contents to contain
Appendix D. Updated SkipType to allow
dynamic. Min and max now allowed as
attributes on while/repeats for static cases.
Extra information added to COBOL
appendix. Added changes to primitive type
definitions, allowing for return types to be
different than the parentType. Added tag
for extCode. Updated some of the advanced
code section.
Updated Appendix D to reflect changes
from 12/12/2008 meeting.
Updated Appendix D to unifiy terminology
with the rest of the document.
Updated input section for tag changes and
ensured examples consistent with spec.
Fixed and finished Advanced Concepts
section.
Fixed generator examples, added new
details specifying test allowed in loops.
Added details of using super-args and userdefined read scan and skip methods. Added
exception handling section.
Added Section “Resolution of types with
Namespace and Using”. Added new Error
Types & Error Level
Changes to output specification section.
Fixed formatting, changed incorrect
examples & added changes to remove old
tags etc.
LAVASTORM ANALYTICS
lavastorm.com
Page 5
Issue 1
LDR User Manual
12/05/2009
(DD/MM/CCYY)
0.10110
Tim Meagher
04/08/2009
(DD/MM/CCYY)
0.9
Tim Meagher
10/08/2009
(DD/MM/CCYY)
24/11/2009
(DD/MM/CCYY)
1.0
Tim Meagher
1.01
Tim Meagher
01/12/2009
(DD/MM/CCYY)
1.9
Tim Meagher
08/02/2010
(DD/MM/CCYY)
04/05/2010
(DD/MM/CCYY)
10/05/2010
(DD/MM/CCYY)
23/09/2010
(DD/MM/CCYY)
14/10/2010
(DD/MM/CCYY)
15/03/2011
03/08/2011
1.91
Tim Meagher
1.92
Tim Meagher
1.9.3
Tim Meagher
1.9.3
Tim Meagher
2.0
Tim Meagher
2.0.9.0.1
2.0.9.0.2
Tim Meagher
Tim Meagher
04/08/201
2.0.9.0.3
Tim Meagher
04/08/2011
2.0.9.0.4
Tim Meagher
05/08/2011
2.0.9.0.5
Tim Meagher
14/08/2011
26/08/2011
28/02/2012
2.0.9.0.6
2.0.9.0.7
2.0.9.0.8
Tim Meagher
Tim Meagher
Tim Meagher
LAVASTORM ANALYTICS
lavastorm.com
Updated the Input Specification section.
Fixed for tag renaming.
Fixed output spec examples and Advanced
Concepts for tag renaming.
Removed old output specification sections.
Added onNoMatch attributes. Updated
LAE interface section.
Added information on clearing actions, and
zero-width triggers.
Removed appendicies for library types. The
new DrixDoc functionality will document
the types & is more maintainable.
Renamed to functional spec, added
information for templating enhancements,
emittable fromParent & new code
attributes. Removed some of the LAE data
reading interface as the nodes will self
document.
Included information on ASN.1 converter
limitations, and unhandled keywords
Added information on include overrides,
drix source=”…”, new error codes.
Added information on error filtering &
until=”nextField”
Renaming to “User Manual”
Updating for Nested drox mappings with
base.
Allowing multiple test tags within a type
Updating relId description, updating to
allow not, or, and, test under test->not,
rather than just test.
Many minor changes – typos, incorrect
cross references etc.
Adding information for using the test>method tag combination
Documenting the hideParam element &
param->private attribute.
Adding the filename specialField
Adding the bytePosition specialField
Updating information for field occurrence
constraints within loops, removing some of
the documented limitations on ASN.1
converter. Changed some references,
BRAIN->LAE, VCDR->LTW.
Page 6
Issue 1
LDR User Manual
2 Intended Audience
This document is intended for users of the Lavastorm Data Reader (LDR), who require more
technical information than is provided in the LDR Tutorial document. For simple data formats, the
information in the tutorial document should be sufficient.
In general, this document should be used as a reference guide for all of the constructs available for
use in the LDR. The inbuilt primitive data types are expected to be sufficient to handle most data
formats. Therefore, for most users, sections 3 through 6 should provide the required information to
construct specifications to read these data formats. In order to construct input specifications using
the LDR, the user will need to have some XML knowledge.
For more difficult formats, requiring specialised primitive types the reader will need to understand
the topics raised in the discussion on Advanced Concepts in section 7. In order to correctly create
new primitive types, the user will need to have a thorough understanding of the operation of the
LDR, and also have some java experience.
3 Objective
The objective of this project is to develop a data reading tool that is easy to use and configure. The
tool must handle the vast majority of the different file formats and data encodings that are
currently used in industry (specifically, but not limited to, the telecommunications industry). The
aim is to provide a component which allows the LAE to easily accept data from a wide variety of
sources without any custom/bespoke development, with only minimal configuration changes
required.
Individual data files may have a vast array of different encodings contained within them. In
general this is done in order to reduce the overhead in transmitting and storing data. These formats,
while ideal for machines to read, are practically incomprehensible to a person. In addition to this,
data files may have very complicated nested structures. These are often ideal for transmission and
provide all of the contextual information required to understand the contained data. However, this
nested composition is highly unsuitable for analyzing the data and auditing systems.
For this reason, the Lavastorm Data Reader (hereafter referred to as the LDR) aims to provide a
flexible and easy to configure tool that is able to read in - possibly complex and heavily nested data from well-defined, structured files, with data encoded in any of a wide array of different
encoding formats, and then to provide a set of human-readable flat records with the data from the
file.
4 Scope and Approach
4.1 Overview
There are a large number of different file formats that are used by different vendors within
Telecoms (and other fields) that are required to be read by clients using the LAE and LTW. This
aim of this project is to deliver a tool which allows for the reading of complex data formats, such
that the resulting data is in a format that the LAE understands.
LAVASTORM ANALYTICS
lavastorm.com
Page 7
Issue 1
LDR User Manual
While the engine which performs this task must be able to work independently of the LAE, the
purpose of this document is to outline both how the engine works, and details the interface for
LAE users.
4.2 Current Process
Currently, there is no standardised manner for reading in data at customer sites. In some situations,
existing LAE nodes can handle reasonably simple data formats. However, for more complex data
formats, custom Python nodes are required for LAE users.
At LTW customer sites, custom readers are generally developed using SAS readers. There has
been some custom work for customers previously on a generic data reading tool, however as this
tool was specifically written for one customer, it could not be used across all customer sites.
Furthermore, this tool was found to be inadequate to cover all of the data formats required.
4.3 New Approach
The aim of this project is to provide a flexible and generic solution to the process of reading data
for use by LAE and LTW clients. The solution must also have a clear and simple user interface
such that the significant overheads associated with the addition of new data streams is reduced.
4.3.1 Architecture Overview
The LDR Engine is driven through the use of xml specification files which are used to read the
input data file(s). A graphical view of the architecture is provided in Figure 1. In terms of this
overall architecture, we have a number of key components. The major components are discussed in
significant detail in later sections of this document. In this section, we provide a brief description
of each of these components and how they interact.
LAVASTORM ANALYTICS
lavastorm.com
Page 8
Issue 1
LDR User Manual
LAVASTORM ANALYTICS
lavastorm.com
Page 9
Issue 1
LDR User Manual
Figure 1 LDR Architecture
LAVASTORM ANALYTICS
lavastorm.com
Page 10
Issue 1
LDR User Manual
4.3.1.1 Major Components
LDR Engine
The LDR engine itself is the program which takes an input specification, an
output specification and some input files, and is able to use these
specifications to read the data file and map this to a set of flat record formats.
Input Specification (DRIX) XML File
The input specification XML file containing the metadata of the data files to
be read. This is also referred to as a Data Reader Input XML specification
(DRIX) file, where the DRIX acronym is also used as the file extension.
The input specification describes the fields that are present in the data, in what
order they appear, and how they are encoded. In general, these fields contain
other nested fields such that a specification is simply a composition of a
number of different field types.
For example, in a simple case, we could have a file containing:
o A file header, containing a number of String fields
o A set of records, where each record contains a number of String fields
o A file trailer, containing a number of String fields
In this case, we would define a base String type, then construct a file header,
file trailer and record field each containing a number of these String types. We
would then define a “file” field which combines a file header, file trailer and a
number of records. A Primary Field (see section 5.1.4) is always provided in
the input specification XML file, which tells the LDR Engine how to start
reading the file, and which field appears first in the input data file. In our
example, the “file” field would be the primary field.
In order to allow for re-use of common primitive data fields and encodings
(e.g. String), and also allow for re-use of common composite types, the input
specification XML file can also reference other Library XML Specifications.
The input specification XML file (and all library XML specifications) must
have a valid format according to the Input Specification XSD. A detailed
description of how these input & library specifications are composed is
contained in section 5.
The input specification must be in an XML document encoded in UTF-8,
where the first line should generally be:
<?xml version="1.0" encoding="UTF-8"?>
When using the LDR via the LAE interface, the input specification is either
specified in the node, or automatically generated from the parameters of the
node.
LAVASTORM ANALYTICS
lavastorm.com
Page 11
Issue 1
LDR User Manual
Library XML Specifications
While the input specification XML file provides the specific encoding and
format information for individual file formats, any input specification can rely
on using the types defined in re-usable library xml specification files. These
files have exactly the same format as an input xml specification file and must
conform to the input specification XSD. These are also referred to as Data
Reader Input XML specification (DRIX) files, where the DRIX acronym is
also used as the file extension.
These library xml specifications must reside on the server where the LDR
engine is located, and their location is provided to the LDR engine by
specifying the library search path in the config parameters provided to the
engine. In addition, the primary purpose of the library xml specification is not
to provide specific information for individual file formats, rather, it is to
provide a set of field types and encoding information that can then be re-used
and built upon by an input specification XML file.
A detailed description of how these library specifications and the input
specification are composed is contained in section 5.
Input Specification XSD
An XML Schema Definition (XSD) is a file, written in XML format which
describes the structure and layout of XML files. XSD files are used to validate
and parse a set of XML documents. Therefore, the input specification XSD
defines the content allowed in an input specification XML file, including the
allowed tags/elements and attributes and the number and order of these XML
components.
All input specification XML files and library xml specifications are validated
against the input specification XSD. The input specification XSD must be
accessible from the server where the LDR engine runs. The LDR input
specification XSD can be found in the conf/ldr/xsd directory in the LDR
installation location.
Output Specification (DROX) XML File
The output specification XML file describes the manner in which an input data
file should be output after it is processed. The output specification XML file
references fields that are defined in the input specification, and therefore these
two files must be consistent with each other in terms of the fields that they
define. These files are also referred to as Data Reader Output XML
specification (DROX) files, where the DROX acronym is also used as the file
extension.
LAVASTORM ANALYTICS
lavastorm.com
Page 12
Issue 1
LDR User Manual
The output specification XML file must have a valid format according to the
Output Specification XSD. A detailed description of how these output
specifications are composed is contained in section 6. When using the LDR
via the LAE interface, the output specification is specified in the LAE node.
The output specification must be in an XML document encoded in UTF-8,
where the first line of the file should be:
<?xml version="1.0" encoding="UTF-8"?>
Output Specification XSD
An XML Schema Definition (XSD) is a file, written in XML format which
describes the structure and layout of XML files. XSD files are used to validate
and parse a set of XML documents. Therefore, the output specification XSD
defines the content allowed in an output specification XML file, including the
allowed tags/elements and attributes and the number and order of these XML
components.
All output specification XML files are validated against the output
specification XSD. The output specification XSD must be accessible from the
server where the LDR engine runs.The LDR output specification XSD can be
found in the conf/ldr/xsd directory in the LDR installation location.
Data File(s)
The data files are the files that are to be read by the LDR. The format of these
files must be specified in the input specification XML. The LDR can only
operate on input files, and not data from other sources (e.g. database input).
The data files must reside on ther server where the LDR engine is located.
Config Parameters
In addition to the input and output specifications, there is some extra
information required to configure the LDR engine. This configuration is
provided through a set of parameters and includes specification of the level of
error handling and reporting, the mechanism for error handling and reporting
and the location of any libraries on the server (through the use of library
search paths).
When operating the LDR through the LAE interface, these config parameters
are specified in a LAE node, and are detailed in section 8.
LAVASTORM ANALYTICS
lavastorm.com
Parsed Flat Records
Page 13
Issue 1
LDR User Manual
The parsed flat records are the output from the LDR engine. In general, the
engine will produce records that are accessible via the engine API. It is up to
the program using the API to choose what to do with these records.
The following section details how these are used by the LAE nodes that
interface with the LDR engine.
LAE Interface to the LDR Engine
The LAE interface to the LDR provides a mechanism for the operation of the
LDR within the LAE. There is a base LDR node in the LAE which allows
users to provide the input specification, output specification, config
parameters, and the location of the data files to process.
In some of the higher-level nodes, the input specification, and some of the
config parameters are hidden from the user as these are automatically
generated based on other parameters provided. A full description of the LAE
interface is provided in section 8.
When using the LAE interface to the LDR, there are two options for the
outputs. They can be:
1. Written to file, with each output having a separate file containing a set
of records (these files may all be referenced in one node output pin)
2. Written to an LAE node output pin, with each output having a separate
output pin containing a set of records. Depending on the version of the
LAE and the configuration of the LAE server, these outputs can either
be:
a. Written to a tmp file
b. Streamed to the next node
c. Passed in memory to the next node
4.3.1.2 Typical LDR Operation
Figure 1 shows an LAE interface node using the LDR. The operation flow depicted in
this figure is the following:
1. The user supplies an input specification, an output specification, data file
locations and some configuration parameters (including library search path) to
an LAE node.
2. The LAE node executes on the server. It instantiates the LDR engine with the
config parameters, and the specifications provided by the user.
3. The engine parses the input & output specification XML files and validates
them against the XSD’s located on the server.
4. For each of the libraries referenced by the input specification, the engine
locates the corresponding library XML file, parses it and validates it against
the input XSD.
5. The LAE node provides the LDR engine with the set of data files to read.
6. The LDR engine reads each of the files, and produces flat-record format data
either to file, or directly back to the LAE node.
LAVASTORM ANALYTICS
lavastorm.com
Page 14
Issue 1
LDR User Manual
7. All of the parsed records are complete, and all files have been read. The LDR
engine and the LAE node terminate successfully.
4.3.2 Supported Data Types
Due to the nature of the data reader, it is difficult to provide a complete list of formats
it handles. The types of data that are able to be supported by the LDR are clearly
dependant on the manner in which we allow data to be specified. For example, a
specification language that does not allow for the user to describe that the length/size
of a field depends on a value read in another field clearly restricts the supported data
types. Therefore, the most complete and accurate definition of what data types and
file formats are supported in the LDR are those that can be described in an input
specification that conforms to the input specification XSD. However, this isn’t really
a particularly useful definition.
At the very high level, the LDR supports well-structured, deterministic file formats
with the following properties:
1. Can be composed of primitive types specified in the LDR DrixDoc Type API.
2. The file structure is deterministic – i.e. there is never a case whereby it cannot
be determined if a field is of type A or B. This means
a. The lengths & types of all of the data fields & nesting structure of the
data is known at the time of writing the specification OR
b. The lengths, types and nesting structure of any data fields that are
unknown at the time of writing the specification can be determined
from the contents of the data using rules that are known at the time of
writing the specification.
3. The input to the LDR Engine is from a file, and no other source (database etc)
The formats implied by this definition are still required to be described in an input
specification file, therefore it is clear that any formats which conform to the above
rules, but cannot be specified in the specification XML cannot actually be handled by
the LDR.
There is significant predefined support for complex data structures using the built in
libraries provided with the LDR. For cases with complex structures that are not
supported by the built in libraries, the user may be required to construct additional
structured types in a LDR xml library. The more complicated & non-standard the data
types and data structure, the more configuration that is going to be required to read the
data.
There are two file formats with their own specific support, and these are COBOL
copybook specified data, and ASN.1 data. The reason these are handled differently is
because both of these are very common formats, and have their own specification
language. Therefore, rather than requiring that a separate LDR input specification be
written, there is support for the user to simply copy and paste an existing ASN.1
specification or COBOL copybook into a node, and have the input XML specification
automatically generated. These types are discussed in more detail in the LDR
DrixDoc Type API and the LAE nodes that use the converters have significant
documentation themselves.
LAVASTORM ANALYTICS
lavastorm.com
Page 15
Issue 1
LDR User Manual
The LDR is not intended to be used for reading unstructured data files (such as screen
scrapes, unstructured log files etc). It is possible that the LDR may be able to handle
specific cases of these files, however it is not the intended purpose, and should be
avoided where other options are available.
4.3.3 Primitive Data Type Support
The default LDR installation provides support for a wide variety of different primitive
fields. Details of these primitive fields are outlined in the LDR DrixDoc Type API. A
primitive field is defined as any field which can be referenced in the input metadata,
and does not contain any further subtypes. For example, sequence and choice types
can be referenced in the input metadata however these types contain other fields.
Integer, Binary, String, UTF-8 etc types are primitive types, as the lower level
implementation actually specifies how the data is to be parsed. Users can also
construct their own low-level atomic types using code snippets if the suite provided in
the base LDR libraries proves insufficient.
LAVASTORM ANALYTICS
lavastorm.com
Page 16
Issue 1
LDR User Manual
5 Input Specification
The most concise and formal definition of the input specification can be found in the
InputSpecification.xsd. This is located in the conf/ldr/xsd directory at the LDR install
location. This XSD is used to parse all input specifications. However, since an XSD
cannot describe all of the restrictions that we place on the xml, there are additional
constraints that are specified in comments in the XSD, and described in this
document. These constraints are validated outside of the normal XSD validation.
Descriptions of the tags used, and examples on general usage follow.
Note that when writing XML there are reserved characters, such as “<”,”>” etc.
Therefore, if you want to set an expression in the XML, which, for example, contains
a less than operator, you will need to use the appropriate reserved entity reference
(e.g. < for <, > for >).
Whenever a multi-line block is being used (.e.g the readMethod, testMethod etc tags),
then the
<![CDATA
Insert your text here
]]>
Notation should be used.
Unless otherwise specified, all attributes should conform to the following regular
expression:
Example 1 – General rule on allowable attribute patterns
attribute = [a-zA-Z_]+[a-zA-Z_0-9\.]*
With additional restrictions placed on the allowable words, as specified in section 11.
5.1 Specification Structure Elements
5.1.1 DRIX – Top Level Element
Each input specification XML file contains a root tag “drix”. This tag contains the
entire specification declaration and is primarily used to ensure that there is one
root element, such that the specification is in well-formed XML. The extension of
the input specification files is also DRIX, which stands for “Data Reader Input
XML specification”.
DRIX Tag 1 drix
<drix>
Description
Position
Attributes
LAVASTORM ANALYTICS
lavastorm.com
Root level tag containing the input specification.
Required root tag of the document.
Optional URL source attribute
Optional String file attribute
Note that the file & source attributes are mutually exclusive, and
Page 17
Issue 1
LDR User Manual
Elements
when either of these attributes are present, no other elements are
allowed to exist within the drix tag.
0..* requires elements (see
0..* include elements.
0..1 library elements.
0..1 primaryField elements.
All requires tags must appear before the first include element.
The first included library will be the first one searched of the include
list, includes can only override includes declared prior to them. All
include elements must appear before any library or primaryField
element. If a library element is present, it must appear before any
primaryField element.
There are three basic forms of DRIX – one that simply references another DRIX
via the source attribute, one that simply references another DRIX via the file
attribute or one that contains all of the necessary sub elements. When the source
attribute is present, it must be a correctly formatted URL specifying the location
of a DRIX (usually a file) which is accessible from the location where the LDR is
run. Example 2 shows an example of a DRIX using the source attribute.
Example 2 - DRIX tag example with a source attribute.
<?xml version="1.0" encoding="UTF-8"?>
<drix source="file:/C:/tmp/lib.drix"/>
Similarly, when using the file attribute, the file must specify the name of a DRIX
file which is accessible from the location where the LDR is run.
Example 3 shows how the file attribute can be used to construct a DRIX.
Example 3 - DRIX tag example with a file attribute.
<?xml version="1.0" encoding="UTF-8"?>
<drix file="C:/tmp/lib.drix"/>
Example 4 shows how an XML input specification is structured, with the contents
left out of subtags of drix.
Example 4 - DRIX tag example.
<?xml version="1.0" encoding="UTF-8"?>
<drix>
<include library="BaseLib" minimumVersion="000.000.000.001">
<library name="MyLibrary" version="000.000.000.001">
…
</library>
<primaryField type="myFile"/>
</drix>
5.1.2 Libraries
Each input specification xml file that is written must contain either a library tag or
a primaryField tag or both tags. Library tags are containers for the set of defined
types that are able to be referenced in other input specification files.
LAVASTORM ANALYTICS
lavastorm.com
Page 18
Issue 1
LDR User Manual
DRIX Tag 2 library
<library>
Description
Position
Attributes
Elements
A Library tag is used to encapsulate the set of types that are able to
be referenced by the primaryField, or able to be used by DRIX files
that include this library specification. These types appear under the
namespace tag so as to prevent clashes in the namespace and
allow for type names to be context dependant.
Optional tag, contained within/under the root-level drix tag.
Follows immediately after any includes tags and before the
optional primaryField tag.
Maximum of 1 per specification.
If no primaryField tag is present, a library tag must be in
the specification.
Required name attribute
Required version attribute (see Versioning in section 5.1.5
1..* Namespace elements (see Namespaces in section
5.1.6)
In cases where the input specification XML file is being used as an actual input
specification (as opposed to a library), this library tag can also appear. In cases
where this is a library specification, then the library tag must appear for the library
to be useful. By defining every specification as a library, we allow for seamless
nesting of individual files within larger file structures. For example, consider
scenario listed below:
Scenario 1 - Data Packaging
ASN.1 records are cut at a telephone exchange E1 whenever a new
call comes through the exchange, or an ongoing call exceeds a
fixed duration.
There is a control mechanism C1 which receives these records, and
at regular intervals during the course of a day, packages the
records into a file, pre-pending a header and appending a trailer
at the end of the record set.
There is another control mechanism C2 further down the line which
stores the files from C1, and archives them every day, again prepending its own header and appending its own trailer around the C1
data.
An end to end audit of this system is to be performed, verifying
that all records from E1 are contained in the files from C1 and
all files from C1 are packaged in the C2 archive.
In this case, we could use 3 xml libraries. The first ASN1 library would define the
format of the records being cut from the exchange. The second library would
define the manner in which the records are packaged by C1, and reference the
ASN1 library. The third library would define the manner in which files are
packaged by C2, and reference the C1 library. It is easy to see that in this
situation, treating every specification as a library allows for easy extension to
handle the data nesting that typically occurs in complex communications &
control systems.
The library tag must follow immediately after any declared includes fields and has
the following syntax:
Example 5 – library tag example
<library name="libraryName " version="versionNum">
…
</library>
LAVASTORM ANALYTICS
lavastorm.com
Page 19
Issue 1
LDR User Manual
The version attribute is required, and its use is described in Versioning.
The name of the library must be the same as the filename of the library (without
the path and “.drix” extension).
5.1.3 Including Other Libraries
Include tags are used to load other DRIX files.
DRIX Tag 3 include
<include>
Description
Position
Attributes
Used to include other XML input/library specification files in order to
re-use the types defined in these files.
Optional tags, contained within/under the root-level drix
tag.
If present, any number of includes tags can follow
immediately after and within the drix tag, prior to the
declaration of any library or primaryField tags.
0..* include tags can be present in any specification.
The order of the include tags are important when type
overriding is performed. Types within an include can only
override types in include elements defined prior to them.
Elements
Required library attribute
Optional overrides attribute (see in Overriding Types in
Included Libraries section 5.1.9)
Optional minimumVersion attribute (see Versioning in
section 5.1.5)
Optional maximumVersion attribute (see Versioning in
section 5.1.5)
None
The include tag exposes any types defined in the included library. This makes the
types accessible within the file which includes them. Includes operations are not
recursive. For example, if we have:
Specification File L1 includes the library in L2
Specification File L2 includes the library in L3
Then the types defined in library L2 are available for referencing & extension
within L1. This includes any types in L2 that reference types in L3. However,
types declared in L3 are not able to be directly extended or referenced in L1.
Any types subsequently referenced from an included library must include the fully
qualified name of the type (dependant on the use of Using tags see section 5.1.7).
The syntax of the include tag is:
Example 6 – include tag example
<include library="libraryName" minimumVersion="minVersionNumber"
maximumVersion="maxVersionNumber"/>
For a library to be successfully included, the name of the DRIX file must be
prefixed with the name of the library. For instance, to include a library named
“libraryName”, there must exist a file in the library search path named:
libraryName*.drix which also declares a library tag, with the name libraryName.
LAVASTORM ANALYTICS
lavastorm.com
Page 20
Issue 1
LDR User Manual
5.1.4 Primary Field
The primary field specifies to the LDR engine which field is to be read first in the
data. There can only be one primary-field, therefore, this field must contain
sufficient information to read an entire file. In libraries, which are simply used to
contain type definitions, the primaryField tag should not be present. However, in
any specification which is used to read a file, the primary field specifies the top
level field (i.e. file structure) that we are looking for in the data. The primary field
can be anonymous (see Anonymous Fields in section 5.2.2.11) to allow for
multiple top-level fields.
DRIX Tag 4 primaryField
<primaryField>
Description
Position
Specifies to the LDR engine which field is to be read first in the data.
Optional tag, contained within/under the root-level drix tag.
Follows immediately after the declaration of any library
tags, and lies under the drix tag.
0..1 primaryField tags allowed per specification.
If no library tag is present, a primayField tag must be in the
specification.
Attributes
None
Elements
Optional name attribute
Required type attribute
The type attribute of the primary field must be qualified with any namespace
applied to the corresponding type in the library declaration where the type is
found. An example of a primary field declaration is:
Example 7 – primaryField tag example
<primaryField name="file" type="MySpecificationNamespace.FileType"/>
5.1.5 Versioning
Versioning is used to indicate whether or not a particular version of a library is
acceptable and able to be included into another library using the include tag.
Libraries must specify a version number. Includes tags are able to optionally
specify a minimum and maximum version number. Therefore, we have the
following possibilities:
Table 1 - Includes & Library version matrix
Includes Min Version
a
Includes Max Version
b
Library Version
c
a
Unspecified
Unspecified
Unspecified
b
Unspecified
c
c
c
Outcome
Library loaded if
a<=c<=b
Library loaded if a<=c
Library loaded if c<=b
Library loaded
If we have a situation whereby multiple libraries satisfy the condition to be
loaded, the library that is located first in the search path is loaded & used.
LAVASTORM ANALYTICS
lavastorm.com
Page 21
Issue 1
LDR User Manual
This means that the directory specified first in the library search path will be
searched first to obtain a correct specification. If two or more specifications in the
same direcrory in the search path match the requirements of the specification, then
one of the libraries will be loaded, however the behavior determining which
library is underfined.
The versions provided by the include and library tags are simply strings.
Therefore string comparison operators are used to identify if one version <=
another version. This means that:
Version “A” < Version “B”
Version “1” < Version “2”
Version “1” < Version “11”
However, it also means that:
Version “11” < Version “2”
Version “LOWERCASE” < “lowercase”
Therefore, care needs to be taken when specifying the versions in a library.
For this reason, it is recommended that a 4-level, 3-digit, dot separated, most
significant revision first versioning standard is used in all libraries, ranging from:
000.000.000.001 – 999.999.999.999.
This is not enforced, and users are allowed to implement more human-readable
version numbers if they chose. However this mechanism should ensure sufficient
space for future versioning and remove any of the issues highlighted above with
the String based version comparison operators.
The LDR uses the java String compareTo method to perform the comparison,
therefore adopts all of the properties of this method. For reference, see the java
String API at
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#compareTo(java.lan
g.String)
5.1.6 Namespaces
Namespaces provide context to the types declared in a library. A namespace is an
abstract container allowing disambiguation of items having the same name
(residing in different namespaces). For example, we could have two types
“String”, one residing in the “BaseLib” namespace, and another in the
“MyCustomTypes” namespace. In this case, we would reference either the
“BaseLib.String” or the “MyCustomTypes.String” type in order to obtain the
implementation we need.
DRIX Tag 5 namespace
<namespace>
Description
LAVASTORM ANALYTICS
lavastorm.com
Container providing the naming resolution for the contained types
Page 22
Issue 1
LDR User Manual
Position
Attributes
Elements
when referenced externally (from tags under another namespace
tag).
Either
Within a library tag (1..* occurrences)
Within another namespace tag (0..* occurrences)
Required name attribute
Optional package attribute – this will resolve to a java
package.
Optional overrides attribute (see in Overriding Types in
Included Libraries section 5.1.9)
0..* namespace tags
0..* using tags (see Using in section 5.1.7)
0..* primitiveType tags (see Primitive Types in section
5.2.1.2)
0..* type tags (see Standard/Constructed Types in section
5.2.1.1)
0..* generatedType tags (see Dynamic Type Generation in
section 5.2.1.9)
The order of these contained elements is unimportant.
In essence namespaces are simply done to allow for multiple types with the same
name with different scope resolutions. Namespaces can be nested to any level.
Namespaces must not contain any dots, as these are used to construct an absolute
namespace from relative namespaces.
In Example 8 we would reference the Tag type as ASN1.BER.Tag.
Example 8 – namespace tag example
<namespace name="ASN1" package="com.lavastorm.ldr.ASN1">
<namespace name="BER" package="com.lavastorm.ldr.ASN1.BER">
<type name="Tag">
…
<type/>
</namespace>
</namespace>
The package attribute is the name of the java package within which the
dynamically constructed java Parser class for the types within the namespace will
be placed. Whether or not this is used as the complete package name, or if this
package is simply included as part of the overall package name is undefined and
subject to change.
5.1.7 Using
The using tag specifies which namespace is used in a given section of the
specification.
DRIX Tag 6 using
<using>
Description
Position
LAVASTORM ANALYTICS
lavastorm.com
Specifies that within this tag, all types referenced in field
construction (not type definitions) and not defined in the current
namespace, are first searched for in the namespace referenced in
the using tag.
Either
Within a namespace tag (0..* occurrences)
Page 23
Issue 1
LDR User Manual
Attributes
Elements
Within another using tag (0..* occurrences)
Within a type tag (0..* occurrences)
Required namespace attribute
Allowable elements are based on the containing tag.
If a using tag is nested under a namespace tag:
0..* using tags (see Using in section 5.1.7)
0..* primitiveType tags (see Primitive Types in section
5.2.1.2)
0..* type tags (see Standard/Constructed Types in section
5.2.1.1)
0..* generatedType tags (see Dynamic Type Generation in
section 5.2.1.9)
Where order is not important.
If a using tag is nested under a type tag:
0..* Structure definition elements (see Flow Control & Data
Structure Elements in section 5.2.3)
0..* field tags (see Fields in section 5.2.2)
Where order is important
If the using tag is nested under another using tag, the the elements
allowed are based on the first non-using (namespace or type) tag
directly up the XML tree that contains the nested using tags –
except that a using tag cannot contain a namespace tag.
Using tags are always written in the fully qualified manner, with the notation
namespace1.namespace2 indicating that it is using the namespace of namespace2
nested under namespace1.
Using tags can be nested, but in this manner, it simply provides an order of
resolution, and not a nested resolution.
Example 9 – using tag example
<namespace name="MyNamespace">
<using namespace="asn1">
<using namespace="asn1.ber">
…
</using>
</using>
</namespace>
In Example 9, we are saying that for any types which are referenced, we should
attempt to resolve them first in the asn1.ber namespace. If the type cannot be
resolved in this namespace, then the asn1 namespace should be searched. This
does not say to search the asn1.asn1.bernamespace. The Using tags are fully
qualified and absolute. Therefore, this also does not say to search the
MyNamespace.asn1.ber namespace.
5.1.8 Resolution of types with Namespace and Using
When declaring a new type, the type is placed within the containing namespace.
When referencing a type from a field, however, both the namespace and using
tags are used in order to locate the referenced type. If referencing a type with the
LAVASTORM ANALYTICS
lavastorm.com
Page 24
Issue 1
LDR User Manual
absolute qualified name of the type, then you can use a type prefixed with a “.”
character.
Table 2 – Naming Resolution Rules.
Naming Resolution Rules
When resolving the type of a field, the following steps are taken:
1. If the referenced type begins with a “.”
a. Search the current library for the type with the fully qualified name as
specified in the “type=” attribute, with the leading “.” Character removed from
the type name.
b. If not found, search all libraries included by the current library for the fully
qualified name as specified in the “type=” attribute, with the leading “.”
character removed.
c. If not found, throw TypeNotFoundException
2. Else try current namespace
a. Search the current library and current namespace for the type referenced in
the field. This is a search relative to the current namespace, so a search for
the type “Bar.Baz” under namespace “Foo” will search for the fully qualified
type “Foo.Bar.Baz”
b. If not found search all libraries included by the current library for the type
referenced in the field. This is a search relative to the current namespace, so
a search for the type “Bar.Baz” under namespace “Foo” will search for the
fully qualified type “Foo.Bar.Baz”
c. If not found, goto 3
3. Try using tags
a. Search the current library and most local using tag for the type referenced in
the field. This is a search relative to the most locally scoped using tag, so a
search for the type “Bar.Baz” under using tag “Foo” will search for the fully
qualified type “Foo.Bar.Baz”
b. If not found search all libraries included by the current library for the type
referenced in the field. This is a search relative to the most locally scoped
using tag, so a search for the type “Bar.Baz” under using tag “Foo” will
search for the fully qualified type “Foo.Bar.Baz”
c. If not found, and more using tags exist, continue step 3 with the next most
locally scoped using tag. Else goto 4
4. Search in the top level namespace
a. Search the current library for the type with the fully qualified name as
specified in the “type=” attribute.
b. If not found, search all libraries included by the current library for the fully
qualified name as specified in the “type=” attribute
c. If not found, throw TypeNotFoundException
Consider another example where we are referencing and defining fields within a
using tag.
Example 10 – using tag example part 2
<namespace name="asn1">
<type name="Tag">
…
</type>
<type name="Tag2">
…
</type>
<type name="SpecialTag">
…
</type>
<type name="DotPrefixTest">
…
</type>
<namespace name="ber">
<type name="Tag">
LAVASTORM ANALYTICS
lavastorm.com
Page 25
Issue 1
LDR User Manual
…
</type>
<type name="Tag2">
…
</type>
</namespace>
</namespace>
<namespace name="TopLevel">
<type name="Tag">
…
</type>
<type name="LastResort">
…
</type>
</namespace>
<namespace name="NewNamespace">
<type name="Tag">
…
</type>
<using namespace="asn1">
<using namespace="asn1.ber">
<type name="CompositeTag">
<field name="tag1" type="Tag">
<field name="tag2" type="Tag2">
<field name="tag3" type="SpecialTag">
<field name="absolute" type=".asn1.DotPrefixTest">
<field name="relative" type="asn1.DotPrefixTest">
<field name="last" type="TopLevel.LastResort">
</type>
</using>
</using>
<namespace name="asn1">
<type name="DotPrefixTest">
…
</type>
</namespace>
</namespace>
InExample 10, we end up with the following types declared:
asn1.Tag
asn1.Tag2
asn1.SpecialTag
asn1.ber.Tag
asn1.ber.Tag2
NewNamespace.Tag
NewNamespace.CompositeTag
asn1.DotPrefixTest
NewNamespace.asn1.DotPrefixTest
TopLevel.LastResort
TopLevel.Tag
Therefore, we see in the case of NewNamespace.CompositeTag that even though
we declare the CompositeTag type under two using tags, using it does not apply to
type definitions. However, as discussed previously, the using tag applies to type
referencing in field definitions. Therefore, within the
NewNamespace.CompositeTag, we will have the following:
tag1 of type NewNamespace.Tag
LAVASTORM ANALYTICS
lavastorm.com
Page 26
Issue 1
LDR User Manual
tag2 of type asn1.ber.Tag2
tag3 of type asn1.SpecialTag
absolute of type asn1.DotPrefixTest
relative of type NewNamespace.asn1.DotPrefixTest
last of type TopLevel.LastResort
Here, we see that the using tag applies to resolving the referenced types. If the
field type is specified in an absolute manner (with leading “.”), then the top-level
namespace is searched. If a type is defined within the local namespace then this
overrides any using statements. If, on the other hand, the type cannot be resolved
in the local namespace, then the namespace of the directly enclosing using tag is
searched. If the type cannot be found in this namespace, then the namespace of the
next using tag further up the tag hierarchy is searched and so on.
5.1.9 Overriding Types in Included Libraries
In general, there can never be two types defined with the same type name in the
same namespace. This will result in a DUPLICATE_TYPE_DEFINITION error being
reported to the user. It is perfectly acceptable for this to occur, so long as the
DRIX files within which the types are declared is never used within the one
specification. For example, if DRIX A specifies a type “Bar” in namespace “foo”
and DRIX B specifies a type “Bar” in namespace “foo”, then this will only be an
error if DRIX A implicitly or explicitly includes DRIX B, or vice versa. Note that
this is not a problem if DRIX B only specifies the type “Bar” in namespace
“foo.baz”.
However, in certain situations, it is clearly desirable to be able to override type
definitions. In some cases (e.g. CABS, ASN.1 switch data), a vendor will provide
solutions that produce files of a variety of different file formats. However, each of
the formats produced may be simply different compositions of a set of available
field formats. For example, a vendor might define a number of different date
formats, binary formats, phone number formats and time formats etc which are
then composed in different manners to produce different file formats.
In such situations, it is extremely useful to simply maintain the collection of
vendor specified formats in a DRIX file (or set of DRIX files) such that they can
be included in all of the DRIX file format specifications that use these vendor
specified types.
In many cases, different releases will cause modifications to the types. This means
that you might have a system which is receiving data file from a switch provided
by vendor X. Say all of the formats produced use the types defined in release Y of
the vendor X specification standards. Then for some reason the switch is
upgraded, and the formats produced use the types defined in release Z of the
vendor X specification standards. Often, such format specification releases can be
specified as a simple diff between the two releases, such that the release Z
specification is the release Y specification, with a set of new types S, and a set of
modifications to existing types M.
LAVASTORM ANALYTICS
lavastorm.com
Page 27
Issue 1
LDR User Manual
Here, it would be useful to simply have the DRIX behave in the same manner –
i.e. to have the DRIX for release z simply include the drix from release y, add a
few more types in it’s library & override some other types.
With the use of the version attributes on include statements, and the overrides
attribute on include, library, namespace, primitiveType, type & generatedType
tags, this is all possible.
The overrides attribute is an optional Boolean attribute. When not specified, the
overrides attribute is taken from the containing tag. If none of the containing tags
specify an overrides attribute, then the default of the overrides attribute is false.
Table 3 shows the algorithm used for the addition of types defined in a Drix.
Table 3 – Type Override Procedure.
Type Override Procedure
In order to add all of the types defined & included into a Drix, the following is performed:
Let D denote the Drix to evaluate
Set{Type} S = addDrixTypes(false, {}, D)
Whereby the mechanism for performing this operation is outlined in the following pseudocode
specified functions:
Set{Type} addDrixTypes(Drix drix) {
S = {}
S = addDrixIncludes(S, drix)
S = addLibraryTypes(S, drix)
return S
}
Set{Type} addDrixIncludes(Set{Type} S, Drix drix) {
for each include I in D
if I has override attribute specified
override = override attribute
else
override = false
Let drixI denote the Drix corresponding to include I
Let SI denote the set of Types added by the include I
SI = addDrixTypes(drixI)
S = mergeTypes(S, SI, override)
return S
}
Set{Type} addLibraryTypes(Set{Type} S, Drix drix) {
Let L denote the library defined in Drix drix
SL = {}
for all Type T defined in L
if
TS
SL where the qualified name of TS = qualified name of T
throw Error : DUPLICATE_TYPE_DEFINITION
if L has override attribute specified
overrideLib = override attribute
else
overrideLib = false
for each namespace N in L
if N has override attribute specified
override = override attribute
else
override = overrideLib
S = addNamespaceTypes(override, N, S)
return S
LAVASTORM ANALYTICS
lavastorm.com
Page 28
Issue 1
LDR User Manual
}
Set{Type} addNamespaceTypes(boolean overrideNamespace, Set{Type} S, Namespace N) {
for each type (primitive, standard, generated) T in N
if T has override attribute specified
override = override attribute
else
override = overrideNamespace
S = mergeType(override, T, S)
return S
}
Set{Type} mergeTypes(Set{Type} S, Set{Type} Smerge, boolean override) {
T
Smerge
S = mergeTypes(S, T, override)
return S
}
Set{Type} mergeType(boolean override, Type T, Set{Type} S) {
if TS
S where the qualified name of TS = qualified name of T
if override
replace TS in S with T
else
throw Error : DUPLICATE_TYPE_DEFINITION
else
return S
}
Note that even with an overrides attribute set to true, it is not possible to have two
types with the same qualified name declared in the same library. It is also
important to note that overriding types updates all references to point to the new
type, regardless of scope. Consider the case shown in .
Example 11 – overrides example
<drix>
<include library="A"/>
<include library="B" overrides="true"/>
<include library="C" overrides="false"/>
<library name="L" overrides="true">
<namespace name="a" overrides="false">
<type name="X" overrides="false">
…
</type>
<type name="Y" overrides="true">
…
</type>
<type name="Z">
…
</type>
</namespace>
<namespace name="b" overrides="true">
<type name="Y" overrides="false">
…
</type>
</namespace>
<namespace name="c">
<type name="Z">
…
</type>
<type name="Z" overrides="true">
…
</type>
</namespace>
LAVASTORM ANALYTICS
lavastorm.com
Page 29
Issue 1
LDR User Manual
</library>
</drix>
<!—
End of DRIX file for library L.
Start new file for Library A
-->
<drix>
<library name="A">
<namespace name="a">
<type name="X">
…
</type>
</namespace>
<namespace name="b" overrides="true">
<type name="Y">
…
</type>
<type name="Z" overrides="false">
<field name="f" type="a.X"/>
</type>
</namespace>
</library>
</drix>
<!—
End of DRIX file for library A.
Start new file for Library B
-->
<drix>
<library name="B">
<namespace name="a">
<type name="X">
…
</type>
<type name="Y">
…
</type>
</namespace>
<namespace name="b" overrides="false">
<type name="Y">
…
</type>
</namespace>
</library>
</drix>
<!—
End of DRIX file for library B.
Start new file for Library C
-->
<drix>
<library name="C">
<namespace name="a">
<type name="Z">
…
</type>
</namespace>
<namespace name="a" overrides="true">
<type name="Y">
…
</type>
</namespace>
LAVASTORM ANALYTICS
lavastorm.com
Page 30
Issue 1
LDR User Manual
<namespace name="b" overrides="false">
<type name="Y" overrides ="true">
…
</type>
</namespace>
</library>
</drix>
The above example will error due to duplicate type definition errors. However, we are
more interested in the process for type override resolution for each of the types. For
these DRIX files, the following will happen for each type:
Type: a.X
Included in library A
Overriden in library B
At this point, field f, defined on type b.Z in library A will now point to the a.X
defined in library B.
Library L defines a.X, but without overrides set to true.
Therefore this will fail with a duplicate type definition error.
Type a.Y
Included in library A
Overriden in library B
Overriden in library C
Overriden in library L
Type a.Z
Included in library C
Library L defines a.Z, but without overrides set to true.
Therefore this will fail with a duplicate type definition error.
Type b.Y
Included in library B
Overriden in library C
Library L defines b.Y, but without overrides set to true.
Therefore this will fail with a duplicate type definition error.
Type b.Z
Included in library A.
Type c.Z
Included in library L
Library L defines another c.Z with overrides set to true.
However two types with the same qualified name can’t be declared in the
same library – therefore this will fail with a duplicate type definition error.
Note also that since library A defines a type b.Z, then this cannot be defined in
libraries B, C or L without overriding the type defined in A. However, the type b.Z is
not referenceable in library B or C.
LAVASTORM ANALYTICS
lavastorm.com
Page 31
Issue 1
LDR User Manual
5.2 Fully Constructed Elements
The configuration allowed in the xml input specification to the LDR is extensive. This
allows for great flexibility, however, as would be expected increased flexibility in
configuration leads to much greater complexity, and raises the entry level before end
users can start constructing specifications of their own. In order to mitigate this, there
are a large number of pre-defined types in libraries available for the end user to work
with.
An extremely exaggerated analogy is to consider writing java programs, using only
primitive types and the java.lang package, without having any of the classes declared
in the java API available to you. This would clearly be an onerous task, and as such, a
library is provided such that much of the low-level work is already done for you.
The end goal is that all user defined specifications will be straightforward and simple
to use. In order to do this, these specifications will need to reference other libraries
that contain a lot of pre-configured building blocks. This section defines how these
pre-configured building blocks can be put together in order to build file specifications.
For most users, this should be sufficient for creating DRIX files.
5.2.1 Types
A type is a primary building block from which the LDR input specifications are
constructed. A type defines how a piece of data is to be read and parsed. We have
three different levels of types. At the basic level, we have Primitive Types, which
are the lower level “bits and bytes” definition of how to read segments of encoded
data. At the higher level, we have Standard/Constructed Types. These
Standard/Constructed Types generally are composed of Fields which are bound to
Primitive Types and other Standard/Constructed Types. In this manner, we can
build up a reusable collection of data-reading elements in order to construct
specifications to parse files.
At another conceptual level, we have Generated Types. These Generated Types
provide a mechanism for creating new type definitions based on information in the
data. These are more of an advanced concept, and we address them in some detail
in section 5.2.1.9.
Types are inherently linked to Fields (as define in section 5.2.2). Types provide
the data type definition, whereas fields provide an instantiation of that definition,
which is then used to parse a data file. Consider the case of a library. In a library,
we may have many types defined, but when we use the library, we may only use a
small number of those types, by declaring fields of that type. For those familiar
with object-oriented programming, types are analguous to a class and fields are
analguous to an object instantiation of that class.
LAVASTORM ANALYTICS
lavastorm.com
Page 32
Issue 1
LDR User Manual
Type names cannot contain “.” characters – these are reserved for namespace
separation.
5.2.1.1 Standard/Constructed Types
A type tag is used to indicate that we are constructing a new standard type,
describing some data layout and encoding.
DRIX Tag 7 type
<type>
Description
Position
Attributes
Elements
Provides the definition of a data-type, including information on how
to parse the type and any sub-fields they contain. These types on
their own provide only the definition, and must be instantiated
through a field (see section 5.2.2) or primaryField (see section 5.1.4)
tag.
0..* type tags may exist within a namespace tag, or within a
using tag that does not lie within a type tag.
The order of types specified in a namespace tag is unimportant.
Required name attribute
Optional parentType attribute specifying the type that this
inherits from
Optional displayName attribute. This provides an external
handle, with the aim being that this displayName attribute
may later be used for GUI displays & error messages etc
Optional overrides attribute (see in Overriding Types in
Included Libraries section 5.1.9)
0..1 documentation tags
0..1 payload tags
0..* param elements. (see Params in section 5.2.1.3)
0..* templateParam elements. (see Template Parameters in
section 5.2.1.4)
0..1 using elements. (see Using in section 5.1.7).
0..* field elements. (see Fields in section 5.2.2).
0..* Structure definition elements. (see Flow Control & Data
Structure Elements in section 5.2.3)
0..* publish elements (see Publishing Data to be
considered for Immediate Output in section 5.2.1.8)
0..1 super elements. (see Inheritance & the Super Tag in
section 5.2.1.5)
0..1 of each of the java method tags - for more advanced
use. (See Advanced Code Elements for Construction of
New Types in section 7.3)
0..1 emittable elements. (see Emittable in section 5.2.1.7)
The order of these elements must be as follows:
The documentation tag must appear first if present
The payload tag must appear immediately after the
documentation tag if present.
Any param tags must follow directly after the payload tag if
present. The order of param tags is unimportant.
Any templateParam tags must follow directly after the last of
any defined param tags. The order of templateParam tags is
unimportant.
If a using tag is present, this must follow directly after the
templateParam tag, if present.
Any of the field, publish, Structure definition elements, or the
super element must follow directly after any declared using
tag. These can appear in any order. However the order they
appear determines the parsing order of the type. Therefore
order is important.
LAVASTORM ANALYTICS
lavastorm.com
Page 33
Issue 1
LDR User Manual
Any java method tags must appear directly after the field,
structure definition elements, forceOutput or super tags.
However their individual order is unimportant.
If present the emittable tag must be the last element in the
type.
Each type is allowed to contain any number of fields (instantiations of types, see
Fields in section 5.2.2), and a set of parameters (see Params in section 5.2.1.3)
that can be used for configuration of the type.
The fields can appear directly under the type declaration, or can be nested in
more complex structures defining looping and choice elements and other data
flow operations (see Flow Control & Data Structure Elements in section 5.2.3).
The payload tag mentioned in DRIX Tag 7 should generally not be included in
user written DRIX files. This is primarily inserted into generated DRIX files
from one of the LDR converters (e.g. ASN.1, COBOL).
The type tag also has an optional attribute that allows you to specify if the type
inherits from another tag. This attribute is parentType, and is discussed in
further detail in the Inheritance & the Super Tag section. At the moment, it is
important to note only the following:
Types can either inherit from other types, or not inherit from other types
Types cannot inherit from primitive types, generated types, or from java
primitives or objects.
The simplest cases, however, are type definitions such as in the following
example:
Example 12 – type tag example
<type name="TypeA">
<param name="paramA"
<param name="paramB"
<field name="fieldB"
<field name="fieldC"
<field name="fieldD"
</type>
javaType="int"/>
type="TypeB"/>
type="TypeB"/>
type="TypeC"/>
type="TypeD"/>
Example 12 states that we are constructing a type, called TypeA, which contains
A field of type TypeB, called fieldB
A field of type TypeC, called fieldC
A field of type TypeD, called fieldD
An int parameter, called paramA
A parameter of type TypeB, called paramB
If this definition were provided, it would also mean that in order for this field to
exist in the data, there would need to be a TypeB field in the data, followed
immediately by a TypeC field, followed by a TypeD field – note that the order of
the fields specified in a type is important. If we were to specify fieldC, fieldB
then fieldD, this would have a different meaning than the specification above.
LAVASTORM ANALYTICS
lavastorm.com
Page 34
Issue 1
LDR User Manual
If any of these fields were not present, then fields of type TypeA would not be
correctly scanned or parsed (see Program Flow in section 7.1 for more
information on how this is used to construct the overall structure of the data).
It also means that when an instance of TypeA is referenced (using a field
declaration) then arguments to the parameters paramA and paramE must be
supplied by the TypeA field.
This is a very simple example however this will be expanded out in the
subsequent sections.
5.2.1.2 Primitive Types
Primitive types provide the mechanism to bind to java objects, such that we can
have int types, or Object types etc. Primitive types are conceptually very similar
to standard types however their usage in the DRIX specification is significantly
different. When using the LAE interface to the LDR, if a primitive type is to be
output, then the base type must resolve to one of the following : String,
java.lang.Date, java.math.BigDecimal, java.math.BigInteger, or a java primitive
type, or java primitive type wrapper (e.g. int, Integer).
DRIX Tag 8 primitiveType
<primitiveType>
Description
Position
Attributes
Elements
A type that binds directly to a java primitive (e.g. int, double) or a
java Object (e.g. Integer, String).
Either
Under a namespace tag (0..* occurrences)
Under a using tag nested under a namespace tag (0..*
occurrences)
Required name attribute
Optional returnType attribute specifying the java primitive
type or java class that this primitive type reads from file.
Optional parentType attribute, specifying the LDR primitive
type that this primitive type inherits from.
Optional displayName attribute. The displayName attribute
provides an external handle with the aim being that this
displayName attribute may later be used for GUI displays
and error messages etc.
Optional overrides attribute (see in Overriding Types in
Included Libraries section 5.1.9)
0..1 documentation tags
0..1 payload tags
0..* param elements. (see Params in section 5.2.1.3)
0..1 super elements (See Inheritance & the Super Tag in
section 5.2.1.5)
0..1 java method tags - for more advanced use. (See
Advanced Code Elements for Construction of New Types in
section 7.3)
These elements must appear in the order they are presented above.
Each of the individual java methods can appear in any order. Each
of the individual param elements can appear in any order.
Primitive types are used to create the most basic types to read data. Therefore,
while standard types can contain fields which are primitive types, a primitive
type itself can contain no fields. End users generally will not need to construct
LAVASTORM ANALYTICS
lavastorm.com
Page 35
Issue 1
LDR User Manual
many new primitive types. Extending and using the primitive types defined in
the base libraries should be sufficient. In order to create a useful primitive type,
some investigation into the Advanced Concepts section will most likely be
required.
As with constructed types, the payload tag should generally not be included in
user written DRIX files. This is primarily inserted into generated DRIX files
from one of the LDR converters (e.g. ASN.1, COBOL).
The primitive type tag also has an optional attribute to specify a parent type if
this primitive type inherits from another defined primitive type. This is the
parentType attribute. It is important note that a primitive type cannot inherit
from a standard type, or a generated type and standard types and generated types
cannot inherit from a primitive type. We will look at inheritance in more detail
in the secion Inheritance & the Super Tag.
Primitive types, like types, are able to declare parameters which can be provided
as arguments by the field declarations which reference the primitive type.
Example 13 – primitiveType tag example
<primitiveType name="TypeA" returnType="String" >
<param name="paramA" javaType="int"/>
<param name="paramE" type="TypeE"/>
…
</primitiveType>
<primitiveType name="TypeB" parentType="TypeA">
<param name="paramB" javaType="int"/>
</primitiveType>
Example 13 simply states that the primitive type TypeA is a primitive type which
will read data into a String value. This type must be provided with a java int
parameter, and a parameter of type TypeE. We have also defined another
primitive type, TypeB, which extends the TypeA primitive type. When a field
references TypeB, it will need to provide arguments to paramB, as well as the
inherited parameters paramA and paramE. The primitive type declarations will
generally provide some of the java methods (defined in Advanced Code
Elements for Construction of New Types in section 7.3) in order to actually read
the data from the file.
5.2.1.3 Params
Parameters are a key part of any type, primitive type or generated type. The
parameters provide much of the flexibility in the configuration as they allow for
most of the data-driven processing required in many complex file formats.
DRIX Tag 9 param
<param>
Description
LAVASTORM ANALYTICS
lavastorm.com
Specifies parameters that are used to configure a type,
primitiveType or generatedType. Values (args) must be provided to
these parameters either through the the use of default arguments or
Page 36
Issue 1
LDR User Manual
Position
Attributes
Elements
when instantiating the type using the field tag (see Fields in section
5.2.2)
Under a primitiveType tag (0..* occurrences)
Under a type tag (0..* occurrences)
Required name attribute
Optional javaType attribute specifying the java primitive
type or java class of the param. Required if type attribute is
not present.
Optional type attribute specifying the type of the param.
Required if javaType attribute is not present.
Optional Boolean private attribute specifying whether or not
this parameter will be visible in documentation (for this
type, and any type inheriting from this type) – default false
Optional default element (see later in this section)
Optional documentation element
Parameters effectively allow for a given type to be constrained or modified
based on the results of previously read fields. Defaults are simply a convenience
utility provided such that the end user does not have to always provide the
parameters when they normally take on a default value. This is done to ensure
that the input xml specification is as succinct & readable as possible.
The private attribute on the parameter specifies whether or not in the generated
documentation for this type, the param will be displayed. If the param is purely
used for internal purposes, supplied with a default argument and is not meant to
be part of the type API, then the private attribute can be set to true. When this is
done, no documentation will be generated for the parameter – either on the type
where the private attribute is declared, or on any type inheriting from this type.
The private attribute affects documentation only – it can still be supplied with
arguments from externally, however the param is not documented. A similar
concept exists for hiding parameters defined in a parentType using the
hideParam element, which is described in section 5.2.1.5.
In order to see the usefulness of parameters, consider the following example:
Example 14 – param tag example
<primitiveType name="FixedString" returnType="String">
<param name="length" javaType="int"/>
</primitiveType>
<type name="CountedString">
<field name="stringField" type="FixedString">
<arg name="length" value="3"/>
</field>
</type>
Here, we have a fixed length String, where the String is specified to be of length
3. Obviously, this type definition is not complete, as it needs some way of
reading the String. We could actually use the primitiveType string.Ascii in this
instance, which is provided in the common library supplied with the LDR. In
general, the primitiveTypes provided in the base libraries should be sufficient,
LAVASTORM ANALYTICS
lavastorm.com
Page 37
Issue 1
LDR User Manual
however, we will see how to construct the reading elements for primitive types
in Advanced Concepts in section 7.
If, we assume for the moment that this primitiveType, when provided with a
length somehow automagically knows how to read a String, we can then use this
as seen in the CountedString type in Example 14.
We have jumped ahead a little, and introduced field arguments (the method of
providing the parameters to types – see Fields for more information), however it
is fairly straightforward to see that we want to read a String field of length 3.
This is a simple example showing some of the power of the param-arg tag
combination. We will expand this example in later sections to illustrate
additional functionality.
The default tag has the form shown in DRIX Tag 10.
DRIX Tag 10 default
<default>
Description
Position
Attributes
Elements
Exists under a param tag, templateParam tag or typeParam tag to
provide default values which can be overridden by any super-arg,
super-templateArg or field-arg, field-templateArg, field-typeArg
combination.
Under a param tag, templateParam tag or typeParam tag
(0..1 occurences)
Optional value attribute
0..1 fromField element
0..1 fromParam element
0..1 value element
0..1 expr element
One and only one of the fromField, fromParam, expr, value tags or
value attributes may exist under the default tag. See Args in section
5.2.2.3 for information on how the fromField, fromParam, expr, value
tags & value attribute are used to supply arguments to params.
5.2.1.4 Template Parameters
Template parameters enable the construction of generic types which can be
specialised through providing template arguments that allow for specific parsing
behaviour. There are two forms of template parameters; namely constant
template parameters, and “normal” template parameters.
Both “normal” and constant template parameters are handled through the
“templateParam” tag defined in DRIX Tag 11.
DRIX Tag 11 templateParam
<templateParam>
Description
Position
LAVASTORM ANALYTICS
lavastorm.com
Provides generic capabilities to types to allow for collections of
types. The type of the element in the collection is provided by the
templateParam.
Under a type tag (0..* occurrences)
Page 38
Issue 1
LDR User Manual
Attributes
Elements
5.2.1.4.1
Required name attribute
Optional type attribute
Optional javaType attribute
Optional baseType attribute
Optional returnType attribute
Only one of type or javaType can be used – these indicate the use
of constant template parameters.
A baseType or returnType can only exist in the case of “normal”
template parameters and cannot be used in conjunction with a type
or javaType attribute.
Optional documentation element
Optional default element (if a type or javaType are used
indicating constant template parameters)
Normal Template Parameters
“Normal” template parameters are primarily used when we have a collection of
types. We can then define the behaviour of the collection in a type declaring a
template parameter and handle the collection generically without needing to
know the specifics of the types that are part of the collection. Consider if we
have a “List of ints” and a “List of Dates”. Here, we want to be able to specify a
generic “List” type, and pass the type of the collection as a parameter. This is
exactly the purpose that “normal” template parameters serve.
Without the use of template parameters, if we were trying to implement this List
behavior as described above, we would need to use a specification like that in
Example 15. We have introduced the repeatRange tag described in section
5.2.3.1. For this example it is sufficient to know that the repeatRange tag simply
defines that the fields within the tag occur in the data a number of times.
Example 15 – list example without using templateParam
<type name="ListOfDates">
<repeatRange min="0" max="unbounded">
<field name="listField" type="myLib.Date"/>
</repeatRange>
</type>
<type name="ListOfInts">
<repeatRange min="0" max="unbounded">
<field name="listField" type=".integer.UInt32"/>
</repeatRange>
</type>
<type name="MyType">
<field name="intList" type="ListOfInts”/>
<field name="DateList" type="ListOfDates”/>
</type>
This is clearly a verbose and undesirable method for specifying such a simple
concept as a List. In our example, the ListOfTypename types are very simple
types. It is easy to see that if we were implementing something more
complicated than a simple List, with a larger structure, this could lead to
extremely long specification files, with a lot of very similar type definitions –
LAVASTORM ANALYTICS
lavastorm.com
Page 39
Issue 1
LDR User Manual
likely leading to issues with maintainance of the specification for any minor
changes to the base type.
In order to overcome this, the concept of template types is introduced. Similar to
the concept of params and args, arguments can be provided to a templateParam
using the templateArg notation.
In general we are able to use a more concise, shorthand method, by specifying
template parameters through the use of {} notation. Using this notation, we are
able to provide a much more concise and generic solution to the List example
discussed above. Example 16 shows how this can be achieved.
Example 16 – ”normal” templateParam example
<type name="List">
<templateParam name="ListElementType"/>
<repeatRange min="0" max="unbounded">
<field name="listField" type="ListElementType">
</repeatRange>
</type>
<type name="MyType">
<field name="intList" type="List{.integer.UInt32}"/>
<field name="DateList" type="List{myLib.Date}"/>
</type>
Here, we define that we have intList, and dateList fields. The actual element
type in the List collection is the templateParam defined in the List class, and is
specified within the curly brackets. Here we see that we simply supply an
“.integer.UInt32”, “myLib.Date” type respectively for each of the required List
types, and achieve the required result. The type .integer.UInt32 is a 32 bit
unsigned integer type, which comes as part of the standard base libraries shipped
with the LDR. We assume for this example the existence of some custom userwritten myLib.Date type.
When using the shorthand “{}” notation, the template arguments supplied must
be Standard Types, or Primitive Types. They cannot be java types or generated
types. As discussed previously, this shorthand notation is preferred and can be
used for all cases where the template parameter type is static, not a java type,
and known at the time of writing the specification.
The “baseType” and “returnType” attributes can be used on template parameters
to allow for restrictions on the set of template arguments that can be supplied to
the type. Consider the case where we want know we have some record where
the record length is first read, then we know we have to read some type which is
some implementation of a “BaseRecordType”.
Example 17 – ”normal” templateParam example using baseType and returnType
<type name="BaseRecordType">
<documentation>
<![CDATA[
This is simply an abstract base type.
The actual types to use will inherit from this type
See the section on Inheritance and the Super tag
For more information on how this is possible
LAVASTORM ANALYTICS
lavastorm.com
Page 40
Issue 1
LDR User Manual
]]>
</documentation>
<param name="length" javaType="int"/>
</type>
<type name="TemplatedContainer">
<templateParam name="RecordLength" returnType="int"/>
<templateParam name="Record" baseType="BaseRecordType"/>
<field name="recLength" type="RecordLength"/>
<field name="record" type="Record">
<arg name="length">
<fromField field="recLength"/>
</arg>
</field>
</type>
In Example 17 we show how this is possible. Through the use of the baseType
and returnType attributes we have enforced that the RecordLength returns an
“int”. We know then that we can supply the value read from this field as an
argument to the “Record” template parameter. This is because we have also
restricted the “Record” template parameter to have “BaseRecordType” as a
baseType, and we see that “BaseRecordType” tales a “length” parameter which
is also declared to be an “int”.
The rules applied to input template arguments based on the returnType and
baseType attributes are listed below in Table 4 and Table 5 respectively
Table 4 – baseType rules
baseType Rules
When a template parameter declares a baseType attribute, the type supplied as a template
argument must:
1. Be the same as the specified baseType
OR
2. Inherit (either directly or indirectly – via several levels of inheritance) from the
specified baseType
Table 5 – returnType rules
returnType Rules
When a template parameter declares a returnType attribute, the type supplied as a template
argument must:
1. Be a primitiveType which either has a returnType which is either
a. The same as
b. A subclass of
c. Assignable to
The specified returnType on the templateParam.
OR
2. Be a standard type which is declared to be emittable, and the emittable type is
either
a. The same as
b. A subclass of
c. Assignable to
The specified returnType on the templateParam
5.2.1.4.2
Constant Template Parameters
Constant template parameters on the other hand are effectively the same as
parameters. The main reason for constant template parameters is for efficiency
LAVASTORM ANALYTICS
lavastorm.com
Page 41
Issue 1
LDR User Manual
reasons. Each time a field occurrence is encountered, the field arguments are
provided to the type. On the other hand, constant template parameters are used
to initialise a type and once initialised, these template arguments are used for all
occurrences of the template type. This is particularly useful where the
construction of a new object is required in order to create the template argument
and a given field will always provide the same argument.
Consider the case of a generic type which reads a set of characters, in any
character set. The character set (or a CharsetDecoder) could be declared as a
constant template parameter. In this way, we know that if we need to read a
UTF-8 String, we can simply provide the character set (or a CharsetDecoder) as
a constant template argument and the same character set (or CharsetDecoder)
will not need to be initialised and passed to a type on each field occurrence.
Example 18 shows how a constant template parameter can be declared and
referenced. This is an extremely trivial example, which defines a constant
template parameter, and then simply sets a field to be the value of the constant
template parameter. Constant template parameters can only be referenced within
a code block, or within an expr tag. In order to reference the value of a constant
template parameter, the expression “TemplateParam.<templateParamName>” is
used.
Example 18 – Trivial example using constant template parameters.
<type name="SomeType">
<templateParam name="constTemplateParam" javaType="String"/>
<field name="echoTheTemplateParam" javaType="String">
<expr>TemplateParam.constTemplateParam</expr>
</field>
</type>
The use of constant template parameters can lead to large performance
improvements. Due to the implementation details however, these should not be
used in situations where there is the likelihood in a given specification for a
large number of variations in the value provided as the template argument.
Basically, a new java class is created for each different constant template
argument value supplied and this can lead to slowdown and a larger memory
footprint. Currently there is no shorthand method for supplying constant
template arguments.
5.2.1.5 Inheritance & the Super Tag
Warning:
There is a significant difference in the bahaviour of super-arg tags when
implementing your own scan, skip, read etc methods as defined in Advanced
Concepts in section 7. Please refer to that section for how to use the super-arg
tag combination if you are overriding the read/scan/skip methods.
We have already discussed inheritance relating to primitive types & types. This
section provides more detail as to how inheritance works in the LDR. As already
described, inheritance in general can occur in the following situations:
LAVASTORM ANALYTICS
lavastorm.com
Page 42
Issue 1
LDR User Manual
Primitive types can inherit from other primitive types. Primitive types
cannot inherit from standard types or generated types.
Standard/Constructed types can inherit from other standard types, not
primitive types or generated types.
Generated types cannot inherit from anything.
When inheriting from a primitive type, the parameters declared in the parent
primitive type are all inherited. Since primitive types have no contained fields,
and since the order of parameters are not important, this is a simple case, and
can be handled without need for any extra information being provided in the
specification.
Standard/ConstructedType inheritance on the other hand is slightly more
complicated. A type can declare fields, and the field order is important, as seen
in the Standard/Constructed Types section. Therefore, the question must be
answered: where are inherited fields positioned in the child type, with respect to
the fields it declares itself?
In order to resolve this possible ambiguity, while also increasing the flexibility
of the inheritance model, the super tag is introduced. Super tags have the
following properties:
DRIX Tag 12 super
<super>
Description
Position
Attributes
Elements
Provides arguments to an inherited type and an indication to the
engine as to when the inherited type should be instantiated, and the
position of any inherited fields in the inheriting type.
Under a type or primitiveType that inherits from another
type or primitive type (0..1 occurences)
None
0..* arg elements (see Args in section 5.2.2.3)
0..* templateArg elements (see section 5.2.1.6)
0..* hideParam elements
The declaration of a super tag allows for the child tag to declare where its
parent’s fields are located relative to its own fields. Furthermore, when using a
super tag, optional arguments maybe supplied to any of the parent types
parameters. These in essence become default parameters for the child type. The
super tag allows us to also set defaults for parameters which are defined more
than one level up the inheritance chain.
Example 19 – Super tag example with args specified.
<type name="t1">
<!--t1 fieldDeclarations-->
<param name="p1" javaType="int"/>
<param name="p2" javaType="int"/>
</type>
<type name="t2" parentType="t1">
<super>
<arg name="p2" value="0"/>
</super>
<!--t2 fieldDeclarations-->
</type>
LAVASTORM ANALYTICS
lavastorm.com
Page 43
Issue 1
LDR User Manual
<type name="t3" parentType="t2">
<!--t3 fieldDeclarations-->
<super>
<arg name="p1" value="1"/>
</super>
</type>
Example 19 shows a situation where the super tag is used. In this example, t3
will correctly set its parameter p1 to 1, where p1 is defined in the t1 type,
inherited by t2 and then subsequently inherited by t3. Similarly, p2 will be set
(to 0) by the call in t2. If in t3, there was an arg tag to set p2, then this would
override the value provided in t2.
The example also dictates that the fields defined in t1 occur before any t2 fields.
The t3 declaration defines that the t3 fields come before the t2 fields. This is
dictated by the fact that the field declarations come prior to the super tag in t3,
whereas the t2 field declarations come after the super tag in t2.
Therefore, the xml definition of t3 above is functionally equivalent to that
shown in Example 20.
Example 20 – super tag example part 2
<type name="t3">
<!--t3 fieldDeclarations-->
<!--t1 fieldDeclarations-->
<!--t2 fieldDeclarations-->
<param name="p1" javaType="int">
<default value="1">
</param>
<param name="p2" javaType="int">
<default value="0">
</param>
</type>
When a type is declared to have a parent type, but no super tag is present, any
inherited fields must occur in the data prior to any fields declared in the
inheriting type.
When inheriting from types that declare template parameters and template
arguments, the situation is slightly more complex, and discussed in the next
section.
The super tag also allows for the definition of any number of “hideParam”
elements. These elements have the format shown inDRIX Tag 13.
DRIX Tag 13 hideParam
<hideParam>
Description
Position
Attributes
Elements
LAVASTORM ANALYTICS
lavastorm.com
The parameter with the specified name which is defined somewhere
on the inheritance chain for this type should not be documented on
this type. Does not affect parameter usage, only documentation.
Under a super tag (0..* occurences)
Required name attribute – must match exactly the name of
a parameter defined on a parentType of this type.
None
Page 44
Issue 1
LDR User Manual
When a hideParam element is defined within a super tag, this means that the
parameter with the corresponding name in the type’s inheritance hierarchy will
not be documented on this type. Consider the case where an extensible base type
is written. This base type may have many parameters defined. Other types may
then implement the base type – by declaring the base type as a “parentType” and
supply the required arguments to the parameters. In this case, it may be desirable
to hide the documentation for these parameters on the implementing type. In
such a case, the hideParam tag can be used such that the documentation
produced will not show the specified parameters from the parentType. Note that
the use of the hideParam notation affects parameter documentation only, and has
no effect on parameter usage – a hidden parameter can still be supplied with an
argument from another argument.
5.2.1.6 Template Parameter/Template Argument Inheritence
When inheriting from types that declare template parameters, the situation is
more complicated. Template parameters are not directly inherited by a child
type. Therefore, in general these will be re-declared on all child types.
Additional template parameters can always be added to the child types.
All template parameters defined on a parent type must be satisfied. Therefore,
the child type can simply take a template argument and pass this up to its
parentType, or the childType can declare a new template argument in the super
tag to pass up to the parent in order to satisfy the parent’s template parameters.
The template argument tag will be introduced fully in section 5.2.2.9. Consider
the DRIX shown in Example 21.
Example 21 – Super tag example with templateArgs
<type name="grandParentType">
<templateParam name="gp1"/>
<field name="gpField1" type="gp1"/>
</type>
<type name="parentType" parentType="grandParentType">
<templateParam name="p1"/>
<templateParam name="p2"/>
<super>
<templateArg name="gp1" type="p1"/>
</super>
<field name="pField1" type="p2"/>
</type>
<type name="childType" parentType="parentType">
<templateParam name="c1"/>
<templateParam name="c3"/>
<super>
<templateArg name="p1" type="c1"/>
<templateArg name="p2" type="B">
</super>
<field name=”childField” type="c3"/>
</type>
…
…
LAVASTORM ANALYTICS
lavastorm.com
Page 45
Issue 1
LDR User Manual
<primaryField name=”instance” type=”lib.childType{A, C}”>
Assuming that the childType definition lies in a library called lib, and that
somewhere we have defined types A, B and C, then this definition is effectively
type same as the declaration in Example 22.
Example 22 – Example showing the template argument inheritance equivalence
<type name="childType">
<field name="gpField1" type="A"/>
<field name="pField1" type="B"/>
<field name="childField" type="C"/>
</type>
…
…
<primaryField name="instance" type="lib.childType">
5.2.1.7 Emittable
The emittable tag allows Standard Types to be referenced directly by the output
specification as fields to output. This means that by providing a mechanism to
“emit” a Standard Type, we do not need to reference its subfields in the output
specification in order to have a representation of the field value output. When
using the LAE interface to the LDR, an emittable type must resolve to one of the
types listed in Table 22 in section 8.1.1.
DRIX Tag 14 emittable
<emittable>
Description
Position
Attributes
Elements
Provides a mechanism to emit/output a constructed, non-primitive
tag.
Last element within a type tag (0..1 occurences)
Required type attribute, specifying the type which is
emitted. This type must be either declared as emittable, a
primitiveType, or a java class or primitive
Optional boolean fromParent attribute. If the fromParent
attribute exists, it must be set to true, and no other
attributes or elements are allowed.
0..1 fromField element (see The fromField element in
section 5.2.2.6)
0..1 fromParam element (see The fromParam element in
section 5.2.2.7)
0..1 expr element (see The expr element in section 5.2.2.8)
0..* templateArg tags (see Template Arguments in section
5.2.2.9)
There can be one and only be one of the fromField, fromParam and
expr elements under an emittable tag.
Notice that we are able to supply template arguments to the type attribute in the
emittable tag. In addition, we are also allowed to use the shorthand template
argument ({}) notation in the emittable type attribute. Even when supplied using
the templateArg tag, these template arguments cannot be dynamically bound
under the emittable tag.
Consider the case of the counted string shown in Example 23.
LAVASTORM ANALYTICS
lavastorm.com
Page 46
Issue 1
LDR User Manual
Example 23 – CountedString example
<type name="CountedString">
<field name="length" type=".integer.UInt21" readRequired="true"/>
<field name="stringField" type=".string.Ascii">
<arg name="length">
<fromField field="length"/>
</arg>
</field>
</type>
In general, we don’t really care about outputting the CountedString.length.
Furthermore, when we reference a field F of type CountedString in the output,
we don’t want to have to reference the F.stringField subfield. This is where the
emittable tag is useful. Instead, we could define the Counted String as per
Example 24.
Example 24 – emittable example
<type name=”CountedString”>
<field name=”length” type=”.integer.UInt32” readRequired=”true”/>
<field name=”stringField” type=”.string.Ascii”>
<arg name=”length”>
<fromField field="length"/>
</arg>
</field>
<emittable type=”String”>
<fromField field="stringField"/>
</emittable>
</type>
Here, since we have defined the CountedString as emittable, we can simply
reference it in the output without needing to reference its subfields. We can see
here that the actual output is then taken from the value returned from the
stringField. The type attribute in Example 24 could be set to .string.Ascii
without changing the functionality. When determining the type to output from
the type attribute, the steps outlined in Table 6 are applied.
Table 6 – Emittable Type Resolution Rules
Emittable Type Resolution Rules
When resolving the type to output for an emittable field, the following steps are taken:
1.
2.
3.
4.
5.
If the emittable is declared with attribute fromParent=”true”, obtain the parentType if
no parentType exists or if the parentType is not declared to be emittable, throw an
INVALID_EMITTABLE_INHERITANCE, else resolve the emittable type of the parent.
If the type attribute references a template parameter
a. Obtain the LDR type supplied as a template argument and goto step 2.
If the type attribute references an LDR primitive type
a. If the primitive type declares a returnType, use this as the emittable type
b. Else search the inheritance chain of the primitive type, and if any return types
are specified, use the inherited return type as the emittable type
c. Else if the primitive type does not have a return type specified anywhere in its
type hierarchy, throw an INVALID_EMITTABLE_TYPE error.
Else If the type attribute references an LDR standard/constructed type
a. If the standard/constructed type is defined as emitable, resolve the type to
output for that emittable type.
b. Else if not found, throw an INVALID_EMITTABLE_TYPE error.
Assume the specified type is an allowable java type, and use this as the output type.
Example 25 demonstates the utility of the emittable-expr tag combination.
LAVASTORM ANALYTICS
lavastorm.com
Page 47
Issue 1
LDR User Manual
Example 25 – emittable example part 2
<type name=”DateString”>
<field name=”year” type=”.string.Ascii” readRequired=”true”>
<arg name=”length” value=”4”/>
</field>
<field name=”month” type=”.string.Ascii” readRequired=”true”>
<arg name=”length” value=”2”/>
</field>
<field name=”day” type=”.string.Ascii” readRequired=”true”>
<arg name=”length” value=”2”/>
</field>
<emittable type=”String”>
<expr>
field.day()+”/”+
field.month()+”/”+
field.year()
</expr>
</emittable>
</type>
Here, the output of a DateString type would be a String in the format
DD/MM/YYYY.
We have made reasonably extensive use of the readRequired attribute in this
example. This attribute is discussed in detail in the Fields section in 5.2.2. For
the purposes of this section, however, it is sufficient to know that if a field’s
value is to be referenced by a subsequent field, then the readRequired attribute
must be set to true.
Inheritance is also slightly tricky in the emittable scenario. The emitable nature
of a parent type is not automatically inherited by the child type. Therefore a
parent type can be emittable, while the child type is not. Similarly (and
intuitively) a child type can be declared to be emittable when the parent is not
emittable. If the emittable nature is to be inherited, then the fromParent attribute
should be set to true. This child type is also allowed to declare a new emittable
tag with different properties than the parent type. However, there are restrictions
on the emittable type in this case.
Consider the case where we have a type A which inherits from type B where B is
declared to be emittable and the emittable-type attribute resolves (via the rules
in Table 6) to a type X. Consider that A is also declared to be emittable, and the
emittable-type attribute resolves (via the rules in Table 6) to a type Y. In this
case, Y must inherit from or be equal to X or an error will be thrown.
5.2.1.8 Publishing Data to be considered for Immediate Output
Warning:
This tag requires knowledge of how program flow occurs in the LDR and the
relationship between the input metadata and the output. As we will show, the
publish tag allows for optimisation of memory and file I/O. However it must be
used with caution. Incorrect use of the publish tag can result in errors in the
LAVASTORM ANALYTICS
lavastorm.com
Page 48
Issue 1
LDR User Manual
output and possible scan errors. It is therefore recommended that this tag be
only be utilised by users with a thorough understanding of how the LDR engine
operates – particularly an understanding of the scanning process and the ideas
in section 7.1.
Without using the publish tag, however, large data files with “or” tags
surrounding large amounts of data to parse, performance will not be optimal.
Publish is a tag that can be placed within a type declaration, and indicates that,
when encountered, all of the fields that have already been read in the tag should
be published to the output side of the engine, to be considered for output. The
publish tag is used in the input specification. However the output specification
defines which fields we want to output. Therefore, it may occur that the fields
prior to the publish tag are not output because they are not referenced in the
output specification. If, on the other hand, they are referenced in the output
specification, then they will be output after encountering the publish tag.
DRIX Tag 15 publish
<publish>
Description
Attributes
Boolean indicating to output processing that at a given point all
fields that have been read can be immediately output, regardless of
whether or not the type has been successfully scanned.
Within a type declaration, where the position is important see Standard/Constructed Types in section 5.2.1.1. (0..*
occurrences)
None
Elements
None
Position
The publish tag is used to specify that all fields prior to the tag should always be
output if it is scanned correctly, regardless of whether or not the top level and
subsequent dependant fields scan correctly. This is used primarily for cases of
where there are wrapping elements around large sections of the file, and we
know that we have enough information at a particular level to force an output.
Consider a case where we get data in the format outlined in Example 26.
Example 26 – file format requiring publish
FileStruct := FH
REC*
FT
FH := String
Rec:= HugeRecordStructure
FT := String
In this example, we do not know we have a correct file until parsing the end FT.
The LDR engine handles the file structure in a sensible manner, such that in any
case where a field is not nested under an “or”, “repeatRange” or “while” tag, it
is available to be read and output immediately after being successfully scanned.
However, in this situation, since the records are repeated, the engine does not
know that it can output each record after it is scanned. In this instance, all of the
records would need to be scanned successfully prior to any of them being
available to output. As such, we need a method to indicate that even though we
LAVASTORM ANALYTICS
lavastorm.com
Page 49
Issue 1
LDR User Manual
have not successfully parsed all of the records, we want output each of the
FileStruct.Rec records as they are scanned.
In order to do this, we use the publish tag. Using publish, we could define the
above specification in our input xml metadata in the manner shown in Example
27.
Example 27 –publish tag example
<type name=”FileStruct”>
<field name=”FH” type=”String”/>
<repeatRange min="0" max="unbounded">
<field name=”REC” type=”Record”/>
<publish/>
</repeatRange>
<field name=”FT” type=”String”/>
</type>
<type name=”Record”>
<field name=”HugeRecordStructure”/>
</type>
For more of an understanding as to the underlying operation of this tag, the
reader should consult the Advanced Concepts section.
5.2.1.9 Dynamic Type Generation And Type Params
Warning:
In order to correctly implement this tag, some java knowledge is required.
The details of the code required for the implementation of the generator tag are
omitted from this section as this is considered an advanced topic and described
in the section titled “Code Required for the Generator Tag” in section 7.3.7.
At this point, we have shown how we can handle a vast array of different types
and are able to create types with an arbitrary level of nesting through
composition and inheritance models. We have shown that through the use of
parameters and type parameters we allow for easy data dependant configuration
of these types. An important area we have not investigated is the case where the
type of elements of the data is not known at the time when the input
specification is constructed.
FileStruct := Metadata
Data(Metadata)
Example 28 –File format requiring dynamic types
Consider the case in Example 28 where we have a file defined and the type of
the data is determined by the metadata. This type of file structure is used for
Lavastorm’s BRD files, where the entire file has tab-delimited columns, and
newline delimited records. The data in each of the columns in the first row
specifies the type of the data in the corresponding column is read in all of the
subsequent records.
stringField:String
LAVASTORM ANALYTICS
lavastorm.com
intField:int
Page 50
dateField:date
Issue 1
LDR User Manual
Record1
1
1/10/2008
Record2
2
2/10/2008
Record3
3
3/10/2008
Example 29 –BRD example file
There is no way to specify this format using the XML elements we have defined
thus far. For this reason, we have introduced two new concepts to address the
problem of dynamic typing. These are:
Generated Types (discussed here)
Binding Fields Dynamically to Types (see Dynamic Binding to Types)
Generated types are defined in the tag generatedType. They have the properties
defined in DRIX Tag 16.
DRIX Tag 16 generatedType
<generatedType>
Description
Position
Provides a mechanism for dynamically create new types based on
some input parameters.
Within a namespace tag, or within a using tag that is not
nested under a type tag. The order of generatedType tags
is unimportant (0..* occurrences)
Attributes
Elements
Required name attribute
Optional overrides attribute (see in Overriding Types in
Included Libraries section 5.1.9)
0..1 documentation elements
1..* typeParams elements (order unimportant)
1 generator element. The generator element contains java
code used to construct new types. The details the code in
the generator tag is considered an advanced concept, and
therefor is defined in Advanced Concepts in section 7
0..3 code tags (allowed only 1 file location,1 class location
& 1 init location tag see Code in section 7.3.6)
The documentation tag if present must appear first, followed by all of
the typeParam tags. The generator tag must follow the typeParam
tags, with any code tags appearing last.
The contents of the generatedType tag must contain sufficient information to
generate other types. The typeParams are not parameters to the types that are
generated. Rather, they are parameters to the type generator and therefore are
used to generate a type. These are supplied through the typeArg element. The
name attribute specifies the name of the type-generator, not the generated types.
The typeParam element has the properties displayed in DRIX Tag 17.
DRIX Tag 17 typeParam
<typeParam>
Description
Position
Attributes
Elements
LAVASTORM ANALYTICS
lavastorm.com
Parameter used in a generated type in order to construct the new
type.
Under a generated-type tag (1..* occurrences)
Required name attribute
Required type attribute
Optional documentation element
Optional default element
Page 51
Issue 1
LDR User Manual
Each type-generator must also create a new generated type name. The name of
the type-generator is referenced in the input specification (since we have no
knowledge of the name of the generated type at this stage). However, the output
specification may reference the name of the types generated by the type
generator.
The generator element is a String element which is interpreted as a java code
block and compiled. This generator creates the new type in order for it to be
referenced elsewhere. The generator requires java code, and as such is
considered an advanced topic. The definition of the generator tag and the code
that it requires is defined in section 7.3.7.
Using the BRD file structure from Example 28, we could construct an imput
specification to read the data, as per Example 30.
Example 30 –File format requiring dynamic types
<generatedType name="Data">
<typeParam name="metadata" type="String"/>
<generator>
//Code used to create the new type “data”.
</generator>
</generatedType>
<field name="File" type="FileStruct"/>
<type name="FileStruct"/>
<field name="metadata" type="true" readRequired="true"/>
<field name="data" type="Data">
<typeArg name="metadata">
<fromField field="metadata"/>
</typeArg>
</field>
</type>
Here, we read the metadata first. The metadata is then passed as an argument to
the generated type Data, which uses the parameter to generate a new type. The
type that is generated from the generator method is then assigned to be the type
of the File.data field.
The actual mechanism for creating a new type is contained in the generator
element, which is intentionally left incomplete in this example. In order to create
a generator section, the user requires knowledge of the LDR API and the
concepts outlined in section 7. The generator tag itself is decribed in greater
detail in section 7.3.7.
There is significant overhead involved in using dynamic types. They are also
more complicated to define, as a code-block will be required to be written in the
generator tag. Therefore, it is recommended that wherever possible, a fully
specified input xml metadata should be used where generated types are not
required. If dynamic binding to types only (and not full-scale type generation) is
required, then it will be easier for most end-users to use the methods described
in Dynamic Binding to Types in section 5.2.2.13. However, in some cases
LAVASTORM ANALYTICS
lavastorm.com
Page 52
Issue 1
LDR User Manual
generated types are required, and in these cases, the method outlined above
should be followed. As an example, the COBOL copybook library uses type
generators in order to construct the file structure based on an input copybook
specification.
The types constructed by the generated type are allowed to inherit from
primitive types and standard types, however not from other generated types.
Anonymous fields (see section 5.2.2.11) cannot be declared to be of a generated
type.
The name of the generated type is the type name that is to be used while
referencing the type in the input specification. However, consider the case where
we have a generated type present in a loop, where a different typeArg is
supplied to the type generator for each occurrence of the field in the data. In this
case, the actual type that is generated may be different on each loop iteration.
While it is correct to reference it in the same manner in the input specification
(we don’t really care here what is generated), the type generator itself will need
to construct a new type name such that it can be distinguished from the other
generated types coming from the same generator. This is particularly important
when looking at the output specification relating to dynamic and generated types
(see section 6.5).
5.2.1.10 Allowable Combinations and Order of Evaluation
So far we have introduced a lot of concepts relating to the allowable types. We
have discussed parameters, templated types and template parameters, dynamic
binding, generated types, standard types, primitive types and java types. This
section provides an overview as to which combinations are allowed, under
which tags, and the evaluation order of the arguments.
First, it is important to note that javaType fields can have no parameters, no
template parameters, no type parameters and therefore no arguments should be
supplied to any of these types. Fields cannot be dynamically bound to java
Types. Anonymous fields (see section 5.2.2.11) cannot be declared to be of a
java Type.
For PrimitiveTypes, these can only have parameters declared. They cannot have
template parameters, or type parameters declared. Fields can be dynamically
bound to Primitive Types. Anonymous fields (see section 5.2.2.11) cannot be
declared to be of a Primtive Type.
Standard Types can have parameters and template parameters declared. They
cannot have type parameters declared. Fields can be dynamically bound to
Standard Types. Anonymous fields (see section 5.2.2.11) can be declared to be
of a Standard Type.
LAVASTORM ANALYTICS
lavastorm.com
Page 53
Issue 1
LDR User Manual
Generated Types (under generatedType tag) cannot have parameters or template
parameters declared. They must have type parameters declared. Fields can be
dynamically bound to Generated Types. Anonymous fields (see section
5.2.2.11) cannot be declared to be of a generated Type.
For any template argument or type argument tags, if the type referenced does
not declare the corresponding template param or type param tags, then the
arguments are simply ignored. For the argument tags, these have to correspond
to param tags if the field is not dynamically bound to a type, does not have
dynamically bound template arguments, and does not specify a generated type.
Otherwise, if there are any dynamic properties of the field, if the type referenced
does not declare the corresponding param tag, then the argument is simply
ignored. In Example 31, a complicated example is shown, from which we will
illustrate the order of evaluation of args, typeArgs, templateArgs and typeFrom
args.
Example 31 –Complicated example illustrating order of evaluation
<type name="MyType">
<field name="argField" type="MyString" readRequired="true"/>
<field name="templateArgArgField" type="MyString"
readRequired="true"/>
<field name="templateArgField" type="MyString" readRequired="true"/>
<field name="typeArgField" type="MyString" readRequired="true"/>
<field name="typeFromField" type="MyString" readRequired="true"/>
<field name="complicatedType">
<typeFrom>
<fromField field="typeFromField"/>
</typeFrom>
<typeArg name="typeParam1">
<fromField field="typeArgField"/>
</typeArg>
<templateArg name="templateArgParam1 ">
<fromField field="templateArgField"/>
<arg name="param1">
<fromField field="templateArgField"/>
</arg>
</templateArg>
<arg name="param1">
<fromField field="argField"/>
</arg>
</field>
</type>
From this example, the following operations are performed in order:
1. Read the argField field
2. Read the templateArgArgField field
3. Read the templateArgField field
4. Read the typeArgField field
5. Read the typeFromField field
6. Evalute the typeFrom argument, and determine the type that this
corresponds to
7. If the type found in step 5 is a Generated Type, evaluate the typeArg,
pass this to the Generated Type generator method, and obtain the actual
type to use
LAVASTORM ANALYTICS
lavastorm.com
Page 54
Issue 1
LDR User Manual
8. If the type to use has a templateArgParam1 template parameter declared,
then evaluate the corresponding template argument, using the value read
in the field templateArgField.
9. If in step 8, the template argument was required, and if the type that this
template argument corresponds to declares a parameter param1, then
evaluate the corresponding argument, using the value read in the field
templateArgArgField and use it to setup the type.
10. If the type to use has a param1 parameter decalred, then evaluate the
corresponding argument, using the value read in the field argField and
use it to setup the type.
11. Read the complicatedType field.
5.2.2 Fields
We have already seen some field definitions in the previous section dealing with
types. Fields and types are intrinsically linked. Types provide the data type
definition, whereas fields provide an instantiation of the type, which can then used
to parse a data file. For those familiar with object-oriented programming, types are
analguous to a class and fields are analguous to an object instantiation of that
class.
Consider the case of a library. In a library, we may have many types defined, but
when we use the library, we may only use a small number of those types, by
declaring fields of that type.
A field has the properties outlined in DRIX Tag 18.
DRIX Tag 18 field
<field>
Description
Position
Attributes
Describes an instantiation of an existing type that is used to read
data.
Within a namespace, using, type, repeatRange, or, while,
or redefines tag. The order of the fields defined is
important, and determines their parsing order. (0..*
occurrences)
Elements
LAVASTORM ANALYTICS
lavastorm.com
Optional name attribute (unnamed fields are called
Anonymous Fields – see section 5.2.2.11).
Optional type attribute.
Optional javaType attribute (see section 5.2.2.12)
Optional offset attribute (see section 5.2.2.2)
Optional Boolean readRequired attribute (see section
5.2.2.1)
Either a javaType, type attribute, or a typeFrom element must exist.
0..* arg elements – see Args in section 5.2.2.3 (order
unimportant)
0..1 typeFrom elements – see Dynamic Binding to Types in
section 5.2.2.13.
0..* typeArg elements – see Dynamic Type Generation in
section 5.2.1.9
0..* templateArg elements – see Combining Template
Parameters & Dynamic Typing in section 5.2.2.14
0..1 expr tag. Allowable only under javaType fields, and
cannot be used in conjunction with any other elements.
Page 55
Issue 1
LDR User Manual
0..1 errorFilters tags. See section 10.1.5.
The order of elements is unimportant in the field definition.
When using a javaType field, only the expr element is allowed. In all
other cases, no expr element is allowed.
One and only one type, javaType or typeFrom attribute/element is allowed to exist
under the field tag. We have already discussed some of the simpler field concepts.
The following subsections detail the use of the attributes and elements that can be
defined in a field tag.
5.2.2.1 The readRequired Attribute
Warning:
The readRequired tag should only be used for fields which are required to be
read in order to correctly parse other fields. Ideally, these should read very
small amounts of data, and not containing heavily nested subfields. If a
readRequired attribute is placed on a field containing many nested subfields,
large amounts of memory will be used. This is because a readRequired attribute
specifies that all of the nested subfields need to be stored in memory for access
by subsequent fields. For example, placing a readRequired attribute on a field
directly under the primaryField, for a large data file will most likely lead to
OutOfMemory errors.
In previous sections we have seen the readRequired attribute appear on fields,
and seen its use in various examples. The readRequired attribute is a boolean
attribute on fields, and must be set to true in order for a field to be later
referenced in a fromField or value tag.
In general, the reading process of the LDR first scans the file to determine
where fields exist in the data file. Then, the output specification is used to
determine which of these fields need to be read in order to be output. Therefore
the readRequired attribute is used to indicate to the LDR that a field needs to
always be read, even if a field is being scanned, or skipped. ReadRequired
should be only used where absolutely necessary, as using it will incur a
performance hit.
5.2.2.2 Offsets
Offset attributes are optional attributes under the field tag used to specify that
this field is located at a specific offset relative to the file pointer of the parent
type when it is encountered. The offset attribute is a String, and has the format:
offset=”byteLength[:bitLength]”, where the offset is restricted in the xsd by the
pattern:
Example 32 –Offset pattern
<xsd:pattern value="[0-9]+(:[0-9]*)?" />
LAVASTORM ANALYTICS
lavastorm.com
Page 56
Issue 1
LDR User Manual
Consider the case described in Scenario 2.
Scenario 2 – File format requiring the use of offsets
We have a file with a Header, a set of CDRs, and a Trailer.
Each CDR contains a set of subcalls.
There are an unknown number of subcalls for each record taking up a
maximum of 2048 bytes.
If the subcalls take up less than 2048 bytes, then the remaining
bytes in the 2048 block are zero padded.
Following this is 4 bits of data we don’t care about (junk), and some
record trailer information begins after the 4th bit.
We can describe the file format presented in Scenario 2 in the following
specification shown in Example 33.
Example 33 –offset tag example
<type name="File">
<field name="header" type="Header"/>
<repeatRange min="0" max="unbounded">
<field name="record" type="Record"/>
</repeatRange>
<field name="trailer" type="Trailer"/>
</type>
<type name="Record">
<repeatRange min="0" max="unbounded">
<field name="subCall" type="CallComponent"/>
</repeatRange>
<field name="extraInfo" type="CallEndInformation" offset="2048:4"/>
</type>
The important line from this example is shown below. This specifies that the
CallEndInformation starts at the 5th bit of the 2049th byte after the start of the
Record.
Example 34 –offset tag line
<field name="extraInfo" type="CallEndInformation" offset="2048:4"/>
Clearly, if the field is a top level field in a specification, not lying under a type
tag, then the offset is absolute. For instance, in the above example, if we had:
Example 35 –absolute offset example
<field name="file" type="File" offset="1024"/>
This would effectively mean that we are skipping the first 1024 bytes before
starting to read the file.
5.2.2.3 Args
As we have already seen in some of our examples, arguments are used within
fields to provide values to the parameters defined in the associated type.
Arg definitions have the following properties:
LAVASTORM ANALYTICS
lavastorm.com
Page 57
Issue 1
LDR User Manual
DRIX Tag 19 arg
<arg>
Description
Position
Attributes
Elements
Used by fields to provide arguments to parameters declared in
types.
Can only occur directly under a field or super tag. The
order of arg tags is unimportant. (0..* occurrences)
Required name attribute
Optional value attribute
0..1 fromField element
0..1 fromParam element
0..1 expr element
There can be only be one of these elements under an arg tag. If a
value attribute exists, none of these elements can exist.
Only one of the fromField, fromParam, expr elements and value attribute can
exist.
Args are optional elements within a field tag. There can be any number of these
arg definitions. The following rules apply to the declaration of arguments:
For each arg definition under a field tag, there must be an associated
param in the declared type with the same name as the arg name – or,
there must be an associated param that is contained in a type that is
directly or indirectly inherited by the declared type.
For each arg definition under a super tag, the type containing the super
tag must inherit a parameter from another type who has the same name
as the arg name.
The type of any supplied arg must be the same as the declared type of the
associated param.
When declaring a field, args must be provided for all params of the
associated type which have not had defaults provided (either by the
default attribute, or through the use of the super-arg tag combination)
Any extra args provided by the field that are not declared by the
associated type are ignored if the field is dynamically bound. If the field
is not dynamically bound, then an exception is thrown for cases where a
supplied arg does not correspond to a param declared on the type.
There are four possible methods for specifying the argument. These are the
value attribute, the fromParam, fromField and expr elements. These elements
and the value attribute are declared in the following sections.
5.2.2.4 The static value attribute
Provides a static value for an arg, default or typeArg element. The value must be
a simple, constant value that needs only to be executed once. The value
contained must be a simple numeric value, or a String. When used under an arg
tag, value attributes are only able to be used if the field under which the arg tag
is defined is not dynamically bound, bound to a generated type, or supplies
dynamically bound template arguments.
LAVASTORM ANALYTICS
lavastorm.com
Page 58
Issue 1
LDR User Manual
When using simple constants, the value attribute is the preferred method of
specifying arguments, as using a value attribute can lead to performance
improvements compared with an expr tag. Furthermore, it is simpler to specify
than an expr tag. Example 36 shows the use of the value attribute.
Example 36 –arg-value tag example
<type name="Type0">
<field name="field1" type="Type1">
<arg name="param1" value="1"/>
<arg name="param2" value="ABC"/>
</field>
</type>
<type name="Type1">
<param name="param1" javaType="String"/>
<param name="param1" javaType="int">
<!-- ... remaining type definition ... -->
</type>
5.2.2.5 The static value element
This is basically the same as the value attribute, exept can be used in cases
where multi-line Strings are required. As such, it provides a static value for an
arg, default or typeArg element. The value must be a simple, constant value that
needs only to be executed once. The value contained must be a simple numeric
value, or a String. When used under an arg tag, value attributes are only able to
be used if the field under which the arg tag is defined is not dynamically bound,
bound to a generated type, or supplies dynamically bound template arguments.
5.2.2.6 The fromField element
Sets the value of the containing tag based on a field value. Similar to the value
attribute, this can exist under a number of different containing tags.
DRIX Tag 20 fromField
<fromField>
Description
Position
Attributes
Elements
LAVASTORM ANALYTICS
lavastorm.com
Provides a value to the containing tag using the value of another
field.
Can occur:
Under a templateArg tag (0..1 occurences)
Under an emittable tag (0..1 occurences)
Under a min tag (0..1 occurences)
Under a max tag (0..1 occurences)
Under a condition tag (0..1 occurences)
Under a typeFrom tag (0..1 occurences)
Under a typeArg tag (0..1 occurences)
Under a test tag (0..1 occurrences)
Under a base tag (0..1 occurrences)
Under a to tag (0..1 occurrences)
Required field attribute, specifying the field where the value
is taken from.
None
Page 59
Issue 1
LDR User Manual
We have already seen an example using the arg-fromField tag combination in a previous
example using a CountedString. We re-introduce this in Example 37
Example 37 below.
Example 37 – arg-fromField tag example
<type name="CountedString">
<field name="stringLength" type=".integer.UInt8" readRequired="true"/>
<field name="stringField" type=".string.Ascii">
<arg name="length">
<fromField field="stringLength"/>
</arg>
</field>
</type>
When using the fromField tag, the field being supplied as an argument
(stringLengthField above) must lie within the direct parent type (CountedString
above) under which this field (stringField above) is being declared.
Furthermore, the field referenced in the fromField tag must be declared with a
readRequired attribute set to true.
Therefore if the field is a top-level field declaration (like primaryField),
fromField cannot be used.
If we are referencing a subfield of stringLengthField in the fromField tag, then
we can also do this in the fromField. However, if we are attempting to do
anything else with the field value, the expr tag must be used.
When using the fromField tag under an emittable tag, the emittable value of the
referenced field is used.
5.2.2.7 The fromParam element
Sets the value of the containing tag based on a parameter value. The parameter
which it references must lie directly under the containing type where the field
declaration lies. Therefore if this is a top-level field declaration, (not lying under
a type), fromParam cannot be used. Similar to the fromField tag, this can exist
under a number of different containing tags.
DRIX Tag 21 fromParam
<fromParam>
Description
Position
Attributes
LAVASTORM ANALYTICS
lavastorm.com
Provides a value to the containing tag using the value of another
field.
Can occur:
Under a templateArg tag (0..1 occurences)
Under an emittable tag (0..1 occurences)
Under a min tag (0..1 occurences)
Under a max tag (0..1 occurences)
Under a condition tag (0..1 occurences)
Under a typeFrom tag (0..1 occurences)
Under a typeArg tag (0..1 occurences)
Under a test tag (0..1 occurrences)
Under a base tag (0..1 occurrences)
Under a to tag (0..1 occurrences)
Required param attribute, specifying the parameter where
Page 60
Issue 1
LDR User Manual
the value is taken from.
Elements
None
Example 38 shows a case using the arg-fromParam tag combination. Here,
field1 is initialized by supplying the value of param1 as an argument to param2.
If param1 was not a primitive type or javaType (int in this case), and we wanted
to specify an argument based on a sub-field of param1, an expr tag would need
to be used instead.
Example 38 – arg-fromParam tag example
<type name="Type0">
<param name="param1" javaType="int"/>
<field name="field1" type="Type1">
<arg name="param2">
<fromParam param="param1"/>
</arg>
</field>
</type>
<type name="Type1">
<param name="param2" javaType="int"/>
…
</type>
5.2.2.8 The expr element
We have now seen how to specify arguments to various tags using the data from
fields that have been read, input parameters, and simple static values.
Unfortunately this is not sufficient for all of our needs. We also need to cater for
situations where we need to evaluate an expression in order to obtain the
argument value. In these situations, we use the expr element.
The expr tag sets the value of the containing tag based on a java expression.
Similar to the fromField and fromParam tags, this can exist under a number of
different containing tags.
DRIX Tag 22 expr
<expr>
Description
Position
Attributes
Elements
LAVASTORM ANALYTICS
lavastorm.com
Provides a value to the containing tag using a java expression.
Can occur:
Under a templateArg tag (0..1 occurences)
Under an emittable tag (0..1 occurences)
Under a min tag (0..1 occurences)
Under a max tag (0..1 occurences)
Under a condition tag (0..1 occurences)
Under a typeFrom tag (0..1 occurences)
Under a typeArg tag (0..1 occurences)
Under a test tag (0..1 occurrences)
Under a base tag (0..1 occurrences)
Under a to tag (0..1 occurrences)
Under a field tag (0..1 occurrences – only when using
javaType fields).
None
None
Page 61
Issue 1
LDR User Manual
If any modifications to a field or parameter are required prior to setting it as an
argument to another parameter, then the expr tag should be used. If sub elements
of a parameter are to be used, then an expr tag must be used. When using an
expr tag to construct a java expression, (similar to when using the java methods)
you will likely need to know how the fields & parameters are going to be
constructed internally. For this reason, for any non-trivial use of the expr tag, the
user should consult Advanced Concepts in section 7.
In Example 39 we see a more complicated use of the expr tag within an arg.
This example uses the or tag (see section 5.2.3.6), the true tag (see section
5.2.3.7) and the testMethod tag (see section 7.3.1) which we have not yet
introduced. For the moment, however, a full understanding of these tags is not
required.
In the example, a SubscriberData type is declared. The SubscriberData has an
expiryDate field, and an optional currentSubCallData field – we will see in
section 7.3.1 that fields can be made optional through combining the “or” and
“true” tags.
The SubscriberData type reads an expiryDate field. The expiryDate field is then
checked to see if this is before the current date. If the expiry date is before the
current date, the spec defines that the optional currentSubCallData field will not
be present.
Example 39–extended arg-value tag example
<type name="SubscriberData">
<field name="expiryDate" type="Date" readRequired="true">
<or>
<field name="currentSubCallData" type="CallData">
<arg name="subExpired">
<expr>
field.expiryDate().before(new
java.util.Date())
</expr>
</arg>
</field>
<true/>
</or>
</type>
<type name="CallData">
<param name="subExpired" javaType="boolean"/>
<testMethod>
if (!param().subExpired()) return Result.GOOD;
return Result.NOT_ME;
</testMethod>
<repeatRange min="0" max="unbounded">
<field name="cdr" type="CDR"/>
</repeatRange>
</type>
When specifying a java expression in the expr tag, this must be a one-line
expression without a semi-colon. If more lines are required, then these can
always be put into a private method in the code tag (see section 7.3.6), and
called from expr.
LAVASTORM ANALYTICS
lavastorm.com
Page 62
Issue 1
LDR User Manual
5.2.2.9 Template Arguments
In section 5.2.1.4 we introduced template parameter tag. In that section we
described how template arguments could be provided to the template parameters
using the shorthand “{}” notation.
However, use of this notation is limited to cases where “normal” and not
constant template arguments are being provided. Whenever the template
argument itself requires any arguments, the shorthand notation cannot be used.
Furthermore, this notation is only able to be used in cases where the template
arguments are statically bound to a type. We will see in section 5.2.2.13 that
dynamic binding to types is possible. However, it is not possible to provide
dynamically bound template arguments using the shorthand “{}” notation.
For this reason, we introduce the templateArg tag, which has the properties
described in DRIX Tag 23. When using the templateArg tag, the template
argument can be dynamically bound to a type. Even when using the templateArg
tag however, the template argument cannot specify a javaType or generated
type.
DRIX Tag 23 templateArg
<templateArg>
Description
Position
Attributes
Elements
Provides the template argument to a templated type. Used in place
of the shorthand {} notation for cases where dynamic typing and
templated types are used in conjunction.
Can occur:
Within a field tag (0..* occurences)
Within a super tag (0..* occurrences)
Required name attribute
Optional type attribute
Optional value attribute
Type attributes are only allowed when binding to “normal” template
parameters, or in one special case for constant template
parameters. The special case is under a super tag where the type
attribute refers to a constant template parameter declared on the
type containing the super tag. In which case, the value of the input
constant template parameter will be passed up to the parentType.
Value attributes are only allowed when binding to constant template
parameters.
0..1 typeFrom elements if declared under a field tag– see
Dynamic Binding to Types in section 5.2.2.13.
0..* arg elements
0..1 expr elements
typeFrom elements and arg elements are only allowed when these
are supplied to a “normal” template parameter.A typeFrom tag can
be used in place of, not in conjunction with a type attribute.
Expr elements can only be declared on templateArgs binding to
constant template parameters. These can be used in place of value
attributes (not in conjunction with value attributes).
There can be one and only one typeFrom element or type attribute under a
templateArg element. When the templateArg is declared under a super tag there
LAVASTORM ANALYTICS
lavastorm.com
Page 63
Issue 1
LDR User Manual
can be no typeFrom tag. In cases under a super tag where the templateArg is
provided to a “normal” template parameter then a type attribute is required.
We will illustrate the use of the template arg tag further in section 5.2.2.14.
5.2.2.10 Type Arguments to Generated Types
We have already seen in section 5.2.1.9 how we can specify typeParams in a
generated type. Similar to the param-arg combination, typeArgs are used when
constructing a field of a generated type to specify the value of the typeParam.
DRIX Tag 24 typeArg
<typeArg>
Description
Position
Attributes
Elements
Used to provide arguments to parameters specified in generated
types.
Can only occur directly under a field tag. The order of
typeArg tags is unimportant. (0..* occurrences)
Required name attribute
0..1 fromField element
0..1 fromParam element
0..1 value element
There can be one and only be one of these elements under an arg
tag.
The same rules that apply to the arg-param combination also apply to the
typeParam-typeArg combination. The specification of the fromField, fromParam
and value elements are the same as define in the previous args section.
5.2.2.11 Anonymous Fields
Anonymous fields are simply fields where no name attribute is specified.
Anonymous fields are not directly referenceable in the output specification, or in
other sections of the input specification. However, any non-anonymous
subfields of an anonymous field become directly referenceable in the type which
contains the anonymous field declaration. For this reason:
Primitive fields cannot be anonymous.
Anonymous fields cannot be dynamically bound
Anonymous fields cannot be bound to generated types
Anonymous fields cannot be bound to javaTypes.
Dynamically bound template arguments cannot be provided under an
anonymous field tag
An anonymous field F declared under type X cannot reference a type X
(infinite anonymous recursion).
Furthermore, care should be taken when using anonymous fields. When using
multiple anonymous fields, the user must ensure that there are no two fields with
the same name under the anonymous fields. Consider Example 40, here we
declare two anonymous fields, one of type Rec1, another of type Rec2.
LAVASTORM ANALYTICS
lavastorm.com
Page 64
Issue 1
LDR User Manual
Example 40 – example using anonymous fields
<field name=”file” type=”ContainsAnonymousFields”/>
<type name=”ContainsAnonymousFields”>
<field name=”id” type=”.integer.Int16”/>
<field type=”Rec1”/>
<field type=”Rec2”/>
<field name=”endId” type=”.integer.UInt32”/>
</type>
<type name=”Rec1”>
<field name=”field1” type=”f1”/>
<field name=”field2” type=”f2”/>
</type>
<type name=”Rec2”>
<field name=”field3” type=”f1”/>
<field name=”field2” type=”f2”/>
</type>
In this example, we see that both Rec1 and Rec2 have a field named field2. This
will result in an error condition, and the LDR will throw an exception. This is
because there would be ambiguities if we attempted to reference file.field2.
Example 41 shows a case where we are using an anonymous field correctly. In
this example, the ConstructedType is anonymous field.
Example 41 – example using anonymous field
<field name=”file” type=”ContainsAnonymousFields”/>
<type name=”ContainsAnonymousFields”>
<field name=”id” type=”.integer.Int16”/>
<field type=”ConstructedType”/>
<field name=”endId” type=”.integer.UInt32”/>
</type>
<type name=”ConstructedType”>
<field name=”field1” type=”f1”/>
<field name=”field2” type=”f2”/>
</type>
Example 42 shows a functionally equivalent version of the specification in
Example 41 without the use of an anonymous field.
Example 42 – example without anonymous fields
<field name=”file” type=”ContainsAnonymousFields”/>
<type name=”ContainsAnonymousFields”>
<field name=”id” type=”.integer.Int16”/>
<field name=”field1” type=”f1”/>
<field name=”field2” type=”f2”/>
<field name=”endId” type=”.integer.UInt32”/>
</type>
While
Example 41 shows how to declare anonymous fields, it does not clearly show
the utitlity of them. If we have a more complex file structure, whereby we need
nested loops, and are required to setup multiple levels of type nesting in order to
achieve this structure, anonymous fields become very useful.
LAVASTORM ANALYTICS
lavastorm.com
Page 65
Issue 1
LDR User Manual
When using anonymous fields, complex constructions can take place in order to
correctly parse the structure and the output specification does not need to know
the details about how this implementation occurs. In our example, the output
specification does not need to know that there is a ConstructedType in the
ContainsAnonymousFields type and can simply reference fields as:
Example 43 – Example referencing subfields of anonymous fields.
file.id
file.field2
file.endId
See the Output Specification section for more information on how output field
definitions can be used to define the LDR outputs.
5.2.2.12 javaType Fields
There is another form of types outside of primitive, standard types and generated
types that we have not yet discussed in detail. In general, all of the types
discussed so far are specifically related to the data we are reading. In certain
cases, we want to combine the data from the file with some other values. For
example, we may want to attach a Batch ID to all of the LDR engine outputs. In
order to do this, we can create the new javaType tag.
For our example, with a BatchID, we could specify this in the manner seen in
Example 44.
Example 44 – example using a javaType attribute
<field name="BatchID" javaType="int">
<expr>0</expr>
</field>
In this situation, the int field BatchID is declared, and has the constant value 0.
This can then be referenced in the output specification, to be included as an
output field.
The example shown above is not particularly useful. However, if we consider
the case where we are running the LDR through LAE (as discussed in section 8),
then we can use graph or runtime parameters to specify a field value. In these
cases, we could specify a BatchID as a graph level parameter, and then reference
it as shown in Example 45.
Example 45 – a javaType attribute referencing LAE parameters.
<field name="BatchID" javaType="int">
<expr>{{^BatchID^}}</expr>
</field>
The expr tag is only allowed to appear under a field which is declared to be of a
javaType. Fields of this type cannot take template, type or other arguments. The
expr tag is optional, and allows for the javaType to be constructed from the
provided expression.
LAVASTORM ANALYTICS
lavastorm.com
Page 66
Issue 1
LDR User Manual
For the cases of standard types, where the user does not provide any code for the
read, scan or skip methods (as defined in Advanced Code Elements for
Construction of New Types in section 7.3) then the expr tag must exist. If the
expr tag is not set, then the field will never be assigned a value. In other cases,
however, the user may choose not to provide a expr tag, and may assign the
value of the field within their custom read, scan and skip methods.
5.2.2.13 Dynamic Binding to Types
We have already discussed one of the methods by which the LDR provides
support for dynamically defined types in the section on Dynamic Type
Generation. Dynamic type generation allows the user to create new types based
on the data, and use them subsequently to process the file.
In most situations, however, we are not required to generate entirely new types.
The more common situation occurs when we have to use information from
processed data to determine the type of the next field to read. Here, the problem
is dynamically binding a field to a type, not dynamically creating a type.
The typeFrom element allows for this dynamic nature. The typeFrom element
has the properties outlined in DRIX Tag 25.
DRIX Tag 25 typeFrom
<typeFrom>
Description
Position
Attributes
Elements
Allows for fields to be dynamically bound to the type specified in the
contained fromParam, fromField or expr tag.
Can occur:
Within a field tag (0..1 occurences)
Within a templateArg tag (0..1 occurences)
None
0..1 fromField element
0..1 fromParam element
0..1 expr element
There can be one and only be one of these elements under a
typeFrom tag.
Only one of the fromField, fromParam & expr elements can exist. The
fromField, fromParam & expr elements take the same form as that discussed in
the Args section (see section 5.2.2.3).
Consider the simple case shown in Example 46. Here, a DataType type is
declared. DataType contains two fields, typeName and fieldData. The typeName
field is read, and the value of the typeName field is then used to specify the type
of the fieldData field. In this case, the typeName field must contain the fully
namespace qualified type name of a type in the LDR input specification.
Example 46 – typeFrom example
<type name=”DataType”>
<field name=”typeName” type=”.string.Ascii” readRequired=”true”>
<arg name=”length” value=”10”/>
</field>
<field name=”fieldData”>
<typeFrom>
LAVASTORM ANALYTICS
lavastorm.com
Page 67
Issue 1
LDR User Manual
<fromField field=”typeName”/>
</typeFrom>
</field>
</type>
5.2.2.14Combining Template Parameters & Dynamic Typing
So far when discussing the dynamic typing of fields, we have used two methods,
namely:
Dynamic Type Generation (see section 5.2.1.9)
Dynamic Binding to Types (see section 5.2.2.13)
We have also discussed how Template Parameters can be used to provide
generic capabilities to types. We have not yet discussed how we can combine
the concepts of dynamic typing and template parameters.
Consider the case where we have a templated type List, which takes a type
parameter, declaring the type of fields within the List. If the type of the fields in
the List is dependant on some other data, using either generated types, or
dynamically bound types, we need to use the templateArg syntax, introduced in
section 5.2.2.9, instead of the shorthand “{}” notation.
Consider the case with a generated type shown in Example 47.
Example 47 – templateArg; combining generated and templated types.
<field name="file" type="Example">
<type name="Example">
<field name="typeArgField" type=".string.Ascii" readRequired="true">
<arg name=”length” value=”10”/>
</field>
<field name="ListOfThings" type="List">
<arg name="typeArgParam">
<fromField field="typeArgField"/>
</arg>
<templateArg name="listElement" type="gt1"/>
</field>
</type>
<generatedType name="gt1">
<typeParam name="metadata" type="String"/>
<generator>
//Code used to create the new type “gt1”.
</generator>
</generatedType>
<type name="List">
<param name="typeArgParam" type=".string.Ascii"/>
<templateParam name="listElement"/>
<repeatRange min="0" max="unbounded">
<field name="listField" type="listElement"/>
<typeArg name="metadata">
<fromParam param="typeArgParam"/>
</typeArg>
</field>
</repeatRange>
</type>
LAVASTORM ANALYTICS
lavastorm.com
Page 68
Issue 1
LDR User Manual
Here, we read the first field, which contains a String of data called typeArgField.
After reading this first field, we pass it as an argument List type. We also set the
List template argument to the gt1 type. This constructs a new gt1 type. The gt1
type is provided with the typeArgument from the typeArgParam, which is in
turn taken from the typeArgField which was originally read. Therefore, the
listOfThings field, this has created a new templated type List{gt1}. Where gt1 is
a type constructed based on the value in the typeArgField field.
This then provides the required mechanism for handling templated types using
generated types. However, we still need a way to handle templated types of
dynamically typed data. Example 48 deals with this case.
Example 48 – templateArg; combining dynamically bound and templated types.
<field name="file" type="Example">
<type name="Example">
<field name="listType" type=".string.Ascii" readRequired="true">
<arg name=”length” value=”10”/>
</field>
<field name="ListOfThings" type=".structure.List">
<templateArg name="t1">
<typeFrom>
<fromField field="listType"/>
</typeFrom>
</templateArg>
</field>
</type>
In this case, we read the first field, listTypeToUse. Once this is read, we take the
value from this field, and declare this as the List type (t1) to be used in the
ListOfThings List. This is effectively creating a type List{T}, where T is the
value of the listType field. The List in this example is defined in the structure
library provided with the LDR.
5.2.2.15 Combining Dynamic Typing & Generated Types
There is another complication to consider in cases where we have a dynamically
typed field and this field is dynamically bound to a generated type. This starts to
get a bit confusing and hopefully won’t happen very often, however the
following input specification code shows a situation in which this is occurring:
Example 49 –combining dynamically bound types and generated types.
<field name="file" type="Example">
<type name="Example">
<field name="typeArgField" type=".string.Ascii" readRequired="true">
<arg name="length" value=”10”/>
</field>
<field name="typeToUse" type=".string.Ascii" readRequired="true">
<arg name="length" value="10"/>
</field>
<field name="theField">
<typeFrom>
<fromField field="typeToUse"/>
</typeFrom>
LAVASTORM ANALYTICS
lavastorm.com
Page 69
Issue 1
LDR User Manual
<typeArg name="argToTypeGenerator">
<fromField field="typeArgField"/>
</typeArg>
</field>
</type>
In this example, we read a String into the typeArgField, which may or may not
be an argument to a generated-type. Then we read the typeToUse field. This
field specifies the type of the subsequent field. If typeToUse actually contains
the name of a generated type, then we use the typeArgField field to supply a
typeArgument to the generated type. This is quite a complex situation.
It is important to note, that in this example, if the typeToUse field does not
specify a generated type, then the typeArgField is read, but the typeArg tag is
ignored. This is not an error condition.
5.2.3 Flow Control & Data Structure Elements
We have already indirectly introduced some of the concepts of flow control &
data structure elements in the previous sections. Clearly, it is impossible to create
a useful specification without some concepts of looping & choice. Furthermore,
there are going to be cases where we do not want to read all of the data in the file,
as certain parts of the file may contain junk information. Therefore in this section
we also introduce the concept of skipping through sections of the file.
5.2.3.1 RepeatRange
RepeatRange is one of the most useful tags when composing a specification. It
allows for looping over a set of fields for a specified range, or infinitely until no
more data can be successfully parsed.
RepeatRange has the properties listed in the following table.
DRIX Tag 26 repeatRange
<repeatRange>
Description
Position
Attributes
LAVASTORM ANALYTICS
lavastorm.com
Informs the LDR engine that the fields contained within the tag are
repeated in the data the specified number of times.
Can occur:
Within a type tag (0..* occurences)
Within a using tag contained within a type tag (0..*
occurrences)
As the repeatRange tag is related specifically to the structure of the
data, order is important in the location of its declaration.
Optional String onMultiple attribute (see section 5.2.3.3)
Optional String min attribute
Optional String max attribute
Optional until attribute (see section 5.2.3.1.1).
The onMultiple attribute is “append” by default.
The min and max attributes can only be inserted here if they are
constant.
Min must simply specify an integer >=0.
Max must specify an integer >=0, or be “unbounded”
Page 70
Issue 1
LDR User Manual
Elements
If min or max are provided as attributes, then the corresponding
min/max element cannot be defined.
If present, the until attribute value must “nextField”.
0..1 min element
0..1 max element
0..* field element
0..* or elements
0..* publish elements
0..* skip elements
0..* align elements
0..* test elements
0..1 testMethod elements
0..1 constraints element (see section 5.2.3.4)
The min element can only appear if there is no min attribute
declared.
The max element can only appear if there is no max attribute
declared.
The min element must be declared first within the repeatRange tag,
followed by the max element, followed by all other tags.
The position of the other (not min/max) tags – excluding the
constraints tag - within the RepeatRange tag indicate the order in
which they are to be parsed by the engine.
Only one testMethod element can exist under a type.
testMethod and test tags cannot coexist within a type.
The constraints element – if it exists - must follow all other elements.
At least one subelement (other than the min/max tags) is required under the
RepeatRange tag. Within a RepeatRange tag, the iteration number (0-indexed)
of the loop can be accessed within any value tag in the locally scoped variable
loopIndex.
The min & max elements have the properties outlined in DRIX Tag 27 and
DRIX Tag 28 resepectively.
DRIX Tag 27 min
<min>
Description
Position
Attributes
Elements
Specifies the minimum number of repetitions in a repeatRange tag
before the data is said to parse correctly.
Must occur:
Within a repeatRange tag (1 occurence)
None
0..1 fromField element (see The fromField element in
section 5.2.2.6)
0..1 fromParam element (see The fromParam element in
section 5.2.2.7)
0..1 expr element (see The expr element in section 5.2.2.8)
There can be one and only be one of these elements under a min
tag.
DRIX Tag 28 max
<max>
Description
Position
LAVASTORM ANALYTICS
lavastorm.com
Specifies the maximum number of repetitions a repeatRange tag
should attempt to make.
Must occur:
Within a repeatRange tag (1 occurence)
Page 71
Issue 1
LDR User Manual
Attributes
Elements
None
0..1 fromField element (see The fromField element in
section 5.2.2.6)
0..1 fromParam element (see The fromParam element in
section 5.2.2.7)
0..1 expr element (see The expr element in section 5.2.2.8)
There can be one and only be one of these elements under a max
tag.
Only one of the fromField, fromParam & expr elements can exist.
There is one special case in the max tag in RepeatRange. When using
RepeatRange->max->expr, the expression is not always simply evaluated as a
java expression. If the keyword “unbounded” is specified in the max->expr, then
this is taken to mean that there is no maximum. In all other cases, a value of -1
will also indicate that the maximum is unbounded. The max->expr tag is the
only min/max tag that is allowed to evaluate to something other than an integer,
and the only non-integer value it is allowed to take is “unbounded”.
The operation of the RepeatRange is dependant on the min & max elements and
constraints tags is slightly different depending on whether or not the maximum
value is bounded.
Case 1 – Unbounded Maximum, No constraints tags with max occurrence
specifiers
Continue repeating until the elements within the loop can no longer be parsed
correctly. If upon exit of the loop, the min clause is satisfied, then the fields
have been read correctly. If the minimum condition or any minimimum field
constraints are not satisfied, then this entire repeats block cannot be satisfied &
none of the previously read fields within the loop are accepted as being scanned
correctly. If a publish tag is nested within the loop then fields will be able to be
output as they are read regardless of the successful parsing of the entire
repeatRange tag.
Case 2 – Bounded Maximum or constraints tags with max occurrence specifiers:
If there is a maximum, then continue reading until you reach the loop maximum,
any field occurs more than it’s specified occurrence maximum (via constraints
clauses), or until the fields within the loop can no longer be scanned correctly
(whichever occurs first). If the exit condition is met due to the maximum repeat
count being reached, and all of the field minimum constraints clauses are met,
then the loop is scanned correctly. If the exit condition is met due to not being
able to read all of the fields, or via a constraints maximum being met and the
min clause and all field minimum constraints clauses are satisfied, then the loop
is scanned correctly. Otherwise, if the minimum condition or any field
constraints minimum clauses are not satisfied, then the entire repeats clause
cannot be satisfied & none of the previously read fields within the loop are
accepted as being scanned correctly. If a publish tag is nested within the loop
then fields will be able to be output as they are read regardless of the successful
parsing of the entire repeatRange tag.
LAVASTORM ANALYTICS
lavastorm.com
Page 72
Issue 1
LDR User Manual
A RepeatRange usage example is displayed in Example 50, where we continue
reading records until there are no more records to read.
Example 50 –example using the repeatRange tag
<type name="File">
<field name="maxRecordSize" type=".integer.Int8" readRequired="true"/>
<repeatRange min="1">
<max>
<fromField field="maxRecordSize"/>
</max>
<field name="record" type="Record">
</repeatRange>
</type>
Here, the maxRecordSize is read first, and then we read a minimum of 1 and a
maximum of maxRecordSize records until we can no longer read any records. In
general, when providing static min and max conditions, as in the case with min
above, the attribute form should be used.
5.2.3.1.1
Non-Greedy/Lazy RepeatRange
In general, the repeatRange tag is a greedy tag. This means that it continues to
iterate over the looping fields until the max is hit, or no more fields within the
loop can be parsed. In certain situations, it may be necessary to jump out of the
loop when a certain field is hit. Consider the DRIX shown in Example 51.
Example 51 –RepeatRange without until
<type name="File">
<repeatRange min="2" max="5" >
<field name="strField" type=".string.Ascii">
<arg name="length" value="6"/>
</field>
</repeatRange>
<field name="term" type="Termination">
</type>
<type name="Term">
<field name="f" type=".string.Ascii" readRequired="true">
<arg name="length" value="2"/>
</field>
<test expected="TRL">
<fromField field="f">
</test>
</type>
Here, the LDR continue reading the “strField” field until either we have 5
repetitions, or we cannot parse any more of these fields. Only then will the LDR
attempt to read the “term” field.
Example 52 –Example file layout, showing need for the until attribute
Field1
Field2
Field3
TRL
LAVASTORM ANALYTICS
lavastorm.com
Page 73
Issue 1
LDR User Manual
Field1
Field2
Field3
Field4
TRL
If we consider the file layout shown in Example 52, this will not parse
successfully with the previous DRIX. Here, we will end up with the first 3
“strField” elements being parsed successfully. However, on the fourth iteration,
(assuming the newlines are Windows CR/LF characters), the “strField” will be
populated with the value “TRL(CR)(LF)F”. Then the following “strField” will
be populated with “ield1(CR)”.
It should be obvious from the file layout that this is not the desired parsing logic.
We want to exit the loop as soon as the “TRL” field is encountered. This equates
to a lazy or non-greedy match, requiring the use of the “until” attribute on the
repeatRange. In order to achieve this, the DRIX shown below can be used:
Example 53 –RepeatRange with until
<type name="File">
<repeatRange min="2" max="5" until="nextField" >
<field name="strField" type=".string.Ascii">
<arg name="length" value="6"/>
</field>
</repeatRange>
<field name="term" type="Termination">
</type>
<type name="Term">
<field name="f" type=".string.Ascii" readRequired="true">
<arg name="length" value="2"/>
</field>
<test expected="TRL">
<fromField field="f">
</test>
</type>
When an “until” clause is present and specified as “nextField”, the element
occurring immediately after the loop is checked to determine prior to the loop
iteration to determine if the loop should be terminated. This check is performed
every iteration after the min clause on the repeatRange has been satisfied.
Therefore, in Example 53, this test is only performed after the first two loop
iterations have successfully been parsed & the min clause has been satisfied.
When the “until” clause is present, there must exist an element immediately
after the repeatRange within the type. This must be either a “skip”, “field” or
“or” tag.
It is important to note that in the case where we have a loop over a field “A”,
followed by a field “B”, the only time it makes sense to use an until clause is
when the data in a “B” field could successfully be parsed as an “A”, multiple
“A”s, or as part of an “A”.
If the max number of loop iterations is hit, and the next field cannot be parsed,
then this is clearly an error. Similarly, if the loop fails prior to reaching its min
LAVASTORM ANALYTICS
lavastorm.com
Page 74
Issue 1
LDR User Manual
condition, this is also an error. As the field immediately after the loop is
attempted to be parsed on each loop iteration, the until attribute can introduce
overhead, and should only be used where necessary.
5.2.3.2 While
Where the general repeatRange case with a bounded maximum can be
considered analogous to a “for” loop, the while element is effectively the same
as a …. “while” loop (surprise, surprise).
The while element has the properties listed in table 5.2.3.2.
DRIX Tag 29 while
<while>
Description
Position
Attributes
Elements
Informs the LDR engine that the fields contained within the tag are
repeated in the data until the termination condition is met.
Can occur:
Within a type tag (0..* occurences)
Within a using tag contained within a type tag (0..*
occurrences)
As the while tag is related specifically to the structure of the data,
order is important in the location of its declaration.
Optional onMultiple attribute (see section 5.2.3.3)
The onMultiple attribute is “append” by default.
1 condition
0..* field element
0..* or elements
0..* publish elements
0..* skip elements
0..* align elements
0..* test elements
0..1 testMethod elements
0..1 constraints tag (see section 5.2.3.4)
The condition element must be declared first within the while tag,
followed by the “or”, field, publish, skip, align,test or testMethod
elements.
The position of the loop elements – excluding the constraints tag after the condition element is important, and determines the parsing
order of the contained fields.
Only one testMethod element can exist under a type.
testMethod and test elements cannot coexist under the one type
The constraints tag - if supplied - must be the last element in the
while tag.
At least one element other than the condition element is required. Within a while
tag, the loop’s iteration number can be accessed from any expr tag or other Java
expression in the variable loopIndex. This 0-indexed variable is scoped to the
loop.
The condition element describes the terminating condition of the where clause
and has the properties described in DRIX Tag 30.
DRIX Tag 30 condition
<condition>
LAVASTORM ANALYTICS
lavastorm.com
Page 75
Issue 1
LDR User Manual
Description
Position
Specifies the termination criteria for a while loop.
Must occur:
Within a while tag (1 occurence)
Attributes
Elements
None
0..1 fromField element (see The fromField element in
section 5.2.2.6)
0..1 fromParam element (see The fromParam element in
section 5.2.2.7)
0..1 expr element (see The expr element in section 5.2.2.8)
There can be one and only be one of these elements under a
typeFrom tag.
Only one of the fromField, fromParam & value elements can exist. The
fromField, fromParam & value elements take the same form as that discussed in
the Args section.
If the while loop terminates, due to being unable to successfully read the fields
contained within, then the entire loop is said to have not scanned correctly, and
all previously read fields are discarded. If a publish tag is nested within the loop
then fields will be able to be output as they are read, even if the loop is not
parsed successfully. If the while loop terminates because the termination criteria
is met, and any field minimum constraints are met, then the loop has been
scanned correctly.
An example of the usage of the while tag is found in the following example:
Example 54 –example using the while tag
<type name="File">
<field name="recordSize" type=".integer.Int8" readRequired="true"/>
<while>
<condition>
<expr>
<![CDATA[
loopIndex<field().recordSize()
]]>
</expr>
</condition>
<field name="record" type="Record">
</while>
</type>
Here, the recordSize is read first, and then we read recordSize records. Note that
this is functionally equivalent as using the repeatRange with fixed min and max
elements as shown in Example 55.
Example 55 –repeatRange equivalent of the while example
<type name="File">
<field name="recordSize" type=".integer.Int8" readRequired="true"/>
<repeatRange>
<min>
<fromField field="maxRecordSize"/>
</min>
<max>
<fromField field="maxRecordSize"/>
</max>
<field name="record" type="Record">
LAVASTORM ANALYTICS
lavastorm.com
Page 76
Issue 1
LDR User Manual
</repeatRange>
</type>
5.2.3.3 Multiple Field Occurrences in a Loop
Generally, when we are defining a loop, each of the elements within that loop
will become referencable as an array, such that you can index the 3rd or 1000th
occurrence of an element within the loop. Most of the time, this is the desired
behaviour. However, in certain situations, this is not desired.
Consider the case where you have data which contains a set of fields. These
fields can occur in any order, and each field may or may not be present in the
set. However, each field can only exist a maximum of one time in the set.
Consider then that we have a set of three fields as shown in Example 56.
Example 56 –set of 3 elements example
<type name="SetOfElements">
<templateParam name="SetContentType">
<while>
<condition>
<expr>
<![CDATA[
loopIndex<3
]]>
</expr>
</condition>
<or>
<field name="f1" type="Type1"/>
<field name="f2" type="Type2"/>
<field name="f2" type="Type3"/>
</or>
</while>
</type>
For more information on the or tag, see section 5.2.3.6. In this example, the
meaning of the tag is fairly self-explanatory.
So from this example, we will now loop 3 times. During iteration we will make
a choice as to whether we have an f1, f2 or f3. The problem is that these will be
loaded into arrays – even though we know that there will only ever be a
maximum of 1 of each of these fields. This means also that the output would
need to reference them as field1[0], field2[0] & field3[0] respectively. This is
very unintuitive given that we know that they are not really an array, but have
been put into a loop for construction purposes only.
For this reason, onMultiple attribute has been created for repeatRange & while
clauses. This attributes defaults to “append” false when not present, which is the
situation described above, where an array is created.
The onMultiple attribute can take one of the following values:
LAVASTORM ANALYTICS
lavastorm.com
append (default)
override
Page 77
Issue 1
LDR User Manual
error
We have already shown what happens in the append situation. When set to
override, the onMultiple attribute specifies that we should not construct an array
of elements. Rather, each top level (or pseudo-top-level – lying under an “or”
etc) field which exists within the loop is a scalar field and not an array. If
multiple occurrences of one of the fields are located in the data while processing
the loop, then only the latest is kept, and the previous value is overwritten.
In certain situations we may want to validate that a maximum of one of each of
the fields is actually found while processing the loop. In this situation, we can
set the onMultiple attribute to error. When onMultiple is set to error, the loop
does not parse successfully if any of the fields within the loop occur more than
once.
Therefore, if we simply changed our definition of the SetOfElements type to that
shown in Example 57, then we have the desired set implementation.
Example 57 –Set example with onMultiple set to error
<type name="SetOfElements">
<templateParameter name="SetContentType">
<while onMultiple="error"/>
<condition>
<expr>
<![CDATA[
loopIndex<3
]]>
</expr>
</condition>
<or>
<field name="f1" type="Type1"/>
<field name="f2" type="Type2"/>
<field name="f2" type="Type3"/>
</or>
</while>
</type>
5.2.3.4 Loop Constraints
Within the repeatRange and while tags, it is possible to specify constraints.
While the min and max clause on a repeatRange element and condition clause
on a while element are used to restrict the number of loop occurrences, the
constraints tag can be used to limite the number of occurrences of individual
fields.
The constraints tag is simply a container for individual occurrence
constraintsand is shown in DRIX Tag 31.
DRIX Tag 31 constraints
<constraints>
Description
Position
LAVASTORM ANALYTICS
lavastorm.com
Specifies constraints that exist on fields within a loop.
Can occur:
Within a while tag (1 occurence)
Within a repeatRange tag (1 occurrence)
Page 78
Issue 1
LDR User Manual
Attributes
Elements
None
1..* occurrence elements
The constraints tag is a container for a number of occurrence constraints on the
loop elements. The occurrence tag has the properties shown in DRIX Tag 32.
DRIX Tag 32 ocurrence
<occurrence>
Description
Position
Attributes
Elements
Specifies constraints that exist on fields within a loop.
Can occur:
Within a constraints tag (1..* occurences)
None
0..1 min elements
0..1 max elements
1 fromField element
The occurrence tag must have either a min or max element or both. These min
and max elements are simpler than the min and max elements that exist on the
repeatRange tag. They must be statically provided via a value attribute. This
means that there can be no fromField, fromParam or expr applied to evaluate the
min and max field occurrence constraints.
The fromField element within the occurrence tag is used to reference a field that
is defined within the loop. Sub-fields of anonymous fields cannot be referenced
here - the field’s declaration must actually be within the loop. The occurrence
tag can be used to specify that a given field cannot occur more than a specified
maximum number of times, or that the field must occur a specified minimum
number of times within the loop.
The max occurrence constraints are applied each loop iteration where the
referenced field exists. If the existence of the referenced field means that the
max clause is violated, then that loop iteration is discarded and the loop is
terminated. The file and parse stack are rolled back to the position on entry into
that loop iteration and the exit conditions for the loop are evaluated.
The min occurrence constraints are applied upon loop termination. If any of the
min occurrence constraints are not satisfied, then the loop is not parsed
successfully.
Note that there is some validation performed on the occurrence constraints as
the data file processes the DRIX. This ensures that a field minimum occurrence
constraint cannot be greater than either the max or min constraint on a
repeatRange (if these values are statically provided), and ensures that no min or
max occurrence constraint is specified as a value greater than 1, if the
onMultiple condition on the loop is set to error.
5.2.3.5 Skip
LAVASTORM ANALYTICS
lavastorm.com
Page 79
Issue 1
LDR User Manual
The skip tag can be used to skip over fixed or variable number of bytes and bits,
or can be used to skip over a number of repetitions of a type. The skip tag has
the properties outlined in the table below.
DRIX Tag 33 skip
<skip>
Description
Position
Attributes
Elements
Specifies that a section of the data file is to be skipped over.
Can occur:
Within a type tag (0..* occurrences)
Within an or tag (0..* occurrences)
Within a repeatRange tag (0..* occurrences)
Within a while tag (0..* occurrences)
Within a using tag nested within a type tag (0..*
occurrences)
As the skip tag can be placed in between field definitions, and
informs that part of the data file can be skipped, the order in which
skip tags are used within a type or using tag is important.
Optional fixed attribute (if present, then the type attribute
cannot be present, and no elements can be present)
Optional type attribute
Optional count attribute (1 by default)
0..* arg elements – see Args in section 5.2.2.3 (order
unimportant)
0..1 typeFrom elements – see Dynamic Binding to Types in
section 5.2.2.13.
0..* typeArg elements – see Dynamic Type Generation in
section 5.2.1.9
0..* templateArg elements – see Combining Template
Parameters & Dynamic Typing in section 5.2.2.14
0..1 fixed element (if present, then no other elements can
be present, and only the count attribute is allowed)
0..1 variable element (if present, then no other elements
can be present and only the count attribute is allowed)
One and only one of the type or fixed attributes, or the fixed or
variable elements must be present
The following sections describe the two different skip modes.
Skipping Repetitions of Types
When skipping a number of occurences of a type in the data, the type attribute
under skip must be set. In this situation, the fixed element may not exist under
the skip tag, but all of the other tags may exist.
The other tags all have the same properties as when used under a field tag. In
this instance, rather than configuring the type that a field binds to, we are
configuring the type that is to be skipped. We can also specify a count, which
tells the engine how many occurrences of the type to skip. The count is set to 1
by default.
The type name provided to skip tag can be any primitive or standard type. An
example of where the skip-type tag-attribute combination may be very useful is
LAVASTORM ANALYTICS
lavastorm.com
Page 80
Issue 1
LDR User Manual
a case where data has been taken from screen dumps of a mainframe system. All
the records still exist, however there is junk header and trailer data interleaved,
which needs to be ignored. Consider the file format in Example 58.
Example 58 –file format example requiring skipping data
FXYL77
19/06/2008
XXYYZZA
11:20
** XXYYZZ SYSTEM **
DOCUMENT DETAILS
***** WARNING SILENT LINE *****
-+
Doc Nbr. AAABBBBB LIVE 23/12/07 Bill Nm: COMPANY NAME
SEARCH:
Section... ______
Girn... _______
Serv nbr...
0123456789________
SCROLL: _________ Dispute Hdr Ref... __________ Line.:
622078 of
1569858
ISDN 10/20/30
1557812 19Nov 02:46P 0213567893
Mult
333:22
0.00
1557813 19Nov 02:47P 0114562853
Mult
03:22
10.00
1557814 19Nov 02:48P 0213576134
Mult
04:13
0.51
Here, most of the information contained in the header is simply junk, and in
most cases we will simply want to skip this. Therefore, we could define a type
which defines the header information (this could most likely be done as a
combination of regular expressions, using some of the methods described in
Advanced Code Elements for Construction of New Types in section 7.3). We
would then end up with something like the specification in Example 59.
Example 59 – skipType example
<type name="File">
<repeatRange min="0" max="unbounded">
<field type="RecordBlockWithHeaderJunk">
</repeatRange>
</type>
<type name="RecordBlockWithHeaderJunk">
<skip type="HeaderJunk"/>
<repeatRange min="0" name="unbounded">
<field name="record" type="Record">
</repeatRange>
</type>
<type name="HeaderJunk">
<field name="junk1" type="HeaderJunk1">
<or>
<field name="junk2a" type="HeaderJunk2A">
<field name="junk2b" type="HeaderJunk2B">
</or>
</type>
The HeaderJunk type is simply used here to show that you can construct a
complicated type consisting of a number of different composed elements, and
use this to simply define a pattern in the file. In this example, we are only
defining this pattern in order to skip it. In complicated cases such as this one,
where we need to do some regular expression matching in order to determine if
an element/field exists, we would most likely want to overwrite one of the
LAVASTORM ANALYTICS
lavastorm.com
Page 81
Issue 1
LDR User Manual
methods defined in Advanced Concepts in section 7 for the HeaderJunk1, 2A
&/or 2B types.
Skipping a Fixed Number of Bytes and Bits
When skipping a fixed number of bytes or bits, there are two mechanisms. The
user can either specify this using the fixed attribute, or the fixed element.
The fixed element and attribute under the skip tag are used to advance the file
pointer a fixed number of bits and bytes.
The fixed tag has the following properties:
DRIX Tag 34 fixed
<fixed>
Description
Position
Specifies that a number of bits and byes to skip in the data
Can occur:
Within a skip tag (0..1 occurrence)
Attributes
Elements
None
Required length attribute
The length attribute is a String, and has the format:
length=”byteLength[:bitLength]”, where the length is restricted in the XSD by
the pattern:
Example 60 –fixed-length and skip-fixed restriction pattern
<xsd:pattern value="[0-9]+(:[0-9]*)?" />
This same pattern is used to restrict the fixed attribute, when using the attribute
form.
Consider the four different example cases presented in Example 61.
Example 61 – fixed length skip examples using element form
<skip>
<fixed
</skip>
<skip>
<fixed
</skip>
<skip>
<fixed
</skip>
<skip>
<fixed
</skip>
length="5601:4"/>
length="5601"/>
length="0:12"/>
length="34:1351"/>
The first skip definition will skip 5601 bytes and 4 bits.
The second definition will simply skip 5601 bytes.
The third definition skips 12 bits.
In the fourth example provided, 34 bytes and 1351 bits will be skipped.
Clearly, this last example is a bit confusing, so it is highly recommended that if
a bitLength and a non-zero byteLength is provided, then bitLength should be <8.
LAVASTORM ANALYTICS
lavastorm.com
Page 82
Issue 1
LDR User Manual
These are all exactly the same as the definitions in Example 62 when using the
attribute form.
Example 62 – fixed length skip examples using attribute form
<skip
<skip
<skip
<skip
fixed="5601:4"/>
fixed="5601"/>
fixed="0:12"/>
fixed="34:1351"/>
Given the conciseness of the attribute form, this is recommended for use over
the element form.
Skipping a Variable Number of Bytes and Bits
There is also the case where the number of bytes & bits to skip depends on the
data. In such cases, the skip-variable tag combination can be used, in
conjunction with the fromField, fromParam or expr tags.
The variable tag has the following properties:
DRIX Tag 35 variable
<variable>
Description
Position
Specifies the number of bytes and bits to skip.
Can occur:
Within a skip tag (0..1 occurences)
Attributes
Optional unit attribute specifying whether we are using byte
form, or bookmark (bits) form (default form – bytes)
Elements
0..1 fromField element (see The fromField element in
section 5.2.2.6)
0..1 fromParam element (see The fromParam element in
section 5.2.2.7)
0..1 expr element (see The expr element in section 5.2.2.8)
There can be one and only be one of these elements under a
variable tag.
For example, say the we wanted to read in an unsigned 8 bit integer, then skip
the number of bytes specified in the 8 bit integer. In this case, we would
construct the type as shown below:
Example 63 – variable length skip example
<type name="SkipBytes">
<field name="f" type=".integer.UInt8" readRequired="true"/>
<skip>
<variable unit="bytes">
<fromField field="f">
</variable>
</skip>
</type>
LAVASTORM ANALYTICS
lavastorm.com
Page 83
Issue 1
LDR User Manual
5.2.3.6 Or
The or tag is used to indicate that one out of a number of fields must exist within
a type. The or tag has the properties defined in DRIX Tag 36.
DRIX Tag 36 or
<or>
Description
Position
Defines that there may be one (and only one) of the contained fields
in the data. A choice must be made as to which of these fields
actually exists.
Can occur:
Within a type tag (0..* occurrences)
Within a using tag nested within a type tag (0..*
occurrences)
Attributes
Elements
The order of the or tag within the type or using tag is important.
None
0..* field elements
0..1 true element
The order of the field and true elements is important.
The true tag (if present) must exist after the declaration of any fields.
The or tag will check each of the contained fields one by one to see if they are
present in the data. When one of the contained fields is present in the data, then
the or condition is satisfied and no further fields are checked. Therefore the
order of fields defined is clearly important under the or tag.
The true tag is used to indicate that the field tags in the or tag are optional. In
an or tag with multiple field tags, the or-true pair implies a choice of either none
or one and only one of the fields. The true tag, if used, must exist after any field
declarations.
We have already seen some examples of usage with an or clause. The most
common use case for an or tag is in a set or a sequence as seen in Example 64.
Example 64 – or example
<type name="MySet">
<or>
<field name="field1" type="Type1">
<field name="field2" type="Type2">
<field name="field3" type="Type3">
</or>
</type>
5.2.3.7 True
As described in the previous section, the true tag is an empty tag that always
returns true. The true tag has the properties defined in DRIX Tag 37.
DRIX Tag 37 true
<true>
Description
LAVASTORM ANALYTICS
lavastorm.com
Simply returns true. Is used in conjunction with the or tag to denote
Page 84
Issue 1
LDR User Manual
Position
Attributes
Elements
optional fields.
Can occur:
Within an or tag after the declaration of any fields(0..1
occurrences)
None
None
True is effectively used to define that elements are optional. In the most simple
case, an or tag containing a true and a field tag implies that the field is optional.
When there are multiple fields, then the or-true implies a choice of 0 or 1 of the
fields. True can also be a useful tag for testing purposes, as a simple case of a
nested or-true will always return true.
Example 65 – true example
<type name="MySet">
<or>
<field name="field1" type="Type1">
<field name="field2" type="Type2">
<field name="field3" type="Type3">
<true/>
</or>
</type>
Example 65 defines that a MySet contains either a field1, field2, field3 or
nothing.
5.2.3.8 Test
There are often cases where we want to perform a simple test on the data that
has been read in order to verify that we should continue reading a type. In these
situations, the test tag comes in handy.
A test tag allows for a simple equality test to be performed, comparing some
constant value to a parameter or field defined on a type, or an expression.
A maximum of 1 test tag can be inserted into the primitiveType declaration.
However within a type declaration, and number of test tags can be used. Within
a type, the test tag can exist at the top level or under a repeatRange or while tag.
If a testMethod tag is defined (see Test Method in section 7.3.1), a test tag
cannot be defined. The test will be invoked at the appropriate time, based on the
location of the test tag in the field structure.
More complicated tests can also be constructed through the use of multiple test
tags, and the Boolean elements available under the test tag.
The test tag has the properties outlined in DRIX Tag 38.
DRIX Tag 38 test
<test>
LAVASTORM ANALYTICS
lavastorm.com
Page 85
Issue 1
LDR User Manual
Description
Position
Test a fixed value for equality against a parameter, field or
expression to determine whether or not the type should
continue to be processed
0..*, declared under a type, repeatRange, while tag
0..1 declared under a primitiveType tag.
0..* declared under a test->and tag
0..* declared under a test->or tag
0..1 declared under a test->not tag.
Within a primitiveType, or type tag, there can only be a maximum of
one test or testMethod tag.
These cannot exist under an type->or tag.
Within a primitiveType, the test tag must appear directly after the
java method tags and any code tags. Within a primitive type, since
there are no fields, this position is fixed, but not important for
parsing considerations.
Within a type tag, the test can appear anywhere within the field
structure –i.e. anywhere a field tag can appear except within an Or
tag. Here the order is important.
Optional expected attribute
Attributes
Where the expected attribute must be present if the test tag does
not contain an or, and or not element is used. If the test tag
contains an or, and or not element, then the expected attribute
must not be present.
0..1 fromField element (see The fromField element in
section 5.2.2.6)
0..1 fromParam element (see The fromParam element in
section 5.2.2.7)
0..1 expr element (see The expr element in section 5.2.2.8)
0..1 and elements
0..1 or elements
0..1 not elements
0..1 method elements
Elements
There can be one and only be one of these elements under a test
tag. If an or, and not, or method element is used, then the expected
attribute must not be present. In all other cases, the expected
attribute must be present.
If the return value of the test is false, then we know the type does not exist. If
the return value is true however, then this indicates the type might exist, but
more scanning may be required.
Consider the simple case shown in
Example 66 – test example
<type name="TestStringEqualsFred">
<field name="f1" type=".string.Ascii" readRequired="true">
<arg name="length" value="4"/>
</field>
<test expected="Fred">
<fromField field="f1"/>
</test>
<field name="f2" type="Type1">
<field name="f3" type="Type2">
</type>
In this example, we first read an f1 field. If the value of this field is “Fred”, then
we continue to attempt to read an f2 field then an f3 field. Otherwise, the parsing
of the TestStringEqualsFred type will fail.
LAVASTORM ANALYTICS
lavastorm.com
Page 86
Issue 1
LDR User Manual
The test tag is designed to only allow for simple tests. More complicated tests
require an understanding of the advanced concepts of the LDR, however a
testMethod tag is provided for such cases where the simple test tag is not
sufficient.This is defined in Advanced Concepts in section 7. However, it is
recommended that wherever possible the simpler test tag be used. The simple
form of the test tag introduced here is generally going to be more efficient and
can lead to significant performance improvements in certain cases.
5.2.3.9 Composite Tests with Boolean operators
Where using the simple form of the test tag is not sufficient, more complicated
tests can be constructed using the and, or & not elements. The following tables
introduce these elements.
DRIX Tag 39 and
<and>
Description
Position
Combines a set of tests, whereby the test only passed if all of
the subtests evaluate to true.
0..1, declared under a test tag
0..* declared under a test->and tag
0..* declared under a test->or tag
None
0..* and elements
0..* test elements
0..* not elements
0..* or elements
Combines a set of tests, whereby the test only passed if any
of the subtests evaluate to true.
0..1, declared under a test tag
0..* declared under a test->and tag
0..* declared under a test->or tag
None
0..* and elements
0..* test elements
0..* not elements
0..* or elements
Performs a simple negation on the contained test.
0..1, declared under a test tag
0..* declared under a test->and tag
0..* declared under a test->or tag
None
1 test element, or
1 and element, or
1 or element, or
1 not element
Attributes
Elements
DRIX Tag 40 or (within test)
<or>
Description
Position
Attributes
Elements
DRIX Tag 41 not
<not>
Description
Position
Attributes
Elements
LAVASTORM ANALYTICS
lavastorm.com
Page 87
Issue 1
LDR User Manual
The example below shows a case where using a combination of these tags is
useful.
Example 67 – test example with Boolean operators
<type name="TestType">
<field name="f1" type=".string.Ascii" readRequired="true">
<arg name="length" value="4"/>
</field>
<field name="f2" type=".integer.UInt8" readRequired="true"/>
<test>
<or>
<test expected="8">
<fromField field="f2"/>
</test>
<and>
<not>
<test expected="Fred">
<fromField field="f1"/>
</test>
</not>
<not>
<test expected="Barney">
<fromField field="f1"/>
</test>
</not>
</and>
</or>
</test>
</type>
In this case, the test will pass successfully if the field f2 has the value 8, or the
field f1 is not equal to either the value “Fred” or the value “Barney”.
5.2.3.10 Complex Testing using the Method tag
In certain cases, the test to be performed is more complicate than simply
comparing an expected value against a field, parameter or expression. In these
cases there are two mechanisms available within the LDR. The first mechanism
is the testMethod tag, and is introduced later in section 7.3.1. However the
testMethod tag should only be used when it is not possible to use any of the
other test tag combinations. The preferred option is to use the method tag under
the test tag.
The method tag provides the ability to define local parameters that will be
available within an expr tag. The local parameters can reference a field (using
the fromField tag), a parameter (using the fromParam tag), a constant value
(using the value tag), or an expression (using the expr tag). Any number of
locally scoped parameters can be defined to be used within the expr, and the
expr can then be written simply referencing the parameter name.
This benefit of this approach is that only a single line expression needs to be
written for the test, and the person implementing the DRIX does not need to
worry about the internals of the LDR in terms of accessing fields, parameters,
emittable values etc within the java code.
LAVASTORM ANALYTICS
lavastorm.com
Page 88
Issue 1
LDR User Manual
The method tag has the format shown in DRIX Tag 42.
DRIX Tag 42 method
<method>
Description
Position
Attributes
Elements
Allows for complex single line test evaluation.
0..1, declared under a test tag
None
0..* defineParam elements
1 expr element
The expr element must appear after all of the defineParam
elements.
The method tag then makes use of the defineParam tag, which has the format
shown in DRIX Tag 43.
DRIX Tag 43 defineParam
<defineParam>
Description
Position
Attributes
Elements
Defines a parameter within a test->method tag to be
referenced within a subsequent expr tag which lies under the
same method tag.
0..*, declared under a method tag
Required name attribute
Required type attribute
0..1 fromField elements
0..1 fromParam elements
0..1 expr elements
0..1 value elemenets
One and only one of these elements must exist on the defineParam
tag.
In order to see how this can be used, consider the case of a file with the
following layout:
Example 68 –Example file layout requiring test-method
FirstRec
text1,text2
text3,text4
SecondRec
1;Fred;Barry
3;Bill;Barney
FirstOtherRec
text1,text2
text3,text4
FirstMyRec
text1,text2
text3,text4
In this case, there are a series of record blocks. Each block starts with a record
heading. The record heading defines the record type and has some other text. If
the record heading starts with “First”, then the following data rows will contain
comma delimited, newline separated records, with two fields per row. If the
record heading starts with “Second”, then the following data rows will contain
semi-colon delimited, newline separated records, with three fields per row.
We can write a specification to read this format using the test-method tag
combination as shown in the following example:
LAVASTORM ANALYTICS
lavastorm.com
Page 89
Issue 1
LDR User Manual
Example 69 Example DRIX using the test-method tags
<type name="File">
<repeatRange min="1" max="unbounded">
<or>
<field name="rec1" type="RecordBlock1"/>
<field name="rec2" type="RecordBlock2"/>
</or>
</repeatRange>
</type>
<type name="RecordBlockHeader">
<field name="RecordHeader" type=".string.SimpleDelimAscii"
readRequired="true" >
<arg name="delim" value="RecordHeader"/>
</field>
</type>
<type name="RecordBlock1" parentType="RecordBlockHeader">
<test>
<method>
<defineParam name="t" type="String">
<fromField field="RecordHeader"/>
</defineParam>
<expr>t.startsWith("First")</expr>
</method>
</test>
<repeatRange min="1" max="unbounded">
<field name="record" type="RecordType1"/>
</repeatRange>
</type>
<type name="RecordBlock1" parentType="RecordBlockHeader">
<test>
<method>
<defineParam name="t" type="String">
<fromField field="RecordHeader"/>
</defineParam>
<expr>t.startsWith("Second")</expr>
</method>
</test>
<repeatRange min="1" max="unbounded">
<field name="record" type="RecordType2"/>
</repeatRange>
</type>
...
Here, the defineParam tag is used to obtain a reference to the field
“RecordHeader” which is read in the “RecordBlockHeader” base type. This is
then defined such that it can be used as the local variable “t”, within the
subsequent expr block. The expr block then checks that the field starts with the
required text (“First” or “Second”) to determine if the test is successful.
Note that because the RecordHeader field is emittable, this would normally
introduce extra complexity when referencing the emittable value using a
testMethod tag. Further, if the RecordHeader change to no longer be emittable
and was a simple primitive String type, then the testMethod would also need to
be modified. The same test could be perfomed in a testMethod tag, using the
following:
LAVASTORM ANALYTICS
lavastorm.com
Page 90
Issue 1
LDR User Manual
<testMethod>
if
(((String)parsers().RecordHeaderParser().emittableValue(context)).start
sWith("First")) return Result.GOOD;
return Result.NOT_ME;
</testMethod>
From this it is easy to see how the test->method tag combination reduces the
amount of internal LDR details & java expertise the specification writer needs to
know.
5.2.3.11 Aligning Data
There are often cases where we will want to align data – either to byte
boundaries after reading non byte aligned data, or to block when using data
blocking. The align tag is introduced to allow for these data alignment issues to
be handled in a simple, concise and powerful manner in the LDR input
specification.
The align tag forces the file position to be aligned to a certain block. When used
with no arguments, this simply aligns to the start of the next byte if the file
pointer is currently not byte aligned, otherwise it does nothing. There is a
backwards attribute, to align to the previous byte boundary. There is also a “to”
argument, which specifies the block which we are aligning to. For instance,
to="1:0" implies that we should align to the next byte. to="1:5" means that we
are aligning in 1*8+5=13 bit blocks. There is also the ability to specify a "base"
(through attribute or element form) which will specifying the start file position
of the block which we are aligning.
The align tag has the properties defined in DRIX Tag 44.
DRIX Tag 44 align
<align>
Description
Position
Attributes
Elements
LAVASTORM ANALYTICS
lavastorm.com
Foces the file position to be aligned to a certain block
Can occur:
Within a type tag (0..* occurrences)
Within a using tag nested within a type tag (0..*
occurrences)
Within an or tag (0..* occurrences)
Within a repeatRange tag (0..* occurrences)
Within a while tag (0..* occurrences)
As the align tag can be placed in between field definitions, and
informs that part of the data file can be skipped, the order in which
skip tags are used within a type or using tag is important.
Optional to attribute (If present, no to element is allowed,
defaults to “1:0”)
Optional base attribute (if present. no base element is
allowed, defaults to “0:0”
Optional backwards attribute (defaults to false)
0..1 base elements
0..1 to elements
Page 91
Issue 1
LDR User Manual
There is a requirement for data types that are not aligned to byte boundaries. An
example of where this occurs is in ASN.1 using the Packed Encoding Rules
(PER). The PER dictates that data should be represented in units using the
minimum number of bits – and in one of the incarnations of PER, this mandates
that data is not aligned to byte boundaries. Although PER is not supported out of
the box in the LDR’s ASN.1 libraries it is possible for users to build their own
PER library. The use of unaligned data in PER clearly highlights the need to
handle unaligned data.
As soon as you start handling sub-byte and even sub-nibble fields, you are
required to maintain a byte & bit file pointer. In most cases, data is aligned to
byte boundaries. Given the composite nature of many systems, it is highly likely
that in any case where data is unaligned to byte boundaries, there will be an
external control or mediation process which will wrap this data stream and
package it for transfer or archiving. When this occurs you will have unaligned
data wrapped in byte aligned data.
It becomes obvious, therefore, that whenever you allow for data not aligned on
byte boundaries, you also will need a mechanism for advancing the file pointer
to the start of the next byte. This is precisely what the align tag does when
provided with no arguments.
Example 70 – align example
<type name="SomeType">
<field name="nibbleField" type="NibbleType"/>
<align/>
<field name="someAlignedIntField" type="IntegerType"/>
</type>
Example 70 shows a case where align is used. Assuming that the types declared
are implemented as their name implies, this would read a nibble from the first
byte, then skip to the start of the second byte, before reading an integer.
This is the trivial align case. However, it gets a lot more complicated when
considering the base, to and backwards operations. Let us first consider the base
and to tags which are allowed under align. These are outlined in the following
tables.
DRIX Tag 45 to
<to>
Description
Position
Specifies the block we are aligning to
Can occur:
Within a align tag (0..1 occurences)
Attributes
Optional unit attribute specifying whether we are using byte
form, or bookmark form (default form – bytes)
Elements
0..1 fromField element (see The fromField element in
section 5.2.2.6)
0..1 fromParam element (see The fromParam element in
section 5.2.2.7)
0..1 expr element (see The expr element in section 5.2.2.8)
There can be one and only be one of these elements under a to tag.
LAVASTORM ANALYTICS
lavastorm.com
Page 92
Issue 1
LDR User Manual
DRIX Tag 46 base
<base>
Description
Position
Specifies the start position for our blocking
Can occur:
Within a align tag (0..1 occurences)
Attributes
Optional unit attribute specifying whether we are using byte
form, or bookmark form (default form – bytes)
Elements
0..1 fromField element (see The fromField element in
section 5.2.2.6)
0..1 fromParam element (see The fromParam element in
section 5.2.2.7)
0..1 expr element (see The expr element in section 5.2.2.8)
There can be one and only be one of these elements under a base
tag.
In order to examine a more complex situation using the align tag, we introduce
the case shown in Example 71.
Example 71 – align example for blocked records
<type name="BdwBlockedRecords">
<param name="bdwLengthIncluded" javaType="boolean">
<default value="true"/>
</param>
<templateParam name="Bdw"/>
<templateParam name="Contents"/>
<field name="_blockInitPosn" javaType="long">
<expr>buffer.bookmark()</expr>
</field>
<field name="_bdw" type="Bdw" readRequired="true"/>
<field name="_contentsInitPosn" javaType="long">
<expr>buffer.bookmark()</expr>
</field>
<repeatRange min="1" max="unbounded">
<field name="contents" type="Contents"/>
</repeatRange>
<align>
<base>
<expr>
(param.bdwLengthIncluded() ?
field._blockInitPosn() :
field._contentsInitPosn()
)
</expr>
</base>
<to>
<fromField field="_bdw"/>
</to>
</align>
</type>
This example is taken from the blocked library provided with the LDR, and
handles the reasonable complicated, but very common case of data blocking
using a Block Descriptor Word (BRD). The BDW is a field in the data which
must be read in order to determine the block size. The contents within the block
are repeated until no more can be fit within the block. Then the block is filled
with padding characters until the next block (with its own BDW) starts.
LAVASTORM ANALYTICS
lavastorm.com
Page 93
Issue 1
LDR User Manual
The example type shown above allows for any type of BDW, as this is passed in
as a template argument. It also allows for the length of the BDW to be included
in the block length, or for it to be separate. Put together, this example shows the
ability of the LDR to specify complicated structures in a concise manner,
without losing the power or expressiveness of a programming language.
LAVASTORM ANALYTICS
lavastorm.com
Page 94
Issue 1
LDR User Manual
6 Output Specification
The output specification defines how (possibly complex and hierarchical) input data
will be translated into a set of flat records. The output specification also allows for
field renaming, and allows for the user to select which fields they require, such that
unimportant fields can be ignored for ease of use and performance optimization.
The user specifies the means for outputting the data via an xml output specification.
As the input specifications are called DRIX files, the output specifications are called
DROX files – for Data Reader Output XML specification. As with the input
specification, this output specification is an xml file conforming to an xml schema
definition. The xsd for the output specification can be found in the
LDROutputSpecification.xsd file located in the conf/ldr/xsd directory in the LDR
install location.
The simplest form of the LDR output specification simply names the fields that it
wants output. Multiple outputs can be specified, and in each output, the required fields
are specified. Only fields (not parameters, types, or anonymous fields) declared in the
input specification can be referenced in the output.
If the LAE interface to the LDR is not being used, and the LDR is used strictly as an
API, then the user of the API can supply RecordOutput objects, which they can define
to encode any type as they see fit.
However, when the LDR is used via the LAE interface, then the LDR is only able to
correctly output any of the following types:
Table 7 – Allowable Brd Output Types
byte[] (using base64 encoding)
ubyte, byte, Byte
ushort, short, Short
uint, int, Integer
ulong, long, Long
ufloat, float, Float
udouble, double, Double
boolean, Boolean
char, Character
java.lang.String
java.math.BigInteger (as String)
java.math.BigDecimal (as String)
java.util.Date
com.lavastorm.lang.Date
com.lavastorm.lang.Time
com.lavastorm.lang.Timestamp
If any other type is attempted to be output, an exception will be thrown. These are the
only types supported by the LAE BrdRecordOutput objects
6.1 DROX – The top level element
LAVASTORM ANALYTICS
lavastorm.com
Page 95
Issue 1
LDR User Manual
Each output specification XML file contains a root tag “drox”. This tag contains the
entire specification declaration and is primarily used to ensure that there is one root
element, such that the specification is in well-formed XML. The extension of the
output specification files is also DROX, which stands for “Data Reader Output XML
specification”.
DROX Tag 1 drox
<drox>
Description
Position
Attributes
Elements
Root level tag containing the output specification.
Required root tag of the document.
Optional URL source attribute
Optional String file attribute
Note that the source and file attributes are mutually exclusive.
Furthermore, if either of these attributes is present, no other
elements are allowed to exist within the drix tag.
0..* requires elements
0..* output elements.
0..* mapping elements.
0..* dump elements.
0..* dumper elements
Provided no source attribute is present, there must be at least 1
output and 1 mapping element, or 1 dump and one dumper element.
Dump, dumper, output & mapping elements can be used in the
same drox. If any requires elements are present, these must be
defined prior to any of the other elements.
There are three basic forms of DROX – one that simply references another DROX via
the source attribute, one that simply references another DROX via the file attribute or
one that contains all of the necessary sub elements. When the source attribute is
present, it must be a correctly formatted URL specifying the location of a DROX
(usually a file) which is accessible from the location where the LDR is run.
Example 72 shows an example of a DROX using the source attribute.
Example 72 - DROX tag example with a source attribute.
<?xml version="1.0" encoding="UTF-8"?>
<drox source="file:/C:/tmp/lib.drox"/>
When using the file attribute, the file must exist in a location which is accessible from
the location where the LDR is run.
Example 73 shows an example of a DROX using the file attribute.
Example 73 - DROX tag example with a file attribute.
<?xml version="1.0" encoding="UTF-8"?>
<drox file="C:/tmp/lib.drox"/>
If the source attribute or file attribute form is not used, a drox tag can contain output
and mapping tags, as shown how in Example 74, or can contain a dump & dumper tag
as shown in Example 75. The dump, dumper, output & mapping elements can all be
used within the same DROX.
LAVASTORM ANALYTICS
lavastorm.com
Page 96
Issue 1
LDR User Manual
Example 74 - DROX tag example with outputs and mappings.
<?xml version="1.0" encoding="UTF-8"?>
<drox>
<output name="output1" name="myMapping"/>
<mapping name="myMapping">
…
</mapping>
</drox>
Example 75 - DROX tag example with a dump tag.
<?xml version="1.0" encoding="UTF-8"?>
<drox>
<dumper name="Asn1Dumper" javaClass =
"com.lavastorm.ldr.converters.asn1.output.Asn1OutputDefinitionGenerator"/>
<dump name="Asn1Dumper">
<include>
…
</include>
<exclude>
…
</exclude>
</dump>
</drox>
6.2 Outputs
In most cases, an output specification is composed of a set of output tags. Each output
tag corresponds to an individual flat-record output. Each output has a single mapping
which defines what the contents of the output are to be. The output tag has the
properties shown in DROX Tag 2.
DROX Tag 2 output
<output>
Description
Position
Defines the contents of a flat record output
Can occur:
Within a drox tag (0..* occurences)
Attributes
Elements
None.
Required name attribute
Required mapping attribute
The mapping attribute on the output must correspond to the name of a mapping that is
defined in the DROX. The mapping tag itself is discussed in the next section, and
defines how a set of fields are combined into an output.
6.3 Mappings
Mappings are the major component of the output specification. The basic operation of
the mapping is to define a set of fields, and a corresponding set of names. On its own,
LAVASTORM ANALYTICS
lavastorm.com
Page 97
Issue 1
LDR User Manual
the mapping does no more than construct the definitions of the two sets. Mappings are
able to include and exclude fields and can be composed or unioned with of other
mappings. In this sense they are an abstract container, and it is only when a mapping
is attached to an output that this has any meaning.
When an output is attached to a mapping, all of the fields referenced in the mapping
are sent to that output, with the corresponding set of names forming the metadata
(column headers in record output) for each of the fields.
The mapping tag has the properties displayed in DROX Tag 3.
DROX Tag 3 mapping
<mapping>
Description
Position
Attributes
Elements
Defines a set of fields and the mapping to their output names.
Can occur:
Within a drox tag (0..* occurences)
Required name attribute
0..* include elements
0..* exclude elements
0..* mappingReference elements
At least one include, exclude or mappingReference element must
exist. The order of each of these elements is important.
As described earlier, a mapping defines a set of fields, and a mapping from each of
those fields to a name. Each of the fields in the set must correspond to a field defined
in the associated DRIX specification (except for the special cases of reference
identifiers described in section 6.7). The include, exclude and mappingReference tag
are defined in the next section. However, for the moment it is sufficient to consider
their operation as simply including fields with associated names, excluding fieldname maps based on a fields or names, and including the set definition from another
mapping.
With this as the basis, the operation of the mapping can be defined as shown in Table
8:
Table 8 –Mapping Evaluation Rules.
Mapping Evaluation Rules
Create an empty map M, which maps elements from a set of fields F to a set of names N
For each element E in the mapping T
If E is a include element
Evaluate E and add all of the defined fields f & names n to F and N respectively.
If there exists an element in f for which there is no n defined, set the n for this entry to
the field names in the set f.
Else if E is an exclude element
If the exclude is a name based exclusion
Search M and remove any element from N that matches a name in E, and
remove the corresponding field from F
Else if the exclude is a field based exclusion
Search M and remove any element from F that matches a field in E, and
remove the corresponding name from N
Else if E is a mappingReference
Evaluate the map ME corresponding to the mapping referenced in E using these rules.
Add the contents of ME to M
LAVASTORM ANALYTICS
lavastorm.com
Page 98
Issue 1
LDR User Manual
If the inclusion of the mapping introduces multiple trigger events (see section 6.6) into
the containingMapping, and this is not a unioned mappingReference, throw a trigger
event error.
Once all elements have been included, evaluate to ensure that there are no multiple trigger events
(see section 6.6).
Each element in F may map to multiple elements in N.
In cases of mapping unions, multiple elements in F may map to the same N.
During any particular iteration, there can never exist duplicate mappings from the same field to the
same name.
Excluding the mapping union case, there can never be duplicate name elements during any
iteration.
6.3.1 Including Fields
As outlined in the previous section, the include elements are used to define a set of
fields and names which are to be included into a particular mapping. The include
tag has the properties as outlined in DROX Tag 4.
DROX Tag 4 include
<include>
Description
Position
Attributes
Elements
Defines a set of fields and their corresponding output names which
are to be included in a mapping.
Can occur:
Within a mapping tag (0..* occurences)
Optional onNoMatch attribute
Optional triggerProperties attribute
onNoMatch attribute is “error” by default.
triggerProperties is “default” by default.
Allowable options are “default”, “triggerableAndDirtying”, “none”.
1 fields elements
0..1 names elements
One fields tag must exist in order to define the fields to include in the mapping.
The onNoMatch attribute can take one of the following values:
error (default)
log
ignore
The onNoMatch attribute also exists under the excludes tag. The onNoMatch tag
defines what occurs when no fields are matched by the fields tag under the
includes tag.
The triggerProperties attribute can take one of the following values:
default (default)
none
triggerableAndDirtying
These options are discussed in more detail in section section 6.6.5.
LAVASTORM ANALYTICS
lavastorm.com
Page 99
Issue 1
LDR User Manual
When included, the optional names tag defines the name with which the fields are
to be output. We will discuss the renaming in more detail in the next section. When
no names tag exists, the fields are to be output without any renaming.
In order to define the fields to include, we introduce the fields tag, with the
properties displayed in DROX Tag 5.
DROX Tag 5 fields
<fields>
Description
Position
Defines a set of fields which are to be included in a mapping.
Can occur:
Within an include tag (1 occurence)
Within an exclude tag (1 occurrence)
Attributes
Elements
None
0..1 pattern element
0..1 specialField element (see section 6.7)
Either a specialField element or a pattern element must exist, but
not both.
The fields tag itself simply contains a pattern tag or a specialField tag. The
specialField tag is not of concern at the moment, however is documented in section
6.7. The pattern tag is discussed in depth in 6.5, particularly relating to the
allowable wildcards in order to reference different elements of the field structure.
For the moment, it is sufficient to know that the pattern defines a field pattern as
defined in the DRIX.
Consider the DRIX shown in Example 76.
Example 76 – Example DRIX specification
…
<namespace name="MyNamespace">
…
<type name="CallDetailRecord">
<field name="calledNumber" type=".string.Ebcdic">
<arg name="length" value=”10”/>
</field>
<field name="callingNumber" type=".string.Ebcdic">
<arg name="length" value=”10”/>
</field>
<field name="connectDate" type=".packed.string.PackedToString">
<arg name="length" value=”4”/>
</field>
<field name="connectTime" type=".packed.string.PackedToInt32">
<arg name="length" value=”3”/>
</field>
</type>
…
</namespace>
…
<primaryField name="cdr" type="MyNamespace.CallDetailRecord"/>
LAVASTORM ANALYTICS
lavastorm.com
Page 100
Issue 1
LDR User Manual
…
If we simply wanted the connectDate & connectTime, from the record structure
defined in Example 76 we could specify this in the DROX shown in Example 77.
Example 77 – Simple DROX including individual fields.
<drox>
<output name="out1" mapping="output1Mapping"/>
<mapping name="output1Mapping">
<include>
<fields>
<pattern pattern="cdr.connectDate"/>
</fields>
</include>
<include>
<fields>
<pattern pattern="cdr.connectTime"/>
</fields>
</include>
</mapping>
</drox>
It is easy to see here that we simply output the field based on the nested field
name. In the case of anonymous fields, we cannot output the anonymous fields
themselves, since they do not have a referencable name. However, we can output
the sub-fields of anonymous fields as displayed in the following examples.
Example 78 shows a case where the NumberPair type is declared as an
anonymous field in the input specification.
Example 78 – DRIX example with anonymous fields.
…
<namespace name="MyNamespace">
…
<type name="CallDetailRecord">
<field type="NumberPair"/>
</type>
<type name="NumberPair">
<field name="calledNumber" type=".string.Ebcdic">
<arg name="length" value="10"/>
</field>
<field name="callingNumber" type=".string.Ebcdic">
<arg name="length" value="10"/>
</field>
</type>
…
</namespace>
…
<primaryField name="cdr" type="MyNamespace.CallDetailRecord"/>
LAVASTORM ANALYTICS
lavastorm.com
Page 101
Issue 1
LDR User Manual
Here, we would access the calledNumber & callingNumber fields using the
DROX shown in Example 79.
Example 79 – DRIX example for including anonymous fields.
<drox>
<output name="out1" mapping="output1Mapping"/>
<mapping name="output1Mapping">
<include>
<fields>
<pattern pattern="cdr.callingNumber"/>
</fields>
</include>
<include>
<fields>
<pattern pattern="cdr.calledNumber"/>
</fields>
</include>
</mapping>
</drox>
Since these fields are under the anonymous field of type NumberPair in
CallDetailRecord, they are referenceable by their name in their field declarations
in NumberPair.
6.3.2 Renaming Fields
If no names tag is specified under an include tag, the field names in the output
metadata can become very long and unclear. This is due to the fact that without
field renaming, the fully qualified (i.e. including the full field path) name of the
field is output. Generally, we do not want the entire field path to be output in the
column headers, and sometimes we want to rename a field due to contextual
reasons.
As seen in DROX Tag 4, each include tag can contain fields (required) and names
(optional) pairs. The names tag performs renaming of the field set returned from
the fields tag. In the cases of simple field renaming, the names tag provides a
reasonably useful tool for field disambiguation in the output and allows output
data to be constructed in a more human readable fashion. The details of the names
tag is shown in DROX Tag 6.
DROX Tag 6 names
<names>
Description
Position
Attributes
Elements
Defines the names to be used in the output metadata for the
corresponding set of fields in a fields tag.
Can occur:
Within an include tag (0..1 occurences)
None
0..1 pattern element
0..1 fromField element
Only one pattern or fromField element is allowed
The fields can be mapped to names by one of two mechanisms. The first and
simpler of the mechanisms is by providing a renaming pattern. This is similar to
LAVASTORM ANALYTICS
lavastorm.com
Page 102
Issue 1
LDR User Manual
the fields pattern tag, and simply specifies a new name pattern with which the
fields are to be named. The fromField tag is more complex and states that the
names to use are to come from the data read into fields in the DRIX.
6.3.2.1 Pattern Based Renaming
The pattern tag is discussed in depth in 6.5, particularly relating to the allowable
wildcards in order to reference different elements of the field structure. The
most simple use case for this functionality is to rename individual output fields
such that when the constructed output is presented, the metadata is more
succinct & human readable.
For example, consider the following DRIX:
Example 80 –DRIX displaying the utility of field renaming
…
<namespace name="MyNamespace">
…
<type name="CallDetailRecord">
<field name="callerInfo" type="PartyInformation"/>
<field name="receiverInfo" type="PartyInformation"/>
<field name="transferInfo" type="TransferInformation"/>
<field name="connectInfo" type="ConnectionInformation"/>
</type>
<type name="PartyInformation">
<field name="number" type=".packed.string.PackedToString">
<arg name="length" value="10"/>
</field>
<field name="imei" type=".packed.string.PackedToString">
<arg name="length" value="16"/>
</field>
<field name="location" type=".string.Ebcdic">
<arg name="length" value="4"/>
</field>
</type>
<type name="TransferInformation">
<field name="dataUpLinkVolume" type=".integer.UInt64"/>
<field name="dataDownLinkVolume" type=".integer.UInt64"/>
<field name="lostPackets" type=".integer.UInt64"/>
</type>
<type name="ConnectionInformation">
<field name="date" type=".packed.string.PackedToString">
<arg name="length" value="4"/>
</field>
<field name="time" type=".packed.string.PackedToString">
<arg name="length" value="3"/>
</field>
<field name="duration" type=".packed.string.PackedToString">
<arg name="length" value="4"/>
</field>
</type>
…
LAVASTORM ANALYTICS
lavastorm.com
Page 103
Issue 1
LDR User Manual
</namespace>
…
<primaryField name="cdr" type="MyNamespace.CallDetailRecord"/>
And say we simply wanted the Caller Number and Called Number from the
record. We can already do this by specifying the following DROX:
Example 81 –DROX to cherry pick nested information
<drox>
<output name="out1" mapping="out1Mapping"/>
<mapping name="out1Mapping">
<include>
<fields>
<pattern pattern="cdr.callerInfo.number"/>
</fields>
</include>
<include>
<fields>
<pattern pattern="cdr.receiverInfo.number"/>
</fields>
</include>
</mapping>
</drox>
However, the field names that are going to be placed in the output metadata are
going to be taken exactly from the pattern attribute. Therefore, for highly nested
structures, it is easy to see that this metadata will quickly become unwieldy. For
this reason we allow for the renaming of fields. For example, the above output
can be rewritten in the following manner to get more sensible field names:
Example 82 –DROX to cherry pick nested information with simple renaming
<drox>
<output name="out1" mapping="out1Mapping"/>
<mapping name="out1Mapping">
<include>
<fields>
<pattern pattern="cdr.callerInfo.number"/>
</fields>
<names>
<pattern pattern="A-Party"/>
</names>
</include>
<include>
<fields>
<pattern pattern="cdr.receiverInfo.number"/>
</fields>
<names>
<pattern pattern="B-Party"/>
</names>
</include>
</mapping>
</drox>
This forces the renaming shown below.
Example 83 –Result of simple renaming
Original
cdr.callerInfo.number
LAVASTORM ANALYTICS
lavastorm.com
Renamed
A-Party
Page 104
Issue 1
LDR User Manual
cdr.receiverInfo.number
B-Party
It is easy to see that for complex nested data structures, this renaming will result
in much simpler metadata.
6.3.2.2 Renaming Using Field Values
In certain situations, the name of the field to output is not known at the time of
writin the DROX specification, as the field name to output is read from the data
file. Here, we want to take a set of fields, and output them with the output field
names taken form the read field value in another set of fields. The fromField tag
is available for this purpose and has the properties shown in DROX Tag 7.
DROX Tag 7 fromField
<fromField>
Description
Position
Attributes
Elements
Defines the fields in a DRIX from which the output names are to be
taken.
Can occur:
Within a names tag (0..1 occurences)
None
1 pattern element
Again, the details of the pattern tag are discussed in section 6.5, particularly
relating to the allowable wildcards in order to reference different elements of the
field structure. However, for the moment consider the simple DRIX specification
with the corresponding data file shown in Example 84 and Example 85.
Example 84 – DRIX example showing utility of fromField renaming.
…
<namespace name="MyNamespace">
…
<type name="File">
<field name="metadata" type="Metadata"/>
<skip type="CRLF"/>
<repeatRange min="0" max="unbounded">
<field name="field0" type=".string.Ascii">
<arg name="length" value="10"/>
</field>
<field name="field1" type=".string.Ascii">
<arg name="length" value="10"/>
</field>
<field name="field2" type=".string.Ascii">
<arg name="length" value="10"/>
</field>
<skip type="CRLF"/>
</repeatRange>
</type>
<type name="Metadata">
<field name="field0" type=".string.Ascii">
<arg name="length" value="10"/>
</field>
<field name="field1" type=".string.Ascii">
<arg name="length" value="10"/>
LAVASTORM ANALYTICS
lavastorm.com
Page 105
Issue 1
LDR User Manual
</field>
<field name="field2" type=".string.Ascii">
<arg name="length" value="10"/>
</field>
<skip type="CRLF"/>
</type>
…
</namespace>
…
<primaryField name="cdr" type="MyNamespace.File"/>
…
Example 85 –Input data file example
AParty
BParty
Date
012345678912345678902009-03-10
012345678934567890122009-03-10
012345678956789012342009-03-11
It is obvious from this trivial example that the field names comes from the first
line in the file. Therefore, we would want the output names to be taken from the
values of these fields as well. In order to achieve this, we could use the DROX
specification shown in Example 86.
Example 86 –DROX example with fromField renaming
<drox>
<output name="out1" mapping="out1Mapping"/>
<mapping name="out1Mapping">
<include>
<fields>
<pattern pattern="cdr.field0"/>
</fields>
<names>
<fromField>
<pattern pattern="cdr.metadata.field0"/>
</fromField>
</names>
</include>
<include>
<fields>
<pattern pattern="cdr.field1"/>
</fields>
<names>
<fromField>
<pattern pattern="cdr.metadata.field1"/>
</fromField>
</names>
</include>
<include>
<fields>
<pattern pattern="cdr.field2"/>
</fields>
<names>
<fromField>
<pattern pattern="cdr.metadata.field2"/>
</fromField>
</names>
LAVASTORM ANALYTICS
lavastorm.com
Page 106
Issue 1
LDR User Manual
</include>
</mapping>
</drox>
Using this DROX, we will obtain the output shown in Example 87.
Example 87 –Example output using fromField renaming.
AParty
0123456789
0123456789
0123456789
BParty
1234567890
3456789012
5678901234
Date
2009-03-10
2009-03-10
2009-03-11
Whereas without using the the names->fromField tags, we would end up with
the output shown in Example 88
Example 88 –Example output without using renaming.
cdr.field0
0123456789
0123456789
0123456789
cdr.field1
1234567890
3456789012
5678901234
cdr.field2
2009-03-10
2009-03-10
2009-03-11
This is obviously a trivial example, since the desired output is essentially the
same as the input. However, in certain situations we could for example receive
the field encoding in the header as well – which would mean that the LDR is
transforming from some (e.g.) binary encoding to something human readable
and maintaining the correct field names. We will see in section 6.5 that our
DROX example can be greatly simplified using the more advanced concepts of
field patterns.
6.3.3 Excluding Fields
So far we have only investigated the mechanism for including fields and names
into a mapping. We also have the ability to exclude fields and names from a given
mapping. Since we have not delved into the pattern tag yet, this may not appear to
be the most useful task. However, if we consider that any individual include-fieldspattern tag combination can include multiple fields at a time, then the value of the
exclude tag becomes more obvious.
The exclude tag has the properties shown in DROX Tag 8.
DROX Tag 8 exclude
<exclude>
Description
Position
Attributes
Elements
LAVASTORM ANALYTICS
lavastorm.com
Defines a set of elements to exclude from a mapping (can be fields
or names)
Can occur:
Within a mapping tag (0..* occurences)
Optional onNoMatch attribute (see the includes section in
section 6.3.1 for more details)
onNoMatch attribute is “log” by default.
0..1 fields elements
Page 107
Issue 1
LDR User Manual
0..1 names elements
Field based and name based exclusion are discussed in more detail in the following
sections.
6.3.3.1 Field Based Exclusion
Field based exclusion is denoted through the use of an exclude-fields tag
combination. The fields tag was previously introduced in DROX Tag 5. The
fields tag under an exclude tag has the same properties as that under an include
tag. However, their operation is clearly different.
The exclude-fields tag is used to exclude any fields which have previously been
included in a given mapping. Table 9 displays the rules for how field based
exclusion is applied.
Table 9 –Field Based Exclusion Rules.
Field Based Exclusion Rules
Do not output any of the fields included in the mapping that match an exclude-fields tag. If a
single field is specified to output to multiple output names, do not output the field to any of the
output names.
Consider the simple data file shown in Example 89, with the associated DRIX
specification shown in Example 90.
Example 89 –Data file example for field based exclusion.
0123456789
0123456789
0123456789
1234567890
3456789012
5678901234
2009-03-10
2009-03-10
2009-03-11
Example 90 – DRIX example for field based exclusion.
…
<namespace name="MyNamespace">
…
<type name="Records">
<repeatRange min="0" max="unbounded">
<field name="AParty" type=".string.Ascii">
<arg name="length" value="10"/>
</field>
<field name="BParty" type=".string.Ascii">
<arg name="length" value="10"/>
</field>
<field name="Date" type=".string.Ascii">
<arg name="length" value="10"/>
</field>
<skip type="CRLF"/>
</repeatRange>
</type>
…
</namespace>
…
LAVASTORM ANALYTICS
lavastorm.com
Page 108
Issue 1
LDR User Manual
<primaryField name="cdr" type="MyNamespace.Records"/>
…
Then also consider we have the DROX specification shown in Example 91
Example 91 –DROX example with field based exclusion
<drox>
<output name="out1" mapping="out1Mapping"/>
<mapping name="out1Mapping">
<exclude>
<fields>
<pattern pattern="cdr.Date"/>
</fields>
</exclude>
<include>
<fields>
<pattern pattern="cdr.AParty"/>
</fields>
<names>
<pattern pattern="AParty"/>
</names>
</include>
<include>
<fields>
<pattern pattern="cdr.AParty"/>
</fields>
<names>
<pattern pattern="AParty2"/>
</names>
</include>
<include>
<fields>
<pattern pattern="cdr.Date"/>
</fields>
<names>
<pattern pattern="Date"/>
</names>
</include>
<exclude>
<fields>
<pattern pattern="cdr.AParty"/>
</fields>
</exclude>
</mapping>
</drox>
This DROX will result in the output shown in Example 92.
Example 92 –Output example for field based exclusion.
Date
2009-03-10
2009-03-10
2009-03-11
The DROX can be interpreted as performing the following:
Initialise Mapping to empty set of fields & corresponding names
Exclude field cdr.Date. from empty map (no effect)
Include field cdr.AParty->AParty
Include field cdr.AParty->AParty2
Exclude field cdr.AParty. Effectively excludes:
LAVASTORM ANALYTICS
lavastorm.com
Page 109
Issue 1
LDR User Manual
o cdr.AParty->AParty
o cdr.AParty->AParty2
There are certain situations where the exclusion of a previously included field
will not cause the exclusion of the associated output column. This only occurs if
multiple included fields are declared to output to the same name. This situation
can only occur in the context of Mapping Unions declared in section 6.3.4.2.
Section 6.5 provides greater detail on the pattern tag, including how excluded
field patterns are applied to included field patterns, particularly relating to the
allowable wildcards for referencing different elements of the field structure.
6.3.3.2 Name Based Exclusion
Name based exclusion is denoted through the use of an exclude-names tag
combination. The names tag was previously introduced in DROX Tag 6 in the
context of the include tag. However, the exclude-names tag has different
properties, and is defined in DROX Tag 9.
DROX Tag 9 names (under exclude)
<names>
Description
Position
Attributes
Elements
Defines a set of output names (output columns) to exclude from a
mapping.
Can occur:
Within an exclude tag (0..1 occurence)
None
0..1 regexPattern elements
In addition to having different properties, the exclude-names tag also has a
clearly different operation than the include-names tag. The exclude-fields tag is
used to exclude any fields which have previously been included in a given
mapping. Table 10 displays the rules for how field based exclusion is applied.
Table 10 –Name Based Exclusion Rules.
Name Based Exclusion Rules
Do not output the columns in the output record for any of the output names included in the
mapping that match an exclude-names tag. Any fields that are specified to output to these
columns should no longer output to the columns. If a fields is defined to output to multiple
columns, it will still output to columns not mentioned in the exlclude-names tag.
The regexPattern tag has the attributes shown in DROX Tag 10.
DROX Tag 10 regexPattern
<regexPattern>
Description
Position
Attributes
Elements
LAVASTORM ANALYTICS
lavastorm.com
Defines a regular expression used to define output names (output
columns) to exclude from a mapping.
Can occur:
Within a exclude->names tag (0..1 occurence)
None
Required pattern attribute
Page 110
Issue 1
LDR User Manual
Consider again the DRIX specification introduced in the previous section in
Example 90. Then consider this time we have the DROX specification shown in
Example 93.
Example 93 –DROX example with name based exclusion
<drox>
<output name="out1" mapping="out1Mapping"/>
<mapping name="out1Mapping">
<exclude>
<names>
<regexPattern pattern="Date"/>
</name>
</exclude>
<include>
<fields>
<pattern pattern="cdr.AParty"/>
</fields>
<names>
<pattern pattern="AParty"/>
</names>
</include>
<include>
<fields>
<pattern pattern="cdr.AParty"/>
</fields>
<names>
<pattern pattern="AParty2"/>
</names>
</include>
<include>
<fields>
<pattern pattern="cdr.Date"/>
</fields>
<names>
<pattern pattern="Date"/>
</names>
</include>
<exclude>
<names>
<regexPattern pattern="AParty2"/>
</names>
</exclude>
</mapping>
</drox>
This DROX will result in the output shown in Example 92.
Example 94 –Output example for name based exclusion.
AParty
0123456789
0123456789
0123456789
Date
2009-03-10
2009-03-10
2009-03-11
The DROX can be interpreted as performing the following:
Initialise Mapping to empty set of fields & corresponding names
Exclude output column cdr.Date. from empty map (no effect)
Include field cdr.AParty->AParty
Include field cdr.AParty->AParty2
Exclude output column cdr.AParty2. Effectively excludes:
LAVASTORM ANALYTICS
lavastorm.com
Page 111
Issue 1
LDR User Manual
o cdr.AParty->AParty2
There are certain situations where the exclusion of a previously included output
name will cause the exclusion of multiple included fields. This only occurs if
multiple included fields are declared to output to the same name. This situation
can only occur in the context of Mapping Unions declared in section 6.3.4.2.
So far, we have only considered the simple name exclusion. In order to
demonstrate the utility of the regexPattern tag, consider again the DRIX
specification introduced in the previous section in Example 90. Then consider
this time we have the DROX specification shown in Example 95.
Example 95 –DROX example with regex name based exclusion
<drox>
<output name="out1" mapping="out1Mapping"/>
<mapping name="out1Mapping">
<exclude>
<names>
<regexPattern pattern="D*"/>
</name>
</exclude>
<include>
<fields>
<pattern pattern="cdr.AParty"/>
</fields>
<names>
<pattern pattern="AParty"/>
</names>
</include>
<include>
<fields>
<pattern pattern="cdr.AParty"/>
</fields>
<names>
<pattern pattern="AParty2"/>
</names>
</include>
<include>
<fields>
<pattern pattern="cdr.Date"/>
</fields>
<names>
<pattern pattern="Date"/>
</names>
</include>
<exclude>
<names>
<regexPattern pattern="A.*"/>
</names>
</exclude>
</mapping>
</drox>
This DROX will result in the output shown in Example 96.
Example 96 –Output example for regex name based exclusion.
Date
2009-03-10
2009-03-10
2009-03-11
LAVASTORM ANALYTICS
lavastorm.com
Page 112
Issue 1
LDR User Manual
The DROX can be interpreted as performing the following:
Initialise Mapping to empty set of fields & corresponding names
Exclude all output columns matching regex D*. from empty map (no
effect)
Include field cdr.AParty->AParty
Include field cdr.AParty->AParty2
Exclude output columns matching regex A.* Effectively excludes:
o cdr.AParty->AParty
o cdr.AParty->AParty2
6.3.4 Combining Multiple Mappings
So far we have introduced the concept of outputs and mappings and seen how we
can construct a set of fields to output based on include and exclude statements. We
are also able to combine mappings. There are primarily two forms of mapping
combinations – mapping composition, and mapping unions.
In order to specify that a mapping includes fields from another mapping, the
mappingReference tag is used. The mappingReference tag has the properties
defined in DROX Tag 11.
DROX Tag 11 mappingReference
<mappingReference>
Description
Position
Attributes
Elements
Defines that the contents of another mapping are to be included into
the containing mapping (using union or composition).
Can occur:
Within a mapping tag (0..* occurence)
None
Required mapping attribute
Optional boolean union attribute (default false)
Optional patternBase attribute
From Table 8 we can see that for any mapping reference elements, the contents of
that mapping are evaluated in their own context to determine a set of fields and
names. No exclude statements from a sub-mapping are applied in the context of the
containing mapping. Ignoring the union attribute for the moment, consider the
DRIX shown in Example 97.
Example 97 – DRIX example mappingReferences.
…
<namespace name="MyNamespace">
…
<type name="MyType">
<field name="A" type="SomeOtherType"/>
<field name="B" type="SomeOtherType"/>
<field name="C" type="SomeOtherType"/>
<field name="D" type="SomeOtherType"/>
</type>
LAVASTORM ANALYTICS
lavastorm.com
Page 113
Issue 1
LDR User Manual
…
</namespace>
…
<primaryField type="MyNamespace.MyType"/>
…
Then consider the DROX displayed in Example 98.
Example 98 –DROX example showing context evaluation of mapping references
<drox>
<output name="out1" mapping="mapping1"/>
<mapping name="mapping1">
<mappingReference mapping="mapping2"/>
<exclude>
<names>
<regexPattern pattern="B"/>
</name>
</exclude>
<include>
<fields>
<pattern pattern="C"/>
</fields>
</include>
<mappingReference mapping="mapping3"/>
</mapping>
<mapping name="mapping2">
<include>
<fields>
<pattern pattern="A"/>
</fields>
</include>
<include>
<fields>
<pattern pattern="B"/>
</fields>
</include>
</mapping>
<mapping name="mapping3">
<exclude>
<fields>
<pattern pattern="C"/>
</fields>
</exclude>
<include>
<fields>
<pattern pattern="D"/>
</fields>
</include>
</mapping>
</drox>
The resulting metadata for this DROX will have 3 columns: A, C and D.
The DROX can be interpreted as performing the following:
Initialise mapping1 to empty set of fields & corresponding names
LAVASTORM ANALYTICS
lavastorm.com
Page 114
Issue 1
LDR User Manual
Encounter mapping2 reference & evaluate the contents of mapping2
o Initialise mapping2 to empty set of fields & corresponding names
o Include field A->A
o Include field B->B
o Include the evaluated mapping {A->A, B->B} to the containing
mapping
Exclude output column B from the mapping. Effectively excludes:
o B->B
Include field C->C
Encounter mapping3 reference & evaluate the contents of mapping3
o Initialise mapping3 to empty set of fields & corresponding names
o Exclude field C from empty map (no effect)
o Include field D->D
o Include the evaluated mapping {D->D } to the containing
mapping
This leaves mapping1 as {A->A, C->C, D->D}
The major difference between mapping composition and mapping unions is the
legality of multiple trigger events and the related concept of whether or not
multiple fields can map to the same output field name. Trigger events are a
complex concept and as such are defined in detail in their own section, in 6.6.
Without going into the complexities of trigger events, it is sufficient for this
section to understand that a trigger event is the event that causes the writing of a
new record in an output. These trigger events are effectively a field path, from the
primary field down, with the lowest level repeating element forming the end of the
path.
With this in mind, mapping unions allow for multiple trigger events on different
field paths to exist under the one mapping. Mapping compositions do not allow
this. Similarly, within a mapping union, multiple fields are allowed to map to the
same output field name. These concepts are investigated further in the following
sections on Composition and Unions.
6.3.4.1 Composition
Mapping composition is the simplest mapping combination to perform and is
generally used for cases where there are a key set of fields that are needed in
multiple outputs. Through the use of mapping composition, we are able to reuse base mappings, increase readability and reduce redundant code (thereby
allowing for easier output specification maintenance).
The following rules apply to composed mappings:
Table 11 –Rules for Mapping compositions.
Mapping Composition Rules
There can only be one trigger event (see section 6.6) in a composed mapping
LAVASTORM ANALYTICS
lavastorm.com
Page 115
Issue 1
LDR User Manual
If an input field is mapped to multiple different output fields, then there is no
problem.
If there is ever a case where there are two inputs in different mappings which
are output to the same output field, this is an error – regardless of whether or not
the input field is actually the same
Example 97 and Example 98 in the previous section displayed a correct use of
mapping composition.
6.3.4.2 Unions
In order to fully understand the utility of mapping unions, an understading of
the trigger event (see section 6.6) is required. The simplest description of a
mapping union is that it allows for a record to be output that contains
independently repeating fields. The general use-case of the mapping union is
when there are two independent, but mutually exclusive trigger events. Each of
these trigger events are included in their own mapping, then included into the
containing mapping via mapping unions. This case is highlighted in the DRIX
in Example 99. Example 100 shows how simple of mapping composition cannot
handle this case, and Example 101 shows where mapping unions can resolve
this problem in a DROX.
Example 99 – DRIX example for mappin unions.
…
<namespace name="MyNamespace">
…
<type name="MyType">
<field name="recordHeader" type="Header"/>
<repeatRange min="1" max="unbounded">
<or>
<field name="recType1" type="RecType1"/>
<field name="recType2" type="RecType2"/>
</or>
</repeatRange>
</type>
…
</namespace>
…
<primaryField type="MyNamespace.MyType"/>
…
Example 100 – DROX example showing incorrect use of mapping composition.
<drox>
<output name="out1" mapping="mapping1"/>
<mapping name="mapping1">
<include>
LAVASTORM ANALYTICS
lavastorm.com
Page 116
Issue 1
LDR User Manual
<fields>
<pattern pattern="recordHeader"/>
</fields>
</include>
<mappingReference mapping="mapping2"/>
<mappingReference mapping="mapping3"/>
</mapping>
<mapping name="mapping2">
<include>
<fields>
<pattern pattern="recType1.*"/>
</fields>
<names>
<pattern pattern="$2"/>
</names>
</include>
</mapping>
<mapping name="mapping3">
<include>
<fields>
<pattern pattern="recType2.*"/>
</fields>
<names>
<pattern pattern="$2"/>
</names>
</include>
</mapping>
</drox>
If the DROX specification in Example 100 was applied, this would result in an
error. The problem is that there are two independently repeating fields
(recType2 and recType3) included within mapping1. In this situation, mapping
composition is insufficient and a union of the two mappings is required.
Example 101 – DROX example showing correct use of mapping union.
<drox>
<output name="out1" mapping="mapping1"/>
<mapping name="mapping1">
<include>
<fields>
<pattern pattern="recordHeader"/>
</fields>
</include>
<mappingReference mapping="mapping2" union="true"/>
<mappingReference mapping="mapping3" union="true"/>
</mapping>
<mapping name="mapping2">
<include>
<fields>
<pattern pattern="recType1.*"/>
</fields>
<names>
<pattern pattern="$2"/>
</names>
</include>
</mapping>
<mapping name="mapping3">
<include>
LAVASTORM ANALYTICS
lavastorm.com
Page 117
Issue 1
LDR User Manual
<fields>
<pattern pattern="recType2.*"/>
</fields>
<names>
<pattern pattern="$2"/>
</names>
</include>
</mapping>
</drox>
This fixes our problem. Through using mapping unions in this manner, we are
able to combine the data from the two independently repeating fields recType1
and recType2. Then, via renaming (provided that the subfields with the same
names have the same types), we are able to output each of these types on the
same output, with the common fields between the two used for both recTypes.
The rules for mapping unions are displayed in Table 12.
Table 12 –Rules for Mapping unions.
Mapping Union Rules
There can be multiple trigger events (see section 6.6) in mapping unions
If an input field is mapped to multiple different output fields, then there is no
problem.
If a field f1 maps to an output field of1 in one mapping, and in another mapping,
a field f2 maps to of1, then this is only a problem if the type of f1 != the type of
f2 and f1 and f2 are mutually exclusive.
Mapping unions are especially useful to combine different record types into one
output record. In this situation, any common fields will be output on each and
every row. However, the fields which are specific to a particular record type
will only be output on the rows for which that record type is present, and null in
the cases where the record type is not present. Consider the DRIX and DROX
shown in Example 102 and Example 103 respectively.
Example 102 – DRIX example containing multiple record types.
…
<namespace name="MyNamespace">
…
<type name="Record">
<or>
<field name="record1" type="RecordType1"/>
<field name="record2" type="RecordType2"/>
<field name="record3" type="RecordType3"/>
</or>
</type>
<type name="RecordCommon">
<field name="commonField" type=".string.Ascii" readRequired="true">
<arg name="length" value="3"/>
</field>
</type>
LAVASTORM ANALYTICS
lavastorm.com
Page 118
Issue 1
LDR User Manual
<type name="RecordType1">
<field type="RecordCommon"/>
<test expected="R1">
<fromField field="commonField"/>
</test>
<field name="r1Field" type=".string.Ascii">
<arg name="length" value="3"/>
</field>
</type>
<type name="RecordType2">
<field type="RecordCommon"/>
<test expected="R2">
<fromField field="commonField"/>
</test>
<field name="r2Field" type=".string.Ascii">
<arg name="length" value="3"/>
</field>
</type>
<type name="RecordType3">
<field type="RecordCommon"/>
<test expected="R3">
<fromField field="commonField"/>
</test>
<field name="r3Field" type=".string.AsciiToInt32">
<arg name="length" value="3"/>
</field>
</type>
…
</namespace>
…
<primaryField type="MyNamespace.Record"/>
…
Example 103 – DROX example using mapping unions & composition for multiple record
types.
<drox>
<output name="out1" mapping="outMapping"/>
<mapping name="outMapping">
<mappingReference mapping="mapping1" union="true"/>
<mappingReference mapping="mapping2" union="true"/>
<mappingReference mapping="mapping3" union="true"/>
</mapping>
<mapping name="mapping1">
<include>
<fields>
<pattern pattern="record1.commonField"/>
</fields>
<names>
<pattern pattern="A"/>
</names>
</include>
<include>
<fields>
<pattern pattern="record1.r1Field"/>
</fields>
<names>
LAVASTORM ANALYTICS
lavastorm.com
Page 119
Issue 1
LDR User Manual
<pattern pattern="B"/>
</names>
</include>
</mapping>
<mapping name="mapping2">
<include>
<fields>
<pattern
</fields>
<names>
<pattern
</names>
</include>
<include>
<fields>
<pattern
</fields>
<names>
<pattern
</names>
</include>
</mapping>
<mapping name="mapping3">
<include>
<fields>
<pattern
</fields>
<names>
<pattern
</names>
</include>
<include>
<fields>
<pattern
</fields>
<names>
<pattern
</names>
</include>
</mapping>
</drox>
pattern="record2.commonField"/>
pattern="A"/>
pattern="record2.r2Field"/>
pattern="B"/>
pattern="record3.commonField"/>
pattern="A"/>
pattern="record3.r3Field"/>
pattern="C"/>
The resulting metadata for this DROX will have 3 columns: A, B and C.
The DROX can be interpreted as performing the following:
Initialise outMapping to empty set of fields & corresponding names
Encounter mapping1 reference & evaluate the contents of mapping1
o Initialise mapping1 to empty set of fields & corresponding names
o Include field record1.commonField->A
o Include field record1.r1Field->B
o Include the evaluated mapping {record1.commonFeld->A,
record1.r1Field->B} to the containing mapping
Encounter mapping2 reference & evaluate the contents of mapping2
o Initialise mapping2 to empty set of fields & corresponding names
o Include field record2.commonField->A
o Include field record2.r2Field->B
o Include the evaluated mapping {record2.commonFeld->A,
record2.r2Field->B} to the containing mapping
LAVASTORM ANALYTICS
lavastorm.com
Page 120
Issue 1
LDR User Manual
Encounter mapping3 reference & evaluate the contents of mapping3
o Initialise mapping3 to empty set of fields & corresponding names
o Include field record3.commonField->A
o Include field record3.r3Field->C
o Include the evaluated mapping {record3.commonFeld->A,
record3.r3Field->C} to the containing mapping
This leaves the containing mapping as {{record1.commonFeld->A,
record1.r1Field->B} U {record2.commonFeld->A, record2.r2Field->B}
U {record3.commonFeld->A, record3.r3Field->C}}
While we have multiple fields mapping to the same output field, in each
case, the field types are the same, and they come from mappings that are
being unioned – therefore this is acceptable.
For this example, the resulting output is fairly straightforward and logical. There
will always be 3 output columns, A, B and C. Whenever there is a record1 or
record2 field present, B will be populated, and C will be null. Whenever there is a
record3 field present, then B will be null and C will be populated.
Note that in this situation, if the DROX was changed for the mapping outMapping,
such that we had the snippet shown in Example 104:
Example 104 – DROX snippet showing the exclusion of multiple fields.
<mapping name="outMapping">
<mappingReference mapping="mapping1" union="true"/>
<mappingReference mapping="mapping2" union="true"/>
<mappingReference mapping="mapping3" union="true"/>
<exclude>
<names>
<regexPattern pattern="B"/>
</name>
</exclude>
</mapping>
In this case, the excludes clause effectively excludes:
o record.r1Field ->B
o record.r2Field->B
The union concept is very powerful when used on specifications using multiple
record types. Part of the power of the union is that it allows for different fields of
the same type to be output to the same field. In general this is a good idea when the
elements being unioned are mutually exclusive. The user must be careful using
mapping unions when the elements being unioned are not mutually exclusive as
this can lead to unintuitive results. Consider the DRIX and DROX specifications
shown in and respectively.
Example 105 – DRIX example mappingReferences.
…
<namespace name="MyNamespace">
…
<type name="MyType">
LAVASTORM ANALYTICS
lavastorm.com
Page 121
Issue 1
LDR User Manual
<field name="f1" type="TypeA"/>
<or>
<field name="f2" type="TypeA"/>
<field name="f3" type="TypeB"/>
</or>
</type>
…
</namespace>
…
<primaryField type="MyNamespace.MyType"/>
…
Example 106 – DROX example showing incorrect use of mapping composition.
<drox>
<output name="out1" mapping="mapping1"/>
<mapping name="mapping1">
<mappingReference mapping="mapping2" union="true"/>
<mappingReference mapping="mapping3" union="true"/>
</mapping>
<mapping name="mapping2">
<include>
<fields>
<pattern pattern="f1"/>
</fields>
<names>
<pattern pattern="A"/>
</names>
</include>
</mapping>
<mapping name="mapping3">
<include>
<fields>
<pattern pattern="f2"/>
</fields>
<names>
<pattern pattern="A"/>
</names>
</include>
</mapping>
</drox>
Assuming that type TypeA is defined, and emittable, then in this instance we
will have the following mapping evaluated as mapping1 :
{{f1 ->A} U {f2->A}}
This mapping is valid, however it is slightly odd, since f1 and f2 are not
mutually exclusive. In this case, the A output field will always be populated. If
the field f2 is present in the data, then f2 will appear in the A output field. If,
however, the field f3 is present in the data, then the field f1 will appear in the A
output field.
LAVASTORM ANALYTICS
lavastorm.com
Page 122
Issue 1
LDR User Manual
6.3.4.3 Nesting Mappings with a patternBase
A lot of systems have nesting of data formats within other wrapping formats.
For example, an ASN.1 file produced by a switch could be wrapped in header
trailer information by some collation process. These files could then be batched
together into a larger file by some later control process which may append its
own metadata information (package date & time etc).
If a reader already exists for the raw data, then it is very simple to include the
DRIX library of the raw data, and simply create a field referencing the toplevel-type. This was designed to allow for such nesting to be easily handled in
the DRIX.
In order to allow for similar easy composition on the DROX side, the
patternBase attribute is introduced.
In certain cases (again consider ASN.1 data) the DROX can be very
complicated due to many mappings, and many independently repeating fields.
However when DRIX wrapping occurs, these DROX mapping elements become
useless without the use of the patternBase attribute.
Consider the case where we have raw data where we want to output subfield g
of field f,
Then the DROX would contain a mapping->include->fields clause shown in
Example 107.
Example 107 –DROX base mapping example
<mapping name="baseMapping">
<include>
<fields>
<pattern pattern="f.g"/>
</fields>
</include>
</mapping>
If however, this data gets wrapped, such that now the field f exists under a
header/trailer field "ht" and subheader/trailer wrapping "sht", then this pattern
should become:
<pattern pattern="ht.sht.f.g"/>
This means that the entire mapping should be rewritten. However, with the use
of the patternBase attribute this can be alleviated. Within the mappingReference
attribute on a mapping, you could specify under which field names, all of the
included patterns are to be nested.
LAVASTORM ANALYTICS
lavastorm.com
Page 123
Issue 1
LDR User Manual
In this example we could have the DROX snippet shown in Example 108.
Example 108 –The patternBase attribute on mappingReferences
<mapping name="wrappingMapping">
<mappingReference mapping="baseMapping" patternBase="ht.sht"/>
</mapping>
<mapping name="baseMapping">
<include>
<fields>
<pattern pattern="f.g"/>
</fields>
</include>
</mapping>
The patternBase attribute is applied to any field patterns, provided that they are
not an absolute field pattern (starting with a “.”). This means that the patterns in
field based exclusions also have the patternBase applied when included in
another mapping via a mappingReference tag. These patternBase attributes can
be repeatedly applied if one mapping references another mapping, which
references a third mapping and so on.
Consider the DROX in Example 109.
Example 109 –Netsted DROX patternBase attributes
<drox>
<output name="out1" mapping="A"/>
<mapping name="A">
<mappingReference mapping="B" patternBase="a"/>
</mapping>
<mapping name="B">
<mappingReference mapping="C" patternBase="b"/>
<mappingReference mapping="D" patternBase=".b"/>
<include>
<fields>
<pattern pattern="b1"/>
</fields>
</include>
<include>
<fields>
<pattern pattern=".a1.b2"/>
</fields>
</include>
<exclude>
<fields>
<pattern pattern="b.c3"/>
</fields>
</exclude>
<exclude>
<fields>
<pattern pattern="b.c4"/>
</fields>
</exclude>
</mapping>
<mapping name="C">
<include>
<fields>
<pattern pattern="c1"/>
LAVASTORM ANALYTICS
lavastorm.com
Page 124
Issue 1
LDR User Manual
</fields>
</include>
<include>
<fields>
<pattern pattern="c3"/>
</fields>
</include>
<include>
<fields>
<pattern pattern=".a.b.c4"/>
</fields>
</include>
</mapping>
<mapping name="D">
<include>
<fields>
<pattern pattern="d1"/>
</fields>
</include>
<include>
<fields>
<pattern pattern=".d.d2"/>
</fields>
</include>
</mapping>
</drox>
In this example, the output out1 will end up with the following fields:
a.b1
a1.b2
a.b.c1
b.d1
d.d2
Where the fields a.b.c3 & a.b.c4 are included by mapping C (with the
patternBase from mapping A & B applied) and subsequently excluded by
mapping B.
6.4 Dumper & Dump Tags
For cases of complicated file structures, with many fields, it can occasionally be
useful to specify that you simply want all of the fields output and have the LDR
determine which outputs it needs to create.
This facility is provided through the use of the dump & dumper tag combinations.
The dumper tag has the properties shown in DROX Tag 12
DROX Tag 12 dumper
<dumper>
Description
Position
LAVASTORM ANALYTICS
lavastorm.com
Specifies the name of a dumper to be associated with a given
dumper class
Can occur:
Within a drox tag (0..* occurence)
Page 125
Issue 1
LDR User Manual
Attributes
Elements
None
Required name attribute
Required javaClass attribute
The dumper tag defines a dumper which can then be referenced in dump tags. It
specifies the class of dumper to be used. The dumper class referenced is itself
something which generates outputs with appropriate fields in the outputs.
There are three pre-built dumpers that are able to be used within the LDR. These
are :
com.lavastorm.ldr.converters.asn1.output.Asn1OutputDefinitionGenerator
com.lavastorm.ldr.converters.cobol.output.CobolOutputDefinitionGenerator
com.lavastorm.ldr.output.SimpleOutputDefinitionGenerator
As their name suggests, the Asn1OutputDefinitionGenerator is useful for ASN.1
data and the CobolOutputDefinitionGenerator is useful for COBOL data. When
using the LAE-LDR node interface, the ASN.1 & COBOL nodes use these classes
respectively.
The SimpleOutputDefinitionGenerator is generally useful in cases where netiher
ASN.1 nor COBOL data is present.
Specifying these dumper tags on their own however does not result in any output
being produced. In order for this to occur, the dump tag is required.
The dump tag has the properties shown in DROX Tag 13
DROX Tag 13 dump
<dump>
Description
Position
Attributes
Elements
Specifies the name of a dumper to be associated with a given
dumper class
Can occur:
Within a drox tag (0..* occurence)
Required name attribute
1 include elements
0..* exclude elements
1 include tag must exist, and this defines the pattern below which
outputs are to be created.
The dump tag must contain one include tag. The structure of this include tag is
more restricted than in the mapping case. When inside a dup tag, the include tag
cannot contain any names elements.
The name attribute on the dump tag must correspond to the name on a dumper tag
defined in the DROX. The specific output definition generator used to generate the
outputs, is the one specified in the javaClass attribute of the dumper tag.
LAVASTORM ANALYTICS
lavastorm.com
Page 126
Issue 1
LDR User Manual
The dump tag will then resolve the include-fields-pattern & obtain a field (or set of
fields). However, rather than these fields simply being included, as in the mapping
case, the fields themselves and all of their subfields are included in outputs.
In the case of Cobol & ASN.1 output definition generators, certain fields known to
be unimportant in the format are ignored, and the names of the fields are dependant
on the way the fields were defined in the COBOL copybook or ASN.1
specification respectively.
In general, the output definition generators use the field pattern specified in the
dump tag and then generate all of the outputs required. Each time a loop is
encountered, a new output will be created. This is done to ensure that there are no
duplicate trigger event issues. The records in the output of the containing loop can
then be rejoined with the records on the output of the sub loop, via refIds (see 6.7)
which are produced to maintain referential integrity between all of the outputs
produced.
6.5 Field Patterns
Field patterns are referenced via the pattern tag, which has the properties shown in
DROX Tag 14.
DROX Tag 14 pattern
<pattern>
Description
Position
Attributes
Elements
Specifies a field pattern to match against. Used for include/exclude
statements
Can occur:
Within an include->fields tag (0..1 occurence)
Within an exclude->fields tag (0..1 occurrence)
Required pattern attribute
None
As shown from DROX Tag 14, the pattern tag is very simple, and contains only
one required attribute, pattern. However, the syntax and details of the pattern
attribute are more complex, and are described in detail in this section.
From the output point of view, there are three types of fields we are interested in:
Emittable fields
Fields defined as a primitiveType, a javaType, or declared as
emittable—as defined in section 5.2.1.7
Constructed fields
Any field which is declared to be of a type that contains other
fields (may also be emittable)
Dynamically typed fields
Fields declared as a generated-type or dynamically bound fields as
declared in 5.2.1.9 and 5.2.2.13 respectively (may be emittable and
or constructed)
LAVASTORM ANALYTICS
lavastorm.com
Page 127
Issue 1
LDR User Manual
Only emittable fields can be output directly however there are special mechanisms
for referencing all of these different field types in the output specification.
6.5.1 What is a Pattern?
Patterns are used to include or exclude fields from the output. Each pattern
represents a set of matching fields. Patterns are composed of a sequence of tests,
separated by dots:
test
test
test
fieldA.fieldB|fieldC.!fieldD
pattern
Each test applies to one level of fields. Conceptually, a pattern may be thought of
as a query against the input specification. The input specification defines the
structure and the field names that the input data will populate. A pattern queries
this structure.
The first test in a pattern is checked against all the top-level fields in the input
spec. Matching fields are then queried for second-level fields matching the second
test, and so on. For example, using the pattern above, the input specification will
first be queried for a top-level field named “fieldA”. If a fieldA is found, then
fieldA will be queried for a field, immediately under it, named “fieldB” or named
“fieldC”. If one or both of these fields are found, then each will finally be queried
for all fields immediately below them that are not named “fieldD”.
6.5.2 Traits
A pattern is a query, and so to be useful, there must exist attributes to query
against. These attributes are referred to as traits.
Trait
?constructed
?emittable
?iterable
?name=fieldname
?type=typename
Description
is a field of a constructed type
is a field of an emittable type
is a field declared within a
repeatRange or while element
is named fieldname
is of type typename
Symbol
#
*
@
fieldname
{typename}
As shown above, each trait also has a shorthand symbol reserved for it. However,
curly braces serve double-duty: they are also used to denote a repeat quantity (see
“Quantifiers” below).
LAVASTORM ANALYTICS
lavastorm.com
Page 128
Issue 1
LDR User Manual
For any of the type specifiers, the fully qualified type name must be used. For
example, if we have a field foo.bar of type fred.Bar, then any of the following will
match:
<pattern pattern=="foo.bar{.fred.Bar}"/>
<pattern pattern=="foo.bar{fred.Bar}"/>
However, the following will not match:
<pattern pattern=="foo.bar{Bar}"/>
Any template types can also be referenced using the type specifier, in a comma
separated list, with the types in order of the template parameters defined on the
type. When specifying the type of a field which is dynamically bound and
supplied template arguments, then there is a type name nesting in the type
specifier.
For example, if we have a field foo.bar, which is bound to a type fred.Bar, and
supplied three template arguments (on the same order as the declared template
parameters on the type), a.Type1, b.Type2 and c.Type3, then the correct way to
reference the type of this field in a type specifier is shown below:
<pattern pattern=="foo.bar{.fred.Bar{a.Type1, b.Type2, c.Type3}}"/>
Where any of the a.Type1, b.Type2, c.Type3 can also have a leading “.” character
for absolute referencing.
6.5.3 Test Composition
The operators below can be used to assemble one or more traits into a test
expression.
6.5.3.1
AND
Within a test, adjacent expressions are joined together by implicit ANDs. That is,
each part of a test expression must evaluate to true for the field(s) that it matches.
For example:
A*{integer.Int32}
6.5.3.2
match a field that is named “A” AND is emittable AND
has type=integer.Int32
OR |
Within a test, the pipe character ‘|’ can be used to OR expressions. For example:
A|B|{integer.Int32}
LAVASTORM ANALYTICS
lavastorm.com
match a field that is named “A” OR “B” OR
has type=integer.Int32
Page 129
Issue 1
LDR User Manual
6.5.3.3
Grouping ()
Within a test, parentheses can be used to clarify precedence or force a different
order of evaluation.
Clarifying Precedence:
A|(B{integer.Int32})
Forcing Precedence:
(A|B){integer.Int32}
6.5.3.4
here the parentheses do not change the default order of
evaluation (the implicit AND has precedence over the
OR operator). The expression says, "match a field that
is named A, or, match an Int32 field named B."
here the parentheses change the order of evaluation.
The expression says, "match an Int32 field that is
named either A or B."
Negation !
Within a test, an exclamation point ‘!’ can be used to negate the following
expression. For example:
!A!B{integer.Int32}
6.5.3.5
match all fields (at this level) that have
type=integer.Int32, but are not named “A” or “B”
Precedence
Within a test, binary expressions are evaluated left to right, with any implicit
AND operations having precedence over any OR operations. The only unary
operator is the negation operator ‘!’, and it evaluates right to left. For example,
!!A is evaluated as !(!A) = A.
6.5.4
Quantifiers {m,n}
A quantifier can be used to match a single test against multiple levels of fields.
Each test may include one quantifier as the last element of the test expression (but
before a range filter, if there is one). LDR quantifiers are compatible with Perl’s
regular expression quantifiers, so valid notations are:
{MIN,MAX}
{MIN,}
{EXACT}
e.g. {2,5}
e.g. {2,}
e.g. {2}
“2 to 5 repetitions”
“2 or more repetitions”
“exactly 2 repetitions”
For example:
A.B{1,2}
A.B{0,}
A{2}.B{3}
LAVASTORM ANALYTICS
lavastorm.com
matches:
matches:
matches:
A.B and A.B.B
A, A.B, A.B.B, A.B.B.B, …
A.A.B.B.B
Page 130
Issue 1
LDR User Manual
Note there is no concept of “greedy” and “lazy” quantifiers because all fields that
match a pattern are output. So, in the first example above, if both A.B and A.B.B
exist in the input, then both will be output.
6.5.5
Regular Expressions (//)
Java-compatible regular expressions can be used within each test. See
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html for Java’s
regular expression documentation. To use a regular expression, simply enclose it
in forward-slashes within the test expression.
For example, given the following fields:
A.A
A.B
A.C
The following pattern will match A.A and A.B:
A./[AB]/
Group capturing is also possible, but the syntax for backreferences slightly
deviates from Java’s syntax. To refer to a group, use $0, $1, $2, … $n. Note that,
as in Java, $0 is a default group that always refers to the entire regular expression.
6.6 Trigger Events
The output trigger event is an implicitly defined event that requires a record to be
output. The trigger event can only be determined by matching the input
specification against the output specification. For each trigger event that occurs in
the data, a record is output. In some cases with dynamically generated types, the
trigger event cannot be determined until runtime. Where there are optional
elements (using an or tag), there may be multiple possible trigger events and the
trigger event which causes a record to be output depends on the data present in the
file.
In order to correctly specify trigger events, and trigger event paths, we introduce
the notation as the ith loop (0-indexed) ocuring on the type E, and define F as
the special file-loop.
The definition of the trigger event is shown in Table 13.
Table 13 –Trigger Event Definition
Trigger Event Definition
A trigger event is the deepest loop in a DRIX specification which has at least one
field within it that:
Occurs in the data file AND
LAVASTORM ANALYTICS
lavastorm.com
Page 131
Issue 1
LDR User Manual
Is in the included set of an output mapping
Considering the field structure as a tree, the trigger event path is therefore defined
as the sequence of loops that are encountered from the root element (primary
field) to the trigger event. Depending on the data present in the file, any one of the
loops in the trigger event path may end up being a trigger event at a particular
stage of processing.
Consider the DRIX in Example 110. This DRIX will be used extensively in the
following subsections.
Example 110 –DRIX example highlighting the output trigger event
…
<namespace name="MyNamespace">
…
<type name="A">
<repeatRange min="1" max="unbounded">
<field name="a1" type="B"/>
</repeatRange>
</type>
<type name="B">
<field name="b1" javaType="int"/>
<repeatRange min="1" max="unbounded">
<field name="b2" type="C"/>
</repeatRange>
<repeatRange min="1" max="unbounded">
<field name="b3" type="C"/>
</repeatRange>
<field name="b4" javaType="int"/>
</type>
<type name="C">
<repeatRange min="1" max="unbounded">
<or>
<field name="c1" type="D">
<arg name="expected" value="c1"/>
</field>
<field name="c2" type="D">
<arg name="expected" value="c2"/>
</field>
<or/>
</repeatRange>
</type>
<type name="D">
<param name="expected" javaType="String"/>
<field name="_value" type=".string.Ascii" readRequired="true">
<arg name="length" value="2"/>
</field>
<testMethod>
<![CDATA[
if (field()._value().equals(param().expected())
return Result.GOOD;
return Result.NOT_ME;
]]>
</testMethod>
LAVASTORM ANALYTICS
lavastorm.com
Page 132
Issue 1
LDR User Manual
<emittable type="String">
<fromField field="_value"/>
</emittable>
</type>
…
<primaryField type="MyNamespace.A">
…
Then if we had the following output specification:
Example 111 – Valid DROX example highlighting the output trigger event
<drox>
<output name="out1" mapping="out1Mapping">
<mapping name="out1Mapping">
<include>
<fields>
<pattern pattern="a1.b2.c1"/>
</fields>
</include>
<include>
<fields>
<pattern pattern="a1.b1"/>
</fields>
</include>
</mapping>
</drox>
The trigger event path for output out1 can be defined as:
Consider then that we have the following data file layout:
Example 112 –Example file layout identifying trigger event occurrences.
a1
b1
b2
c1
c1
b3
c1
b4
a1
b1
b2
c1
c2
b3
c2
b4
a1
b1
b2
c2
c2
b3
c1
LAVASTORM ANALYTICS
lavastorm.com
Page 133
Issue 1
LDR User Manual
b4
When determining how to output the records, the following steps are taken:
1. a1 encountered – implies that we have hit
in the trigger event path. Still
have not hit any fields we are required to output.
2. a1.b1 encountered - have hit a field we are required to output under
in
the trigger event path. If no deeper loops are encountered, this is the
trigger event. Store the a1.b1 field to output
3. a1.b2 encountered- implies that we have hit
in the trigger event path.
Still have not hit any fields under this loop we are required to output.
4. a1.b2.c1 encountered –have hit a field we are required to output under
in the trigger event path. This is the deepest loop in the trigger event
path. Output the record, containing the field read in this step, and the field
read in step 2.
5. a1.b2.c1 encountered –have hit a field we are required to output under
in the trigger event path. This is the deepest loop in the trigger event
path. Output the record, containing the field read in this step, and the field
read in step 2.
6. a1.b3 encountered. We are not concerned about this field, or anything
under this field. Ignore it & its subfields.
7. a1.b4 encountered. We are not concerned about this field, or anything
under this field. Ignore it & its subfields (none).
8. a1 encountered – implies that we have hit
in the trigger event path. Still
have not hit any fields we are required to output.
9. a1.b1 encountered - have hit a field we are required to output under
in
the trigger event path. If no deeper loops are encountered, this is the
trigger event. Store the a1.b1 field to output
10. a1.b2 encountered- implies that we have hit
in the trigger event path.
Still have not hit any fields under this loop we are required to output.
11. a1.b2.c1 encountered –have hit a field we are required to output under
in the trigger event path. This is the deepest loop in the trigger event
path. Output the record, containing the field read in this step, and the field
read in step 8.
12. a1.b2.c2 encountered – We are not concerned about this field, or anything
under this field. Ignore it & its subfields (none).
13. a1.b3 encountered. We are not concerned about this field, or anything
under this field. Ignore it & its subfields.
14. a1.b4 encountered. We are not concerned about this field, or anything
under this field. Ignore it & its subfields (none).
15. a1 encountered – implies that we have hit
in the trigger event path. Still
have not hit any fields we are required to output.
16. a1.b1 encountered - have hit a field we are required to output under
in
the trigger event path. If no deeper loops are encountered, this is the
trigger event. Store the a1.b1 field to output
17. a1.b2 encountered- implies that we have hit
in the trigger event path.
Still have not hit any fields under this loop we are required to output.
18. a1.b2.c2 encountered – We are not concerned about this field, or anything
under this field. Ignore it & its subfields (none).
LAVASTORM ANALYTICS
lavastorm.com
Page 134
Issue 1
LDR User Manual
19. a1.b2.c2 encountered – We are not concerned about this field, or anything
under this field. Ignore it & its subfields (none).
20. a1.b3 encountered. We are not concerned about this field, or anything
under this field. Ignore it & its subfields.
21. a1.b4 encountered. We are not concerned about this field, or anything
under this field. Ignore it & its subfields (none).
22. Recognise that
is the trigger event for this record as there are no deeper
loops in the trigger event path. Output the field read in step 14, and a null
value for a1.b2.c1 (since not encountered).
6.6.1 Multiple Trigger Events and Mapping Unions
When mapping unions are not used, it is an error for a DRIX and DROX to define
multiple trigger event paths on an output. Consider the DRIX in Example 110
previously introduced in section 6.6. Then consider that the DROX in Example
113 is applied.
Example 113 – Valid DROX example highlighting the output trigger event
<drox>
<output name="out1" mapping="out1Mapping">
<mapping name="out1Mapping">
<include>
<fields>
<pattern pattern="a1.b1"/>
</fields>
</include>
<mappingReference mapping="mapping2"/>
<mappingReference mapping="mapping3"/>
</mapping>
<mapping name="mapping1">
<include>
<fields>
<pattern pattern="a1.b2.c1"/>
</fields>
</include>
</mapping>
<mapping name="mapping2">
<include>
<fields>
<pattern pattern="a1.b3.c1"/>
</fields>
</include>
</mapping>
</drox>
The trigger event path to field a1.b1 is:
There is no problem with this inclusion. Then, mapping1 is included. Mapping1
will have its trigger event path defined as:
LAVASTORM ANALYTICS
lavastorm.com
Page 135
Issue 1
LDR User Manual
This path is simply an extension to the original trigger event path. Therefore, the
trigger event path for out1Mapping becomes the deeper of the two paths – the
trigger event path from mapping1. The next element in the DROX includes
mapping2 into out1Mapping. Mapping2 will have trigger event path defined as:
This is a different trigger event path, and not simply an extension, or a subsection
of the existing trigger event path. Therefore, since the different trigger event path
is not being included via a mapping union, this is an error condition.
However, if we changed the definition of out1Mapping to that shown in Example
114, this is no longer an error.
Example 114 – Using mapping unions for multiple trigger event cases
<mapping name="out1Mapping">
<include>
<fields>
<pattern pattern="a1.b1"/>
</fields>
</include>
<mappingReference mapping="mapping2" union="true"/>
<mappingReference mapping="mapping3" union="true"/>
</mapping>
In the case of mapping unions, each individual mapping has its own trigger event
path defined. Then, when the mappings are unioned together, there exist multiple
trigger event paths in the containing mapping. In the situation shown above, a new
record output will be triggered when ever an a1.b2.c1 field, or an a1.b3.c1 field is
encountered. In the first case, the a1.b3.c1 output field will be null. In the second
case, the a1.b2.c1 output field will be null.
6.6.2 Output Suspension and Mapping Unions
A record must be constructed for an output each time that output’s trigger event is
encountered. This, however, does not mean that the record needs to be constructed
when the trigger event is encountered. Consider again the DRIX introduced in
Example 110. Consider the case shown below.
Example 115 – Output Suspension – Fields after the trigger event
<drox>
<output name="out1" mapping="out1Mapping">
<mapping name="out1Mapping">
<include>
<fields>
<pattern pattern="a1.b2.c1"/>
</fields>
</include>
<include>
<fields>
LAVASTORM ANALYTICS
lavastorm.com
Page 136
Issue 1
LDR User Manual
<pattern pattern="a1.b4"/>
</fields>
</include>
</mapping>
</drox>
The trigger event path for the a1.b2.c1 field is:
The trigger event path for the a1.b4 field is:
Therefore, there is no issue with multiple trigger event paths, since the first trigger
event path is simply an extension on the second path. This then implies that a
record should be output for each occurrence of an a1.b2.c1 field. However,
consider the data file in Example 112. In this situation, we encounter two a1.b2.c1
fields prior to encountering the b4 field under the same a1. This means that output
is suspended until we have read the b4 field which exists under .
This suspension, however, is not applied across mapping unions. Consider that we
have the DROX in Example 116.
Example 116 – Mapping Unions and Output Suspension
<drox>
<output name="out1" mapping="out1Mapping">
<mapping name="out1Mapping">
<mappingReference name="mapping1" union="true"/>
<mappingReference name="mapping2" union="true"/>
</mapping>
<mapping name="mapping1">
<include>
<fields>
<pattern pattern="a1.b2.c1"/>
</fields>
</include>
</mapping>
<mapping name="mapping2">
<include>
<fields>
<pattern pattern="a1.b4"/>
</fields>
</include>
</mapping>
</drox>
In this case, since we are performing a union of two mappings, the suspension of
output does not occur. Consider now that we have the data file structure displayed
in Example 117, with the values for field a1.b2.c1 and a1.b4 shown in brackets
next to the field occurrences.
Example 117 –Example file layout – read data values for a1.b2.c1 and b4 fields shown in
brackets.
LAVASTORM ANALYTICS
lavastorm.com
Page 137
Issue 1
LDR User Manual
a1
b1
b2
c1(A)
c1(B)
b3
c1
b4(C)
a1
b1
b2
c1(D)
c2
b3
c2
b4(E)
a1
b1
b2
c2
c2
b3
c1
b4(F)
In this situation, the output from the LDR will be as shown in Example 118
Example 118 –Example mapping union with no output suspension
a1.b2.c1
A
B
Null
D
Null
Null
a1.b4
Null
Null
C
Null
E
F
In this case, since the trigger events & trigger paths are treated independently, and
since there is no suspension of output, the values in one of the fields will always
be null.
6.6.3 Zero-Width Trigger Fields
Another important thing to note is that for cases of zero-width trigger fields, there is
no data output. This means that even if other data exists in the trigger path, if the
trigger event field does not exist in the data, then no record is output.
Consider the DRIX shown in Example 119
Example 119 –Example DRIX for zero-width trigger fields
…
<namespace name="MyNamespace">
…
<type name="T">
<field name="a" type="HDR"/>
<repeatRange min="0" max="unbounded">
<field name="b" type="B"/>
LAVASTORM ANALYTICS
lavastorm.com
Page 138
Issue 1
LDR User Manual
</repeatRange>
<field name="c" type="TRL">
</type>
…
<primaryField type="MyNamespace.T">
…
In this case, the trigger event occurs on the b field. However, given that this exists
under a repeatRange with no minimum, then it is possible that the trigger field does
not exist. In this case, consider that we have the DROX shown in Example 120,
applied to the file layout shown in Example 121.
Example 120 – Mapping Unions and Output Suspension
<drox>
<output name="out1" mapping="out1Mapping">
<mapping name="out1Mapping">
<include>
<fields>
<pattern pattern="a"/>
</fields>
</include>
<include>
<fields>
<pattern pattern="b.*"/>
</fields>
</include>
<include>
<fields>
<pattern pattern="c"/>
</fields>
</include>
</mapping>
</drox>
Example 121 –Example file layout – showing zero-width trigger fields.
HDR
B
B
TRL
HDR
TRL
In this case, there will be two records. The first will contain the first HDR & TRL,
and the first B field. The second will contain the first HDR & TRL field, and the
second B field.
However, there will be no third record for the second HDR & TRL field, since the
trigger field is zero-width. In order to ensure that you obtain all references to the
non-trigger fields, these can be defined in their own output, or defined in their own
mapping and unioned with the trigger field.
6.6.4 Clearing Actions
So far, we have discussed how to compose records based of multiple mapping, how
LAVASTORM ANALYTICS
lavastorm.com
Page 139
Issue 1
LDR User Manual
trigger events work in general and when trigger events can be suspended. One thing
that has not yet been discussed is when fields are cleared such that they do not
appear on an output record. Clearing actions are – like the trigger event – implicitly
defined. A field is cleared on the containing loop under which it lies. Until this
containing loop is terminated, a field which has been set will be present on any
output record.
6.6.5 Include TriggerProperties
As mentioned briefly in section 6.3.1, each include tag has an optional
triggerProperties attribute. This attribute specifies how the included fields are to
participate in the determination of trigger events. It also specifies whether or not the
fields will “dirty” the record. There are two components to writing a record. The
trigger event determines when the record should be checked to see if it should be
output. When the record is checked, it is only output if the record has been “dirtied”
– i.e. a field has been set on the output record that dirties the record.
The allowable options for triggerProperties on include-fields tags are:
default
none
triggerableAndDirtying
Some additions to this list may be made in future to allow for the case when the field
only dirties the record, but does not define a trigger event, or when the field should
be considered to take part in the trigger calculation, but should never dirty the record
itself.
When the default attribute is used, the property used depends on the fields being
included as shown in the table below:
Field Type
Default Trigger Properties
Normal (any normal field)
triggerableAndDirtying
refId
none
relId
none
fileId
none
errorCode
triggerableAndDirtying
errorMessage
triggerableAndDirtying
errorCount
No effect*
errorFields
No effect*
*These fields are record-based output fields, therefore are only considered after it has
already been determined that an output record needs to be written.
6.7 Special Identifier Fields
All of the fields referenced in the output specification we have discussed thus far
rely on the field being defined in the input specification. In general, the field needs
LAVASTORM ANALYTICS
lavastorm.com
Page 140
Issue 1
LDR User Manual
to be defined in the input specification in order to be referenced in the output. The
exceptions to this rule are the special cases of the refId, relId, bytePosition, fileId
and fileName for multiple file usage. In section 10.1.6, additional special fields for
identifying errors are introduced however these are not of particular concern here.
On each field, there are two implicitly defined counters, the refId (short for
reference identifier), and the relId (short for relative identifier). The relId is an
occurrence count, identifying the number of times a field has occurred within its
containing loop. The refId is an integer which is unique for any field occurrence in
a given file. This can then be used as a key (together with the fileId) to maintin
referential integrity & join data split into multiple outputs. The fileId is for use
when multiple files are pumped through the same DRIX & DROX, and identifies
the index of the file which was being processed when the record was produced.
Simliarly, the filename identifies the file that was being processed when a record
was produced, but uses the file name rather than the index. All of these integers are
zero-indexed. The bytePosition special field is used to identify the location within
the data file where the field occurred. This is the location of the start of the field,
not the end of the field.
We can reference these specialFields via the corresponding tag, with the properties
described in DROX Tag 15.
DROX Tag 15 specialField
<specialField>
Description
Position
Attributes
Elements
Specifies the special field(s) to include/exclude in a mapping.
Can occur:
Within an include->fields tag (0..1 occurrence)
Within an exclude->fields tag (0..1 occurrence)
Required type attribute
Optional pattern attribute
The type attribute must be one of {refId, relId, bytePosition, fileId,
fileName} (all described in this section), or {errorCode, errorCount,
errorFields, errorMessage} (described in section 10.1.6).
The pattern attribute must be present if the type is one of {refId,
relId, errorCode, errorMessage}.
The pattern attribute must not be present if the type is one of
{fileId,fileName errorCount, errorFields}
None
As mentioned in the table, there are 7 possible values for the type attribute on a
specialFields tag. The type value determines whether or not a pattern attribute is
required, or must not exist. For refId, relId, bytePosition, errorCode &
errorMessage specialField types, these all refer to properties/attributes of
individual fields. Therefore, a pattern attribute is required to determine which field
(or fields) should have the refId, relId, bytePosition, errorCode or errorMessage
included or excluded. The pattern attribute itself, is simply a field pattern, which is
described in detail in section 6.5.
LAVASTORM ANALYTICS
lavastorm.com
Page 141
Issue 1
LDR User Manual
All of the other specialField types do not reference properties/attributes of
individual fields. The fileId specialField references the file index for the current file
being processed therefore it does not make sense to include a pattern attribute.
Simliarly, for the fileName specialField there is no need for a pattern attribute. As
discussed later in section 10.1.6, the errorCount & errorFields are
properties/attributes of an entire record. Therefore they also do not require a pattern
attribute.
Consider the case where we have a very simple (and probably useless) input
specification shown in Example 122.
Example 122 –Example DRIX specification displaying special identifier utility
…
<namespace name="MyNamespace">
…
<type name="File">
<repeatRange min="1" max="unbounded">
<field name="rec" type="Record"/>
</repeatRange>
</type>
<type name="Record">
<repeatRange min="1" max="unbounded">
<field name="rF" type="Field"/>
</repeatRange>
</type>
…
<primaryField type="MyNamespace.File"/>
…
Then say we had the output specification shown in Example 123
Example 123 –Example DROX specification displaying special identifier utility
<drox>
<output name="out1" mapping="out1Mapping"/>
<mapping name="out1Mapping">
<include>
<fields>
<specialField type="fileId"/>
</fields>
</include>
<include>
<fields>
<specialField type="relId" pattern="rec"/>
</fields>
</include>
<include>
<fields>
<pattern pattern="rec.rF.relId"/>
</fields>
</include>
<include>
<fields>
LAVASTORM ANALYTICS
lavastorm.com
Page 142
Issue 1
LDR User Manual
<specialField type="refId" pattern="#{0,}.*"/>
</fields>
</include>
</mapping>
</drox>
Consider also that we have the following input data:
Example 124 –Example file layout, displaying special identifier utility
rec
rF
rF
rec
rF
rF
rF
rec
rF
rF
rec
rF
rF
rF
Then we would end up with the flat-file output shown below:
Example 125 –Example special identifier output
fileId
0
0
0
0
0
0
0
0
0
0
rec.relId
0
0
1
1
1
2
2
3
3
3
rec.refId
a
a
d
d
d
h
h
k
k
k
rec.rF.relId
0
1
0
1
2
0
1
0
1
2
rec.rF.refId
b
c
e
f
g
i
j
l
m
n
Where a,b,c,d,e,f,g,h,I,j,k,l & m are all integers, and a < b < c < d < e < f < g < h <
I < j < k < l < m.
Notice that for the refId include, the pattern “#{0,}.*” will match both the field
rec and also the field rec.rF. Therefore the include statement will include the refId
for the field rec and also for the field rec.rF.
Since all refIds, relIds and fileIds are zero-indexed, the last row of this table states
that this rF belongs on the record with id k. This happens to be the 3rd rF under the
4th rec in the input data file. In this example since we are only providing one file,
the fileId is always 0.
These special identifier fields are essential when the input data cannot be output in
a single flat file, and some sort of referential key is required to join the different
outputs back together.
LAVASTORM ANALYTICS
lavastorm.com
Page 143
Issue 1
LDR User Manual
Excluding specialFields is done in exactly the same manner, except is declared
under an exclude ->fields->specialField tag. It is important to note that the
exclude->fields->pattern tag will not exclude any fields included via include>fields->specialField. The rules for including and excluding special & normal
fields are shown in Table 14.
Table 14 –Include/Exclude Rules for Special & “Normal” fields.
Include/Exclude Rules for Special & “Normal” fields
Any field included via a specialField tag can only be excluded by a
specialField tag of the corresponding type, or, via an exclude->names tag
where the name matches the output name of the specialField.
Any field included via a fields->pattern tag can only be excluded by a
exclude->field->pattern tag or, via an exclude->names tag where the name
matches the output name of the field.
Special fields can be renamed in exactly the same manner as normal fields. Here,
any substitution ($x) patterns are applied against the pattern in the specialField
tag. Therefore, the “$” substitution mechanism cannot be used in cases where no
pattern tag exists on the specialField tag.
LAVASTORM ANALYTICS
lavastorm.com
Page 144
Issue 1
LDR User Manual
7 Advanced Concepts
This section is intended for LDR users who want to understand more of the operation
of the LDR. Furthermore, this section is required in order to read difficult formats,
requiring specialised primitive types.
In order to correctly create new primitive types and understand the concepts in this
section, the user will need to have some java experience.
7.1 Program Flow
The LDR can be thought of as having four major phases of operation. The first phase
involves reading in the input & output DRIX & DROX specifications parsing the
specification files and validating them against the XSDs. This phase is depicted in
Figure 2.
For the subsequent phases of operation, the reader should consult Figure 3.
Using these specification files, the LDR then dynamically constructs and compiles the
necessary java classes that are required to read the data files. Whenever there is datadependant type binding, this cannot be done at startup, and some delayed compilation
will be required. However, if we ignore this for the moment, we can consider that
phase 2 is the generation of the required dynamic classes. Whenever a user specifies
their own primitive types with read, scan, skip methods specified, it is in this phase
where these methods will be placed in the appropriate java Parser class and compiled.
In addition to the dynamic compilation of Parser classes from the input specification,
the LDR will also generate the classes required to process the output according to the
output specificiation in this phase. Phases 2-4 are depicted in Figure 3.
The actual parsing of the data file takes place in phase 3 and 4. Phase 3 involves the
scanning of field tokens. In phase 4, the scanned fields are read and output.
Phase 3 begins when the engine calls the scan method on the Parser class for the
constructed primary field – depicted in process 3.1 of Figure 3. Whenever a field is
correctly scanned by a Parser, the Parser will store a token containing the field Id and
file position on the ticker-tape (discussed in more detail in section 7.2.1). Process 3.2
of Figure 3 depicts the writing of tokens to the ticker-tape.
LAVASTORM ANALYTICS
lavastorm.com
Page 145
Issue 1
LDR User Manual
Figure 2 LDR Program Flow Phase 1. Parsing the Input and Output Specifications
LAVASTORM ANALYTICS
lavastorm.com
Page 146
Issue 1
LDR User Manual
Figure 3 LDR Program Flow Phases 2-4. Dynamic Class Construction, File Scanning and Reading
LAVASTORM ANALYTICS
lavastorm.com
Page 147
Issue 1
LDR User Manual
Sometime after the tokens are written to the ticker-tape they become available to
output by the constructed output classes. For now it is not necessary to understand
when these tokens become available to output (this is discussed in section 7.2.1 in
more detail). However, as soon as any tokens are available for output, phase 4 is
initiated.
The output classes will continually check the ticker-tape for tokens that are available
for output. The classes know which fields they are searching for based on the output
specification. As soon as a token is available on the ticker-tape that corresponds to a
field that the output class is searching for, the output class will obtain the token off the
ticker tape. This is depicted in process 4.1 of Figure 3.
After obtaining the token, the output class will use the field id to locate the
constructed Parser class corresponding to the field. The output class will then move
the file pointer to the file position specified in the ticker-tape token, and then invoke
the read method on the Parser class. Once the field has been read, the output class will
obtain the read field value of the Parser. This is decpicted in process 4.2 of Figure 3.
In section 6.6 we introduced the concept of a trigger event. When an output class
encounters a field on the ticker-tape that causes a trigger event, the fields that it has
read in process 4.2 are used in the construction of a flat record. The constructed flat
record is then returned through the LDR engine interface, and processing continues.
This is depicted in process 4.3 of Figure 3.
We can see then, that for an individual field, the general program flow follows the
phases and proceses depicted in Figure 3. However, this process is ongoing, and some
fields will be output as part of a read flat record prior to other fields being scanned in
the file. These processes will continue until the entire file is read. Therefore, after
calling the readFile method, the client program depicted in Figure 3 will need to
accept records as they are processed by the engine. For those writing third party
software to interface with the LDR, the javadoc API should be consulted as to how
this is performed.
7.2 Required Interfaces
7.2.1 TickerTape
NOTE:
The TickerTape interface is never required in order to correctly build Primitive
Types. In certain situations, where the user is overwriting the auto-generated scan,
skip, skipCount or read methods of a Standard Type and where the type has loops, the
TickerTape interface may be required. The use of the TickerTape interface is
discouraged and is not guaranteed to remain constant between different versions of
the LDR. The description of how the ticker-tape operates is provided in this section to
provide users with contextual information as to how the LDR parsing operations are
performed, and this knowledge may be useful for some optimizations in the types
created by users.
LAVASTORM ANALYTICS
lavastorm.com
Page 148
Issue 1
LDR User Manual
In section 7.1 we briefly introduced the concept of the ticker-tape. The ticker-tape
contains a trace of the fields that have been encountered in the file. Each of the tokens
on the ticker-tape corresponds to a field that was read at a particular location on the
file, and contains the following information:
Field Id
File Position where the field was encountered in the file.
Scan flags (emittable, endToken, primitive etc)
Cached field value
The field Id is a unique identifier which can be mapped to a field name, and is used to
obtain the constructed Parser class corresponding to the field. The scan flags are the
set of flags which the TickerTape and the output side of the engine use to maintain
some contextual information on the tokens. They also allow the output side of the
engine to determine what to do with the token, and to optimise performance.
For all Standard Types, a start and an end token will be placed on the ticker-tape. For
Primitive Types, only one token is placed on the tape. If the token is an end token for
a Standard Type, then the endToken flag is set to true. In all other cases this flag is set
to false.
In section 7.1, we mentioned that the tokens are not immediately available to the
output classes for output. This is because the ticker-tape may need to be rolled back to
a previous position. Consider the input specification displayed in Example 126.
Example 126 – Example format where ticker-tape rollbacks may be required.
<drix>
<library name=”MyLibrary” version=”000.000.000.001”>
<namespace name=”MyNamespace”>
<type name=”MySet”>
<or>
<field name=”field1” type=”Type1”/>
<field name=”field2” type=”Type2”/>
</or>
</type>
<type name=”Type1”>
<field name=”field1_1” type=”Type2”/>
<field name=”field1_2” type=”Type2”/>
<field name=”field1_3” type=”Type2”/>
</type>
…
…
</namepace>
</library>
<primaryField name=”MyField” type=”MyNamespace.MySet”>
</drix>
In this example, it is feasible that we could encounter one Type2 field. If this were the
case, then MyField.field1.field1_1 would be scanned successfully. A token would then
be written to the ticker-tape. However, if there were no subsequent Type2 fields, then
MyField.field1 would not scan correctly. In this case, we would drop to the second
field in the or clause, and MyField.field2 would scan correctly. In such a situation, we
would need to rollback the original element on the ticker-tape, and replace it with
MyField.field2.
LAVASTORM ANALYTICS
lavastorm.com
Page 149
Issue 1
LDR User Manual
If all tokens were available for output as soon as they were written to the ticker-tape,
this would lead to erroneous outputs. Therefore, only fields that do not lie nested
under an or, repeatRange, or while tag are immediately available for output.
Whenever one of these fields is encountered, all of the tokens that exist on the tickertape at that time become available for output. In addition to these cases, all tokens on
the ticker-tape are available for output whenever a publishtag is encountered.
In general, the ParserContext will handle all of the ticker-tape updates required, and
users who write their own java methods will not need to worry about the ticker-tape.
However, if you are constructing your own methods that are looping over other fields,
then you may need to make ticker-tape updates yourself. In these cases, the
TickerTape is available via the ParserContext through a call to context.tickerTape() in
each of the read, scan, skip, skipCount and test methods. It is highly unlikely that this
situation should arise, as the repeatRange and while tags should provide sufficient
functionality for all looping requirements.
The TickerTape API is available through the LDR Engine javadoc API.
7.2.2 LDRByteBufferInterface
In order to ensure optimal file I/O, the LDR processes all file operations through the
LDRByteBufferInterface. Wherever feasible, this interface mimics the ByteBuffer
interface defined in the java API. The underlying class seeks to provide optimal file
access by buffering multiple blocks of the file into memory at a given time, and
swapping these blocks in and out of memory as required.
In addition to the methods defined on the standard ByteBuffer interface, the
LDRByteBufferInterface provides mechanisms for obtaining non-byte aligned data. It
also provides methods to obtain character data from a file in specified character set
encodings. Any user-defined scan, skip, skipCount and read methods will need to use
the LDRByteBufferInterface.
The LDRByteBufferInterface is accessible in all read, scan, skip, skipCount and test
methods through the locally scoped buffer variable. The LDRByteBufferInterface API
is available through the LDR Engine javadoc API.
7.2.3 ParserContext
The ParserContext maintains the state of the parsing operations, and determines when
tokens should be placed on the TickerTape, or when rollbacks are required. Whenver
a read, scan, skip or test method is called, the context is notified upon entry and exit
of the method. In addition to this important task, the ParserContext also updates the
LDRByteBufferInterface as required, ensuring that if a field is not scanned correctly,
the file position is reverted to the file position where field scanning began.
The ParserContext also maintains a running trace on the current nesting of scan, skip,
read operations, and is able to provide a detailed stack trace in error cases where an
LAVASTORM ANALYTICS
lavastorm.com
Page 150
Issue 1
LDR User Manual
entire data file cannot be read. If a NOT_ME result is returned for a non-optional
field, the ParserContext is then able to log the last successful field tokens that were
seen on the file, and also log the nested NOT_ME errors that led to the file parse
failure.
All of this, however, is performed behind the scenes, and the specification writer does
not need to know about these implementation details (although understanding this can
sometimes prove to be useful when optimising specifications for performance).
In general, the user cares about the ParserContext because it provides access to
logging, and to the LDRByteBufferInterface. The ParserContext is accessible in all
read, scan, skip, skipCount and test methods through the context method argument.
The ParserContext API is available through the LDR Engine javadoc API.
The context.log(..) method is available in the user-written methods to log messages to
the ParserLog via the ParserContext. The details of what the log does with these
errors are specific to the implementation of the ParserLog, and the threshold values
placed on the log (see sections 7.2.5 and 10.1.2 for more information on the
ParserLog, and the thresholding & logging settings respectively).
The context.buffer() method is available in the user-written methods to obtain a
reference to the LDRByteBufferInterface.
In Figure 3, the ParserContext would lie between the dynamically constructed Parser
classes and the ticker-tape.
7.2.4 Parser
For each of the fields referenced in an LDR Input Specification file, a dynamically
constructed java class is constructed. Each of these classes implements the Parser
interface. The Parser interface defines the method signature for each of the read, scan,
skip and skipCount methods. All of the user-defined methods must comply with the
method signatures specified in the Parser interface. There are details of how the code
is constructed that are not specified on the Parser interface. Some of this information
is contained in section 7.4.
The Parser interface also specifies utility methods which are invoked by the
ParserContext in order to determine the tokens and flags to place on the ticker-tape.
The Parser API is available through the LDR Engine javadoc API.
One important part of the Parser interface is the Result enumeration. The scan
method, skip method, read method and test method are all declared to return a Result.
The Result can be one of 3 values:
Result.GOOD
o Indicates the operation was successful & the data is not invalid
Result.BAD
o Indicates that the operation was successful, however the data is invalid.
This implies that the expected data exists at the specified location, so
LAVASTORM ANALYTICS
lavastorm.com
Page 151
Issue 1
LDR User Manual
scanning/reading/skipping etc can continue. However, the data is badly
formatted.
Result.NOT_ME
o Indicates that the operation was not successful &
scanning/reading/skipping can not continue. If this field is required,
then the file errors out with a Failed Data Layout error. If this field is
not required (under an or, loop), then the next possible field will be
attempted to be read instead of this field.
In general, the scan method, skip method & test method should only ever return
Result.GOOD or Result.BAD. The read method can return any of these values.
7.2.5 ParserLog
All warnings and errors encountered during the parsing of a file should be logged to
the ParserLog. The ParserLog is an interface that defines simple logging methods.
The most common method to use on the ParserLog interface is the log method which
takes a String error message, an LDRException.ErrorLevel and an
LDRException.ErrorType as parameters.
The ParserLog should have its error thresholds set immediately after construction
through a call to the recoverableErrorThreshold(…) method. This method sets the
number of recoverable errors that are able to be encountered in a file before the file
fails parsing. The method also takes two threshold parameters, which define the error
threshold, and log threshold. For all errors which have an ErrorLevel greater than or
equal to the error threshold, the running count of recoverable errors encountered is
incremented. For all errors which have an ErrorLevel greater than or equal to the log
threshold, the message is written to the log.
The interface also defines a method for resetting the error count. This allows for third
party software using the LDR as an API to leave the error count running over multiple
files for the same input specification, or reset it for each file if required.
The LDR provides a DefaultErrorLog class which simply implements the ParserLog
by writing the log messages to a specified text file. Also provided is the
EmptyErrorLog which has the appropriate error thresholding, but does not write to
any log file. For those using the LDR as an API for their own software, there is the
option to write your own ParserLog class and pass this in to the LDR to handle
logging in your own user defined manner.
For more information on the thresholding and logging settings that can be used in the
ParserLog, see Thresholding & Logging in section 10.1.2. The ParserLog API is
available through the LDR Engine javadoc API.
7.3 Advanced Code Elements for Construction of New Types
LAVASTORM ANALYTICS
lavastorm.com
Page 152
Issue 1
LDR User Manual
For most file formats, the DRIX elements described in sections 5.1 & 5.2 should be
sufficient to create the input specification. In most cases, the only element defined in
this section that the average user may need to implement is Test. The LDR is shipped
with an extensive base library from which the user can construct most file formats
without needing the advanced code elements. However, if an entirely new and
sufficiently unique file format is encountered, then it may be necessary to use the tags
described in this section to define new types.
The tags described in this section are all placeholders for user-defined java code. In
order to use these tags, a deeper understanding of how the LDR operates is required.
In general, in order to use these tags, the user needs to be familiar with the following
concepts:
1. Where the tags fit in the xml schema
2. The structure of the generated code from the xml (how to reference
fields, parameters etc – see section 7.4 & the table below)
3. Basic program flow in the LDR, relating to the two-stage lexing then
scanning process and the ticker-tape (see section 7.1).
4. The LDR engine API (LDRByteBuffer, TickerTape, ParserContext,
Parser – see section 7.2 and the LDR Engine javadoc API)
The general rule for accessing & modifying fields and parameters in a code block is
the following:
Table 15 –Accessing Fields, Parsers & Params in custom code blocks.
Param/Field/
Parser
Param
Accessor/Modifier/Method Call
Method
Accessor
Param
Modifier
Param
Invalid Check
(Whether or not the param has
been set at all – via
default, super-arg or arg
tag)
Set Accessor
(whether or not the param has
been set 153ieldname153ly by
a field-arg tag)
Accessor
paramType value =
param.paramName()
param.paramName(ParamType
value)
param.paramNameInvalid()
Param
Constant
Template
Param
Field
Accessor
Field
Modifier
Field
Size Accessor
(number of elements read)
Occurrence Accessor
(used to access an individual
element of a field, when the
field exists under a loop,
with default/append
onMultiple actions)
Set Accessor
(whether or not the field has
been read & set)
Accessor
Field
Field
Field
LAVASTORM ANALYTICS
lavastorm.com
Page 153
param.paramNameSet()
constTemplateParamType value =
TemplateParam.templateParamName
FieldType value =
field.fieldName()
field.fieldName(FieldType
value)
field.fieldNameSize()
field.fieldName_acc(loopIndex)
field.fieldNameSet()
parser.fieldNameParser().
Issue 1
LDR User Manual
Parser
Parser
Parser
Accesses the emittable value
of the field. If the field is
not emittable, this is the
field object, otherwise it is
the emittable value.
Accessor
Method Call
Sets up a parser, for the
<index>th field (including
fields that are skipped and
inherited fields) defined in
the type’s field structure.
This sets up all of the
required arguments, template
arguments, type arguments
etc.
The setupParser boolean
argument is used to specify
whether or not we want to
construct a new parser.
Method Call
Method to setup arguments on
parent type (uses the values
from super-arg tag
combination)
Should be done prior to
calling super. Will be
inefficient but does not fail
if called multiple times.
emittableValue(ParserContext
context)
parser.fieldNameParser()
setupParser<index>(int
loopIndex, boolean
createParser, ParserContext
context)
setupSuperArgs(ParserContext
context)
All of the above methods & fields are available in any expr tag, any readMethod,
scanMethod, skipMethod & skipCountMethod tag and in code tags where the location
is “class” (default). Since code tags with a location of “file” are external to the class,
these code blocks can only be used for import statements therefore none of the above
are available. Code tags with a location of init are able to use all of the above
acceessors, however since the init block is called at the end of the Parser constructor,
none of the fields, params or parsers will have been setup. Only the templateParam
accessor should be used in the init code block.
At the start of each of the read, scan, skip, skipCount and test methods, some autogenerated code is provided as a convenience to give the user access to commonly
required objects. For a type T, this will then provide the following declarations:
Example 127 – Convenience declarations provided at the start of user written methods
LDRByteBufferInterface buffer = context.buffer();
Result _res = null;
TParams param = param();
In a Standard Type, T, then in addition to the above, the following declarations will be
provided:
Example 128 – Convenience declarations provided at the start of user written methods in constructed
types.
T field = field();
TParsers parser = parser();
boolean setupParser=true;
int loopIndex=0;
int min=0;
int max=0;
boolean keepLooping=true;
LAVASTORM ANALYTICS
lavastorm.com
Page 154
Issue 1
LDR User Manual
boolean[] fieldExists=null;
Although the advanced tags should not be needed by the average user, if we consider
the artificial example described previously, whereby we need a new type of 4 bytes
length, where the decoded value should be byte1-byte2+(byte3^byte4), then our
existing primitive types cannot handle this case 1. Therefore, in order to handle this
case, we may want to create a new primitive type to handle this specific data format.
Another example of where the end user may need this section can best be illustrated
from the ASN.1 case. Say for example we are required to read ASN.1 data using some
new encoding rules (not BER, DER or CER). If the encoding rules were as complex
as the tag-length-value encoding used in BER, but had a sufficiently different layout,
then we may need to construct new composite types, whereby the data flow within the
composite types requires the elements defined in this section to be constructed.
The following sections will describe the purpose of the scanMethod, skipMethod,
skipCountMethod, readMethod, code & testMethod tags. In all types defined by the
user, the methods corresponding to these tags will exist in the dynamically created
java code for a given type. If the user does not specify these tags, then these methods
will be auto-generated based on the other XML elements - most importantly, the field
structure in a type. When the user does define these tags, then the custom user code
will be used instead of the auto-generated code.
For the cases of super-arguments, the mechanisms to setup these values will be
inserted into the dynamically constructed code. However, the calls to these methods
will no longer occur due to the user-defined read/scan/skip methods overwriting this
functionality. Therefore, the user has to ensure that they call the method
setupSuperArgs(); prior to calling super.read(…),super.scan(...) or super.skip(…).
In addition, when writing the code segments for a constructed type, prior to calling
read, scan or skip on any of the subfields, the appropriate call to setupParser<i> must
be made, where the value of i is the index of the field which we are to setup, including
fields that are skipped and inherited fields defined in the type’s field structure. This
sets up all of the required arguments, template arguments, type arguments etc.
It is important to note that even if custom readMethod, scanMethod, skipMethod etc
tags are being used, the XML within each type declaration defining the field structure
must be accurate and reflect the operation of the user-defined code. This XML field
structure definition is required to be correct for input specification to output
specification mapping, and to ensure that the correct field objects are available on the
dynamically constructed Parser classes.
In addition to these types, we will also describe the structure of code required in the
generator tag in a generated type.
1
Actually in this case, we could create a new fully constructed composite type that contains four byte
fields, and define the output in the container object via an emittable tag. However in this instance a new
primitive type would be the optimal method both logically and performance wise.
LAVASTORM ANALYTICS
lavastorm.com
Page 155
Issue 1
LDR User Manual
It is important to take care when writing these code snippets to ensure that if they
contain reserved XML characters (e.g. <,>) the entire code block is enclosed in an
escape tag such as:
<![CDATA[
//your code here
]]>
Example 129 – Escaping code with a CDATA section tag.
Otherwise, the individual characters must all be escaped (e.g. > should be written as
> in XML). When writing multi-line code blocks, or code blocks containing
repeated escaped characters (e.g. && as &&), then these should always be
surrounded with a CDATA section, as the XML conversion from the single escaped
characters is more likely to lead to unintentional syntax errors in the generated code.
Wherever new primitive types are used (i.e. not primitive types that are extending
another primitive type), the read code is not auto-generated. Therefore, for new
primitive types, knowledge of at least the read tag is required.
All of the methods described in this section, except for test, must take care to ensure
that the file pointer is advanced to the correct offsets for any of their sub-fields which
have offsets specified. This behavior will be described in more detail for each
function in the following sections.
7.3.1 Test Method
A testMethod tag allows for some custom user defined code to be evaluated in
order to determine whether or not a particular type exists. This tag is used in
exactly the same manner as the more restricted test tag, and is used in situations
when the test tag does not provide sufficient power or functionality.
In most cases, the user does not want to construct the entire scan method for a
type based on the field structure. Often, a simple bit of extra validation in addition
to the auto-gernerated scan is required in order to determine that a type exists. In
these cases, the test tag is extremely useful. If however, the test tag is not
sufficient, then a small snippet of user written code can be used in the testMethod
tag, without needing to rewrite the scan method.
A maximum of 1 testMethod tag can be inserted into the type declaration either at
the top level, or under a repeatRange or while tag. The testMethod tag can also be
inserted into a primitiveType tag. If a test tag is defined, a testMethod tag cannot
be defined. If no user-defined scan, read or skip methods are provided, then test
will be called from the associated auto-generated scan, read & skip methods at the
appropriate location in the field structure. If a user-written version of these other
methods is provided, then it is up to the user to ensure tha the test method is called
at the appropriate location in the field structure.
LAVASTORM ANALYTICS
lavastorm.com
Page 156
Issue 1
LDR User Manual
7.3.1.1 XML Properties
The test tag has the properties outlined in the following table:
DRIX Tag 47 testMethod
<testMethod>
Description
Position
Attributes
Elements
Code block which is inserted into dynamically constructed Parser
code, and called from the appropriate location in generated scan,
skip & read methods.
0..1, declared under a type, repeatRange, while tag
0..1 declared under a primitiveType tag.
Within a primitiveType, or type tag, there can only be a maximum
of one test or testMethod tag.
These cannot exist under an or tag.
Within a primitiveType, the test tag must appear directly after the
other java method tags and any code tags. Within a primitive type,
since there are no fields, this position is fixed, but not important for
parsing considerations.
Within a type tag, the test method can appear anywhere within the
field structure –i.e. anywhere a field tag can appear except within
an Or tag. Here the order is important.
None
None
If the return value of the test is NOT_ME, then we know the type does not exist.
If the return value is GOOD however, then this indicates the type might exist,
but more scanning may be required.
7.3.1.2 Method Signature
The data within the test must be java code using the following method signature:
Table 16 – Test Method Signature
protected Result test(ParserContext context, int loopIndex) throws
IOException, LDRException;
When the testMethod tag does not lie under a loop, the loopIndex value of “0”
should always be provided.
Consider the DRIX snippet shown in Example 130.
Example 130 –Simple example using a test tag
<type name="TypeName">
<field name="Field1" type="Type1">
<testMethod>
<![CDATA[
if (true) return Result.GOOD;
else return Result.NOT_ME;
]]>
</testMethod>
</type>
Ignoring the field declaration for the moment, the resulting code would be
functionally equivalent to that displayed in the following example:
LAVASTORM ANALYTICS
lavastorm.com
Page 157
Issue 1
LDR User Manual
Example 131 –Simple generated test method
public class TypeNameParser {
protected Result test(ParserContext context, int loopIndex) throws
IOException {
if (true) return Result.GOOD;
return Result.NOT_ME;
}
}
7.3.1.3 Method Contract
In order to ensure that your custom test can be integrated successfully with
generated code, it must conform to the following test method interface
standards, with the responsiblities of the calling method also outlined.
All read, scan & skip methods (auto-generated or otherwise) must ensure that
any declared test is performed, and performed at the appropriate time, dependant
on the location of the test tag in the type field structure.
Although the test method receives a ParserContext, through which it can access
the LDRByteBufferInterface, no changes may be made to the buffer, and the file
pointer must be unchanged as a result of the call to test method.
If the test is successful, then processing continues as if test was never called.
If the test method is unsuccessful (returning a Result.NOT_ME) then no further
processing is done, and the calling method returns Result.NOT_ME.
7.3.1.4 Example
Consider Example 132, which specifies another type containing a test tag.
Example 132 –Example using a test tag
<type name="T">
<param name="myParam" javaType="int"/>
<or>
<field name="f1" type=".integer.Int32" readRequired="true"/>
<field name="f2" type=".integer.AsciiToInt32"
readRequired="true">
<arg name="length" value="4"/>
</field>
</or>
<testMethod>
<![CDATA[
if (field.f1Set()) {
if (param.myParam()==field.f1().intValue())
return Result.GOOD;
else
return Result.NOT_ME;
}
LAVASTORM ANALYTICS
lavastorm.com
Page 158
Issue 1
LDR User Manual
else {
if (param.myParam()==field.f2().intValue())
return Result.GOOD;
else
return Result.NOT_ME;
}
]]>
</testMethod>
<field name="f3" type=".integer.Int32"/>
</type>
…
In this case, the above example states that we first attempt to read a field f1, if
this is successful, then we would call the test method. Otherwise, we would
attempt to read f2. If this is successful, then we would call the test method. If
neither f1 nor f2 is read successfully, then the type’s or clause would not be
satisfied and a Result.NOT_ME would be returned.
Therefore, any time the test method is called either an f1 or an f2 will have been
read. Within the test method, the code then checks to see if the value of the read
field (f1 or f2) is equal to the input myParam parameter. If the value is equal,
then the test method will return true, and then we will attempt to scan f3.
Otherwise, the test method will return Result.NOT_ME, and the containing
read/scan/skip method will return a Result.NOT_ME.
The code within the test tag will end up placed in the test method of the
constructed Parser, and will appear as shown in the following example:
Example 133 –Code generated from a testMethod tag
protected Result test(ParserContext context) throws
IOException, LDRException {
LDRByteBufferInterface buffer = context.buffer();
T field = field();
TParser parser = parser();
TParams param = param();
if (field.f1Set()) {
if (param.myParam()==field.f1().intValue())
return Result.GOOD;
else
return Result.NOT_ME;
} else {
if (param.myParam()==field.f2().intValue())
return Result.GOOD;
else
return Result.NOT_ME;
}
}
7.3.2 Scan
The scan operation is contained within the scanMethod tag and is used as the basis
of the lexing process. Scan is used to determine whether or not a given field exists
in the data. The scan method simply returns a Result which specifies whether or
not the field was found at the current file position. Therefore, the scan method
LAVASTORM ANALYTICS
lavastorm.com
Page 159
Issue 1
LDR User Manual
should do as little data reading as possible – reading will be subsequently required
- and focus only on validating that the field exists in the data.
If a Result.BAD is being returned, which should happen very rarely in the scan
method then the scan method should update the log on the context object. In
general, the scan method should always return a Result.NOT_ME or
Result.GOOD.
7.3.2.1 XML Properties
The scanMethod tag is used to declare the contents of the scan method. The scan
method is used to determine whether or not the type exists at the current file
position. The code to be placed within the scan method in the constructed Parser
class is placed within the scanMethod element. The scanMethod tag has the
properties listed in DRIX Tag 48.
DRIX Tag 48 scanMethod
<scanMethod>
Description
Position
Attributes
Elements
Code block which is inserted into dynamically constructed Parser
code, used for scanning. The code within this tag should determine
whether or not an instance of the field exists at the current file
location.
0..1, declared under a type tag,
0..1, declared under a primitiveType tag,
The scanMethod tag must appear after any param, typeParam,
templateParam and field structure elements. It must appear before
the emittable element on a type if one is declared. The position of
the scanMethod tag with relation to readMethod, skipMethod,
skipCountMethod tags is unimportant.
None
None
7.3.2.2 Method Signature
The data within the scanMethod tag must be java code using the following
method signature:
Table 17 – Scan Method Signature
public Result scan(ParserContext context) throws IOException,
LDRException;
Where the buffer object on the context is provided within scope due to
autogenerated code at the beginning of the method -via a call to the
context.buffer() accessor.
7.3.2.3 Method Contract
In order to ensure that your custom scanMethod can be integrated successfully
with generated code, it must conform to the following scan method interface
standards.
LAVASTORM ANALYTICS
lavastorm.com
Page 160
Issue 1
LDR User Manual
If there is a test method defined on this type, then this must be called at the
appropriate time based on the position of the test tag in the XML in
relation to other field definitions. The results of the test tag must be
adhered to, with the subsequent processing based on the rules outlined in
the Test section.
The setupParser[i](int, boolean, ParserContext) call must have been made,
and returned successfully, where this field is the ith field in the type
structure (including skipped, anonymous & inherited fields). This call
must be made prior to every invocation of the scan method.
Post Condition:
If the type does not exist at the current file pointer:
o Returns Result.NOT_ME
o The position of the file pointer is unimportant (it will be returned to the
file position apparent upon entry to the method)
If the type exists at the current file pointer:
o Returns RESULT.GOOD
o If this is a composite type, then it must have either called the scan
method of all of its children, or
o Called the scan method of none of its children and none of its children
contain any fields requiring a decision (or etc)
o The file pointer is updated to be at the first bit in the file after the end
of this type.
7.3.2.4 Example
Consider the simple case where we have a primitive type which reads a String of
a fixed length.
Example 134 –Scan Method example
<primitiveType name="FixedStringType" returnType="String">
<param name="length" javaType="int"/>
<scanMethod>
<![CDATA[
if (buffer.remaining()>=param.length()) {
buffer.bytePosition(
buffer.bytePosition()+(long)param.length());
return Result.GOOD;
}
return Result.NOT_ME;
]]>
</scanMethod>
</primitiveType>
The code example above shows that the scan method simply checks that there is
enough space left in the file to contain the String. If there is enough remaining
data, then the file pointer on the byte buffer is updated and a GOOD result is
LAVASTORM ANALYTICS
lavastorm.com
Page 161
Issue 1
LDR User Manual
returned. If there is not enough remaining data on the file, a NOT_ME result is
returned and no modifications take place to the file position.
Obviously extra information (e.g. a read method) is required here, since there is
no information provided on how to actually read the String. However, the
scanMethod is sufficient for scanning. We could also simply declare this
FixedStringType as a parentType of another type which actually defines how to
read a specific String.
7.3.3 Skip
The skip operation is contained within the skipMethod tag and is used to skip over
fields. Skip must also determine whether or not a given field exists in the data.
The skip method simply returns a Result which specifies whether or not the field
was found at the current file position. Therefore, the skip method should do as
little data reading as possible and focus only on validating that the field exists in
the data and skipping the field. In practice, the skip method will generally be very
similar to the scan method. The major difference between the two is that when a
field is scanned, the LDR knows that it may be used later by the output side of the
engine. Therefore the scanned field will be written to the ticker tape. However,
when a field is skipped, the LDR knows to write nothing to the ticker tape.
Therefore, these fields will never be available for output, however we will obtain
a performance improvement for fields we know we never want to read and output.
The skip method should update the log on the context object where necessary in
cases where a Result.BAD is returned, however, in general, the skip method
should always return a Result.NOT_ME or Result.GOOD.
7.3.3.1 XML Properties
The skipMethod tag has the properties in DRIX Tag 49.
DRIX Tag 49 skipMethod
<skipMethod>
Description
Position
Attributes
Elements
Code block which is inserted into dynamically constructed Parser
code, used for skipping. The code within this tag should advance the
file pointer passed the end of the type if it exists.
0..1, declared under a type tag,
0..1, declared under a primitiveType tag,
The skipMethod tag must appear after any param, typeParam,
templateParam and field structure elements. It must appear before
the emittable element on a type if one is declared. The position of
the skipMethod tag with relation to readMethod, scanMethod,
skipCountMethod tags is unimportant.
None
None
The skipMethod is a simple tag with no attributes or elements.
LAVASTORM ANALYTICS
lavastorm.com
Page 162
Issue 1
LDR User Manual
7.3.3.2 Method Signature
The data within the skipMethod tag must be java code using the following
method signature:
Table 18 – Skip Method Signature
public Result skip(ParserContext context) throws IOException,
LDRException;
7.3.3.3 Method Contract
The skipMethod must conform to the following skip method interface standards.
If there is a test method defined on this type, then this must be called at the
appropriate time based on the position of the test tag in the XML in
relation to other fields. The results of the test tag must be adhered to, with
the subsequent processing based on the rules outlined in the Test section.
The setupParser[i](int, boolean, ParserContext) call must have been made,
and returned successfully, where this field is the ith field in the type
structure (including skipped, anonymous & inherited fields). This call
must be made prior to every invocation of the skip method.
PostCondition:
If the type does not exist at the current file pointer:
o Returns Result.NOT_ME
o The position of the file pointer is unimportant (it will be returned to
the file position apparent upon entry to the method)
If the type exists at the current file pointer:
o The file pointer is updated to be at the first bit in the file after the end
of this type.
o Returns Result.GOOD
7.3.3.4 Example
Example 135 –Skip Method example
<primitiveType name="AbstractFixedStringType" returnType="String">
<param name="length" javaType="int"/>
<skipMethod>
<![CDATA[
if (buffer.remaining()>=param.length()) {
buffer.bytePosition(
buffer.bytePosition()+(long)param.length());
return Result.GOOD;
}
return Result.NOT_ME;
]]>
</skipMethod>
</primitiveType>
LAVASTORM ANALYTICS
lavastorm.com
Page 163
Issue 1
LDR User Manual
The code example above shows that the skip method simply checks that there is
enough space left in the file to contain the String. If there is enough, then the file
pointer on the byte buffer is updated, and Result.GOOD is returned. Otherwise
Result.NOT_ME is returned.
Obviously extra information (e.g. a read method) is required here, since there is
no information provided on how to actually read the String. However, the
scanMethod is sufficient for scanning. We could also simply declare this
FixedStringType as a parentType of another type which actually defines how to
read a specific String.
We can see from this example that the code required in a skip method is often
going to be exactly the same as that in a scan method.
7.3.4 SkipCount
SkipCount is used for the skipping of a fixed number of repetitions of this field
starting from the current file position. This is implemented primarily when it
offers significant advantage to simply calling the skip method a number of times.
When a custom skipCount method is not provided, the auto-generated default will
simply call the skip method n times.
7.3.4.1 XML Properties
The skipCountMethod tag has the following properties:
DRIX Tag 50 skipCountMethod
<skipCountMethod>
Description
Position
Attributes
Elements
Code block which is inserted into dynamically constructed Parser
code, used for skipping a fixed number of times. The code within
this tag should advance the file pointer passed the end of the type if
it exists.
0..1, declared under a type tag,
0..1, declared under a primitiveType tag,
The skipCountMethod tag must appear after any param,
typeParam, templateParam and field structure elements. It must
appear before the emittable element on a type if one is declared.
The position of the skipCountMethod tag with relation to
readMethod, scanMethod, skipMethod tags is unimportant.
None
None
The skipCountMethod is a simple tag with no attributes or elements.
7.3.4.2 Method Signature
The data within the skipCountMethod must be Java code that could execute if
placed in the following method signature:
LAVASTORM ANALYTICS
lavastorm.com
Page 164
Issue 1
LDR User Manual
Table 19 – SkipCount Method Signature
public Result skipCount(ParserContext context, int count) throws
IOException,LDRException;
7.3.4.3 Method Contract
The skipCountMethod must conform to the following skipCount method
interface standards.
If there is a test method defined on this type, then this must be called at the
appropriate time based on the position of the test tag in the XML. The
results of the test tag must be adhered to, with the subsequent processing
based on the rules outlined in the Test section.
The setupParser[i](int, boolean, ParserContext) call must have been made,
and returned successfully, where this field is the ith field in the type
structure (including skipped, anonymous & inherited fields). This call
must be made prior to every invocation of the skipCount method.
PostCondition:
If there are 0..n iterations of this type at the current file pointer, where n <
count:
o Returns Result.NOT_ME
o The position of the file pointer is unimportant (it will be returned to the
file position apparent upon entry to the method).
If there >= count iterations of this type at the current file pointer:
o The file pointer is updated to be at the first bit in the file after the end
of count iterations of this type.
o Returns Result.GOOD
7.3.4.4 Example
Example 136 –SkipCount Method example
<primitiveType name="AbstractFixedStringType" returnType="String">
<param name="length" javaType="int"/>
<skipCountMethod>
<![CDATA[
if (buffer.remaining()>=param.length()*count) {
buffer.bytePosition(
buffer.bytePosition()+
(long)(param.length()*count));
return Result.GOOD;
}
return Result.NOT_ME;
]]>
</skipCountMethod>
</primitiveType>
The code example above shows that the skipCount method simply checks that
there is enough space left in the file to contain the Strings. If there is enough,
LAVASTORM ANALYTICS
lavastorm.com
Page 165
Issue 1
LDR User Manual
then the file pointer on the byte buffer is updated, and Result.GOOD is returned.
Otherwise Result.NOT_ME is returned.
Obviously extra information (e.g. read method, skip method etc) is required
here, since there is no information provided on how to actually read the String.
However, the skipCountMethod is sufficient for skipping multiple sequential
occurrences of this type, provided access to an auto-generated or custom skip
method.
7.3.5 Read
The read method is the method that actually reads the data from the file. The read
method then returns one of three Results. If the type does not exist (e.g. we require
a String “FRED”, but receive random bytes, or we hit the end of file etc), then a
Result.NOT_ME is returned. If the type exists at the current location, however it
is badly formatted, or contains errors that do not affect the file position (e.g. a
fixed length number field that cannot be parsed as a valid number) then a
Result.BAD is returned. If the type does exist at the current file position and can
be successfully parsed, then the read field value should be accessible after a call to
the read method through the field() accessor on the Parser. In these situations, the
read method should return a Result.GOOD.
In general, the read method should not be called until the type is known to exist
(via a call to the associated scan method). This general rule is only invalidated
when the readRequired attribute is set to true on a field. Therefore, in most
circumstances, calls to read are generally made after a successful scan. In such
cases, you may know that the data cannot be read, however you know how to
interpret the next set of data which has previously been scanned.
If the read method returns a Result.BAD, it should always log an error via the
ParserContext. In this situation, an error is generated but processing can continue
for all subsequent non-error cases. These error conditions are later referred to as
“recoverable errors” in section10.1.1.4.
7.3.5.1 XML Properties
The readMethod tag has the following properties:
DRIX Tag 51 readMethod
<readMethod>
Description
Position
Code block which is inserted into dynamically constructed Parser
code, used for reading a type. The code within this tag should
advance the file pointer passed the end of the type if it exists.
0..1, declared under a type tag,
0..1, declared under a primitiveType tag,
The readMethod tag must appear after any param, typeParam,
templateParam and field structure elements. It must appear before
LAVASTORM ANALYTICS
lavastorm.com
Page 166
Issue 1
LDR User Manual
Attributes
Elements
the emittable element on a type if one is declared. The position of
the readMethod tag with relation to skipCountMethod, scanMethod,
skipMethod tags is unimportant.
None
None
The readMethod is a simple tag with no attributes or elements.
7.3.5.2 Method Signature
The data within the readMethod must be java code that could execute if placed
in the following method signature:
public Result read(ParserContext context) throws
IOException,LDRException;
Table 20 – Read Method Signature
7.3.5.3 Method Contract
The readMethod must conform to the following read method interface standards.
If there is a test method defined on this type, then this must be called at the
appropriate time based on the position of the test tag in the XML in
relation to other fields. The results of the test tag must be adhered to, with
the subsequent processing based on the rules outlined in the Test section.
The setupParser[i](int, boolean, ParserContext) call must have been made,
and returned successfully, where this field is the ith field in the type
structure (including skipped, anonymous & inherited fields). This call
must be made prior to every invocation of the read method.
PostCondition:
If a field of this type does not exist at the current file position
o Returns Result.NOT_ME.
o The position of the file pointer is unimportant (it will be returned to the
file position apparent upon entry to the method)
If a field of this type exists at the current file position, but there are errors
in parsing the field,
o Returns Result.BAD
o An error message is be logged via a call to context.log(..) – The
ErrorLevel can be set based on the severity of the error.
o The file pointer is updated to be at the first bit in the file after the end
of this type.
In all other cases
o The field is successfully read and a Result.GOOD object is returned
o The read method must ensure that the field has been set on the parser.
o The file pointer is updated to be at the first bit in the file after the end
of this type.
LAVASTORM ANALYTICS
lavastorm.com
Page 167
Issue 1
LDR User Manual
7.3.5.4 Example
Consider the code in Example 137 for reading a fixed length field containing
integer data encoded in an ASCII format:
Example 137 –Read Method example
<primitiveType name="FixedASCIIInteger" returnType="String">
<param name="length" javaType="int">
<readMethod>
<![CDATA[
if (!buffer.remaining()>=param.length()) {
return Result.NOT_ME;
}
byte[] dst = new byte[param.length()];
buffer.get(dst);
String s = new String(dst,”ASCII”);
char first = s.charAt(0);
if (first!='-' && first!='+' && !Character.isDigit(first)) {
context.log(“InvalidDataFormat – Invalid Start Characer
for integer, must be 0-9,-, or +”,
ErrorType.INVALID_DATA_FORMAT,
ErrorLevel.RECOVERABLE_ERROR);
return Result.BAD;
}
if (s.charAt(0)=='+')
s = s.substring(1,s.length);
try {
int i = Integer.parseInt(s);
field(s);
return Result.GOOD;
}
catch (NumberFormatException nfe) {
context.log(“InvalidDataFormat – Data :”+s+
“ is not an integer”,
ErrorType.INVALID_DATA_FORMAT,
ErrorLevel.RECOVERABLE_ERROR);
return Result.BAD;
}
]]>
</readMethod>
</primitiveType>
In this simple example, we have a parameter to the type which specifies the
length of the String to read. If this length is greater than the remaining length in
the file, we return a Result.NOT_ME. Otherwise, we read the String. If the
String cannot be interpreted as an Integer value, we return a Result.BAD and log
a recoverable error on the context log. Otherwise, we set the field on the parser
to the read String, and return Result.GOOD via the ParserContext.
7.3.6 Code
The code tag is used for three purposes. The first purpose is to simply add code
into the contents of the generated Parser class. This is generally useful to allow for
code re-use across user-written read, scan and skip methods. It can also be useful
LAVASTORM ANALYTICS
lavastorm.com
Page 168
Issue 1
LDR User Manual
to abstract common functions into an abstract base type which other types might
inherit from. The second purpose is to add code external to the class, but in the
same file as the class. The code is added into the section of a java file where
imports would be placed. This is generally used to add import statements for java
classes that are to be used by custom user-written code. The third purpose is to
allow for one-off initializations to be performed (particularly initializations that
require the use of constant template parameters). These one off initializations will
be put into an init method called at the end of the generated Parser’s constructor.
Since code initializations in the body of a class will be called prior to the
constructor, constant template parameters may be unable to be accessed in the
standard code block. Therefore, the init block allows for one-off initialization
using constant template parameters to improve performance and save the cost of
reinitialization on each field occurrence.
The default selection is to place the code in the class, however this can be set by
specifying the location attribute on the code tag. The code tag is shown in DRIX
Tag 52.
7.3.6.1 XML Properties
DRIX Tag 52 code
<code>
Description
Position
Attributes
Elements
Code block which is inserted into dynamically constructed Parser
code. Defined at the top level under a class definition. Used for
supplying helper methods to be used by scan, skip, skipCount, read
methods in the same type.
0..3, declared under a type tag,
0..3, declared under a primitiveType tag,
0..3, declared under a generatedType tag
The code tags must appear after any param, typeParam,
templateParam and field structure elements. They must appear
before the emittable element on a type if one is declared. The
position of the code tags with relation to skipCountMethod,
scanMethod, skipMethod, readMethod tags is unimportant.
Optional location attribute (defaults to “class”)
Location attribute must be set to “class”, “file” or “init”.
Only one of each type of code tag is allowed in a type, primitiveType
or generatedType (i.e. not allowed two code tags with a
location=”file”.
None
Code is a simple tag with a location attribute and no elements.
In the following sections we outline the use of the class location and file
location code blocks.
7.3.6.2 Code Internal to the Class
When using the default code tag, or when explicitly setting location to “class”,
simple code blocks are inserted into a type. These code blocks are not to be
accessed by anything other than custom skipMethod, skipCountMethod,
LAVASTORM ANALYTICS
lavastorm.com
Page 169
Issue 1
LDR User Manual
readMethod, testMethod, scanMethod & java expressions in value tags under
the same type. The code in the code block is simply cut and pasted into the top
level of a class (i.e. directly nested below the class definition). This basically
allows for declaration of helper methods and class level variables/constants to
be used within the type.
As an example, consider the following primitiveType which is designed to read
padded ASCII Strings, and return the un-padded version. This makes use of a
parentType whose job is to simply read a fixed length String:
Example 138 –Internal code example (class location)
<primitiveType name="PaddedFixedASCIIString"
parentType="FixedASCIIString">
<readMethod>
<![CDATA[
_res = super.read(context);
if (_res==Result.GOOD) {
field(trimPadding(field()));
return Result.GOOD;
}
else return _res;
]]>
</readMethod>
<code>
<![CDATA[
private void trimPadding(String s) {
return s.trim();
}
]]>
</code>
</primitiveType>
As you can see from this example, the paddedString simply calls the method in
the code block in order to trim the padding of the String. This is a trivial case,
since the code in this method simply calls a method on the String class.
However, it is easy to see that having a code block could be useful if the same
functions are to be used by read, scan and skip methods.
7.3.6.3 Code External to the Class
When using the code with location set to “file”, code blocks are able to be
inserted outside the class definition in a generated Parser class. For this reason,
these external code blocks are generally useful for providing any additional
import statements that may be required by the type.
Consider the primitiveType defined in Example 139. This is designed to read in
a fixed length integer field, where the length of the field could be any number.
In this case we need to make use of the BigInteger type. We can declare this in
the returnType as a java.math.BigInteger, and are required to provide the fully
qualified name in the returnType. However, through the use of the code tag, we
are also able to import the java.math.BigInteger class, meaning that within our
read, scan, or skip methods, we do not need to provide the full package name for
the BigInteger type.
LAVASTORM ANALYTICS
lavastorm.com
Page 170
Issue 1
LDR User Manual
Example 139 –External code example (file location)
<primitiveType name="BigInteger" returnType="java.math.BigInteger">
<param name="length" javaType="int"/>
<readMethod>
<![CDATA[
if (buffer.remaining()>=param().length) {
return Result.NOT_ME;
}
byte[] dst = new byte[param().length];
buffer.get(dst);
field(new BigInteger(dst));
return Result.GOOD;
]]>
</readMethod>
<code location="file">
<![CDATA[
import java.math.BigInteger;
]]>
</code>
</primitiveType>
7.3.6.4 Init Code Called by the Constructor
When using the code with location set to “init”, code blocks are inserted into an
initialization method which is called by the constructor (the last statement in the
constructor is to call the init method) in the generated Parser class. For this
reason, these init code blocks are generally useful for initializing any objects
that depend on constant template parameters.
Consider the case shown in Example 140 where the SystemNewline type is used
to read in newline characters. In this example, the newline characters to read are
based on the System properties – i.e. the platform specific newline characters,
which will be “\n” on a Unix machine, “\r\n” on a Windows machine etc.
Example 140 –Init code location example.
<type name="SystemNewline">
<templateParam name="Charset" javaType="String"/>
<field name="_data" type=".identification.binary.ByteArrayIdEquals">
<templateArg name="expected">
<expr>ENCODED_BYTES</expr>
</templateArg>
</field>
<code>
<![CDATA[
private final static String NEWLINE_STRING =
System.getProperty("line.separator");
private static CharsetEncoder ENCODER;
private static byte[] ENCODED_BYTES;
]]>
</code>
<code location="init">
<![CDATA[
ENCODER = Charset.forName(TemplateParam.Charset).newEncoder();
ENCODED_BYTES =
ENCODER.encode(CharBuffer.wrap(NEWLINE_STRING)).array();
]]>
</code>
<code location="file">
LAVASTORM ANALYTICS
lavastorm.com
Page 171
Issue 1
LDR User Manual
<![CDATA[
import java.nio.charset.CharsetEncoder;
import java.nio.charset.Charset;
import java.nio.CharBuffer;
]]>
</code>
<emittable type="String">
<expr>NEWLINE_STRING</expr>
</emittable>
</type>
This example is slightly trivialized as there should be nicer exception handling in
the init code tag however the general utility is well demonstrated. The example is
a cut down version of the .string.newline.SystemNewline type found in the string
library provided with the LDR.
7.3.7
Code Required for the Generator Tag
In section 5.2.1.9 we introduced the concept of dynamically generated types. In
this section we showed how a generatedType tag could be used in conjunction
with typeParam tags and a generator tag in order to generate new types based on
the type arguments input.
In that section we skipped over the details of the generator tag as the tag contents
are simply interpreted as java code to be placed inside a method. In this section
we investigate further the use of the generator tag.
7.3.7.1 XML Properties
The generator tag has the following properties:
DRIX Tag 53 generator
<generator>
Description
Position
Code block which is inserted into a generator method in a
TypeGenerator class. The code within the tag is used to construct
new types based on input type arguments using the sections of the
LDR denoted as Generator API, then register these types on a input
Spec object. The code must match the method signature of the
TypeGenerator interface.
0..1, declared under a generatedType tag,
Attributes
Elements
The generator tag must appear after any typeParams.
None
None
Generator is a simple tag with no attributes or elements.
7.3.7.2 Method Signature
The data within the generator tag must be java code that could execute if placed
in the following method signature:
LAVASTORM ANALYTICS
lavastorm.com
Page 172
Issue 1
LDR User Manual
Table 21 – Generator Method Signature
public BaseTypeDefinition generate(GeneratorContext context,
Object[] typeArgs) throws LDRException;
7.3.7.3 Method Contract
The code within the generator tag must conform to the following generator
method interface standards.
The generator constructs new type objects using the Generator API. All
constructed types must be registered on the Spec, via
context.spec().registerTypeDefinition(..)
All generated types must lie under the namespace available via
context.namespace()
All recoverable errors encountered in the generator tag must be logged via
the context, using the log accessible from context.log()
The top-level type that is generated must be returned from the method.
7.3.7.4 Example
Consider the code in Example 141:
Example 141 –Generator example
<generatedType name="GT_1">
<typeParam name="newTypeName" javaType="String”>
<generator>
<![CDATA[
List<String> readMethodContents = new ArrayList<String>();
readMethodContents.add("if (buffer.hasRemaining()) {");
readMethodContents.add("buffer.move(1);");
readMethodContents.add("field(\"Good\");");
readMethodContents.add("return
Result.GOOD;");
readMethodContents.add("}");
readMethodContents.add("return
Result.NOT_ME;");
ReadMethod readMethod = new ReadMethod(readMethodContents);
PrimitiveTypeDefinition newType = new PrimitiveTypeDefinition(
(String)typeArgs[0], null, "String",
PrimitiveTypeDefinition.NO_INHERITANCE, null,
null,null,null,null,context.namespace(),readMethod,null,
null,null,null,null,null,null);
newType.parent(context.spec().library());
context.spec().registerTypeDefinition((String)typeArgs[0],
newType);
return newType;
]]>
</generator>
</generatedType>
In this simple example, we take an input String type parameter. The value of this
parameter becomes the name of the new type, however the details of the type in
this simple example do not change dependant on the type parameter, only the
type name. The type is a simple Primitive type which skips a byte if there is one
on the buffer. The type has a String return type, which is set to “Good” if the
LAVASTORM ANALYTICS
lavastorm.com
Page 173
Issue 1
LDR User Manual
byte was skiped. The type is then registered on the spec and returned from the
generator method.
This simple case in Example 141 does not really provide any useful features that
could not be simply written in a DRIX file. In order to successfully construct
generator methods, the Generator API needs to be used extensively. The
methods in the LDR engine javadoc API denoted with a GENERATOR_API tag
indicate that the method forms part of the Generator API. In general, this
involves programmatic constructors for all of the required tags that could be
found in a DRIX specification.
The true benefit of generated types does not become apparent until you start
having complex generator tags. The generator tags can then be used for such
complicated tasks as converting one specification format into an LDR DRIX
specification format. Within the core libraries of the LDR, are the ASN.1 and
Cobol converters, which both use generator tags to perform the conversion from
Cobol copybooks and ASN.1 specifications to LDR specifications.
7.4 Code Generation
The code generation for generatedTypes is straightforward. In this case, the contents
of the generator tag are placed directly within the generator method as discussed in
section 7.3.7. Since generatedTypes cannot specify parentTypes the code generation
is simple. Things are more interesting in the case of Standard Types and Primitive
Types, as we will discuss in the following sections.
7.4.1 Primitive Types
Primitive Types contain no sub-fields. As such, the generation of the code for
Primitive Types is somewhat simpler than that of Standard types. It is possible to
construct a Primitive Type that simply has a name. This is practically useless, but it is
informative in terms of code generation to see how this will be constructed.
Consider the simplest possible type of primitive type displayed in Example 142.
Example 142 –Empty Primitive Type exampl
<namespace name="MyNamespace">
<primitiveType name="AbstractType"/>
</namespace>
In this case, we will end up with the generated code shown in Example 143.
Example 143 –Generated code from empty Primitive Type example
package dynamic.primitive.MyNamespace;
/**
* @(#)AbstractTypeParser.java
* --- AUTO-GENERATED CLASS ---
LAVASTORM ANALYTICS
lavastorm.com
Page 174
Issue 1
LDR User Manual
* This class has been automaticallly generated by the
DynamicClassConstructor
* @author <AutoGenerated>
* @date 12/5/2009
* --------- User Writter XML Documentation
------*
* Parameter Documentation:
* --------- End User Writter XML Documentation
------*/
import com.lavastorm.ldr.input.*;
import com.lavastorm.ldr.input.constructs.*;
import com.lavastorm.ldr.util.*;
import com.lavastorm.ldr.io.*;
import java.io.*;
import com.lavastorm.ldr.exception.*;
import com.lavastorm.ldr.exception.LDRException.ErrorType;;
import com.lavastorm.ldr.exception.LDRException.ErrorLevel;;
import java.util.List;
import java.util.ArrayList;
import java.nio.BufferUnderflowException;
import java.lang.reflect.Method;
public abstract class AbstractTypeParser implements
com.lavastorm.ldr.input.Parser<Object> {
protected int m_fieldId;
protected int m_staticFieldId;
protected Parser<?> m_containingParser;
protected int m_fieldPosition;
protected String m_nestedFieldName;
protected byte SCAN_FLAGS = (byte)(0|8);
private AbstractTypeParams m_param;
protected Drix m_spec;
//Param Class Definition
public class AbstractTypeParams {
public void clearParams() throws LDRException {
}
public void applyDefaults() throws LDRException {
}
}
public void arg(String paramName, Object arg) throws LDRException{
}
public Drix spec() { return m_spec;}
private boolean errorOnScanFail() {
return
(SCAN_FLAGS & Parser.FILE_ERROR_ON_PARSER_FAIL)==
Parser.FILE_ERROR_ON_PARSER_FAIL;
}
//Param Accessor & Modifier
public AbstractTypeParams param() {
return m_param;
}
public void param(AbstractTypeParams param) {
m_param=param;
}
public void applyDefaults() throws LDRException, IOException {
param().applyDefaults();
}
public void reset() throws LDRException, IOException {
param().clearParams();
}
protected void setupSuperArgs() throws LDRException{
}
LAVASTORM ANALYTICS
lavastorm.com
Page 175
Issue 1
LDR User Manual
//Method to obtain the value to emit (from emittable or form field) //this is Primitive, so always return the field
@Override
public Object emittableValue(ParserContext context) throws
LDRException, IOException{
return field();
}
}
We see from this example that all the read, scan and skip methods are not declared on
the class, and the class itself is declared abstract. It provides the standard methods that
are provided on all Parser objects, such as a param() accessor, a reset method, and an
errorOnScanFail method. However, in general, we can see from this generated code
that the class itself will be practically useless.
For a Primitive Type, the class itself will be declared abstract if any of the following
is true:
The primitiveType specifies no returnType and
o It has no parentType OR
o It’s parentType is abstract
The primitiveType specifies no readMethod and
o It has no parentType OR
o It’s parentType is abstract
Therefore, in order for a Primitive Type to not be abstract, somewhere in its type
inheritance chain we need:
A user defined read method, and
A return type
Whenever both the return type and read method are defined somewhere in the type
inheritance chain, then the Primitive Type will not be abstract. In these cases, the
scan, skip, skipCount and read methods will also be defined and not abstract.
If the returnType is defined, and the readMethod is not defined in the inheritance
chain, then the Primitive Type is abstract, as are the read, scan, skip methods. If the
return type is not defined, and the read method is defined in the inheritance chain,
then the primitive type will be abstract, however the read, scan, skip, skipCount
methods will not be abstract.
The following steps are used when constructing new Parser classes from the contents
of a primitiveType tag. It is important to note that the parent types are always
constructed before any child types are constructed, so where appropriate these steps
can be considered to be applied recursively up the inheritance chain.
When generating the code for the read method, the following steps are taken:
1. If there is a user defined read method on the Primitive Type, this is used.
2. Else if the Primitive Type has a parent type and the parent type has a nonabstract read method, then:
LAVASTORM ANALYTICS
lavastorm.com
Page 176
Issue 1
LDR User Manual
a. If there is no super tag and no test or testMethod tag on the type, no
read method is generated, and the parents read method will always
be invoked
b. Else, a read method will be generated with the appropriate super &
test functionality, but all of the work of the read method will take
place in the parents defined read method.
3. Else, no read method generated, and the type is defined as abstract.
When generating the code for the scan method, the following steps are taken:
1. If there is a user defined scan method on the Primitive Type, this is used.
2. Else if the Primitive Type has a parent type and the parent type has a nonabstract scan method, then:
a. If there is no super tag and no test or testMethod tag on the type, no
scan method is generated, and the parents scan method will always
be invoked
b. Else, a scan method will be generated with the appropriate super &
test functionality, but all of the work of the scan method will take
place in the parents defined scan method.
3. Else, if the read method is not abstract (we did not hit step 3 above), the
scan method is generated, but simply calls the read method.
4. Else, the scan method is not generated, and the type is abstract
When generating the code for the skip method, the following steps are taken:
1. If there is a user defined skip method on the Primitive Type, this is used.
2. Else if the Primitive Type has a parent type and the parent type has a nonabstract skip method, then:
a. If there is no super tag and no test or testMethod tag on the type, no
skip method is generated, and the parents skip method will always
be invoked
b. Else, a skip method will be generated with the appropriate super &
test functionality, but all of the work of the skip method will take
place in the parents defined skip method.
3. Else, if the scan method is not abstract (we did not hit step 3 above), the
skip method is generated, but simply calls the scan method.
4. Else, the skip method is not generated, and the type is abstract
When generating the code for the skipCount method, the following steps are taken:
1. If there is a user defined skip method on the Primitive Type, this is used.
2. Else if the skip method is not abstract (we did not hit step 3 above), the
skip count method is generated, but simply calls the skip method numerous
times.
3. Else, the skipCount method is not generated, and the type is abstract
Whenever the Primitive Type is not abstract, then in addition to this generated code,
there will be accessors and modifiers for the read field value generated on the
constructed Parser class. For instance, if we have a non-abstract Primitive Type which
is defined to have a String returnType, then the methods shown in Example 144 will
be available on the Parser.
LAVASTORM ANALYTICS
lavastorm.com
Page 177
Issue 1
LDR User Manual
Example 144 –Generated code for accessing and modifying a Primitive Type’s field.
public Object field() throws IOException, LDRException{
if (!m_fieldValueSet) {
throw ParserExceptionGenerator.fieldNotSet(m_nestedFieldName);
}
return m_fieldValue;
}
public void field(String field) {
m_fieldValueSet=true;
m_fieldValue=field;
}
public boolean fieldSet() {
return m_fieldValueSet;
}
public void reset() throws LDRException, IOException {
param().clearParams();
m_fieldValueSet=false;
}
If the Primitive Type has a primitive java return type, then there will also be methods
to access and modify the primitive value, via the <primitiveTypeName>FieldValue(..)
methods. For instance, Example 145 shows an example of the code that would be
generated when we have an int returnType specified.
Example 145 –Generated code for Primitive Types with a java primitive returnType
public Object field() throws IOException, LDRException{
if (!m_fieldValueSet) {
throw ParserExceptionGenerator.fieldNotSet(m_nestedFieldName);
}
return m_fieldValue;
}
public void field(Integer field) {
m_fieldValueSet=true;
m_fieldValue=field.intValue();
}
public boolean fieldSet() {
return m_fieldValueSet;
}
public int intFieldValue() throws LDRException {
if (!m_fieldValueSet) {
throw ParserExceptionGenerator.fieldNotSet(m_nestedFieldName);
}
return m_fieldValue;
}
public void intFieldValue(int fieldValue) {
m_fieldValueSet=true;
m_fieldValue=fieldValue;
}
7.4.2 Standard Types
Unlike Primitive Types, Standard Types can never be abstract. Therefore, none of the
read, scan, skip or skipCount methods can be abstract.
When generating the code for the read method, the following steps are taken:
LAVASTORM ANALYTICS
lavastorm.com
Page 178
Issue 1
LDR User Manual
1. If there is a user defined read method on the type, this is used.
2. Else if this type contains anything for which code can be generated (any
non javaType field, or javaType field with an expr tag, test or super tag is
sufficient), then the read method will be generated according to the field
structure (including or, repeatRange, test, super elements etc etc)
3. Else if the type has a parent type specified, then the read method will be
constructed to call the parent read.
4. Else, the read method will be written to throw an exception if it is ever
called.
When generating the code for the scan method, the following steps are taken:
1. If there is a user defined scan method on the type, this is used.
2. Else if this type contains anything for which code can be generated (any
non javaType field or javaType field with an expr tag, test or super tag is
sufficient), then the scan method will be generated according to the field
structure (including or, repeatRange, test, super elements etc etc)
3. Else if the type has a parent type specified, then the scan method will be
constructed to call the parent scan.
4. Else if there is a user written read method, then the scan method will be
constructed to call the read method.
5. Else, the scan method will be written to throw an exception if it is ever
called.
When generating the code for the skip method, the following steps are taken:
1. If there is a user defined skip method on the type, this is used.
2. Else if this type contains anything for which code can be generated (any
non javaType field or javaType field with an expr tag, test or super tag is
sufficient), then the skip method will be generated according to the field
structure (including or, repeatRange, test, super elements etc etc)
3. Else if the type has a parent type specified, then the skip method will be
constructed to call the parent skip.
4. Else if there is a user written scan method, then the skip method will be
constructed to call the scan method.
5. Else if there is a user written read method, then the skip method will be
constructed to call the read method.
6. Else, the skip method will be written to throw an exception if it is ever
called.
When generating the code for the skipCount method, the following steps are taken:
1. If there is a user defined skipCount method on the type, this is used.
2. Otherwise, the skipCount method will be constructed to call the skip
method. If the skip method isn’t defined, the exceptions thrown from the
skip method will be thrown by the skipCount method.
Therefore, it is important to note that if you specify any javaType fields, no code will
be autogenerated to populate these fields unless you provide an expr tag for the field.
If these are not required for scanning, or skipping, but are simply used for example, to
provide values to an emittable clause, then you may need only to write a readMethod
LAVASTORM ANALYTICS
lavastorm.com
Page 179
Issue 1
LDR User Manual
which populates these fields. In general, wherever you are specifying javaType fields
without a static value associated, you should also be writing the appropriate read, scan
and skip methods to populate these values.
7.5 Performance Tuning DRIX Files
Understanding how the LDR attempts to optimize DRIX specifications when
compiling to java code can lead to great performance improvements for certain data
files. This is particularly true with data files where the file structure involves a large
number of fields under or tags.
Consider the DRIX in Example 146.
Example 146 –Example optimization candidate case
<drix>
…
<type name="ManyChoices">
<or>
<field name="f1" type="T1"/>
<field name="f2" type="T2"/>
…
<field name="f100" type="T100"/>
</or>
</type>
<type name="T">
<field name="dataField" type=".integer.Int8"
readRequired="true"/>
</type>
<type name="T1" parentType="T">
<test expected="1">
<fromField field="dataField">
</test>
<!— Lots of T1 specific fields -->
</type>
<type name="T2" parentType="T">
<test expected="2">
<fromField field="dataField">
</test>
<!— Lots of T2 specific fields -->
</type>
…
<type name="T100" parentType="T">
<test expected="100">
<fromField field="dataField">
</test>
<!—Lots of T100 specific fields -->
</type>
…
</drix>
LAVASTORM ANALYTICS
lavastorm.com
Page 180
Issue 1
LDR User Manual
Assuming the missing sections depicted by “…” follow the same pattern, we have a
case with 100 fields under an or tag, each performing a simple test on a read field
value. In this example, the field that is read is the same in each case.
In this example, if the value at the current file position is 100, and we are trying to
read a type “ManyChoices”, without optimization, the following would happen:
1.
Attempt to read a T1 field.
a. Successfully read dataField, with value 100
b. Run the test, compare to expected value 1
c. Fail to read T1 field
Attempt to read a T2 field
a. Successfully read dataField, with value 100
b. Run the test, compare to expected value 2
c. Fail to read T1 field
2.
…
100. Attempt to read a T100 field
a. Successfully read dataField, with value 100
b. Run the test, compare to expected value 100
c. Succeed, continue reading T100 specific fields.
This example shows clearly that the standard, non-optimized program flow will be
very inefficient in certain cases. Here, we will be reading the same value off the data
file (or at least off the buffer) 100 times.
In order to avoid this, in certain situations, such as the one illustrated above, we
optimize the program flow for fields under an or tag. In this example, the optimized
code would do the following:
1. Attempt to read a T field.
a. Successfully read dataField, with value 100
b. Perform a switch on the read data value, to identify that we should read
a T100 field.
c. Read T100 specific fields
With certain file structures (especially ASN.1 files) this leads to massive performance
improvements. In order for the code to be optimized, there must exist at least two
consecutive fields defined under the or tag with:
1. No dynamic properties (not dynamically bound, bound to a generated type, or
supplying dynamically bound template arguments)
2. Bound to a type which share a common parent type (with the other fields
under the or tag)
3. The parent type’s fields occur first in the field definition (no fields prior to a
super tag)
4. Bound to a type with a test tag defined in its hierarchy.
LAVASTORM ANALYTICS
lavastorm.com
Page 181
Issue 1
LDR User Manual
5. On the first test tag encountered in the hierarchy, the value to test against is
defined in either
a. A fromParam tag, where the parameters are the same or the value
supplied to the parameters are the same.
b. A fromField where the referenced fields are the same
6. Any tags occurring prior to the test tag are equivalent.
There is the possibility that there are multiple subgroups under an or tag where
optimization can occur. In such situations, each subgroup will be optimized. This
means that if we have i fields which can be optimized in a group, followed by j fields
which can not be optimized, followed by another k fields that can be optimized, the
following will occur:
1. Switch on value for first optimize group to determine if any fields in the first i
group are present, if so, read the field that is present.
2. Else check each individual field in the j group in a non-optimized manner. If
any of these fields exist, then use this field & stop searching.
3. Else switch on value for second optimize group to determine if any fields in
the k group are present, if so, read the field that is present.
4. If nothing present, then fail out of the or tag with a NOT_ME result.
LAVASTORM ANALYTICS
lavastorm.com
Page 182
Issue 1
LDR User Manual
8 LAE Data Reading Interface
We have so far discussed the details of the LDR engine. The LDR engine is the backend data program that is extremely configurable, and handles a wide variety of
complex data formats. The problem is that because the LDR engine has such a
configurable input and output specification, it becomes very complex to use.
In general, end users will not want to create large complicated DRIX specification
files for simple file formats. Furthermore, for most of the simple cases, the end user
will not want to know about the underlying DRIX, and will want to simply use a
graphical interface that hides most of the unnecessary configuration from them.
This type of configuration hiding, and user interface support is extremely well suited
to the LAE/BRE. While it is stated that the LDR can be deployed independently of
BRE, there are pre-configured nodes in BRE that make the LDR a lot simpler for the
average user. As we will see in the following sections, LAE will interface with the
LDR through a base node, which allows all of the configuration present in the input &
output specification files. All other – more user friendly –nodes will inherit from this
base node, and simply restrict the specification.
In certain cases (ASN.1 & Cobol for example), the user will be allowed to provide
specifications in a different format that will subsequently be converted to the LDR’s
DRIX specification prior to the data being read & output.
8.1.1 Mapping of LDR Data Types to BRD
Table 22 – LDR to BRD data type mappings
LDR
Return
Type
byte
byte[]
short
int
long
ubyte
ushort
uint
BRD
Data Type
bint
string
bint
bint
blong
bint
bint
bint
ulong
blong
float
double
BigInteger
bdouble
bdouble
string
BigDecimal string
LAVASTORM ANALYTICS
lavastorm.com
Description
outputs a base64 encoded String
outputs the actual value, e.g. 255 255
outputs the actual value, e.g. 65535 65535
outputs the two’s complement equivalent, e.g.
MAX_UINT -1
outputs the two’s complement equivalent, e.g.
MAX_ULONG -1
outputs the value promoted to a double
outputs a string by calling toString(), e.g. “1234”
This is optional, and is set in a node parameter.
outputs a string by calling toString(), e.g. “12.34”
This is optional, and is set in a node parameter.
Page 183
Issue 1
LDR User Manual
String
String|unicode This is optional, and is set in a node parameter.
As shown above, in the case of the uint and ulong types, the output is packed into
the equivalent signed type (both Java and the LAE only provide signed data
types). The effect is that uint and ulong values greater than the signed maximum
are output as negative values, but the stored (i.e. hexadecimal) value is correct.
Also, for types that return a Java float, the return value is automatically promoted
to a Java double (in order to be output as a BRD bdouble). The implication of this
is a loss of precision, as often occurs with floating-point arithmetic. For example,
the 32-bit float value 123.4 becomes 123.4000015258789 when output to a BRD
file.
LAVASTORM ANALYTICS
lavastorm.com
Page 184
Issue 1
LDR User Manual
9 Compatibility
9.1 Self-Compatibility & Versioning
The LDR has a feature versioning system to ensure that DRIX’s and DROX’s can
specify which version of the LDR they require in order to run. The LDR in this sense
is described as a “feature”. In future versions, it may be possible for the DRIX or
DROX to specify the versions it requires on other “features”.
This feature versioning system is implemented in a DRIX or DROX via the “requires”
tag, which has the properties shown below.
DRIX Tag 54 requires
DROX Tag 16 requires
<requires>
Description
Used to indicate that the current DRIX/DROX depends upon a given
version of the specified feature.
Position
May appear 0..* times under a drix, or drox tag in a DRIX or DROX
specification respectively
Attributes
Required “feature” attribute (for 2.0, must be set to “ldr”)
Optional minimumVersion attribute
Optional maximumVersion attribute
Elements
None
The requires tag has the same format in both a DRIX and a DROX.
When present the required versions of the specified feature will be checked against
the version of the feature in the enrironment in which the LDR is running. If the
feature is not present with the correct version requirements, then the LDR will error.
9.2 Compatibility with other Software
The LDR is compatible with the following systems:
Solaris Sparc 9,10
HP UX 11.11 & 11.23 PA-RISC
HP UX 11.23 Itanium
Redhat Linux 4 &5
Oracle Enterprise Linux 4 & 5
Windows XP, Vista.
The following sections outline the LDR compatibility with different LAE versions.
LAVASTORM ANALYTICS
lavastorm.com
Page 185
Issue 1
LDR User Manual
9.2.1 LAE 3.x and Earlier
The LDR is not compatible with any version of LAE prior to LAE 4.x
9.2.2 LAE 4.x
The LDR is compatible with versions of LAE 4. x from LAE 4.1.4 forward. Each
version of the LDR is tied to a specific LAE version. The LDR installation is
packaged with the corresponding LAE installation which which it should be used.
Using incompatible LAE & LDR versions will result in incorrect LDR behavior.
LAVASTORM ANALYTICS
lavastorm.com
Page 186
Issue 1
LDR User Manual
10 Error Handling
The LDR provides extensive error handling facilities to ensure that data reading and
processing can occur with as few problems as possible. Whenever complex data
formats are used, it is likely that there will be minor issues with existing specifications
that do not entirely describe the data format, and there is also the high possibility of
mid-file data corruption causing parsing issues.
The exception handling features of the LDR are designed to ensure that wherever
possible and where the user requires, data processing is able to continue in spite of
minor data corruption issues. Furthermore, the LDR seeks to provide as much
information in the form of logs and field data traces to the user in case of non
recoverable errors due to file corruption or invalid file specification.
Given the complexity of the data formats that the LDR is able to handle, the
specification mechanism also requires some complexity. Therefore, for cases where
the input specification simply does not match the requirements of the specification
language, the error reporting provides fine-grained and useful messages that allow the
user to locate exactly where the errors are in their specification file.
10.1.1
Error Levels
There are three separate error levels within the LDR to allow for different
handling based on the severity of the problem encountered. In the default case, the
error levels are used to either simply provide warning and diagnostic messages, to
log errors that occurred which the LDR was able to recover from, and fatal errors
which causes the file processing to fail. However, as we will see in subsequent
sections, the user is able to tweak the LDR error thresholds to utilise these error
levels in different ways.
10.1.1.1 Ignore
The Ignore error level is simply used to denote that the error(s) should be
ignored. This can be set on the ParserLog to ensure that all warnings and
recoverable errors do not contribute to an error count, and that nothing is written
to log. This can also be set as an error level for the case when a specification
does not fail against a data file, however the entire data file is not read as a result
of successfully using the specification to read the file. This error level is most
useful when using error filtering to ensure that certain errors are ignored.
10.1.1.2 Info
Info level messages are useful for cases when there is no error, however can
show useful information. In most cases these will neither be written to the log,
or contribute to the error count. However, if a error is occurring and simply
using the error log is insufficient to determine the cause, it may be useful to
LAVASTORM ANALYTICS
lavastorm.com
Page 187
Issue 1
LDR User Manual
ensure that these messages are written to the log by changing the log threshold
to obtain further information.
10.1.1.3 Warnings
Warnings are simply used within the LDR to inform users of cases where there
may possibly be a problem with the specification or data file that they are using.
In general, the warnings will not refer to actual errors, but can be useful in
diagnosing problems that may occur later in the file processing operation.
Through use of the thresholding and logging settings discussed in section 10.1.2,
the user is able to configure how warnings are handled.
If a file format is to be used repeatedly, then it is recommended and reasonable
to output all warnings to a log file while configuring the reader. Clearly logging
all warning messages will incur some performance hit, therefore if the user is
satisfied that the reader is functioning correctly, and very large data files are
being processed, then it may be useful to turn off warning logging in order to
improve performance.
10.1.1.4 Recoverable Errors
Recoverable errors will generally occur when there are small amounts of corrupt
data in a file. These errors indicate that although a field could not be read, the
file processing can still continue.
For example, consider the case of a CSV file. If there are too few delimiting
commas on one line of the file, file processing can still continue on the next line,
however the data on the line with too few delimiting characters is corrupt.
Unless the output of the LDR is to be used in a downstream system requiring a
perfect and complete data file to be processed, there is no reason to stop
processing due to these recoverable errors.
With the default settings, file reading will continue in the face of recoverable
errors, and an error message will be logged.
10.1.1.5 Fatal Errors
Clearly, there are some errors which are not recoverable. For example, if a
specification is syntactically incorrect, if a type is referenced which does not
exist in a specification, or if the referenced data file does not exist, or is too
corrupt to process, then these result in a fatal error being thrown. At this point,
the exception is thrown back to the user from the LDR. When the LDR is used
as an API, then these can be handled however is necessary.
Thresholding and logging settings have no impact on fatal errors as there is no
way to ignore the problem and continue processing.
LAVASTORM ANALYTICS
lavastorm.com
Page 188
Issue 1
LDR User Manual
10.1.2
Thresholding & Logging
As inferred in the previous section, the LDR error handling is configurable
depending on the requirements of the user. This configuration is handled through
the ParserLog class within the LDR.
The ParserLog is an interface which has three key properties:
Error Threshold
Log Threshold
Maximum Number of Non Fatal Errors
These properties interact to provide the configuration required for error handling.
When the error threshold is set to Recoverable Errors, and the log threshold is set
to Warning, then the logging and exception handling functions as described in
the previous section.
Table 23 displays the possible combinations of these threshold settings and how
error handling is processed when using the default ParserLog. As we will see in
the following section, the empty ParserLog never logs any errors, and is simply
used to determine when a sufficient number of errors have been encountered to
cause file processing to fail.
Table 23 – Error and Log Thresholding
Error Threshold
Log Threshold
Ignore
Ignore
Maximum
Number of Non
Fatal Errors
NO_MAX
Ignore
Ignore
a
Ignore
Info
NO_MAX
Ignore
Info
a
Ignore
Warning
NO_MAX
Ignore
Warning
a
LAVASTORM ANALYTICS
lavastorm.com
Page 189
Outcome
Everything logged. File
processing only fails on
fatal errors.
Everything logged.
When a fatal error is
encountered, or the
total number of errors,
warnings or ignores
reaches a, file
processing fails
Everything more severe
than Info logged. File
processing only fails on
fatal errors
Everything more severe
than Ignore logged.
When a fatal error is
encountered, or the
total number of errors,
warnings, info’s or
ignores reaches a, file
processing fails
Everything more severe
than Info logged. File
processing only fails on
fatal errors
Everything more severe
than Info logged. When
a fatal error is
encountered, or the
Issue 1
LDR User Manual
Ignore
Recoverable Errors
NO_MAX
Ignore
Recoverable Errors
a
Ignore
Fatal Errors
NO_MAX
Ignore
Fatal Errors
a
Info
Ignore
NO_MAX
Info
Ignore
a
Info
Info
NO_MAX
Info
Info
a
Info
Warning
NO_MAX
Info
Warning
a
Info
Recoverable Errors
NO_MAX
Info
Recoverable Errors
a
LAVASTORM ANALYTICS
lavastorm.com
Page 190
total number of errors,
warnings, info’s, or
ignores reaches a, file
processing fails
Only recoverable errors
and fatal errors are
logged. File processing
only fails on fatal errors
Only recoverable errors
and fatal errors are
logged. When a fatal
error is encountered, or
the total number of
errors, warnings, info’s
or ignores reaches a,
file processing fails
Only fatal errors are
logged. File processing
only fails on fatal errors.
Only fatal errors are
logged. When a fatal
error is encountered, or
the total number of
errors, warnings, info’s
or ignores reaches a,
file processing fails
Everything logged. File
processing only fails on
fatal errors
Everything logged.
When a fatal error is
encountered, or the
number of entries more
sever than ignore
reaches a, file
processing fails
Everything more severe
than Ignore logged. File
processing only fails on
fatal errors
Everything more severe
than Ignore logged.
When a fatal error is
encountered, or the
number of entries more
sever than ignore
reaches a, file
processing fails
Everything more severe
than Info logged. File
processing only fails on
fatal errors
Everything more severe
than Info logged. When
a fatal error is
encountered, or the
number of entries more
sever than ignore
reaches a, file
processing fails
Only recoverable errors
and fatal errors are
logged. File processing
only fails on fatal errors
Only recoverable errors
Issue 1
LDR User Manual
Info
Fatal Errors
NO_MAX
Info
Fatal Errors
a
Warning
Ignore
NO_MAX
Warning
Ignore
a
Warning
Info
NO_MAX
Warning
Info
A
Warning
Warning
NO_MAX
Warning
Warning
a
Warning
Recoverable Errors
NO_MAX
Warning
Recoverable Errors
a
Warning
Fatal Errors
NO_MAX
LAVASTORM ANALYTICS
lavastorm.com
Page 191
and fatal errors are
logged. When a fatal
error is encountered, or
the number of entries
more sever than ignore
reaches a, file
processing fails
Only fatal errors are
logged. File processing
only fails on fatal errors.
Only fatal errors are
logged. When a fatal
error is encountered, or
the number of entries
more sever than ignore
reaches a, file
processing fails
Everything logged. File
processing only fails on
fatal errors
Everything logged.
When a fatal error is
encountered, or the
number of errors in total
(including warnings)
reaches a, file
processing fails
Everything more severe
than Ignore logged. File
processing only fails on
fatal errors
Everything more severe
than Ignore logged.
When a fatal error is
encountered, or the
number of errors in total
(including warnings)
reaches a, file
processing fails
Everything more severe
than Info logged. File
processing only fails on
fatal errors
Everything more severe
than Info logged. When
a fatal error is
encountered, or the
number of errors in total
(including warnings)
reaches a, file
processing fails
Only recoverable errors
and fatal errors are
logged. File processing
only fails on fatal errors
Only recoverable errors
and fatal errors are
logged. When a fatal
error is encountered, or
the number of errors in
total (including
warnings) reaches a,
file processing fails
Only fatal errors are
logged. File processing
Issue 1
LDR User Manual
Warning
Fatal Errors
a
Recoverable Errors
Ignore
NO_MAX
Recoverable Errors
Ignore
a
Recoverable Errors
Info
NO_MAX
Recoverable Errors
Info
a
Recoverable Errors
Warning
NO_MAX
Recoverable Errors
Warning
a
Recoverable Errors
Recoverable Errors
NO_MAX
Recoverable Errors
Recoverable Errors
a
Recoverable Errors
Fatal Errors
NO_MAX
Recoverable Errors
Fatal Errors
a
Fatal Errors
Ignore
N/A
Fatal Errors
Info
N/A
LAVASTORM ANALYTICS
lavastorm.com
Page 192
only fails on fatal errors.
Only fatal errors are
logged. When a fatal
error is encountered, or
the number of errors in
total (including
warnings) reaches a,
file processing fails
Everything logged. File
processing only fails on
fatal errors
Everything logged.
When a fatal error is
encountered, or the
number of recoverable
errors reaches a, file
processing fails
Everything more severe
than Ignore logged. File
processing only fails on
fatal errors
Everything more severe
than Ignore logged.
When a fatal error is
encountered, or the
number of recoverable
errors reaches a, file
processing fails
Everything more severe
than Info logged. File
processing only fails on
fatal errors
Everything more severe
than Info logged. When
a fatal error is
encountered, or the
number of recoverable
errors reaches a, file
processing fails
Only recoverable errors
and fatal errors are
logged. File processing
only fails on fatal errors
Only recoverable errors
and fatal errors are
logged. When a fatal
error is encountered, or
the number of
recoverable errors in
total reaches a, file
processing fails
Only fatal errors are
logged. File processing
only fails on fatal errors.
Only fatal errors are
logged. When a fatal
error is encountered, or
the number of
recoverable errors in
total reaches a, file
processing fails
Everything logged. Only
fatal errors cause file
processing to fail.
Anything with level
Issue 1
LDR User Manual
Fatal Errors
Warning
N/A
Fatal Errors
Recoverable Errors
N/A
Fatal Errors
Fatal Errors
N/A
greater than Ignore
logged. Only fatal
errors cause file
processing to fail.
All errors logged. Only
fatal errors cause file
processing to fail.
Only recoverable erros
and fatal errors are
logged. Only fatal
errors cause file
processing to fail.
Only fatal errors are
logged. Only fatal
errors cause file
processing to fail.
10.1.2.1 Default Log
The Default Parser Log implements the functionality as described in Table 23.
All log messages are simply written to a file that the user must specify. This is
the recommended log, unless the user wishes to have custom error handling.
10.1.2.2 Empty Log
The Empty Parser Log is the same as the Default Parser Log, except that it never
logs any messages to File. It only tracks the number of errors encountered, and
determines when the error threshold has been reached, such that file processing
fails. The Log Threshold has no impact on the Empty Parser Log. This is almost
the same as using the Default Parser Log, with the Log Threshold set to Fatal
Errors – except that in the case of the Default Parser Log, the Fatal Error will
still be written to file, whereas in the Empty Parser Log, this will not occur
10.1.2.3 Custom Logging
When using the LDR as an API, it is possible to write a custom logging class
that implements the ParserLog interface. Then, the user has complete control
over all error handling except in case of Fatal Errors, which will always cause
file processing to fail. For instance, when implementing a custom log, the user
may choose to ignore certain error types and fail on others. Alternately it could
be that rather than logging to file, the user may choose to stream the information
to another downstream system, or use this as part of some control mechanism.
10.1.3
Error Types
In order to provide useful error messages, and in order to allow for users to handle
errors based on the type of error (and the error level), all errors that are sent to the
ParserLog have their own error type. In addition to this, all Fatal Errors thrown
from the LDR which are of the LDRException type will have an Error Type
LAVASTORM ANALYTICS
lavastorm.com
Page 193
Issue 1
LDR User Manual
associated. SAXParseExceptions and IOExceptions are treated separately, and do
not have an Error Type. These are discussed in section 10.1.4.
If required for an external process, it is possible to get an error code off the error
type, since the error types themselves are simply enums, from which an ordinal
value can be obtained. A description of each of the error types is included in Table
24, along with the associated ordinal value (code) of the error type.
Table 24 – Error Types
Code
Error Type
0
ALL
1
NO_ERROR
Description
Never actually thrown as an error and will
not appear in any error output. This error
type has simply been introduced for field
based error filtering to catch all errors
(which is the same as specifying no error
type to filter on in the errors-rule-type).
Should not appear in any error output.
Simply used such that the error code 0,
indicates success (introduced for possible
later use)
Top Level Exception Types – Used for Repackaging Errors
General wrapping error type used in some
instances when another unknown
exception has occurred. This Error Type is
simply used to catch errors of some
unknown type and report to the ParserLog
prior to failing. An exception with this
Error Type will never be thrown from the
LDR without a root cause attached.
2
TOP_LEVEL_EXCEPTION
3
TOP_LEVEL_IO_EXCEPTION
The root cause exception should be
investigated to fix this problem.
Occurs whenever there an IOException
occurs while reading the file. These are
generally irrecoverable, and are thrown for
things such as a file not existing, having
incorrect permissions etc. This Error Type
is simply used to catch all IO errors and
report to the ParserLog prior to failing. An
exception with this Error Type will never
be thrown from the LDR, as the root cause
exception will be thrown.
The root cause exception should be
investigated to fix this problem.
Occurs whenever there is an error in the
format of the input specification file. This
Error Type is simply used to catch all xml
errors and report to the ParserLog prior to
failing. An exception with this Error Type
will never be thrown from the LDR, as the
root cause exception will be thrown.
The root cause exception should be
investigated to fix this problem.
4
The information provided to the user is
described in more detail in section 10.1.9
TOP_LEVEL_SAX_EXCEPTION
General Not Supported Issues
LAVASTORM ANALYTICS
lavastorm.com
Page 194
Issue 1
LDR User Manual
5
Thrown for case where the current version
of the LDR does not support what the user
is attempting. For instance, in version 1.0
of the LDR this will occur in the ASN.1
Converter when attempting to use the
Packed Encoding Rules.
NOT_SUPPORTED_EXCEPTION
Logging Errors
6
Error thrown when a log file is specified
to the ParserLog which cannot be accessed
or written to.
UNABLE_TO_WRITE_TO_LOG
Errors With Parameters to the LDR
7
Error Type describing that the build dir
location for dynamically constructed
Parser classes is incorrect. The build
directory is located under the build
subdirectory of the classpath location
specified in config to the LDR. Ensure
that this classpath is correct.
MALFORMED_BUILD_DIR_URL
Property Errors
8
9
PROPERTIES_NOT_SET
Error thrown is user written code (in an
expr or code tag, or a read, scan, skip or
test method etc) when an attempt is made
to access the LdrProperties object on a
Parser, however no properties were passed
to the LDR.
PROPERTY_EXCEPTION
Error thrown is user written code (in an
expr or code tag, or a read, scan, skip or
test method etc) when an attempt is made
to access a property from the
LdrProperties object on a Parser, and an
exception is thrown while trying to obtain
the property. The contained message
thrown from the properties object will also
be returned. This is generally for cases
where a property is not set.
File Structure Errors
10
11
FAILED_DATA_LAYOUT
This error message is thrown whenever
there is an issue while parsing a data file
that forces processing to halt. As much
information as possible as to the state of
the parsing operation is returned to the
user, as discussed in section 10.1.7.
MAXIMUM_NUMBER_OF_NON_FATAL_ERRORS
Error Type indicating the maximum
number of non fatal errors (with error
level >= error level threshold) set on the
ParserLog is reached. The details on how
this level is reached are described in
section 10.1.2.
This error type is used to indicate that the
input specification was used to
successfully parse part of the file, with the
input specification completely used.
However, the file was not completely read
at the end of processing.
12
LAVASTORM ANALYTICS
lavastorm.com
INCOMPLETE_FILE_PARSE
Page 195
Issue 1
LDR User Manual
For instance, this could occur if we had
100 numeric fields in a file, however, our
specification states that we read a min of
1, and max of 5 numeric fields. In this
case, the spec will be satisfied by the file,
but does not completely satisfy the file
itself.
General DRIX Definitional Issues: Duplicate Definition Errors
13
Error occurs if there are multiple types in
the specification with the same fully
qualified name. This means that two types,
with the same type name, in the same
namespace appear in the specification.
DUPLICATE_TYPE_DEFINITION
Error occurs if two or more parameters are
declared on a type, with the same
parameter name attribute.
14
This error will only be thrown when the
type is attempted to be compiled.
Therefore, this only occurs if the declared
type is ever directly or indirectly
referenced by the primary field.
DUPLICATE_PARAM_ON_TYPE
Error occurs if a type declares a
parameter, where somewhere in the type’s
inheritance chain a parameter with the
same name is also declared.
15
This error will only be thrown when the
type is attempted to be compiled.
Therefore, this only occurs if the declared
type is ever directly or indirectly
referenced by the primary field.
DUPLICATE_INHERITED_PARAM_ON_TYPE
Error occurs if two or more template
parameters are declared on a type, with the
same template parameter name attribute.
16
This error will only be thrown when the
type is attempted to be compiled.
Therefore, this only occurs if the declared
type is ever directly or indirectly
referenced by the primary field.
DUPLICATE_TEMPLATE_PARAM_ON_TYPE
Error occurs if two or more fields are
declared on a type, with the same field
name attribute. If the type contains
anonymous fields, then any named field
lying under the anonymous field can also
cause a DUPLICATE_FIELD_ON_TYPE
error.
17
18
LAVASTORM ANALYTICS
lavastorm.com
DUPLICATE_FIELD_ON_TYPE
This error will only be thrown when the
type is attempted to be compiled.
Therefore, this only occurs if the declared
type is ever directly or indirectly
referenced by the primary field.
DUPLICATE_INHERITED_FIELD_ON_TYPE
Error occurs if a type contains a field,
where somewhere in the type’s inheritance
chain a field with the same name is also
declared.
If the type or parent type contains
Page 196
Issue 1
LDR User Manual
anonymous fields, then any named field
lying under the anonymous field can also
cause the same error.
This error will only be thrown when the
type is attempted to be compiled.
Therefore, this only occurs if the declared
type is ever directly or indirectly
referenced by the primary field.
Error occurs if two or more type
parameters are declared on a generated
type, with the same type parameter name
attribute.
19
DUPLICATE_TYPE_PARAM_ON_GENERATED_TYPE
This error will only be thrown when the
type is attempted to be compiled.
Therefore, this only occurs if the declared
type is ever directly or indirectly
referenced by the primary field.
General DRIX Definitional Issues: Argument Errors
20
21
22
23
LAVASTORM ANALYTICS
lavastorm.com
INSUFFICIENT_TYPE_ARGUMENTS
This error is thrown when a field is
declared to be of a generatedType, but
does not supply sufficient type arguments
to fully define the type. The position
within the input specification where the
error occurs is returned to the user to
allow for easy debugging. Ensure that all
typeParams on the generatedType have
corresponding typeArgs on the field.
MISSING_TYPE_ARGUMENT
This error is thrown when a field is
declared to be of a generatedType, but one
of the type parameters declared on the
generated type does not have a
corresponding type argument on the field.
Whereas the
INSUFFICIENT_TYPE_ARGUMENTS
Error Type specifically relates to an
insufficient number of type arguments
declared on the field, this Error Type
relates to the names of the type arguments
not being correct for the declared
typeParams. The position within the input
specification where the error occurs is
returned to the user to allow for easy
debugging. Ensure that all typeParams on
the generatedType have corresponding
typeArgs on the field.
INSUFFICIENT_TEMPLATE_ARGUMENTS
Error Type used to define that an
insufficient number of template arguments
are declared on a field that is bound to a
template type. Ensure that the field
provides templateArgs for all of the
corresponding templateParams on the type
to which it is bound. The position within
the input specification where the error
occurs is returned to the user to allow for
easy debugging.
INVALID_NUMBER_OF_TEMPLATE_ARGUMENTS
Error Type used to define that the number
of template arguments declared on a field
does not exactly match the number of
Page 197
Issue 1
LDR User Manual
template parameters defined on the type to
which it is bound. In general, a field can
supply more template arguments than the
number of template parameters defined on
the type, unless the template arguments
are provided via the shorthand “{tArg,
tArg2}” notation. In this case, the number
of template arguments must match exactly
the number of template parameters.
Similarly, the order of the supplied
template arguments in this case must be
the same as the order the template
parameters are declared on the type.
Ensure that the field declares
templateArgs for all of the corresponding
templateParams on the type to which it is
bound in the correct order if using the
“{}” notation. The position within the
input specification where the error occurs
is returned to the user to allow for easy
debugging. If still experiencing issues, it
may be easier to specify all of the template
arguments in the long-hand templateArg
notation until the error is resolved.
24
Error Type occurring when using named
template arguments (via the templateArg
tag), where the number of template
arguments is sufficient for the type to
which a field is bound, however one of the
template parameters does not have a
corresponding template argument
specified. For example if on a type we
defined:
<templateParam name=”tParam”/>
And the on a field bound to this type we
specified a templateArg:
<templateArg name=”templateParam”
type=”….”/>
We would have sufficient template
arguments, however no argument for
“tParam” is ever provided. The position
within the input specification where the
field is defined, and the templateParam tag
which is unsatisfied is returned to the user
to allow for easy debugging.
MISSING_TEMPLATE_ARGUMENT
Error thrown when a constant template
argument is provided to a normal template
parameter, or a normal template argument
is provided to a constant template
parameter.
25
26
27
LAVASTORM ANALYTICS
lavastorm.com
MISMATCHED_TEMPLATE_ARGUMENT
The error message will specify which
template argument & parameter this
corresponds to in the DRIX & these errors
should be easily fixable.
MISMATCHED_TEMPLATE_ARGUMENT_RETURN_TYPE
Error thrown when a template argument is
supplied to a template parameter, but does
not satisfy the returnType requirements
specified on the template parameter.
MISMATCHED_TEMPLATE_ARGUMENT_BASE_TYPE
Error thrown when a template argument is
supplied to a template parameter, but does
not satisfy the baseType requirements
Page 198
Issue 1
LDR User Manual
specified on the template parameter.
28
UNNAMED_CONSTANT_TEMPLATE_ARGUMENT
Error occurs where an attempt is made to
supply a constant template parameter
using the shorthand {} notation. Currently,
constant template arguments can only be
provided using the long-hand
<templateArg …> notation.
Only constant template parameters
declared to be of primitiveTypes (e.g. int,
uint), primitive wrapper types (e.g.
Integer) or “String” are able to be
provided with constant template
arguments using the value attribute
notation. All other types must be supplied
using an expr tag.
29
ILLEGAL_STATIC_TEMPLATE_ARGUMENT_VALUE_TYPE
This error is thrown when a value attribute
form is used where the declared type of
the constant template parameter requires
that an expr tag be used.
Error Type returned when the code within
a generator tag is attempting to set an
ArgBinding which is not correct for the
tag for which the ArgBinding is being set.
For instance, max and min can have static
binding, however emittable and typeFrom
tags cannot. The details of the allowable
ArgBindings for the tag are returned.
30
This error will be wrapped in an
EXCEPTION_THROWN_BY_GENERA
TOR Error Type, which will provide
details on the location of the generator
method in the input specification.
ILLEGAL_ARG_BINDING
Error thrown when a javaType field has
arg tags declared. These are not allowed
on a javaType field. This will be handled
in a SAXParseException when the error
occurs in an input specification, however
if the error occurs in the code for a
generator method, this Error Type will be
returned.
31
This error will be wrapped in an
EXCEPTION_THROWN_BY_GENERA
TOR Error Type, which will provide
details on the location of the generator
method in the input specification.
JAVA_TYPE_WITH_ARGS
Error thrown when a javaType field has
templateArg tags declared. These are not
allowed on a javaType field. This will be
handled in a SAXParseException when
the error occurs in an input specification,
however if the error occurs in the code for
a generator method, this Error Type will
be returned.
32
LAVASTORM ANALYTICS
lavastorm.com
This error will be wrapped in an
EXCEPTION_THROWN_BY_GENERA
TOR Error Type, which will provide
JAVA_TYPE_WITH_TEMPLATE_ARGS
Page 199
Issue 1
LDR User Manual
details on the location of the generator
method in the input specification.
Error thrown when a javaType field has
typeArg tags declared. These are not
allowed on a javaType field. This will be
handled in a SAXParseException when
the error occurs in an input specification,
however if the error occurs in the code for
a generator method, this Error Type will
be returned.
33
34
JAVA_TYPE_WITH_TYPE_ARGS
This error will be wrapped in an
EXCEPTION_THROWN_BY_GENERA
TOR Error Type, which will provide
details on the location of the generator
method in the input specification.
SUPER_ARG_NOT_DECLARED_ON_PARENT
This is thrown when a type contains a
super->arg tag combination, where the
referenced argument is not declared on the
parent type – or anywhere in the parent
type’s inheritance chain.
Error thrown when a constant "value"
attribute provided on a field which is
either dynamically bound, or bound to a
generated type.
In order to specify argument values to
dynamically bound fields, or fields bound
to generated types, the "expr" element
must be used.
When providing the values through the
element form, the user can explicitly
surround the expression with quote (")
characters, implying that the expression is
to be treated as a String. This is not
possible in the "value" attribute form, as
all values must be surrounded with quote
(") characters.
STATIC_VALUE_ON_DYNAMIC_FIELD
Therefore, the type of the parameter to
which the arg is supplying a value is used
to determine whether or not the argument
itself is a literal String.
When dynamic binding is used, or when a
field is bound to a generated type, it is not
possible to know how to interpret the
argument.
STATIC_TEMPLATE_ARGUMENT_VALUE_ON_DYNAMIC_
FIELD
Similar to the above error condition, this
case occurs when a template argument is
supplied via a value attribute rather than
an expr tag when the field under which the
templateArg is declared has some
dynamic properties.
37
STATIC_VALUE_ON_DYNAMIC_TEMPLATE_ARGUMENT
Similar to the above error condition, this
case occurs when a dynamically bound
template argument is provided with arg
tags and the arg tags contain static value
attributes. In such cases, expr tags must be
used on the arg tags.
38
ILLEGAL_TEMPLATE_ARG_BINDING_UNDER_SUPER
Thrown if there is an invalid template arg
35
36
LAVASTORM ANALYTICS
lavastorm.com
Page 200
Issue 1
LDR User Manual
binding under a super tag. Under the super
tag, only statically bound template
arguments can be used. These can be
provided using the <templateArg
name=”..” value=”..”/> notation.
39
PARENT_TEMPLATE_PARAM_NOT_DECLARED
This is thrown when a type contains a
super->templateArg tag combination,
where the referenced template argument is
not declared as a template parameter on
the parent type – or anywhere in the parent
type’s inheritance chain.
This error will be thrown in the following
situation:
A super->templateArg notation is used on
a type, in the form:
<type name=”W”>
<super>
<templateArg name=”X”
type=”Y”/>
</super>
</type>
TEMPLATE_PARAM_NOT_DECLARED
However, there is no template parameter
“Y” declared on the type “W”.
TYPE_BOUNDING_ON_CONSTANT_TEMPLATE_PARAM
Error occurs when a constant template
parameter has either a returnType or a
baseType defined. These can only be
defined on “normal” template parameters.
INVALID_ARG_CLASS
This exception is thrown if an argument is
supplied to a parameter, and the type of
the argument does not match the required
type of the parameter.
This implies that the argument needs to be
changed.
ARG_OVERFLOW
Thrown in cases where an argument is
supplied to a parameter, and the value of
the argument cannot be packed into the
parameter (e.g. long to int conversion,
with the value > maximum value of an int)
INVALID_ARG_SIGN
Thrown when a signed argument is
supplied to an unsigned parameter, and the
signed argument is negative
45
INVALID_TYPE_ARG_CLASS
Thrown when a type argument is supplied
to a generatedType and the class of the
type argument cannot be cast to the type
of the typeParam to which it is supplied.
46
TYPE_PARAM_NOT_DECLARED
Thrown if a typeArg references a non
existant typeParam.
40
41
42
43
44
General DRIX Definitional Issues: General DRIX Errors
47
LAVASTORM ANALYTICS
lavastorm.com
This error occurs if a fromField tag is used
within a primitiveType tag. Since
primitive types cannot have fields, a
fromField tag cannot appear within a
primitiveType. This can only occur in a
FROM_FIELD_ON_PRIMITIVE
Page 201
Issue 1
LDR User Manual
super->arg tag under a primitiveType. The
position within the input specification
where the error occurs is returned to the
user to allow for easy debugging.
48
Error occurs when a primitiveType is
declared without a name attribute.
PrimitiveTypes cannot be anonymous, as
described in section 5.2.2.11.
ANONYMOUS_PRIMITIVE_FIELD
49
INVALID_INHERITANCE_TYPE
50
INVALID_FIELD_ACCESS_INDICES
This error occurs when a primitiveType is
declared to have a parentType which is
not a primitiveType, or when a standard
type is declared to have a parentType
which is not a standard type. The position
within the input specification where the
error occurs is returned to the user to
allow for easy debugging.
When a constructed type has a userwritten read method, then the fields are
obtained directly off the containing type,
and not via reading the subfields
individually. The subfields are accessed
using indices of the position of the field in
the containing type, and the field
occurrence numbers. If these are invalid,
then this error type is reported. This
should not generally happen and indicates
an internal fault in the processing of the
LDR.
Error Type occurring when a skip field is
declared to skip a javaType. This cannot
occur in an input specification, however
can occur from the code in a generator
method. The details of the type to skip are
included.
51
52
53
54
55
LAVASTORM ANALYTICS
lavastorm.com
JAVA_SKIP_TYPE
This error will be wrapped in an
EXCEPTION_THROWN_BY_GENERA
TOR Error Type, which will provide
details on the location of the generator
method in the input specification.
INVALID_ATTRIBUTE
Error Type describing that an attribute
does not match the required format. The
details of where the attribute is declared in
the input specification and the expected
format are reported to the user along with
the received attribute value.
ANONYMOUS_JAVA_TYPE,
Error occurs when a javaType field is
anonymous. javaType fields cannot be
anonymous, as described in section
5.2.2.11.
DYNAMIC_ANONYMOUS_FIELD
Error occurs when an anonymous field is
declared to have dynamic properties.
Anonymous fields cannot declare type
arguments, dynamic template arguments
or be dynamically bound.
ABSTRACT_INSTANTIATION
Error occurs if a primitiveType is
attempted to be instantiated, where the
primitiveType is abstract. A
Page 202
Issue 1
LDR User Manual
primitiveType is declared to be abstract if
it:
Does not declare or inherit a
returnType
Does not declare or inherit a
readMethod
Abstract primitiveTypes can be declared
in the specification, a field can never be
declared to be of an abstract
primitiveType. A concrete primitiveType
which inherits from the abstract
primitiveType must be used.
56
Error thrown when user written code in a
Generated Type attempts to set an offset,
align position etc using the bookmark
(“byte:bit”) notation, however the value is
not correct. This will be handled
differently when the error occurs via an
input drix, and a SAXParseException will
be thrown. However, when using the
Generator API, an exception with this
error type will be thrown and wrapped in
an
EXCEPTION_THROWN_BY_GENERA
TOR exception.
INVALID_BOOKMARK_VALUE
If we have a type A, and this is declared to
have a parentType B, and A is declared to
be emittable, with an emittable type X.
This is an error, if B, or any parent type of
B is declared to be emittable with an
emittable type Y, where: X<>Y and X does
not inherit from Y.
57
The scope of the emittable type can only
be narrowed, not widened down the
inheritance chain.
INVALID_EMITTABLE_INHERITANCE
The type attribute in an emittable tag must
correspond to one of:
A java type
A primitiveType with a declared
(or inherited) returnType
A standard/constructed type
with a valid emittable clause
INVALID_EMITTABLE_TYPE
If the emittable type does not correspond
to any of the above, this error is thrown.
SUPER_TAG_WITH_NO_PARENT_TYPE
Error thrown when a DRIX contains a
type which has a super tag, but the type
itself does not have a parentType declared.
60
INVALID_FROM_FIELD_INDICES
This error in cases where a fromField
argument exists, and the number of field
indices do not match the number required.
Essentially, this means that if we have a
field a.b.c, and b exists under a nested
anonymous loop on a, such that it is
possible to have a a.b[i][j].c, then
referencing a.b[i].c, or a.b[i][j][k].c will
throw this error.
61
NO_NEXT_FIELD_FOR_UNTIL
58
59
LAVASTORM ANALYTICS
lavastorm.com
Page 203
Issue 1
LDR User Manual
Occurs when a repeatRange tag contains
an until=”nextField” clause, however
within the type the repeatRange clause is
the last field structure element.
62
Currently, only field, skip or or tags are
allowed to follow immediately after the
repeatRange tag when a until=”nextField”
attribute is used within the repeatRange.
This error is thrown if a different element
(e.g. another loop) occurs after the
repeatRange.
INVALID_UNTIL_FIELD
General DRIX/DROX Definitional Issues: Not Found Errors
63
64
65
66
LAVASTORM ANALYTICS
lavastorm.com
PARAM_NOT_DECLARED_ON_TYPE
Error occurs when an arg tag is defined on
a field, where the type referenced does not
contain the corresponding param tag.
As discussed in section 5.2.1.10, this is
not a problem where the field has dynamic
properties.
However, if the field:
Is not dynamically bound to a
type
Does not have dynamically
bound template arguments, and
Does not specify any type
arguments.
Then the arg tag must correspond to a
param tag on the referenced type. The
position (line number, column number)
within the input specification where the
arg is declatrd is returned to the user to
allow for easy debugging.
PARENT_TYPE_NOT_FOUND
This error occurs if the type referenced in
a parentType attribute cannot be located in
the Spec. Ensure that the type is correctly
referenced, with a namespace qualified
name if necessary, or within a using stack.
Also ensure that if the type lies in another
library, this library is being included
correctly. The position within the input
specification where the error occurs is
returned to the user to allow for easy
debugging.
PARAM_TYPE_NOT_FOUND
This error occurs if the type referenced in
a param->type attribute cannot be located
in the Spec. Ensure that the type is
correctly referenced, with a namespace
qualified name if necessary, or within a
using stack. Also ensure that if the type
lies in another library, this library is being
included correctly. Ensure that if the type
is a javaType, it is labeled as a javaType,
and not simply a type. The position within
the input specification where the error
occurs is returned to the user to allow for
easy debugging.
FIELD_TYPE_NOT_FOUND
This error occurs if the type referenced in
a field->type attribute cannot be located in
the Spec. Ensure that the type is correctly
referenced, with a namespace qualified
name if necessary, or within a using stack.
Page 204
Issue 1
LDR User Manual
Also ensure that if the type lies in another
library, this library is being included
correctly. Ensure that if the type is a
javaType, it is labeled as a javaType, and
not simply a type. The position within the
input specification where the error occurs
is returned to the user to allow for easy
debugging.
67
68
69
70
FIELD_NOT_FOUND
Error Type indicating a field referenced in
the output specification does not appear in
the input specification. Details of the field
referenced are provided to the user.
RETURN_TYPE_NOT_FOUND
Error thrown when the declared return
type of a primitive type cannot be located
(is not a valid java class and does not
reference an LDR type).
TEMPLATE_PARAM_TYPE_NOT_FOUND
Error thrown when a constant template
parameter is declared to be of a certain
type, however the type cannot be found.
TEMPLATE_PARAM_BASE_OR_RETURN_TYPE_NOT_FOU
ND
Error thrown when a template parameter
declares a base or returnType, however
the base/return type declared cannot be
found.
General DRIX Definitional Issues: DRIX Logic Errors
Error thrown if there is the possibility of
an infinite loop in the input DRIX. This
will generally occur on an unbounded
repeatRange surrounding an or-true tag, or
surrounding a repeatRange with a min=0
attribute.
71
72
73
LAVASTORM ANALYTICS
lavastorm.com
POSSIBLE_INFINITE_LOOP
This error is thrown either at the parser
compile time, or at runtime, after the
evaluation of all dynamic arguments, prior
to entering the loop.
APPARENT_INFINITE_LOOP
This error is thrown when the LDR
recognizes that it is in an infinite loop
situation. This error is thrown within
while/repeatRange loops, when 10,000
iterations are reached, and the file position
has not been modified since the beginning
of the loop.
INFINITE_RECURSION
Error thrown in infinite recursion cases.
This does not catch all cases of infinite
recursion, and it is still possible for the
LDR to throw StackOverflowErrors for
cases when an invalid DRIX is causing a
recursive definition.
However, this catches all of the cases that
are definitely going to cause an infinite
recursion problem.
This can occur for one of two reasons:
An anonymous field is of the
same type as the type within
which the field is defined
The first field f defined within a type t is
also of type t.
Page 205
Issue 1
LDR User Manual
Error type used as a warning if the first
field to parse in a data file has an offset
specified.
74
The offset attribute should generally be
used relative to the start file position of a
containing type. Specifying an offset at
the top level is akin to saying that the first
x bytes and bits should be skipped.
ABSOLUTE_OFFSET_WARNING
Compilation Errors
Error occurs when trying to dynamically
compile the Parser classes from the input
specification. In order to ensure optimal
performance of the LDR, none of the
elements or attributes in the input
specification that correspond to java code
(value tag, readMethod, scanMethod, test
etc) are parsed prior to attempting to
compile them. This is normally where the
TypeCompilationExceptions will occur.
75
It is assumed that if the user is utilising
these tags, they are able to understand the
comiler errors that will be returned. The
full compiler stack trace is provided with
these error messages, along with the name
of the type that is being compiled, and the
location in the input specification where
the type is declared.
TYPE_COMPILATION_ERROR
Loading Errors
76
Error type reported if a dynamically
constructed & compiled Parser class (from
a DRIX type) cannot be loaded. This
should not happen and points to an
internal LDR error.
ERROR_LOADING_CLASS
Instatiation Errors
77
Error returned if a dynamically compiled
type cannot be instantiated. In general this
should not occur. Most errors that could
occur during instantiation are caught by an
ABSTRACT_INSTANTIATION Error
Type. However, it is possible that there is
user-defined code which is incorrectly
attempting to instantiate a type with
incorrect arguments, or attempting to
instantate an abstract type. The location
within the input specification of the type
being instantiated is also returned to the
user.
TYPE_INSTANTIATION_ERROR
Errors Occurring During File Parsing
78
LAVASTORM ANALYTICS
lavastorm.com
This error occurs when a field is attempted
to be accessed without the field first being
read. This can normally occur in the
following situations:
The field is being referenced
correctly, however it does not
have the readRequired attribute
set
FIELD_NOT_SET
Page 206
Issue 1
LDR User Manual
79
80
81
The field that is accessed lies
within an or tag, and is accessed
outside of the or tag, however
the field was not set in the or.
The field is referenced prior to
being declared. E.g. field1 is
defined prior to field2 under a
type, however field1 has a
<fromField field=”field2”/> tag.
INVALID_FIELD_PARSER_ID
This error is thrown by the generated code
if one of the Parser setup methods is called
with a field id for a field that does not lie
under the Parser. This error should not
occur unless user-defined code is
attempting to call one of the setupParser
methods.
INVALID_FIELD_NAME_FOR_TYPE
This error is thrown by the generated code
if one of the subFields on a Parser is
attempted to be accessed via field name,
and that field does not lie under the Parser.
This error should not occur unless userdefined code is attempting to call the
generalised Parser field accessor.
INVALID_FIELD_INDEX_FOR_TYPE
This error is thrown by the generated code
if one of the Parser field setup methods is
called with a field id that does not lie
under the Parser. This error should not
occur unless user-defined code is
attempting to call the field setup method
on a Parser.
This error is called if a standard type does
not declare a read method, does not inherit
a read method, and has no fields declared
within the type from which the engine can
determine how to read the field. The error
only occurs if the read method of the type
is ever called. If a type is to be read, it
needs a mechanism to read its subfields.
If these fields are not javaType fields, then
the read mechanism is autogenerated. For
javaType fields, a expr tag is required
such that the engine knows how to
initialize the fields.
If, however, no fields are declared, or only
javaType fields are declared with no expr
tag, there is no way for the LDR to read
the fields and this Error Type is returned.
NO_READ_MECHANISM_SPECIFIED
The position within the input specification
where the type is defined is returned to the
user to allow for easy debugging.
83
NO_SUB_PARSER_ON_PRIMITIVE
This error is thrown by the generated code
if one of the Parser setup methods is called
for a primitive type. This error should not
occur unless user-defined code is
attempting to call one of the setupParser
methods.
84
NO_CACHED_PARSER_AVAILABLE
This error is thrown by the generated code
82
LAVASTORM ANALYTICS
lavastorm.com
Page 207
Issue 1
LDR User Manual
if one of the Parser setup methods is
called, with the arguments specifying that
a cached Parser should be used, but that
Parser has not yet been created and
cached. This error should not occur unless
user-defined code is attempting to call one
of the setupParser methods.
85
Error Type occurring whenever an end
token is placed on the ticker tape for
which there is no start token. This should
not occur unless the user is writing their
own read, scan, skip, skipCount or test
methods, or implementing their own code
section. In general this will occur when:
A user defined read, scan, skip
or skipCount method invokes
the context.skip/scan/read
Result or skip/scan/read Start
methods. These should never be
called by user written code.
A user defined test method, or
user defined code calls
context.read/scan/skipResult.
NO_START_TOKEN
Whenever looping fields are used in the
LDR, and their values are required in
array form on the containing type, the read
field values must be copied from the subfield onto the containing type.
For all LDR types, this is not a problem.
For primitives, however, issues can arise.
The primitive type will have a return type
specified which corresponds to a java
primitive or java object.
If it is primitive, there is no problem, as
these are immutable and can simply be
copied.
If the returnType corresponds to a known
immutable type (Integer, String,
BigInteger etc), then this also isn’t a
problem.
Furthermore, if the class implements the
Cloneable interface, then the field will be
cloned.
86
For all other objects, however, the LDR
assumes that they are immutable and logs
a warning message.
NON_CLONEABLE_ASSUMING_IMMUTABLE
As discussed in Error Type 2, fields will
be cloned where required, if they are
required on a containing type and the
subtype is declared to loop. If required,
the LDR will check if the type implements
the Cloneable interface, and if it does, it
will attempt to clone the field using
reflection. If the class of the object,
however, is only declared to be cloneable
and does not implement the clone method
publicly, then this exception is thrown.
87
LAVASTORM ANALYTICS
lavastorm.com
CLONE_EXCEPTION_ON_CLONEABLE_OBJECT
Page 208
In these cases, wherever possible it is
recommended that the primitiveType for
Issue 1
LDR User Manual
which this is occurring declare a different
returnType which is immutable or
publicly cloneable.
Thrown when the engine attempts to
access a parameter during file parsing,
however the parameter has not been set.
88
It is possible for the user to not supply
arguments to all parameters, however
these parameters either must not be
required, or have defaults supplied. If a
parameter is accessed which has not been
set (either through a field->arg tag
combination, a super->arg tag
combination, or via a param->default tag
combination), then an exception with this
error type will be thrown.
PARAM_NOT_SET
Generator Errors
89
90
ILLEGAL_PARENT_RESET_BY_GENERATOR
Error Type occurring specifically within
generator code blocks. When using the
Generator API, you can never assign an
object to lie under two different parent
types (e.g. the same ReadMethod object
declared on two types). The generator
code must use different objects for each of
the LibraryElement types to ensure that
the element tree is traverseable in the
same manner as it would be from a DRIX
input specification.
GENERATOR_REDEFINITION
This Error Type occurs when the code in a
generator method attempts to set a tag
value twice, where only one of these tags
is allowed. For example, attempting to set
two ReadMethods under a type. This error
will be wrapped in an
EXCEPTION_THROWN_BY_GENERA
TOR Error Type, which will provide
details on the location of the generator
method in the input specification.
Error Type returned when the code within
a generator tag is attempting to set a value
which does not match the required format.
For example, a Max value which is not
[0-9]+|unbounded.
91
92
INVALID_GENERATOR_VALUE
This error will be wrapped in an
EXCEPTION_THROWN_BY_GENERA
TOR Error Type, which will provide
details on the location of the generator
method in the input specification.
EXCEPTION_THROWN_BY_GENERATOR
General catch-all Error Type specifying
that an error occurred within the code of a
generator method. The details of the
contained exception are returned, as is the
position in the input specification of the
generator tag causing the problem.
Errors thrown from by Types in DRIX Libraries
93
LAVASTORM ANALYTICS
lavastorm.com
Error Type used by primitive types with a
INVALID_FIELD_LENGTH
Page 209
Issue 1
LDR User Manual
length parameter, where the length
specified is invalid for the primitive type.
94
INVALID_BIAS
95
DELIM_ARG_EMPTY
This error is thrown by primitive types of
an ExcessN format, where the N is a bias.
The bias value is a parameter, and if the
parameter supplied is invalid for the type,
this Error Type is used.
Occurs when a delimited type is passed an
empty array for its “delims” parameter.
105 INVALID_ENDIANNESS
Error thrown when a delimited type is
defined to have both a record delimiter,
and a field delimiter, and receives a field
delimiter when it expects to see a record
delimiter or vice versa.
For zoned primitive types, when the upper
nibble of a byte is not 0xF.
For zoned and packed primitive types,
when a digit is not in 0-9.
For zoned and packed primitive types,
when the sign nibble is not 0xC or 0xD for
signed types, or 0xF for unsigned types.
For numeric delimited types, when an
value is read.
For numeric primitive and delimited types,
when the string cannot be parsed as a
number.
For Unicode types, when an
illegal/incomplete byte sequence is
encountered.
When a delimited type exhausts the buffer
because it cannot find a closing quote
and/or a delimiter.
Some string to integer types log this error
(e.g. AsciiToUInt8) when the number
represented in the input string exceeds the
range of the type.
For primitive types with a bitLength
parameter, if bitLength is not divisible by
8 and bigEndian=false, then the value is
read in bigEndian byte order (i.e. ignores
the bigEndian arg) and a warning with this
ErrorType is logged.
106 INVALID_DATA_FORMAT
General error type for any case when there
appear to be no issues in the input
specification file, however the data file
cannot be processed as it does not satisfy
the specification. This could be due to
invalid data or an incorrect specification.
These errors can often be difficult to
diagnose. However in order to provide
assistance, details on why the field parsing
failed, including what success was
achieved are returned to the user. This is
described in more detail in section 10.1.7
107 CONFLICTING_PARAMS_SET
Thrown in cases where multiple mutually
exclusive params are supplied with args.
108 INVALID_SIGN_IND
Thrown for invalid positive, negative or
unsigned ind args on a zoned type.
109 ILLEGAL_ARGUMENT
General catch all error for any
96
WRONG_DELIM_FOUND
97
INVALID_ZONE_NIBBLE
98
INVALID_DIGIT
99
INVALID_SIGN
100 EMPTY_STRING_TO_NUMBER
101 NUMBER_FORMAT_EXCEPTION
102 CHARSET_DECODING_EXCEPTION
103 BUFFER_UNDERFLOW_EXCEPTION
104 OUT_OF_RANGE
LAVASTORM ANALYTICS
lavastorm.com
Page 210
Issue 1
LDR User Manual
illegal/invalid argument supplied to a
parameter.
ValueParser Registry Errors
110 UNKNOWN_VALUE_TYPE_TO_PARSE
Error type logged in cases where an argvalue is supplied to a param, and the LDR
does not know how to correctly parse the
param type.
111 ERROR_PARSING_VALUE
Logged if an error is encountered
attempting to parse a static value argument
to a param.
Payload Parsing Errors
112 UNKNOWN_PAYLOAD_TYPE_TO_PARSE
Thrown if there is a payload tag on a type,
and the payload class is unknown.
113 ERROR_PARSING_PAYLOAD
Thrown if there was an error while
attempting to parse a payload value.
114 ERROR_OBTAINING_PAYLOAD_PARSER
Thrown if the payload parser class
specified for a given payload data type
cannot be loaded.
Field Registry Errors
115 FIELD_ALREADY_REGISTERED
Thrown if there is an attempt to register a
field twice with the field registry. This is
an error implying incorrect internal
operation and should never be thrown by
the LDR
116 PARENT_FIELD_NOT_REGISTERED
Thrown if a subfield is attempted to be
registered prior to the field it lies under.
This is an error implying incorrect internal
operation and should never be thrown by
the LDR
117 ALIAS_ALREADY_REGISTERED
All fields are registered in the field
registry. When a field has a name, but lies
somewhere under a structure where there
are unnamed fields, then an alias is also
registered. This error is thrown for the
duplicate registration of aliases on a field.
This is an error implying incorrect internal
operation and should never be thrown by
the LDR
118 FIELD_NOT_REGISTERED
Occurs if a field has not been registered in
the field registry and the registry is
searched for the specified field. This is an
error implying incorrect internal operation
and should never be thrown by the LDR
Field Name Errors
119 NAME_SYNTAX_ERROR
An error thrown if a fieldname provided
via the DROX is invalid according to the
allowable field patterns.
120 NAME_SUBSTITUTION_ERROR
Thrown when a substitution index
referenced in a names pattern does not
exist in the corresponding field pattern.
For instance, if there are 3 substitution
LAVASTORM ANALYTICS
lavastorm.com
Page 211
Issue 1
LDR User Manual
parts/groups in the field pattern, and an
attempt is made to use substitution
part/group 5, then this will throw an error
Output Errors
121 EMPTY_OUTPUT_METADATA
Error thrown if the metadata is completely
empty for an output. This means that the
combinations of includes, excludes &
mapping references have resulted in the
selection of no fields.
122 OUTPUT_NOT_REGISTERED
Thrown if an attempt is made to unregister
an output which was not registered with a
Distributor.
123 RECORD_OUTPUT_REGISTRATION_ERROR
Error thrown whenever an attempt to
register a RecordOutput fails. The detailed
reasons of why this fails will be included
in the error message.
124 ATTRIBUTE_REFLECTION_ERROR
Error while attempting to instantiate the
class of a dumper.
125 DATA_READER_OUTPUT_COMPILATION_FILE
Thrown if there is an error building the
output objects from their DROX
definitions.
Output Mapping Errors
126 UNKNOWN_OUTPUT_MAPPING
Error thrown if an output references a
mapping that does not exist in the DROX,
or if a mapping has a mappingReference
pointing to a mapping that does not exist.
127 UNNAMED_OUTPUT_FIELD
This error is generally thrown when using
fromField renaming. If the name comes
form a field which has not been read at the
time the output metadata has been
finalized, this error is thrown.
128 DUPLICATE_OUTPUT_FIELD
Error occurs if there is an incorrect
attempt to output multiple fields to the
same input field, without using a mapping
union.
129 MULTIPLE_MATCHING_FROM_FIELDS
Thrown if a fromField renaming matches
to multiple fields in the DRIX. The
fromField renaming can only match one
field.
130 FIELD_RESOLUTION_FAILURE
Error thrown if the list of fields from a
containing type cannot be obtained. This
is an error implying incorrect internal
operation and should never be thrown by
the LDR
131 OUTPUT_COMPILATION_FAILED
This is a general error thrown when the
DROX is being used to initialize the
outputs. The error log will contain the
details of why the compilation failed.
132 MULTIPLE_TRIGGER_EVENTS
Error thrown if a mapping contains
multiple trigger events, which have not
been included via mapping unions.
LAVASTORM ANALYTICS
lavastorm.com
Page 212
Issue 1
LDR User Manual
133 OUTPUT_FIELD_AFTER_TRIGGER_EVENT
Warning issued for cases where a field
occurs after a trigger event and therefore
will not be output. As field suspension
should allow for this without problem, this
implies incorrect internal operation and
should never be thrown by the LDR
134 NO_MATCHING_FIELDS
Error thrown when an include or exclude
field pattern does not match to any of the
fields in the specification. The level of this
error will be dependant on the onNoMatch
settings of the tag under which the
include/exclude is defined.
Output Generation Errors
135 DUMP_COMPILATION_FAILED
General error thrown when the
initialization of outputs from a dump
specification fails. The error log will
contain the details of why the compilation
failed.
136 OUTPUT_GENERATION_FAILED
General error thrown when attempting to
generate outputs using a dump
specification. The error log will contain
the details of why the generation failed.
137 DUMP_PATTERN_CREATION_FAILED
General error thrown when the generation
of a pattern for a dump specification
cannot be performed. The error log will
contain the details of why this failed.
138 DUMPER_COMPILATION_FAILED
Thrown if a dumper cannot be compiled.
Ticker Tape Errors
Error thrown while attempting to create
the tickertape. This will normally be due
to file I/O issues where the file backed
ticker tape is attempting to create a ticker
tape file where the permissions are
incorrect or the location does not exist on
disk. The details of the error message will
provide more information.
139 TICKER_TAPE_CREATION_FAILED
Threading & Timing Errors
140 PARSER_THREAD_CREATION_ERROR
Error thrown while attempting to create
the ParserThread. In general, the error log
should contain the details of why the
thread creation failed.
141 DISTRIBUTOR_THREAD_CREATION_ERROR
Error thrown while attempting to create
the DistributorThread. In general, the error
log should contain the details of why the
thread creation failed.
142 STOP_EVENT
Error type reported for when one of the
threads in the LDR is told to stop. This
could be because of an error in another
thread, or because the LDR was explicitly
told to stop via the API.
Catch-all for Vault Conversion Errors
Thrown if any error occurs during the
143 VAULT_CONVERSION_ERROR
LAVASTORM ANALYTICS
lavastorm.com
Page 213
Issue 1
LDR User Manual
conversion of the deprecated GDR Vault
that is related to the Vault itself, and not
specifically to the LDR.
144 COBOL_CONVERTER_ERROR
Thrown for errors occurring during
conversion of COBOL specifications to
LDR types
145 ASN1_CONVERTER_ERROR
Thrown for errors occurring during
conversion of ASN.1 specifications to
LDR types
10.1.4
Exception Types
The previous section defined all of the Error Types that can occur for an
LDRException. However, LDRExceptions are not the only errors that are thrown
from the LDR. There are also SAXExceptions, IOExceptions, and other nondeclared exceptions that can be thrown from invalid code. Most of the exceptions
that will be seen are one of the three exception types mentioned and these are
briefly discussed in this section.
10.1.4.1 LDRException
Most exceptions thrown from the LDR are either LDRExceptions or
SAXExceptions. In general, LDRExceptions are thrown for cases when the
input specification is valid but issues arise attempting to construct an object
model from the input specification. These exceptions are also thrown when the
data is unable to be parsed according to the specification. An LDRException
itself will never be thrown. Rather one of its subtypes will be thrown. These
subtypes are:
FieldNotFoundException
This is an exception thrown when a field is referenced in the output
specification but does not exist in the input specification.
TypeNotFoundException
This exception is thrown when a field is declared to be of a particular type,
however that type cannot be located in the input specification.
TypeInstantiationException
This exception occurs when the dynamically constructed Parser class for a
type cannot be instantiated.
ParserException
This is a general exception case which covers a wide variety of the Error
Types listed in Table 24. The most common type of ParserException that will
be thrown is a FAILED_DATA_LAYOUT error which occurs when the data
file cannot be parsed according to the specification. Any errors with
compilation (due to invalid Read, Scan, Skip, Code etc methods) will also
LAVASTORM ANALYTICS
lavastorm.com
Page 214
Issue 1
LDR User Manual
result in a ParserException. In addition to these, any errors caused by an
invalid generator method will result in a ParserException.
OutputException
This is a general exception case which covers most of the errors occurring
while initializing the outputs. For example a common case where this error
would be thrown is for duplicate trigger events, or fields referenced in the
output that are not declared in the input specification.
10.1.4.2 SAXException
SAXExceptions are thrown whenever there is an issue in parsing the input
DRIX specification, or any of the included libraries. There are two major types
of SAXException – those that are thrown by the SAXParser, and those that are
thrown by the LDR itself.
The LDR uses a SAXParser to validate the input DRIX specification files
against the LDR Input Specification XSD. If there are any general XML issues,
whereby the specification does not conform to the XSD, then the SAXParser
will throw an exception.
For all other cases, which involve some validation on the logic in the input
specification, and validation external to the XSD, the LDR will generate a
SAXParseException. The details on how these exceptions are reported back to
the user are contained in section 10.1.9.
10.1.4.3 IOException
IOExceptions cover all general I/O issues that can occur while processing a data
file. In general, attempts to read beyond the end of a file should be handled by
the primitive types that access the buffer. The buffer itself will throw an
LDRBufferUnderflowException, which should be checked for by all primitive
types whenever using the buffer. However, in other cases where files are nonexistant, or corrupted/deleted after opening, IOExceptions can be thrown.
10.1.4.4 Non Declared Exceptions from custom code
Since the LDR allows for users to write their own code in scan, skip, read,
skipCount, test and generator methods, and also in extCode and code blocks,
there can be other non-declared exceptions that get thrown from the LDR. These
should not occur unless there is user-defined code in an input specification or
referenced library.
10.1.5
LAVASTORM ANALYTICS
lavastorm.com
Error Filtering on Fields
Page 215
Issue 1
LDR User Manual
Within a “field” tag in a DRIX, it is possible to define actions to perform in case
of certain errors. These filters apply to errors occurring on the feld itself, and all of
its nested subfields. The errorFilters tag has the properties shown in DRIX Tag 55.
DRIX Tag 55 errorFilters
<errorFilters>
Description
Position
Specifies the error filters that are to be applied to a field and all of its
nested subfields.
0..1 errorFilters tags may exist within a field tag.
When it exists, the errorFilters tag must be the last element to
appear under a field tag.
Attributes
Elements
None
1..* error tags
The contents of the errorFilters tag are one or more error tags, with the properties
shown in DRIX Tag 56.
DRIX Tag 56 error
<error>
Description
Position
Attributes
Elements
Specifies the action that is to be performed for all errors that match
the specified rules.
1..* error tags may exist within a errorFilters tag.
None
0..* rule tags
1 action tag
Any rule tags must be defined prior to the action tag.
Each of these error tags specifies that a given action will be performed for all
errors logged on the field (or subfields) under which the errorFilters tag is
declared, whenever the error matches the specified rules on the error tag. If no
rules are specified, the action will be performed on all errors occurring on the
field.
When present, any of the rules must be matched for the action to execute
(therefore the test is a logical or, not a logical and). The rules to apply are
specified in a sequence of “rule” tags as shown in DRIX Tag 57.
DRIX Tag 57 rule
<rule>
Description
Position
Specifies an error rule which must be matched for an error-action to
be applied
0..* rule tags may exist within a error tag.
Where present, these must exist before the “action” tag.
Attributes
Elements
None
0..* errorType tags
0..1 errorLevel tags
Any errorType tags must appear before the errorLevel tag if one
exists.
LAVASTORM ANALYTICS
lavastorm.com
Page 216
Issue 1
LDR User Manual
The rule itself is basically a combination of a set of errorTypes, and an errorLevel.
The errorType and errorLevel tags are shown in DRIX Tag 58 and DRIX Tag 59
respectively. In order for the rule to match, it must match at least one of the
errorType tags, and match the errorLevel tag.
DRIX Tag 58 errorType
<errorType>
Description
Position
Specifies an errorType be used as a match criteria for an error-rule.
0..* errorType tags may exist within a rule tag.
Where present, these must exist before the “errorLevel” tag.
Attributes
Elements
None
Required String name attribute
The errorType contains a simple name attribute, which must match one of the
error types declared in Table 24.
DRIX Tag 59 errorLevel
<errorLevel>
Description
Position
Specifies errorLevels to be used as a match criteria for an error-rule.
0..1 errorLevel tags may exist within a rule tag.
Where present, these must exist after any declared “errorType”
tags.
Attributes
Elements
Optional minLevel attribute
Optional maxLevel attribute
If minLevel & maxLevel are not present, this is the same as not
having an errorLevel tag.
None
The errorLevel contains an optional minLevel and an optional maxLevel attribute.
These must match one of the error levels defined in section 10.1.1. The errorLevel
part of the rule is matched if minLevel<=errorLevel<=maxLevel.
Within an error tag, if any of the error rules are matched for a given error, then the
related action is executed. The action tag has the properties defined in DRIX Tag
60.
DRIX Tag 60 action
<action>
Description
Position
Specifies the action to perform if an error rule is matched
1 action tag must exist within an error tag.
Attributes
Elements
This tag must exist after any rule tags.
None
0..1 setErrorLevel tags
LAVASTORM ANALYTICS
lavastorm.com
Page 217
Issue 1
LDR User Manual
Currently, there is only one action that can be performed for an error that matches
a given rule, which is to re-set the error level of the error. This may be extended in
future to allow for more complex actions.
Error filters are applied top-down and in order of appearance. This means that if a
field defines an errorFilters tag, and a subfield of the field also contains an
errorFilters tag, the errorFilters tag for the field is applied before the errorFilters
tag of the subfield. Each of the individual error tags are applied in order. If an
error that is being logged ever matches one of the error-rules, then the action for
this rule is executed, and no further error filters are applied.
Consider the DRIX shown in Example 147.
Example 147 – error filtering example
<type name="Type1">
<field name="field1" type="SubType">
<errorFilters>
<error>
<rule>
<errorType name="WRONG_DELIM_FOUND"/>
<errorType name="INVALID_ZONE_NIBBLE"/>
<errorLevel maxLevel="RECOVERABLE_ERROR"/>
</rule>
<rule>
<errorType name="INVALID_DIGIT"/>
<errorLevel maxLevel="WARNING"/>
</rule>
<action>
<setErrorLevel level="IGNORE"/>
</action>
</error>
<error>
<rule>
<errorType name="INVALID_SIGN_NIBBLE"/>
<errorLevel minLevel="WARNING"/>
</rule>
<action>
<setErrorLevel level="IGNORE"/>
</action>
</error>
</errorFilters>
</field>
</type>
<type name="SubType">
<field name="subField" type="SubType">
<errorFilters>
<error>
<action>
<setErrorLevel level="WARNING"/>
</action>
</error>
</errorFilters>
</field>
</type>
Then, in this example, consider the case where an error occurs on field.subField.
When this error is to be handled by the log, the following will occur:
LAVASTORM ANALYTICS
lavastorm.com
Page 218
Issue 1
LDR User Manual
1
2
3
If the error is of error type WRONG_DELIM_FOUND or INVALID_ZONE_NIBBLE,
and the error level is >= RECOVERABLE_ERROR.
Set the error level to IGNORE
Else if the error is of error type INVALID_SIGN_NIBBLE and the error level is
>= WARNING
Set the error level to IGNORE
Else
Set the error level to WARNING.
10.1.6
Identifying Errors on Output Records
During normal operation and in the absence of readrequired flags, the LDR
operates in a two-step manner. First the file is scanned, and the structure of the file
is determined. Then, the required fields are read from their known file locations
and written to output records. During the scanning phase, minimal validation of
the field data is performed. This is because the LDR only cares about the structure
of the file at this point, and does not care whether or not all of the fields are
encoded correctly. If there are encoding issues – that do not affect the file
structure – and the LDR can continue scanning, it will do.
This is done partly for optimization reasons as there is no point decoding all of the
fields in a file if we only need to output a subset of these fields. Also, in many
situations there will be corrupt data in a file. Rather than simply failing & stating
that the data is corrupt, it makes more sense for the LDR to read all of the file, and
indicate on the output records which fields have not been able to be read correctly.
This is exactly what the LDR does. However, when using the LAE interface to the
LDR, there is no way to determine on an output record whether a field was simply
not set, or contained bad data as in both cases, a null value will appear. Therefore,
in order to provide such erroring information to the user, four special fields are
available to reference in the DROX. Special fields were introduced in section 6.7.
There, the fileId, refId & relId special fields were introduced. For the purposes of
error identification, the errorCount, errorFields, errorMessage & errorCode
special fields are introduced.
The logging information in the LDR error log are not related to specific output
records as decoding errors will occur on fields that may not be included in any
output record, or could be used in many different outputs. Through the use of the
error identification special fields, errors can be related to individual output
records. In general, these fields will not provide as detailed information as that
provided in the LDR error log, however the two can be used in conjunction to
identify not only on which fields and output records errors occurred, but also the
reasons behind these errors.
All of the special fields can be renamed and excluded as described in section 6.7.
Here we simply describe how they can be included.
LAVASTORM ANALYTICS
lavastorm.com
Page 219
Issue 1
LDR User Manual
10.1.6.1 errorCount
The special field errorCount is available to be referenced in the DROX. This
special field takes no pattern, and simply outputs the number of fields in the
output record that encountered errors during decoding. Therefore, in order to
obtain an errorCount field in an output record, the following DROX syntax is
used:
Example 148 – errorCount example
<drox>
…
<include>
<fields>
<specialField type="errorCount"/>
<fields>
<include>
…
</drox>
10.1.6.2 errorFields
The special field errorFields is similar to the errorCount field, except that rather
than simply containing a count of the number of errors on an output record,
errorFields specifies which fields in an output record have errored. These are
simply output in a String field, with each errored field separated by commas. If no
fields have errored in an output record, this field will be null. This special field
also takes no pattern. In order to obtain an errorFields field in an output record,
the following DROX syntax is used:
Example 149 – errorFields example
<drox>
…
<include>
<fields>
<specialField type="errorFields"/>
<fields>
<include>
…
</drox>
The errorFields field provides a little more information than the errorCount field,
however it is also be more computationally expensive.
10.1.6.3 errorCode
The errorFields and errorCountfields introduced in the previous sections are
record-based fields that do not require a pattern attribute. These are aggregate
LAVASTORM ANALYTICS
lavastorm.com
Page 220
Issue 1
LDR User Manual
fields based on all the errors that occurred in an output record. The errorCode on
the other hand is field-based, and on errorCode can be output per field defined in
the DRIX. In cases where the field exists on a record, and did not error, the
errorCode is set to “ok”.
Whenever an error occurs on a field, the corresponding errorCode will be set to
some value that is not “ok”. In future, this will contain the specific error code
(e.g. “ldr.parseError.OVERFLOW” or something similar) however, for the
moment, it is only used as an indication that there was an error therefore the toplevel errorCode “ldr” will be output.
If the field does not exist on the record, the errorCode will be null.
Therefore, in a DRIX containing only the emittable fields “pf.f1” and “pf.f2”, the
errorCodes for these fields can be output using the following DROX.
Example 150 – errorCode example
<drox>
…
<include>
<fields>
<specialField type="errorCode" pattern="pf.*" />
<fields>
<include>
…
</drox>
This will produce the output fields “pf.f1.errorCode” & “pf.f2.errorCode”
The pattern syntax used for the errorCode (and the errorMessage introduced in
the next section) is defined in more detail in section 6.5).
10.1.6.4 errorMessage
The errorMessage output field contains the most detailed information relating to
errors that have occurred in an output record. It is also the most expensive to
produce. Therefore is not recommended in production systems where performance
is important. Generally, the error message should only really be used while
investigating files & initially constructing a DRIX and DROX. With that in mind,
the errorMessage is a field-based, and for each field defined in a DRIX, the
corresponding errorMessage can be obtained. When the field is not set, or no
errors appear on the field, the corresponding errorMessage will be null. In other
cases a detailed error message will be produced.
Therefore, in a DRIX containing only the emittable fields pf.f1 and pf.f2, the
errorMessage for these fields can be output using the following DROX.
Example 151 – errorMessage example
<drox>
…
<include>
<fields>
LAVASTORM ANALYTICS
lavastorm.com
Page 221
Issue 1
LDR User Manual
<specialField type="errorMessage" pattern="pf.*" />
<fields>
<include>
…
</drox>
This will produce the output fields “pf.f1.errorMessage” and
“pf.f2.errorMessage”. For each errorMessage output field, the field name and type
of the input field, along with the byte and bit position where the error occurred
and the name of the file being processed are output. In addition, if there was any
field value set for the erroring field, this is also output.
10.1.7
Failed Data Format Reporting
One of the most common and hard to diagnose problems that can arise while
processing data files is the case when the data file cannot be parsed according to
the input specification. This will generally occur because of one of three reasons:
1. The input specification is incorrect
2. The input specification is correct however the data file is corrupt
3. Both the input specification and the data file are incorrect
If the LDR were to simply fail and say that the file could not be parsed, it would
be practically impossible to determine why this occurred. Therefore, the LDR
provides extensive diagnostic information in cases of file format failure with the
exception that is thrown. This information includes the last successfully read
fields (up to 10 fields), and the file parse trace identifying where the error
occurred.
For each of the successfully read fields, the following information is provided:
The field that was read & the DRIX location where the field is defined
The type of field that was read & the DRIX location where the type is
defined.
The file position (byte and bit) of the start of the field
The failure tract includes all of the field failures that resulted in the overall failure.
For each of the fields in the failure trace, the following information is provided:
The field name (if the field is not anonymous) & the DRIX location where
the field is defined
The type of field that couldn’t be read & the DRIX location where the type
is defined
The file position (byte and bit) of the start position where the field was
attempted to be read
The operation that was attempted
The error order (identifying which of the errors during parsing occurred
before or after the other errors).
LAVASTORM ANALYTICS
lavastorm.com
Page 222
Issue 1
LDR User Manual
The failure trace also indicates the causal relationship between the errors. If a
particular field could not be parsed due to one of its subfields being able to be
parsed, this is a “Caused By” relationship. If the failure of multiple fields within
an or tag caused an error, this is an “And” relationship.
For example, consider the input specification in Example 152:
LAVASTORM ANALYTICS
lavastorm.com
Page 223
Issue 1
LDR User Manual
<drix>
<include library="identification"/>
<include library="string"/>
<library name="MyLibrary" version="000.000.000.001">
<namespace name="MyNamespace">
<type name="File">
<field name="field1" type="F1"/>
<field name="field1Repeated" type="F1"/>
<field name="field2" type="F2"/>
</type>
<type name="F1">
<field name="f1" type=".identification.string.StringIdEquals{MyAsciiType}">
<arg name="expected" value="A"/>
</field>
</type>
<type name="F2">
<repeatRange min="5" max="unbounded">
<or>
<field name="f1" type=".identification.string.StringIdEquals{MyAsciiType}">
<arg name="expected" value="A"/>
</field>
<field name="f2" type=".identification.string.StringIdEquals{MyAsciiType}">
<arg name="expected" value="B"/>
</field>
<field name="f3" type=".identification.string.StringIdEquals{MyAsciiType}">
<arg name="expected" value="C"/>
</field>
</or>
</repeatRange>
</type>
<primitiveType name="MyAsciiType" parentType=".string.Ascii">
<super>
<arg name="length" value="1"/>
</super>
</primitiveType>
</namespace>
</library>
<primaryField name="pf" type="MyNamespace.File"/>
</drix>
Example 152 –Drix for a failed file.
LAVASTORM ANALYTICS
lavastorm.com
Page 224
Issue 1
LDR User Manual
This DRIX states that we expect at least the following data in the file (in ASCII
format):
AAABCABCABCABCABC
Example 153 – File format example for a failed file
Where the highlighted characters may repeat any number of times, but must appear
at least as many times as displayed.
And consider we have a data file containing:
AAABCADBCA
Example 154 – File format example for a failed file
In this case, we would successfully read the following fields:
pf.field1 (A)
pf.field1Repeated (A)
pf.field2.f1 (A)
pf.field2.f2 (B)
pf.field2.f3 (C)
pf.field2.f1 (A)
However, when we hit the 6th byte in the file, (D), the reading will fail, as we are
expecting another pf.field2.f2 field, or a pf.field2.f1 field, or a pf.field2.f3 field
which are ‘A’, ‘B’, and ‘C’ characters resepectively.
Following this, the repeatRange loop will fail, since we do not have 5 repetitions of
the loop – we only have 4 in this instance. This will then cause the field2 field to
fail, since the entire repeatRange is required. Once this fails, the
We will receive the following output from the LDR:
Example 155 – Stack trace for a failed data format.
Fatal error caught while scanning the file: Fatal Error parsing data file.
Last 6 Successfully Read Fields:
Field:
Field Name: .pf.field1.f1._id
Declared at Line 54, Column 54 in Library: identification
Bound to type (MyNamespace.MyAsciiType)
Declared at Line 32, Column 65 in Library: MyLibrary
File Position [byte:bit]: [0:0]
Field:
Field Name: .pf.field1.f1
Declared at Line 12, Column 80 in Library: MyLibrary
Bound to type (identification.string.StringIdEquals)
Declared at Line 15, Column 33 in Library: identification
File Position [byte:bit]: [0:0]
Field:
Field Name: .pf.field1
Declared at Line 7, Column 37 in Library: MyLibrary
Bound to type (MyNamespace.F1)
Declared at Line 11, Column 20 in Library: MyLibrary
File Position [byte:bit]: [0:0]
Field:
Field Name: .pf.field1Repeated.f1._id
Declared at Line 54, Column 54 in Library: identification
LAVASTORM ANALYTICS
lavastorm.com
Page 225
Issue 1
LDR User Manual
Bound to type (MyNamespace.MyAsciiType)
Declared at Line 32, Column 65 in Library: MyLibrary
File Position [byte:bit]: [1:0]
Field:
Field Name: .pf.field1Repeated.f1
Declared at Line 12, Column 80 in Library: MyLibrary
Bound to type (identification.string.StringIdEquals)
Declared at Line 15, Column 33 in Library: identification
File Position [byte:bit]: [1:0]
Field:
Field Name: .pf.field1Repeated
Declared at Line 8, Column 45 in Library: MyLibrary
Bound to type (MyNamespace.F1)
Declared at Line 11, Column 20 in Library: MyLibrary
File Position [byte:bit]: [1:0]
Failed Data Trace:
-----------------Failed field: .pf.field2
Declared at Line 9, Column 37 in Library: MyLibrary
Bound to type (MyNamespace.F2)
Declared at Line 16, Column 20 in Library: MyLibrary
File Position [byte:bit] [2:0]
Operation: scan
Parser: dynamic.standard.MyNamespace.F2Parser.
Parse Error Order: 4
Caused By:
Failed loop iteration
Declared at Line 17, Column 42 in Library: MyLibrary
Under type (MyNamespace.F2)
Declared at Line 16, Column 20 in Library: MyLibrary
File Position [byte:bit] [6:0]
Operation: scan
Parse Error Order: 3
Caused By:
Failed field: .pf.field2.f1
Declared at Line 19, Column 82 in Library: MyLibrary
Bound to type (identification.string.StringIdEquals)
Declared at Line 15, Column 33 in Library: identification
File Position [byte:bit] [6:0]
Operation: scan
Parser:
dynamic.standard.identification.string.StringIdEqualsMyAsciiTypeParser.
Parse Error Order: 0
And:
Failed field: .pf.field2.f2
Declared at Line 22, Column 82 in Library: MyLibrary
Bound to type (identification.string.StringIdEquals)
Declared at Line 15, Column 33 in Library: identification
File Position [byte:bit] [6:0]
Operation: scan
Parser:
dynamic.standard.identification.string.StringIdEqualsMyAsciiTypeParser.
Parse Error Order: 1
And:
Failed field: .pf.field2.f3
Declared at Line 25, Column 82 in Library: MyLibrary
Bound to type (identification.string.StringIdEquals)
Declared at Line 15, Column 33 in Library: identification
File Position [byte:bit] [6:0]
Operation: scan
Parser:
dynamic.standard.identification.string.StringIdEqualsMyAsciiTypeParser.
Parse Error Order: 2
Furthest point into file where error occurred:
---------------------------------------------
LAVASTORM ANALYTICS
lavastorm.com
Page 226
Issue 1
LDR User Manual
Failed field: .pf.field2.f1
Declared at Line 19, Column 82 in Library: MyLibrary
Bound to type (identification.string.StringIdEquals)
Declared at Line 15, Column 33 in Library: identification
File Position [byte:bit] [6:0]
Operation: scan
Parser: dynamic.standard.identification.string.StringIdEqualsMyAsciiTypeParser.
We can interpret the stack trace in Example 155 as the following:
We successfully read a pf.field1 at the first byte in the file
We successfully read a pf.field1Repeated at the second byte in the file
At the third byte in the file, we were unable to read a pf.field2
This was because at the 7th byte in the file (D), we were unable to read any
of :
o pf.field2.f1 (A)
o pf.field2.f2 (B) OR
o pf.field2.f3 (C)
which in turn caused the loop iteration to fail, which in turn caused
pf.field2 to fail.
If the information from this stack trace is not sufficient to determine the root cause
of the problems, it may be necessary to use the parse tracing mechanisms outlined
in the following section.
10.1.8
Parse Tracing
It is possible to debug the parsing process, and obtain a trace for all parsing
operations. This essentially allows us to see exactly which fields were successfully
parsed, which failed, whether they were scanned, skipped, or read, in addition to
some extra contextual information about the fields themselves.
In general, this is a very slow & IO intensive process, that uses large amounts of
disk space, so it is not recommended to be used unless you cannot glean enough
information from looking only at the failed data layout reporting discussed in the
previous section. Even for an individual file, when you know you need to examine
the parse trace, this can still lead to massive parse trace files. Therefore, you must
provide a debug output specification file, which details exactly what you are
interested in obtaining in the parse trace.
A debug output specification conforms to the LDRDebugOutput.xsd, located in the
conf directory of the LDR install. An example of an LDR Debug Output xml is
shown in in Example 152.
Example 156 – Debug Output specification
<debug>
<filename>"C:\work\udraw\testMach2\fdl\DebugOutput.brd"</filename>
<filePositions byteStart="5" byteEnd="8"/>
<maxOutput>50</maxOutput>
<fieldRestriction fieldName="pf" matchOption="contains"/>
<fieldRestriction fieldName="j2b" matchOption="exact" nestLevel=”0”/>
</debug>
LAVASTORM ANALYTICS
lavastorm.com
Page 227
Issue 1
LDR User Manual
Considering the example from the previous section, this debug trace will provide
more information about exactly why the parsing failed.
The debug output can be interpreted as saying:
Provide me with all of the parse information for any field or sub-field of a field,
whose fieldname contains the substring “pf”, and any field whose field name
matches exactly to the String “j2b”. Provide a maximum of 50 debug statements,
and only provide information on parsing done between bytes 5 and 8 (inclusive) in
the file. Output the parse trace to the file:
C:\work\udraw\testMach2\fdl\DebugOutput.brd
When this debug output specification is provided to the example in the previous
section, we receive the ParseTrace shown on the following page.
This parse trace clearly shows exactly why and where the parsing failed. For
instance, in column 25, we see that the identification of the
pf.field2.f1{MyAsciiType} fails, when in the previous row, we see that the _id
field was read as a “D”. We see similar failures for the f2 and f3 cases in rows 33
& 41, which results in the failure of pf.field2, shown in row 42.
LAVASTORM ANALYTICS
lavastorm.com
Page 228
Issue 1
LDR User Manual
Example 157 –Parse Trace Output.
VASTORM ANALYTICS
astorm.com
Page 229
Issue 1
LDR User Manual
In order to get the full benefit of the parse trace, some knowledge of the LDR
parsing operation is useful, and consulting the Advanced Concepts in section 7
would be useful.
In general, the parse trace output produces the following columns:
LAVASTORM ANALYTICS
lavastorm.com
FieldNestingLevel
o Level of nesting underneath the primary field.
o For example, the primaryField will have a fieldNestingLevel of 0.
primaryField.subField will have a fieldNestingLevel of 1.
CanonicalFieldName
o Full name of the field including the field path.
o If there are any anonymous fields in the field path, the generated
name for these anonymous fields is also displayed
Alias
o Alias name of the field. This is only set if there are anonymous
fields in the field path. If there are anonymous fields in the field
path, the generated anonymous names are removed from the alias
name.
Parser
o The generated parser class to which this field belongs. Parsers are
constructed for types, therefore, through analyzing the Parser class
name, it is possible to glean information of the type to which the
field is bound
Operation
o The operation being performed (scan, skip, read). For more
information on these operations, see the Advanced Concepts in
section 7
StartToken
o Specifies whether at this point in parsing, the LDR is starting to
read a field, or completing the reading of a field.
BytePosition
o The byte position for the specified parsing operation. If this is a
startToken, then the bytePosition is the position when the LDR
attempts to start scanning/skipping/reading a field. If this is an
endToken, then this is the bytePosition where the LDR has
completed its attempt to scan/skip/read a field.
BitPosition
o The bit position for the specified parsing operation. If this is a
startToken, then the bitPosition is the position when the LDR
attempts to start scanning/skipping/reading a field. If this is an
endToken, then this is the bitPosition where the LDR has completed
its attempt to scan/skip/read a field.
Result
o The result of the parsing operation.
o This is not populated if this is a startToken
o For an endToken, this can be one of:
GOOD – parsed successfully
BAD – parsed successfully, however the data is badly
encoded
Page 230
Issue 1
LDR User Manual
NOT_ME – the field could not be parsed at the specified
location.
EmittableValue
o Only populated on read operations, where the type is emittable or
primitive
o Contains the read data at the location
Emittable
o Specifies whether or not the type is emittable
o False for primitive types
o True for constructed types with an emittable tag
ReadRequired
o Specifies whether or not the field has a readRequired attribute set to
true, or lies under a type to which a field with truea readRequired
attribute was bound
StructurallyRequiredField
o Specifies whether or not failing to parse this field successfully will
result in the file not being parsed successfully.
PrimitiveType
o Specifies whether or not the type is primitive.
While it may appear that there are duplicate entries in the parse trace shown in this
example, these are simply cases where type inheritance means that multiple start
tokens will appear in the trace. For example, for pf.field2.f2._id, in rows 19-21 we
have 3 read start tokens. These occur, because the pf.field2.f2._id is a
MyAsciiTypeParser. However, MyAsciiType inherits from .string.Ascii which in
turn inherits from .base.Fixed. This means that each of these types have a read start
token on the parse trace.
10.1.9
Incorrect Specification Reporting
In cases where the specification is incorrect, an exception will be returned
specifying the reason why the file could not be read. It also specifies in which
library file and the line and column number where the error was encountered.
Consider we have the input specification shown in example
Example 158 –Specification Failure example
<drix>
<include library="FAIL" minimumVersion="000.000.000.001"/>
<library name="Test">
<namespace name="MyNamespace">
<!—Type Declarations
</namespace>
</library>
<primaryField name="primField" type="File"/>
</drix>
If the included library FAIL could not be located, then we would receive the
following error message:
LAVASTORM ANALYTICS
lavastorm.com
Page 231
Issue 1
LDR User Manual
Example 159 –Specification Failure Exception Reported
org.xml.sax.SAXParseException: Cannot find the library referenced by the include:
FAIL.
Searched in the paths:
C:\PerforceWorkspace\lavastorm\src\trunk\lib\ldr\commonTypes
C:\PerforceWorkspace\lavastorm\src\trunk\lib\ldr\converters\gdr
Error occurred in Library: Test
At Line 2, Column 27.
at
com.lavastorm.ldr.exception.SAXParseExceptionGenerator.cannotFindReferencedIncludeF
ile(SAXParseExceptionGenerator.java:296)...
It is easy to see that the exception reported provides us with all of the required
information to diagnose and fix the problem.
LAVASTORM ANALYTICS
lavastorm.com
Page 232
Issue 1
LDR User Manual
11 LDR Reserved Characters
In the DRIX & DROX files there are a set of reserved characters/words that cannot be
used for field, type, or parameter names, or renames. Some of these characters are
explicitly not allowed in the DROX due to use as special characters. In certain cases
these are also restricted in the DRIX due to the general rule of java identifiers being
required on fields, params etc.Thes reserved characters are:
Table 25 – Reserved Words and Characters
Character/Word
field
<fieldName>Set
<fieldName>Size
Param
parser
loopIndex
_res
buffer
context
()
[]
:
{}
*
#
@
.
“
`
/
\
<>
Reserved For
All field variables of a type
Where there exists a field with the
name <fieldName>, there cannot
exist a field with the name
<fieldName>Size in the same type.
Where there exists a field with the
name <fieldName>, there cannot
exist a field with the name
<fieldName>Set in the same type.
All parameters of a type
All parsers
Loop index referencing within Parser
classes
Result accessing within Parser classes
The byte buffer reading the data from
file
The Parser Context
Regular expression grouping
character
Array referencing in output spec
Array referencing in output spec
Template types
Emittable fields in output
Constructed fields in output
Iterable fields in output
Field nesting operator
Used to contain all Strings in output
Indicates names clause takes value of
field
Identifies regular expressions in
output
Regular expression escape &
backreferencing character
XML Tag Indicators2
Reserved In
DRIX
DRIX
DRIX
DRIX
DRIX
DRIX
DRIX
DRIX
DRIX
DROX
DROX
DROX
DROX
DROX
DROX
DROX
DRIX/DROX
DROX
DROX
DROX
DROX
DRIX/DROX
2
Able to be used within <![CDATA ]]> tags within expr, readMethod, scanMethod, skipMethod,
testMethod & code tags.
LAVASTORM ANALYTICS
lavastorm.com
Page 233
Issue 1
LDR User Manual
In addition to these characters, all java reserved keywords & characters should not be
used as dynamic java code will be constructed from the input and output
specifications. Furthermore, none of the class names in the java.lang package should
be used anywhere in a DRIX specification. See
http://java.sun.com/javase/6/docs/api/index.html for a full list of these clases.
LAVASTORM ANALYTICS
lavastorm.com
Page 234
Issue 1
LDR User Manual
Appendix A
Special Data Type Handling
ASN.1 Data
ASN.1 data is a highly specific, complicated, self-describing form of fixed-width
data, with its own specification language. Due to the widespread use of ASN.1 data
(particularly in telecommunications switch data, but also in banking systems,
transportation control systems and many other areas) and the fact that it has its own
specification language, this is considered as a separate high level data format in its
own right.
The LDR is able to read in ASN.1 specifications, and parse data files based on
these specifications.
The method for handling ASN.1 data is via the “ASN.1 Converter”. This converter
transforms an input ASN.1 specification into an LDR DRIX & DROX
specification. The LDR is then able to use thesespecifications to parse the data file.
There are a number of parameters that are required to successfully configure the
converter. The easiest way to use the ASN.1 converter is via the associated nodes
in the LAE. These nodes include all of the parameters that need to be set and the
parameters are well documented in the LAE nodes.
Supported Encodings
There are a number of different ASN.1 encoding rules. Of these, the LDR
supports:
1. BER
2. CER
3. DER
These are all encodings that pack data using different mechanisms to enhance
throughput, reliability, interoperability or ease of use.
PER, XER & GSER are not supported in the LDR. The PER encoding rules
specify that the encoded data is not necessarily aligned to byte boundaries, and
data formats that are not aligned to bytes can be handled in the LDR to ensure that
support for such encodings may be possible in the future. Due to the verbosity of
XER & GSER and the complexity of PER, these encoding rules are not as widely
used (particularly in telecommunications) as the other encoding rules, and
therefore do not have in-built support. However, there is flexibility in the LDR to
allow for future support in future releases if required.
LAVASTORM ANALYTICS
lavastorm.com
Page 235
Issue 1
LDR User Manual
Support for ASN.1 Constructs/Keywords
There are a number of ASN.1 constructs which are not currently handled by the
converter. These are outlined below. For the constructs that are not handled at all,
these were found to be not commonly present in ASN.1 specifications from the
telecommunications industry. Handling for these keywords may be added in
subsequent releases as required - if and when they are seen “in the field”. For the
keywords with some handling, for the most part it is extremely difficult to have a
generic handling of the types involved, therefore these are simply decoded as Hex
Strings and it is left to the user to subsequently decode the types according to the
rules being used on site. The keywords which are correctly handled (with no
additional information required) are not mentioned in the list below.
Table 26 – ASN.1 constructs with limited/no support in version 2.0. This list also includes types
with special handling in the 2.0 version.
Keyword
Handling
CLASS
ABSENT
ABSTRACT_SYNTAX
ALL
COMPONENT
COMPONENTS
CONSTRAINED
CONTAINING
EMBEDDED
ENCODED
EXCEPT
EXTENSIBILITY
EXTERNAL
IMPLIED
INCLUDES
INTERSECTION
MAX
MIN
MINUS-INFINITY
ObjectDescriptor
PATTERN
PDV
PLUS-INFINITY
PRESENT
REAL
RELATIVE-OID
SYNTAX
TYPE-IDENTIFIER
UNION
UNIQUE
WITH
ANY
DEFINED BY
OBJECT IDENTIFIER
TeletexString
T61String
CHARACTER STRING
GeneralString
VideotextString
ISO646String
GraphicString
DEFAULT
TRUE
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Not handled
Decoded using an OCTET STRING implementation
Decoded using an OCTET STRING implementation
Decoded using an IA5String implementation
Decoded using an OCTET STRING implementation
Decoded using an OCTET STRING implementation
Decoded using an OCTET STRING implementation
Decoded using an OCTET STRING implementation
Ignored
Ignored
LAVASTORM ANALYTICS
lavastorm.com
Page 236
Issue 1
LDR User Manual
FALSE
OCTET STRING
EXTENSIONS (…)
IMPORTS/EXPORTS
LAVASTORM ANALYTICS
lavastorm.com
Ignored
Handled correctly, with options.
Generally, these are simply decoded as a byte array.
There are options on the converter to allow for
OCTET STRING types to be decoded as either:
A simple byte array
A byte array-hex string type.
The former is technically correct. The latter is easier
to investigate & manipulate in external editors,
rherefore is recommended for investigation purposes
when the OCTET STRING is simply a place holder,
and the real encoding is somewhat different. The
.binary.Int8ArrayToHexString implementation *may*
also offer some performance improvements. Note that
the types which are described in this table as being
decoded using an OCTET STRING implementation
(e.g. GeneralString, GraphicString) are also affected
by this parameter and will be read as a hex string, or
byte array depending on the parameter.
Partially handled.
Extensions are handled correctly in all SET, CHOICE
and ENUM types. Within SEQUENCE types,
however, the implementation is slightly different. If
the extension occurs at the end of the SEQUENCE,
then the LDR will handle this correctly. If, however,
the extension is not the last defined element in the
SEQUENCE, then the LDR will handle the entire
SEQUENCE as a SET (without modifying any
tagging information). This simply means that the
order of the elements within the SEQUENCE is not
treated as important if an extension is defined within
the middle of the SEQUENCE. The LDR will warn in
these cases.
Example, an implementation like:
A ::= SEQUENCE {
b B,
…,
c C
}
Will be treated as a SET, and B and C are allowed to
occur in any order. In these instances, a warning is
output by the converter.
On the other hand, an implementation like:
A ::= SEQUENCE {
b B,
c C,
…
}
Will parse successfully as a SEQUENCE and no
warning will be output.
Partially handled.
The EXPORTS clause is always ignored. However,
the IMPORTS clause is used to a certain extent.
Wherever an IMPORTS clause is present, all of the
types defined in the module being imported become
visible to the module importing them. The local
specification is always searched first, and the modules
in the FROM clause are simply placed in a using
stack, where the module specified in the last FROM
clause is searched first after searching the local
module specification.
This has the possibility to lead to IMPORT conflicts,
and the wrong type being selected so should be used
with some care.
Page 237
Issue 1
LDR User Manual
ASN.1 Specification Format
The converter requires a syntactically correct, full ASN.1 specification. This means
that it requires the ASN.1 module definition to be present, and the type definitions
to be enclosed within BEGIN and END keywords.
This means that the ASN.1 specification should have the format:
Example 160 –Specificaiton format required by the ASN.1 converter.
ModuleName DEFINITIONS [(IMPLICIT|EXPLICIT|AUTOMATIC) TAGS] ::=
BEGIN
ASN.1 Type Definitions
END
In addition, it has been noticed that there are some specifications provided that do
not conform to the correct ASN.1 syntax for field & type naming. All ASN.1 type
definitions must begin with an upper-case character, and contain no underscore
(‘_’) characters.
While it is allowable to have hyphen (‘-‘) characters in an ASN.1 specification,
these are not allowed in LDR field & type names. Therefore, these will be autoconverted by the LDR to have underscore characters (‘_’) in the generated DRIX.
When any field or type in the ASN.1 specification conflicts with an LDR reserved
word (see Table 25 – note that this only applies to reserved words, not characters),
then the LDR will provide a warning. The LDR will rename the fields in the DRIX
such that they no longer conflict with any LDR reserved words, and the generated
DROX will then rename them back to their original form in the ASN.1
specification. If the ASN.1 file can be parsed successfully, this means that the
output will have the same names as those provided in the ASN.1 specification. If,
however, there are errors in reading the data file, the error messages will refer to
the fields as they were renamed by the LDR.
Comments and Special Encodings
Another important note is that the ASN.1 Converter provided with the LDR
ignores all comments in the ASN.1 specification. This means that if special
encodings are described in human readable text in the comments (within ‘--‘
characters in an ASN.1 specification) these are simply ignored by the converter.
The user will then need to modify the DRIX produced by the LDR in order to have
these fields decoded correctly. In most occasions these comments are detailing the
specifics of how sub-byte fields are to be decoded. These types are often simply
specified as an OCTET STRING in the ASN.1 specification and the LDR will
simply output them as byte arrays.
The ASN.1 converter produces both a DRIX and a DROX. If the ASN.1 encoded
data is nested in the file, for example if there are wrapping header & trailer
LAVASTORM ANALYTICS
lavastorm.com
Page 238
Issue 1
LDR User Manual
elements in the data file, then the DRIX produced by the ASN.1 converter simply
needs to be included & the top level type in the DRIX referenced in a field at the
appropriate location. This will mean that the DROX needs to be modified such that
all elements have the correct field-name pattern. This normally involves a simple
search-replace in the generated DROX to change the prefixes for all includes.
COBOL Copybook Data
Similar to ASN.1 data, COBOL copybook data is simply another form of fixed
width data with its own specification language. However, due to the widespread
use of these file formats in billing systems (and mainframes in general), the LDR is
required to be able to read in COBOL copybook specifications and process data
based on these specifications. COBOL copybooks, however, do not specify the
underlying encoding to which the data fields correspond as this is implementation
& vendor specific. Therefore, some configuration in the LAE nodes is required to
get COBOL copybook specified data to work with the LDR.
COBOL copybook specified data can be handled through the LAE interface to the
LDR by simply providing a COBOL copybook file and the DRIX & DROX will
be autogenerated.
When multiple COBOL copybooks are used in the one data file, the input
specification needs to define how these copybooks are used to read the data.
Limitations of Support
LAVASTORM ANALYTICS
lavastorm.com
Comments in COBOL copybooks are ignored.
Copybook level 88 data items are ignored.
Underscores (‘_’) cannot be used in the copybooks provided to the LDR
All hyphens (‘-‘) in an input copybook will appear as underscores (‘_’) on
the output field names after being run through the LDR.
There is currently no in-built handling of data types where the sign (+/-) is
encoded in a different manner than the value (e.g. and EBCDIC encoded
sign value, with an ASCII encoded value).
Within a COBOL program, it is possible to initialize the record data to a
given byte – if a field is never set, it will be populated in the output record
with the initializer. This is handled within the LDR via the padByte
parameter. It is also possible to initialize sub-sections of the record with
different padding bytes. This is not automatically handled by the LDR.
Only relative, record sequential and mainframe (using RDW, BDW’s) file
organizations are handled automatically by the LDR – for both fixed &
variable length records. Indexed file organizations and Line sequential file
organizations are not automatically handled. However, the user is able to
write a wrapping DRIX around the part of the DRIX referencing the
copybook to implement this themselves.
Page 239
Issue 1
LDR User Manual
Appendix B
Index of DRIX Tags
DRIX Tag 1 drix ......................................................................................................17
DRIX Tag 2 library ..................................................................................................19
DRIX Tag 3 include .................................................................................................20
DRIX Tag 4 primaryField.........................................................................................21
DRIX Tag 5 namespace ............................................................................................22
DRIX Tag 6 using ....................................................................................................23
DRIX Tag 7 type ......................................................................................................33
DRIX Tag 8 primitiveType .......................................................................................35
DRIX Tag 9 param ...................................................................................................36
DRIX Tag 10 default ................................................................................................38
DRIX Tag 11 templateParam....................................................................................38
DRIX Tag 12 super ..................................................................................................43
DRIX Tag 13 hideParam ..........................................................................................44
DRIX Tag 14 emittable ............................................................................................46
DRIX Tag 15 publish ...............................................................................................49
DRIX Tag 16 generatedType ....................................................................................51
DRIX Tag 17 typeParam ..........................................................................................51
DRIX Tag 18 field ....................................................................................................55
DRIX Tag 19 arg ......................................................................................................58
DRIX Tag 20 fromField ...........................................................................................59
DRIX Tag 21 fromParam .........................................................................................60
DRIX Tag 22 expr ....................................................................................................61
DRIX Tag 23 templateArg .......................................................................................63
DRIX Tag 24 typeArg ..............................................................................................64
DRIX Tag 25 typeFrom ............................................................................................67
DRIX Tag 26 repeatRange .......................................................................................70
DRIX Tag 27 min .....................................................................................................71
DRIX Tag 28 max ....................................................................................................71
DRIX Tag 29 while ..................................................................................................75
DRIX Tag 30 condition ............................................................................................75
DRIX Tag 31 constraints ..........................................................................................78
DRIX Tag 32 ocurrence............................................................................................79
DRIX Tag 33 skip ....................................................................................................80
DRIX Tag 34 fixed ...................................................................................................82
DRIX Tag 35 variable ..............................................................................................83
DRIX Tag 36 or .......................................................................................................84
DRIX Tag 37 true.....................................................................................................84
DRIX Tag 38 test .....................................................................................................85
DRIX Tag 39 and .....................................................................................................87
DRIX Tag 40 or (within test) ....................................................................................87
DRIX Tag 41 not ......................................................................................................87
DRIX Tag 42 method ...............................................................................................89
DRIX Tag 43 defineParam .......................................................................................89
DRIX Tag 44 align ...................................................................................................91
DRIX Tag 45 to ........................................................................................................92
DRIX Tag 46 base ....................................................................................................93
DRIX Tag 47 testMethod ....................................................................................... 157
LAVASTORM ANALYTICS
lavastorm.com
Page 240
Issue 1
LDR User Manual
DRIX Tag 48 scanMethod ...................................................................................... 160
DRIX Tag 49 skipMethod ...................................................................................... 162
DRIX Tag 50 skipCountMethod .............................................................................164
DRIX Tag 51 readMethod ...................................................................................... 166
DRIX Tag 52 code ................................................................................................. 169
DRIX Tag 53 generator .......................................................................................... 172
DRIX Tag 54 requires ............................................................................................ 185
DRIX Tag 55 errorFilters ....................................................................................... 216
DRIX Tag 56 error ................................................................................................. 216
DRIX Tag 57 rule ................................................................................................... 216
DRIX Tag 58 errorType ......................................................................................... 217
DRIX Tag 59 errorLevel ........................................................................................ 217
DRIX Tag 60 action ............................................................................................... 217
Appendix C
Index of DROX Tags
DROX Tag 1 drox ....................................................................................................96
DROX Tag 2 output .................................................................................................97
DROX Tag 3 mapping ..............................................................................................98
DROX Tag 4 include ................................................................................................99
DROX Tag 5 fields................................................................................................. 100
DROX Tag 6 names ............................................................................................... 102
DROX Tag 7 fromField .......................................................................................... 105
DROX Tag 8 exclude ............................................................................................. 107
DROX Tag 9 names (under exclude) ...................................................................... 110
DROX Tag 10 regexPattern .................................................................................... 110
DROX Tag 11 mappingReference .......................................................................... 113
DROX Tag 12 dumper ........................................................................................... 125
DROX Tag 13 dump .............................................................................................. 126
DROX Tag 14 pattern ............................................................................................ 127
DROX Tag 15 specialField ..................................................................................... 141
DROX Tag 16 requires ........................................................................................... 185
Appendix D
Index of Examples
Example 1 – General rule on allowable attribute patterns ..........................................17
Example 2 - DRIX tag example with a source attribute. ............................................18
Example 3 - DRIX tag example with a file attribute. .................................................18
Example 4 - DRIX tag example. ...............................................................................18
Example 5 – library tag example ..............................................................................19
Example 6 – include tag example .............................................................................20
Example 7 – primaryField tag example .....................................................................21
Example 8 – namespace tag example ........................................................................23
Example 9 – using tag example ................................................................................24
Example 10 – using tag example part 2.....................................................................25
Example 11 – overrides example ..............................................................................29
Example 12 – type tag example ................................................................................34
Example 13 – primitiveType tag example .................................................................36
Example 14 – param tag example .............................................................................37
LAVASTORM ANALYTICS
lavastorm.com
Page 241
Issue 1
LDR User Manual
Example 15 – list example without using templateParam ..........................................39
Example 16 – ”normal” templateParam example ......................................................40
Example 17 – ”normal” templateParam example using baseType and returnType.....40
Example 18 – Trivial example using constant template parameters. ..........................42
Example 19 – Super tag example with args specified. ...............................................43
Example 20 – super tag example part 2.....................................................................44
Example 21 – Super tag example with templateArgs ................................................45
Example 22 – Example showing the template argument inheritance equivalence ......46
Example 23 – CountedString example ......................................................................47
Example 24 – emittable example ..............................................................................47
Example 25 – emittable example part 2 ....................................................................48
Example 26 – file format requiring publish ...............................................................49
Example 27 –publish tag example ............................................................................50
Example 28 –File format requiring dynamic types ....................................................50
Example 29 –BRD example file ...............................................................................51
Example 30 –File format requiring dynamic types ....................................................52
Example 31 –Complicated example illustrating order of evaluation ..........................54
Example 32 –Offset pattern ......................................................................................56
Example 33 –offset tag example ...............................................................................57
Example 34 –offset tag line ......................................................................................57
Example 35 –absolute offset example .......................................................................57
Example 36 –arg-value tag example .........................................................................59
Example 37 – arg-fromField tag example .................................................................60
Example 38 – arg-fromParam tag example ...............................................................61
Example 39–extended arg-value tag example ...........................................................62
Example 40 – example using anonymous fields ........................................................65
Example 41 – example using anonymous field .........................................................65
Example 42 – example without anonymous fields.....................................................65
Example 43 – Example referencing subfields of anonymous fields. ..........................66
Example 44 – example using a javaType attribute ....................................................66
Example 45 – a javaType attribute referencing LAE parameters. ..............................66
Example 46 – typeFrom example..............................................................................67
Example 47 – templateArg; combining generated and templated types. ....................68
Example 48 – templateArg; combining dynamically bound and templated types.......69
Example 49 –combining dynamically bound types and generated types. ...................69
Example 50 –example using the repeatRange tag......................................................73
Example 51 –RepeatRange without until ..................................................................73
Example 52 –Example file layout, showing need for the until attribute .....................73
Example 53 –RepeatRange with until .......................................................................74
Example 54 –example using the while tag ................................................................76
Example 55 –repeatRange equivalent of the while example ......................................76
Example 56 –set of 3 elements example ...................................................................77
Example 57 –Set example with onMultiple set to error .............................................78
Example 58 –file format example requiring skipping data ........................................81
Example 59 – skipType example ..............................................................................81
Example 60 –fixed-length and skip-fixed restriction pattern .....................................82
Example 61 – fixed length skip examples using element form ..................................82
Example 62 – fixed length skip examples using attribute form..................................83
Example 63 – variable length skip example ..............................................................83
LAVASTORM ANALYTICS
lavastorm.com
Page 242
Issue 1
LDR User Manual
Example 64 – or example .........................................................................................84
Example 65 – true example ......................................................................................85
Example 66 – test example .......................................................................................86
Example 67 – test example with Boolean operators ..................................................88
Example 68 –Example file layout requiring test-method ...........................................89
Example 69 Example DRIX using the test-method tags ............................................90
Example 70 – align example .....................................................................................92
Example 71 – align example for blocked records ......................................................93
Example 72 - DROX tag example with a source attribute. ........................................96
Example 73 - DROX tag example with a file attribute. .............................................96
Example 74 - DROX tag example with outputs and mappings. .................................97
Example 75 - DROX tag example with a dump tag. ..................................................97
Example 76 – Example DRIX specification ............................................................ 100
Example 77 – Simple DROX including individual fields. ....................................... 101
Example 78 – DRIX example with anonymous fields. ............................................ 101
Example 79 – DRIX example for including anonymous fields. ............................... 102
Example 80 –DRIX displaying the utility of field renaming.................................... 103
Example 81 –DROX to cherry pick nested information .......................................... 104
Example 82 –DROX to cherry pick nested information with simple renaming ........ 104
Example 83 –Result of simple renaming ................................................................. 104
Example 84 – DRIX example showing utility of fromField renaming. .................... 105
Example 85 –Input data file example ...................................................................... 106
Example 86 –DROX example with fromField renaming ......................................... 106
Example 87 –Example output using fromField renaming. ....................................... 107
Example 88 –Example output without using renaming............................................ 107
Example 89 –Data file example for field based exclusion. ...................................... 108
Example 90 – DRIX example for field based exclusion. ......................................... 108
Example 91 –DROX example with field based exclusion ....................................... 109
Example 92 –Output example for field based exclusion. ......................................... 109
Example 93 –DROX example with name based exclusion ...................................... 111
Example 94 –Output example for name based exclusion. ........................................ 111
Example 95 –DROX example with regex name based exclusion............................. 112
Example 96 –Output example for regex name based exclusion. .............................. 112
Example 97 – DRIX example mappingReferences. ................................................. 113
Example 98 –DROX example showing context evaluation of mapping references .. 114
Example 99 – DRIX example for mappin unions. ................................................... 116
Example 100 – DROX example showing incorrect use of mapping composition. ... 116
Example 101 – DROX example showing correct use of mapping union. ................. 117
Example 102 – DRIX example containing multiple record types. ........................... 118
Example 103 – DROX example using mapping unions & composition for multiple
record types. ........................................................................................................... 119
Example 104 – DROX snippet showing the exclusion of multiple fields. ................ 121
Example 105 – DRIX example mappingReferences. ...............................................121
Example 106 – DROX example showing incorrect use of mapping composition. ... 122
Example 107 –DROX base mapping example ........................................................ 123
Example 108 –The patternBase attribute on mappingReferences ............................ 124
Example 109 –Netsted DROX patternBase attributes .............................................124
Example 110 –DRIX example highlighting the output trigger event ....................... 132
Example 111 – Valid DROX example highlighting the output trigger event ........... 133
LAVASTORM ANALYTICS
lavastorm.com
Page 243
Issue 1
LDR User Manual
Example 112 –Example file layout identifying trigger event occurrences................133
Example 113 – Valid DROX example highlighting the output trigger event ........... 135
Example 114 – Using mapping unions for multiple trigger event cases ................... 136
Example 115 – Output Suspension – Fields after the trigger event .......................... 136
Example 116 – Mapping Unions and Output Suspension ........................................ 137
Example 117 –Example file layout – read data values for a1.b2.c1 and b4 fields
shown in brackets. .................................................................................................. 137
Example 118 –Example mapping union with no output suspension ........................ 138
Example 119 –Example DRIX for zero-width trigger fields .................................... 138
Example 120 – Mapping Unions and Output Suspension ........................................ 139
Example 121 –Example file layout – showing zero-width trigger fields. ................. 139
Example 122 –Example DRIX specification displaying special identifier utility ..... 142
Example 123 –Example DROX specification displaying special identifier utility.... 142
Example 124 –Example file layout, displaying special identifier utility .................. 143
Example 125 –Example special identifier output .................................................... 143
Example 126 – Example format where ticker-tape rollbacks may be required. ........ 149
Example 127 – Convenience declarations provided at the start of user written methods
............................................................................................................................... 154
Example 128 – Convenience declarations provided at the start of user written methods
in constructed types. ............................................................................................... 154
Example 129 – Escaping code with a CDATA section tag. ..................................... 156
Example 130 –Simple example using a test tag ....................................................... 157
Example 131 –Simple generated test method .......................................................... 158
Example 132 –Example using a test tag .................................................................. 158
Example 133 –Code generated from a testMethod tag ............................................ 159
Example 134 –Scan Method example ..................................................................... 161
Example 135 –Skip Method example...................................................................... 163
Example 136 –SkipCount Method example ............................................................ 165
Example 137 –Read Method example ..................................................................... 168
Example 138 –Internal code example (class location) .............................................170
Example 139 –External code example (file location) .............................................. 171
Example 140 –Init code location example. .............................................................. 171
Example 141 –Generator example .......................................................................... 173
Example 142 –Empty Primitive Type exampl ......................................................... 174
Example 143 –Generated code from empty Primitive Type example ...................... 174
Example 144 –Generated code for accessing and modifying a Primitive Type’s field.
............................................................................................................................... 178
Example 145 –Generated code for Primitive Types with a java primitive returnType
............................................................................................................................... 178
Example 146 –Example optimization candidate case .............................................. 180
Example 147 – error filtering example .................................................................... 218
Example 148 – errorCount example ........................................................................ 220
Example 149 – errorFields example ........................................................................ 220
Example 150 – errorCode example ......................................................................... 221
Example 151 – errorMessage example .................................................................... 221
Example 152 –Drix for a failed file......................................................................... 224
Example 153 – File format example for a failed file ...............................................225
Example 154 – File format example for a failed file ...............................................225
Example 155 – Stack trace for a failed data format. ................................................ 225
LAVASTORM ANALYTICS
lavastorm.com
Page 244
Issue 1
LDR User Manual
Example 156 – Debug Output specification ............................................................ 227
Example 157 –Parse Trace Output. ......................................................................... 229
Example 158 –Specification Failure example ......................................................... 231
Example 159 –Specification Failure Exception Reported ........................................ 232
Example 160 –Specificaiton format required by the ASN.1 converter. .................... 238
Appendix E
Index of Tables
Table 1 - Includes & Library version matrix .............................................................21
Table 2 – Naming Resolution Rules. .........................................................................25
Table 3 – Type Override Procedure. .........................................................................28
Table 4 – baseType rules ..........................................................................................41
Table 5 – returnType rules ........................................................................................41
Table 6 – Emittable Type Resolution Rules ..............................................................47
Table 7 – Allowable Brd Output Types.....................................................................95
Table 8 –Mapping Evaluation Rules. ........................................................................98
Table 9 –Field Based Exclusion Rules. ................................................................... 108
Table 10 –Name Based Exclusion Rules. ................................................................ 110
Table 11 –Rules for Mapping compositions. ........................................................... 115
Table 12 –Rules for Mapping unions. ..................................................................... 118
Table 13 –Trigger Event Definition ........................................................................ 131
Table 14 –Include/Exclude Rules for Special & “Normal” fields. ........................... 144
Table 15 –Accessing Fields, Parsers & Params in custom code blocks. ................... 153
Table 16 – Test Method Signature .......................................................................... 157
Table 17 – Scan Method Signature ......................................................................... 160
Table 18 – Skip Method Signature .......................................................................... 163
Table 19 – SkipCount Method Signature ................................................................ 165
Table 20 – Read Method Signature ......................................................................... 167
Table 21 – Generator Method Signature ................................................................. 173
Table 22 – LDR to BRD data type mappings .......................................................... 183
Table 23 – Error and Log Thresholding .................................................................. 189
Table 24 – Error Types ........................................................................................... 194
Table 25 – Reserved Words and Characters ............................................................ 233
Table 26 – ASN.1 constructs with limited/no support in version 2.0. This list also
includes types with special handling in the 2.0 version. .......................................... 236
Appendix F
Index of Figures
Figure 1 LDR Architecture .......................................................................................10
Figure 2 LDR Program Flow Phase 1. Parsing the Input and Output Specifications 146
Figure 3 LDR Program Flow Phases 2-4. Dynamic Class Construction, File Scanning
and Reading ........................................................................................................... 147
LAVASTORM ANALYTICS
lavastorm.com
Page 245
Issue 1
© Copyright 2026 Paperzz