Lavastorm Desktop Professional Tech Note

Filtering Data
Often when performing an analysis, data must be manipulated and filtered. The Filter node,
found in the Aggregation and Transformation library, provides the functionality to change and
customize the data output using BRAINscript, Lavastorm’s scripting language. The Split node in
the same library is a specialized Filter node which splits the data according to a specified
criterion. This Tech Note covers both of these nodes.
Filter node
A Filter node is essentially a blank node in which any level of data manipulation and output
customization can be performed. The functionality of the Filter node is customized by entering
BRAINscript in the Script section of the node. By default, the Script section contains the
following line of script:
emit *
which outputs all inputted fields and records. The “emit” keyword is used to specify the fields
to output and the “*” (wildcard) indicates all data fields. The default script can be added to or
replaced depending on the desired functionality.
The most common BRAINscript functions used in the Filter node fall into two broad categories:
1. Those used to transform the data. These are data-related functions and fall into
three groups depending on the type of data the function is applied to: numeric,
string and data and time.
2. Functions used to control the output of the node. These functions allow for
specifying which data fields should be outputted, for renaming fields, filtering the
field according to a field value and other output related tasks.
Using BRAINscript Functions
BRAINscript functions can be added to the Script section by right-clicking within the script area
(Figure 1). BRAINscript keywords and functions appear in blue font. The functions are grouped
together based on their functionality.
1
Lavastorm Desktop
Professional Tech Note
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 1
Figure 1 – Accessing BRAINscript functions.
The syntax used for functions is generally:
FieldName.Function()
where FieldName is the data field the function will be applied to and Function is the function
being used. Arguments to the function are entered within the parenthesis and are comma
separated. If multiple functions are to be applied to a data field, they can be strung together
within the same line as follows:
FieldName.Function1().Function2().Function3()
2
Lavastorm Desktop
Professional Tech Note
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 2
An alternative syntax is to use the field name as the first argument of the function:
Function(FieldName)
The results of a function call can be assigned to a variable name:
VariableName = FieldName.Function()
If the assignment is the first time the variable is used, it will be automatically defined and then
assigned. Variables do not need to be defined before assignment. Variable names and
functions names are case-sensitive whereas input field names are not. Help for a function can
be easily accessed by placing the cursor in the function name and pressing F1. Comments can
be added to the script by placing # in front of the comment. Comments appear in a green font.
Data Related BRAINscript Functions
Data-related functions are used to transform data before output. They are broken up into
three categories depending on the field’s data type – numeric, string and date and time. A list
of the most commonly used functions for each type and its definition follows:
Numeric BRAINscript Functions
abs()
returns the absolute value
ceil()
returns the smallest integer greater than or equal to the value
double()
returns the value converted to a double
floor()
returns the largest integer less than or equal to the value
int()
returns the value converted to an integer
isNumber()
returns true if the value is a number or can be cast to a number
long()
returns the value converted to a long integer
round()
rounds the value
pow(exponent)
returns the value raised to the power of the specified exponent
sqrt()
returns the (positive) square root of the value
square()
returns the value squared
3
Lavastorm Desktop
Professional Tech Note
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 3
String BRAINscript Functions
left(num)
returns the first num characters
ltrim()
removes leading spaces
isSpace()
returns true if every character is whitespace (horizontal
tab, linefeed, carriage return, space)
pad(length, [character], [direction])
pads the string in the specified direction (left or right)
with the specified length number of characters
replace(find,replace)
replaces all occurrences (case-sensitive) of the find string
with the replace string
right(num)
returns the last num characters
rtrim()
removes trailing whitespace
split(separator)
splits the string by the specified separator creating a list
of individual strings
strcat()
concatenates the value and the arguments to the
function
strFind(substring)
returns the index of the start of the specified substring
(case-sensitive). A -1 is returned if the substring does not
exist within the string
strlen()
returns the string length
substr(offset,[num])
returns the substring starting at the specified offset
including num characters
toLower()
converts all uppercase letters to lowercase
toUpper()
converts all lowercase letters to uppercase
trim()
removes leading and trailing whitespace
4
Lavastorm Desktop
Professional Tech Note
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 4
Date/Time BRAINscript Functions
Date/time functions fall into three types depending on the data type of the variable calling the
function. The three date/time data types are date, time and datetime.
Date Functions:
date()
constructs a date object
dateAdjust(delta,[units])
adds the specified delta units to the date
dateSubtract(date2)
subtracts date2 from the date
day()
returns the day value of the date
month()
returns the month value of the date
year()
returns the year value of the date
Time Functions
time()
constructs a time object
hours()
returns the hour value of the time
minutes()
returns the minute value of the time
seconds()
returns the second value of the time
timeSubtract(time2)
subtracts time2 from the time and returns the results in
number of seconds.
Datetime Functions:
timestamp()
constructs a datetime object
dateTime(time)
returns the epoch-time (number of seconds since
midnight 1/1/70) of the specified time
dateTimeAdjust(delta, [units])
adds the specified delta units to the datetime
5
Lavastorm Desktop
Professional Tech Note
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 5
Output Related BRAINscript Functions
Output related functions control the output of the Filter node. The main output related
keyword is emit which specifies what to output. Multiple fields can be outputted by placing the
field names separated by commas after emit:
emit Field1, Field2, Field3
or by using multiple emits:
emit Field1
emit Field2
emit Field3
The second syntax in useful when customizing the output using the emit qualifiers which are as
follows:
override
overrides old data with new data
exclude
suppressed the output of a field
rename
changes the output field name
where
controls the output based on a specified criteria
6
Lavastorm Desktop
Professional Tech Note
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 6
Examples
1. In this example, a Filter node is used to output the month an item was purchased and the price
of the item. The output would then be used in an aggregate node to find the total monthly
revenue. Both data related and output related BRAINscript functions are used. A sample of the
input data for this example is shown in Figure 2 and the Filter node scripting is shown in Figure
3.
Figure 2 – Example 1 sample input.
7
Lavastorm Desktop
Professional Tech Note
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 7
The first two lines of the script use string functions to first find the price string length and then
to use that length and the right function to remove the $ from the price. The third script line
converts the result to a double. As stated in the comments in lines 5 and 6, the three script lines
could have been written as one single line by stringing the functions together. Line 9 uses the
month function to extract the month portion of the date of purchase and then assigns the result
to a variable called MonthofPurchase. The final line of scripting contains output related
BRAINscript to emit all input fields, emit the MonthofPurchase field and to replace the Price
field with the newly created PriceValue field. A sample of the output is shown in Figure 4.
Figure 3 – Example 1 node configuration.
8
Lavastorm Desktop
Professional Tech Note
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 8
Figure 4 – Example 1 sample output.
9
Lavastorm Desktop
Professional Tech Note
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 9
2. The script in Figure 5 is an example of output related BRAINscript used to control the node’s
output. Line 1 specifies that all inputted rows in which field7 is value should be outputted. Line
2 adds two fields, newField1 and newField2, to the output. Line 3 removes all dashes in the data
in field3 and replaces the original field3 with the result. Line 4 excludes field4 and field5 from
the output and line 5 outputs field6 with the new name of myField6Name.
Figure 5 – Example 2 Filter node configuration.
10 Professional Tech Note
Lavastorm Desktop
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 10
Split Node
The Split node splits the node’s input based on a specified condition. The Split node configuration is
shown in Figure 6. The criterion to split the data is entered in the PredicateExpr section. The expression
must evaluate to a Boolean result. Input data that match the condition are outputted in the first output
pin and those that do not are outputted to the second output pin. In the example in Figure 6, rows of
data in which the TotalPurchase is greater than 300 is outputted to the first pin and those that are less
than or equal to 300 are outputted to the second pin.
By default, the script section contains
emit *
which outputs all fields. BRAINscript can be used in the script to customize the output similar to the
Filter node.
Figure 6 – Split node configuration.
11 Professional Tech Note
Lavastorm Desktop
Lavastorm Analytics© 2012 | www.lavastorm.com
Page 11