Syntax example: M1[{0..2}, {1..4}]

LANGUAGE FOR ARRAY DATA
PROCESSING
INSTRUCTOR: DR. KWOK –BUN YUE
MENTOR: MR. RAVI GANTA
TEAM #6
HARITHA RANI JADCHERLA
NARASIMHA BHYRAVABOTLA
SALOTI ANNAPURNA
VIKRAM SRIRAM
Spring 2010
04/21/2010
1
Acknowledgement
We would like to thank our professor Dr. Kwok Bun Yue, Chairperson of Division of
Computing and Mathematics, for giving us an opportunity to explore our skills and innovations
and also for helping us by providing valuable suggestions in making the project successful.
We further extend our gratitude to our mentor Mr.Ravi Ganta, Director, Product
Development of AnshLabs, for providing us with an opportunity to work on this project to
integrate the proposed language on the tool. His valuable feedback in weekly mentor meetings
helped us to better understand the software engineering process that is involved in real world
product development.
We would also like to thanks our friends and all those who were directly or indirectly
involved in this project.





Check your grammar carefully.
Your background description is better than before.
Spend less space on generic discussion, more space on project-specific discussion.
Have you learnt any lessons?
What are the percentage of original requirements your team was able to satisfy?
2
Abstract
In a typical laboratory, physician performs various tests on blood, urine or other chemical samples. The
test results/observations are tabulated for further processing. Sometimes the results are stored in various
storage types like tables, records or 2D arrays. Data (observations/results) in these arrays are subjected to
tedious mathematical algorithms or trigonometric functions. In situations where the blood sample
belonging to one patient is tested for various diseases, these readings belong to only one patient but there
are readings for each test for disease. Hence these observations are tagged by the test name. When
performing various mathematical an algorithm that involve 2D arrays and tags that are really tedious and
the preciseness of the results also plays an important role. Hence there is a need to define a domainspecific language for lab users (Your requirements are very specific. They may not apply to general array
data users.). The Domain Specific Language (DSL) will be simple in semantics, compact in syntaxes and
easy to understand by the lab physicians. Physician need not to have prior knowledge of any
programming language. This minimizes the effort on the end user. There are many tools to develop such
Domain Specific Language. (This paragraph is a general introduction, which is better than before.
However, your main project’s goal is to design a domain-specific language and interpreter for 2D arrays.
Just a sentence is not sufficient. You may want to provide more details.)
The goal of this project is to design a user-friendly domain specific language and interpreter. The
project has the following four phases: (The description of these phases are not as important as the
requirements and designs)
1. Language specification design.
2. Unambiguous grammar design.
3. Use of compiler generation tool for creating lexer, a parser and
3
4. A run-time environment.
Contents
Abstract…………………………………………………………………………………....3
1. Introduction……………………………………………………………………………5
1.1 Background…..……………………………………………………………………5
1.2 Purpose…..………………………………………………………………………...6
1.3 Scope…..…………………………………………………………………………..6
2. Project Requirements…………………………………………………………………...6
2.a Defining a Language specification………………………………………………….6
2.b Defining a Grammar………………………………………………………………..7
2.c Compiler generation tools………………………………………………………… 7
2.d Run time environment…………………………………………………………….. 8
3. Design and Implementation………………………………………………………… …9
4. Technical Details…………………………………………………………………….. 10
4.a ToolsUsed…………………………………………………………………………10
5. Evaluation……………………………………………………………………………. 12
6. Conclusion and Further Discussions………………………………………………… 12
7. References……………………………………………………………………………. 13
8. Appendices…………………………………………………………………………….13
8.a Appendix A
A1. Team roles……………………………………………………………………..13
A2.Team Contribution……………………………………………………………..14
8.b Appendix B………………………………………………………………………...15
4
1. Introduction
Background: A chemical laboratory domain is a collection of physicians/lab technicians,
scientists, doctors, terminology and many more. Physicians perform various tests on
samples, and results are tabulated. Each test may result in many results/readings and
some tests may yield only one result. Tests are different, and results are specific to each
test. The results are to be stored in such a way that they belong to one patient for easy
retrieval. The results specific to a test may be identified by tagging the results with the
test name. This is done by using a Microtiter plate. A Microtiter plate or microplate is a
flat plate with multiple wells used as small test tubes. The wells are distributed as a 2-D
matrix. Hence the dimensions for this plate can be represented as for the matrix as “rows
X columns” and read as “rows by columns”. This 2-D matrix is represented as a 2-D
Array in a programming language. Figure 1 shows a typical Microtiter plate for which we
have developed the language and whose dimensions are 8x12. This means 8 rows and 12
columns. In the following figure rows are named with letters and columns with integers.
5
Figure 1:Microplate
There are many types of operations that a physician performs on the test results. Some of
them are:
1. Arithmetic operations
2. Trigonometric functions like sin, cos and tan.
3. Statistical operations. When periodical tests are performed the results are read for
some number of times and then further results are estimated by plotting a graph.
In order to permit end users to easily specify such mathematical transformations on plate
data, a simple but specific language needs to be designed and built. Our language is one
such attempt at building this kind of language.
This section gives good background.
Purpose: The purpose of this project was to develop an interpreter for a language that
enhances the processing of array type of data. This project mainly focuses on developing
user-friendly interface while achieving accurate results after performing various
operations on the plate.
6
Scope: The scope of this project was to help the lab users perform various operations
without any errors and also to ease the task of calculating (or performing) operations on
various samples.
2. Project Requirements:
The main requirement of the project is to develop a domain-specific language to process
array type of data. (You need to provide some discussion on what kind of operations are
needed. They will in turn drive your language design.) In developing such a domainspecific language four major tasks are identified. The four major tasks that are identified
are defining a language specification, defining a grammar, compiler-generation tools and
developing a rum-time.
a. Defining Language Specification:
A language specification is nothing but a document that gives the user with all the
necessary information which is needed to operate the interpreter. The document
contains details about the variables and commands and their functionality that are
used for the interpreter. This document helps the user in using the interpreter. The
document is available in the project website under the deliverables section. Most
readers know what a language specification is and a generic description is not
urgently needed.
b. Defining a grammar:
As we are defining a new language, we need to develop a new grammar which
defines the new language. Defining a grammar for a language involves two
7
phases. The first phase deals with defining the syntax of the language. The second
phase deals with evaluating the defined syntax. Grammar does not define any
limitations for the variable names and function names but this is handled while
checking the semantics of the grammar. The syntax of the grammar is defined by
a set of production rules. A production rule consists of non-terminal and terminal
symbols. A non-terminal symbol is one which can be replaced or rewritten by
another production rule. A terminal symbol is one, which cannot be rewritten or
replaced. In our grammar, all the terminal symbols represent a digit or variable
name or function name. In our project, we used LL* grammar to define the
production rules. Thus defining a grammar gives the prototype of the variables
and functions used in the language. In this project, we developed grammar in
ANTLRWorks. A detailed description of ANTLRWorks is defined in the
following section. Again, this is too generic. Readers with knowledge about
grammar should know about production rule, terminals, etc. You don’t need a
long introduction.
c. Compiler-generation Tools:
A compiler-generation tool is a tool which is used to generate the java code files.
The input to such a tool is a grammar which should be syntactically and
semantically correct. The compiler-generation tool that we used in our project is
ANTLRWorks. An ANTLR is abbreviated as ANother Tool for Language
Recognition. For every production rule, a corresponding syntax diagram is
8
displayed. By verifying the syntax diagram the logic of the grammar is tested.
ANTLR also provides the necessary error recovery. In our project using ANTLR,
we developed the required lexer and parser code files. A lexer code file consists
of the syntax of the variables and functions that are being used and a parser code
file describes the definition of the function and the working of the functions. As
our project is developed in Java, we declared the target language as Java. The
generated lexer and parser code files are used in developing the run-time
environment. There are many such tools. Why were ANTLR selected?
d.
Run-Time Environment:
A run-time environment is the interface where the user gives the input and sees
the processed results. To develop the interface we used eclipse in java framework.
Eclipse is open-source and is an Integrated Development Environment. Using the
lexer and parser code files obtained from the compiler-generation tools, a driver
file is developed. The driver file uses the grammar file, lexer file, parser file and
the necessary library files. (Firstly, RTE and IDE are different, although
sometimes they are integrated together. Your readers should know the basics of
IDE and RTE so keep it short. Instead of discussing generic issues, focus on the
specific flavors of your projects. For examples, you may state that users may want
to access different collections of cell data and provide examples. This drives your
language syntax.)
3. Design and Implementation
9
The architecture for the Language of Array Data Processing (LADP) is shown below:
Figure 2:Architecture Diagram of LADP
10
The architecture can be described in three phases.
In the first phase, the grammar that is defined is executed in ANTLR. ANTLR generates
the lex and parse code files. A lexer code is one which generates tokens and a parser code
file is one which generates a parse tree that gives us the syntax tree of the grammar.
In the second phase an interpreter is developed. To develop the interpreter, we used
Eclipse on Java framework. For developing the interpreter, the java code files i.e, lexer
and parser code files from ANTLR, grammar file, matrix file are used. The output of this
phase is a console application.
In the third phase, the console is tested for a given input and the output is verified.
4. Technical details
This section gives the details of the technologies used in developing the project.
a. Tools Used:
To develop the code files from the grammar we used automated tools. The present
section gives the details of tools that are used in the project.
The following discussion is too general. You may want to shorten it. Instead, focus on
examples of interesting language syntax and how they are actually implemented as
production rules. You don’t need to list all of them. However, you may want to give
some full examples, such as:
Need: users may want to access a rectangular block of cells within the test plate.
Syntax: allow using ranges in specifying indices of rows and columns.
Syntax example: M1[{0..2}, {1..4}]
Production rule: …
11
ANTLR: ANTLR is an open-source framework that is used for constructing
recognizers, compilers and translators from grammatical description. ANTLR provides
excellent support for tree construction, tree walking, translation, error recovery, and
error reporting. ANTLR has a sophisticated grammar development environment
called ANTLRWorks. We can generate the code in many target languages by specifying
the target language.
The general ANTLR IDE framework looks as follows:
12
Figure 3: A snapshot of ANTLRWorks
Eclipse: Eclipse is an open-source environment and is an integrated-development
environment that is used to develop the console application. Eclipse offers extensible
plug-ins. The plug-ins are available for various languages such as C#, C, PHP. We used
Eclipse because it is mainly user-friendly and offers the programmer the flexibility as of
Visual Studio.Net. Eclipse is well known. Just mention that you are using it and how you
integrate Eclipse with ANTLR.
13
5. Evaluation
The project is evaluated by testing whether the console is able to give accurate output for
a variety of test cases.
The test cases that we used for evaluating the project execution are accessing of a matrix,
working of functions and expressions.
Accessing of a matrix is evaluated by considering different test cases. There are two ways
of accessing a matrix, single index and multiple index. In case of single index, the matrix
is accessed as an atomic value. A matrix is accessed in multiple index in three different
ways i.e., This is too short. You may want to describe how you construct the test cases
and what the results are.
a. Range
b. Ordered Set: meaning?
c. Wildcard (* means all)
Working of functions is evaluated by testing the function with passing the parameters and
also by testing the function in an expression.
Working of expressions is evaluated by executing the possible expressions at the console
application.
All the possible test cases and the expected outputs are listed below in the Appendix B.
6. Conclusions and Future Work:
A language for array data processing enables a user to perform the complex array
operations in an easy way. It simplifies the task of a user by providing the user with userfriendly run-time environment where the users can give input and see the result.
14
The future work in the project may include further implementation of tag-based
operations and functions.
7. References:
[1]. Source for ANTLR www.antlr.org
[2]. Source for Eclipse http://www.eclipse.org/downloads/
[3]. Team Website http://dcm.uhcl.edu/c423008fasalotia/caps10g6/default.htm
8. Appendices:
Appendix A: Team roles and contribution
A.1 Team role
Vikram Sriram
Major: Computer Science
Email: [email protected]
Phone: 832-314-2534
Role: Developer
Narasimha Bhyravabotla
Major: Computer Information System
Email: [email protected]
Phone: 832-477-0265
Role: Developer
15
Haritha Rani Jadcherla
Major: Computer Science
Email: [email protected]
Phone: 630-913-2844
Role: Developer
Annapurna Saloti
Major: Computer Science
Email: [email protected]
Phone: 305-439-7477
Role: Webmaster
A.2 Team Contribution
Sr
No
Tasks
Vikram
Narasimha
Haritha
Annapurna
1
Project Selection
25
25
25
25
2
Team Leadership
25
25
25
25
3
Project Analysis
(Brainstorming)
25
25
25
25
4
Research Work
25
25
25
25
5
Website Creation, Maintenance
20
20
25
35
6
Preparing Instructor and Mentor
Meetings
40
20
20
20
25
25
25
25
7
Documentation :
SRS, Abstract, Language
Specification, Presentation, Final
16
Report
8
9
Testing
25
25
25
25
Integration
30
25
25
20
Appendix B:
In this section, the test cases are described. For a given input, the expected output is presented in
the form of a table as shown below: List the actual content of M1.
INPUT
EXPECTED OUTPUT
M1[1,2]
2.2
M1[0,11]
1.11
M1[{1,2,3},1]
2.1 3.1 4.1
M1[1,{1,2,3}]
2.1 2.2 2.3
M1[{1,2,3},{1}]
2.1 3.1 4.1
M1[{1},{1,2,3}]
2.1 2.2 2.3
M1[{1,2,3},{1,2,3}]
2.1 2.2 2.3 3.1 3.2 3.3 4.1 4.2 4.3
M1[{0..2},1]
1.1 2.1 3.1
M1[{0..2},{1,2}]
1.1 1.2 2.1 2.2 3.1 3.2
M1[{0..2},{0..8}]
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 2.0 2.1 2.2
2.3 2.4 2.5 2.6 2.7 2.8 3.0 3.1 3.2 3.3 3.4 3.5
3.6 3.7 3.8
M1[{0},{0..8}]
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
M1[{1,2},{0..8}]
2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.0 3.1 3.2
17
3.3 3.4 3.5 3.6 3.7 3.8
M1[1,{0..8}]
2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
M1[*,*] Are there any reasons for using the
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.1 1.11
values with yellow background.
2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.1 2.11
3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.1 3.11
4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.1 4.11
5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.1 5.11
6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.1 6.11
7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.1 7.11
2.0 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.1 8.11
M1[*, 1]
1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1
M1[*,{1,2,3}]
1.1 1.2 1.3
2.1 2.2 2.3
3.1 3.2 3.3
4.1 4.2 4.3
5.1 5.2 5.3
6.1 6.2 6.3
7.1 7.2 7.3
8.1 8.2 8.3
M1[*,{0..2}]
1.0 1.1 1.2 2.0 2.1 2.2 3.0 3.1 3.2 4.0 4.1 4.2
5.0 5.1 5.2 6.0 6.1 6.2 7.0 7.1 7.2 2.0 8.1 8.2
M1[1,*]
2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.1 2.11
M1[{1,2},*]
2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.1 2.11
18
3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.1 3.11
M1[{0..1},*]
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.1 1.11
2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.1 2.11
Avg(1,2,3)
2
Avg(1.0,2.0,3.0)
2.0
Max(2,5,8,10,12)
12
Max(8.0,2.0,56.0)
56.0
Min(6,9,2,3,4,1)
1
Sort(22,19,7,56,33,4)
4,7,19,22,33,56
Log8 Do you mean Log(8). Case sensitivity?
3
Where M1 is the matrix defined in matrix.txt as shown below in the table format where the first
0
1
2
3
4
5
6
7
8
9
10
11
0
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3
4.0
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4
5.0
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5
6.0
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6
7.0
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
19
Where is row 7? Values of yellow highlight background seem not matching results of M1[*,*]
20