ComputerAssignment_DataHandling.pdf

TMMS04
1
Data Management Computer Lesson
2014 HT-1
Introduction
Development of mechatronic systems involves a lot of data handling. This can for instance
involve collecting measurement data, organizing parameter sets or comparing simulation results.
There are many different data types and file formats that can be used for storing large sets of
numbers in computers. Being able to manage data efficiently is therefore an important skill for
the mechatronics engineer.
1.1
Data Types
All variables in a computer have data types. This can for example be integers, booleans,
characters, floating-point numbers, strings or others. The range of numbers that the variable
can represent is limited by the number of bits that are used to store it. Some of the most
important data types for engineering are shown in table 1
Type (C Notation)
bool
int
unsigned int
float
double
a
Contents
boolean
integer
integer
floating-point
floating-point
Size
1 bita
32 bits
32 bits
32 bits
64 bits
Range
true or false
−2147483648 to 2147483647
0 to 4294967295
±1.4e−45 to ±3.4028235e+38
±4.9e−324 to ±1.7976931348623157e+308
Hardware restrictions lead often to an actual size of e.g. 32 bit
Table 1: Common data types used in mechatronic systems. The type in stated in the notation
of the programming language C.
1.2
File Formats
In order to store data permanently on a computer, it must be saved to a file. There are roughly
two kinds of files that can be used: text files and binary files (although text files are actually
binary files that contain text). A text file can be opened by any text editor, and the contents
can be viewed directly. However, storing numbers as text requires far more space on the hard
drive than storing them directly as binary files.
A common text-based file format is Comma-Separated Values (.CSV). With this format
data variables are stored in rows, with columns separated by a comma symbol (","). It is also
possible to use other symbols for the separation, for example spaces, tabs or semicolons. This
is especially useful in countries where commas are used as decimal marks. Two examples are
shown in listing 1 and 2, one file using commas and one using semicolons:
0 , 5.262 , 0.000012
0.001 , 5.287 , 0.0000015
0.002 , 5.295 , 0.0000063
0.003 , 5.3 , 0.0000156
0.004 , 5.312 , 0.0000321
0 ; 5 ,262; 0 ,000012
0 ,001; 5 ,287; 0.0000015
0 ,002; 5 ,295; 0.0000063
0 ,003; 5 ,3; 0.0000156
0 ,004; 5 ,312; 0.0000321
Listing 1: A .CSV file using comma
separators
Listing 2: A .CSV file using semicolon
separators
The second option is to use binary files, which are not humanly readable. On the other hand,
they are much faster to read and require far less space.
A common way to distinguish file formats from each other is to use file extensions, like
".csv" as mentioned above. While this can be practical, it must be pointed out that this is
nothing more than a part of the name of the file. A file can be of any format regardless of its
1
TMMS04
Data Management Computer Lesson
2014 HT-1
file extension. It is also common that different programs use files with the same extension even
though they have different formats, which sometimes can lead to confusion.
2
Tasks
You have received a .ZIP-file with a number of example files in different formats. These contain
numerical data, which could for example be measurement data from some experiments. The
task is very simple: Read the files from MATLAB and try to plot some data from
them.
2.1
A Simple Text File
The file called example_text.txt is actually a .CSV file (notice the false file extension). Try
to load it from MATLAB using the load and csvread commands, and then plot the data.
There are two columns, x and y. You can also load it from a standard text editor like Notepad
to visualize the contents. The file can also be loaded from Microsoft Excel by clicking "Data"
→"From Text".
2.2
An Unpleasant Text File
The file example_text.csv contains a header row and four columns. The first column is a
quoted text string. Take a look at it in a text editor before trying to read it from MATLAB.
Do you find the nasty thing about this file? Now try loading it from MATLAB using the load,
csvread and dlmread commands. Plot some data from it to show you have found the data in
the last column.
Hint: You must specify which rows and columns that shall be loaded.
2.3
A MATLAB Matrix Data File
MATLAB has its own binary file format called .MAT (Matrix Data File). Now try to load
signal_left_3_0.mat using the load command. It contains a structure object, where data is
stored hierarchically in sub-structures. Try navigating the structure using the dot operator (.)
and the fieldnames, getfields and isfield commands. This kind of structures can be very
complex. This particular example comes from the measurement system used for the forklift
project later on in the course. It is thus a good idea to get familiar with it already. Try plotting
some data from it. It is common to facilitate plotting and analyzing such structures by writing
a MATLAB m-script that plots variables directly from the files. Try the plotta3 script file,
which you will also use later in the course.
Think about the following variants of using load.
load myfile
l o a d ( ’ m y f i l e . mat ’ )
data = l o a d ( ’ m y f i l e . mat ’ )
2.4
A Simple Binary Example
Now things will become a little more complicated. The file example_bin.dat is a binary file.
You can try opening it in Notepad but, as you will see, it is impossible to understand the output.
The file does, however, contain 4 columns stored row-wise. Each row begins with an index value
of one int followed by three data variables stored as floats.
2
TMMS04
Data Management Computer Lesson
2014 HT-1
To read such files it is necessary to use some MATLAB commands for file handling. Create
a new MATLAB script called load_example_bin.m. In the script you must first use the fopen
command to open the file. Then you can use the fread command to read one variable at a
time, and the feof command to check if you have reached the end of the file. Last but not
least you must close the file using fclose. Have the script plot some of the data in the file, for
example the last column values.
Hint: You can display the help text for each command by writing "help <command>".
2.5
A Nasty Binary Example
The last file is called example_bin.avi. If you double-click on it, Windows will think that it is
of the Audio Video Interleaved (.AVI) format, and will try to play it as a movie. This will not
work, since it is not a movie file. In fact, binary files used in engineering often use the same file
extensions as other well-known file formats. This file is actually a binary file similar to the one
in the previous task. It also contains four columns of data stored in a row-wise fashion. This
data is meant to represent a file originating from an ARM-based embedded system. Therefore,
it is coded in big-endian byte order. Each row is built up from the following C-struct prototype:
{int,double,float,float}.
Modify the script file from the previous task to read this file. Demonstrate your skills by
plotting the last column function.
Hint: Look for the MACHINEFORMAT argument in the help text for fopen.
3
Conclusions
Data management is a cornerstone in development of mechatronic systems. Understanding file
formats and data types is therefore necessary for the mechatronics engineer. Handling large sets
of data can be very time consuming. There are several commercial tools available that address
this issue. These tools are often part of larger development suites like MATLAB or LabVIEW.
Open source alternative and home-made solutions are also common, as they are cheaper and
not have the license limitations of commercial tools.
3