Visualizations of High-Dimensional Space PPoint CATA2007 5

Visualizations of HighDimensional Space
Abstract
Spatial data analyzing occurs at different
levels
– Pixel
– Pixel-group
– Band
Pixel level – common attributes are
– Location – coordinate plane or geolocation
– Reflectance values – 0-255 encoded in 8 bits
Abstract 2
• Band operations defined as functionals
• Visualize process using 1D and 2D
representors
– Jewell diagrams
– Augmented Himalayan chain diagrams
Organization of Presentation
•
•
•
•
•
Massive data sets description
Image band formats
Band operations as functionals
1D and 2d representors
Conclusions
RSI Datasets
• Massive – one Thematic Mapper image in
time covers approximately a 180-km square
area and consists of well over one billion
pixels
• Data repositories will soon reach petabyte
size
Scalability Issues
• Cardinality
• Row (or database size)
scalability
• Dimensionality
• Column (or
dimension) scalability
Scalability Issues 2
• Spatial data scalability issues addressed by
using functionals and visualizing the
process using Jewell diagrams and
mountain chain diagrams
Image Band Formats
• Existing formats
– BIL (band interleaved by line)
– BIP (band interleaved by pixel)
– BSQ (band sequential)
• New format
– bSQ (bit sequential)
Image Data Organized by Bands
• First level of data
organization is to
group by bands
• This figure represents
mechanism used for
separating image data
into bands
BAND 1 (blue)
BAND 2 (green)
BAND 3 (red)
BAND 2
BAND 1
B G
R
BAND 3
SCENE
DATA
Spatial Data Formats
254
(1111 1110)
BAND-1
127
(0111 1111)
37
(0010 0101)
BAND-2
240
(1111 0000)
14
(0000 1110)
193
(1100 0001)
200
(1100 1000)
19
(0001 0011)
BSQ format (2 files)
BIL format (1 file)
BIP format (1 file)
Band 1: 254 127 14 193
Band 2: 37 240 200 19
254 127 37 240
14 193 200 19
254 37 127 240
14 200 193 19
bSQ format (16 files)
B11 B12 B13 B14 B15
1
1
1
1
1
0
1
1
1
1
0
0
0
0
1
1
1
0
0
0
B16 B17 B18 B21 B22 B23
1
1 0
0
0 1
1
1 1
1
1 1
1
1 0
1
1 0
0
0 1
0
0 0
B24 B25 B26
0
0 1
1
0 0
0
1 0
1
0 0
B27
0
0
0
1
B28
1
0
0
1
BIP (Band Interleaved by Pixel)
• Pixel-consecutive
scheme
• Data stored in pixelmajor order
DIGITIZED AND FORMATTED DATA
0 1 4
3 2 0
2 4 2
4 0 3 ....
BAND 1
0
1
2
3
4
ANALOG
TO
DIGITAL
SCALE
BAND 2
BAND 3
SCENE
DATA
BIL (Band Interleaved by Line)
• Image scan line
constitutes organizing
base
• Data stored in linemajor order
DIGITIZED AND FORMATTED DATA
line 1
band 2
line 1
band 1
line 1
band 3
0 3 2 4 .... 1 2 4 0 .... 4 0 1 3 ....
BAND 1
0
1
2
3
4
ANALOG
TO
DIGITAL
SCALE
BAND 2
line 2
band 1
1 0 0 ..
BAND 3
SCENE
DATA
BSQ (Band Sequential Format)
• Data stored in bandmajor order
• Widely used format
• Each image band
appears consecutively
in data file
DIGITIZED AND FORMATTED DATA
0 3 2 4 .. 1 0 0 .. 1 2 4 0 .. 2 4 3 ..
BAND 1
0
1
2
3
4
ANALOG
TO
DIGITAL
SCALE
BAND 2
4 0 1 3 ..
BAND 3
SCENE
DATA
bSQ (Bit Sequential Format)
•
•
Split each band into eight separate
files, one for each bit position.
Reasons of using bSQ format
– Different bits contribute to the
value differently.
– bSQ format facilitates
representation of a precision
hierarchy (from 1 to 8 bit
precision).
– bSQ format facilitates creation
of an efficient data structure,
the P-tree, algebra and cube
BSQ and bSQ
 BSQ and bSQ are “tabular” formats.
– BSQ consist of a separate table for each feature band.
– bSQ consist of a separate table for each bit of each band.
 One can view it this way:
– The data set is initially 1 relation or table, R(K1,..,Kk, a1, …, an) where
k1,..,Kk are structure attributes and Ai are feature attributes.
• Structure attributes of a 2-D image are X,Y coordinates of the pixels (rows).
• Feature attributes are the bands, B,G,R, NIR, …
• BSQ we separate each feature into a separate file and suppress the structure
attributes altogether (assuming pixels are always arranged in raster order.
(aka: decomposition storage model (DSM), Copeland et al, SIGMOD85, 268279.).
• bSQ, separate each bit of each feature into separate file (raster order
assumption) (aka: bit transpose file (BTF) model, Wong et al, VLDB85, pp
448-457.).
Band Operations as Functionals 1
• 16-pixel reduced number, raster-ordered
RSI dataset
• Each pixel has
– two structural attributes, x and y
– three feature attributes, R (red), B (blue), G
(green)
– derived attributes, RVI, NTV, and RLTV
RSI Band Functionals
x
y
R
G
B
Y
0
0
0
0
1
1
1
1
2
2
2
2
3
3
3
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
2
3
2
2
3
2
7
7
1
0
0
0
1
2
1
3
6
6
5
5
1
1
1
1
5
6
7
7
7
7
6
5
1
1
1
1
2
2
0
0
0
0
0
0
0
0
0
0
2
2
1
1
0
0
0
0
1
2
2
2
2
1
2
1
36
2.25
76
4.75
8
0.5
19
1.875
S =
=
2-6+7
3-6+7
2-5+7
2-5+7
3-1+7
2-1+7
7-1+7
7-1+7
1-5+7
0-6+7
0-7+7
0-7+7
0-7+7
2-7+7
1-6+7
3-5+7
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RVI
NTV
RLTV
3
4
4
4
9
8
13
13
3
1
0
0
1
2
2
5
30
38
6
6
270
262
590
590
30
110
166
166
110
86
54
14
1
2
1
1
2
2
3
3
1
2
2
2
2
2
2
1
Contours
• Definition: let q  Rk be the set of all
points x  Rn such that f(x) = q is the
preimage of q under f and is denoted as
f -1(q).
• Now let [p,q]  Rk be the set of all
points x  Rn , such that f(x)  [p,q] is the
preimage of [p,q] under f, or the contour of
[p,q] under f.
Contours Around a Given Pixel
RVI (rough vegetative
index) contours are
– M  RVIxy-1
– H  RVIxy-1
• Using contours,
functional pruning can
prune-off nonneighboring pixels
1D and 2D Tuple Visualizations
• RSI dataset of Figure 4 as a function, X, as
follows: let f:X  Rn  Rk be any function,
with R = reals.
• If k = 1, then we call it a functional
• If k = 2 or 3, then we call it a diagram and its
range can be viewed as a plot of points, as in
the Jewell and Mountain Chain diagrams.
These are related to Parallel Coordinates
• If k = n, then it is a vector field.
Diagrams ( k = 2 or k = 3)
• Attributes
– represented by A1 to A8
– depicted by straight lines
• Data points
– represented by different colors
– individual values depicted by colored dots
– values scaled on an attribute-line
Parallel Coordinates
Jewel Diagram
Jewell Diagram 2
Jewell Diagram 3
AUGH (Augmented Himalayan
Chain) 1
AUGH 2
AUGH 3
Comparisons
• Jewell diagram
• AUGH diagram
Conclusions 1
• Comparison of diagrams
Conclusions 2
• Diagrams and Scalability
– Provide a method for viewing n-dimensional
data
– Provide preliminary and rough interpretation of
clustering and outlier detection
– Might be useful in pruning dataset, addressing
scalability issues, and identifying outliers
Conclusions 3
• These are VERY preliminary results and
further work with full datasets is necessary
before the advantage of their use can be
fully understood