! "# $% '&($)&! *+&,-&./% *-&,10-)'2 '3$ 4576"8:9<;=9>9@?BADCFEHGJI9LKNMPOJQJRS;25T9LKN9 y UWdeVYX>jtZ7Z7VYVY[:[zl{\Y]N_N[=dk^`u|Zc_ab>ZcV=b@VY}kdeVY_aZcb>ZgfiVYhY[Yjk~ _N_aXXmlNl`j UW]Nd=nmoLZ7VY[qpmhrfgVsXmhtVtu2U2vxw FY Y m Keywords: Information Visualization, Multi Dimensional Scaling. Abstract Multi Dimensional Scaling is a structure preserving projection method that allows for the visualization of multidimensional data. In this paper we discuss our practical experience in using MDS as a projection method in three different application scenarios. Various reasons are given why structure preserving projection methods are useful for the analysis of multidimensional data. We discuss two visual forms (glyphs, heightfields) which can be used to represent the output of the projection methods. . +¡ F. In this paper we discuss our practical experience in using Multi Dimensional Scaling (MDS) for the visualization of multidimensional data. We show how MDS is used to gain insight into multidimensional spaces that are represented in a table. A large class of data can be characterized by tables. Such tables can be described by a matrix of attribute variables in one dimension and the outcome of specific cases in the other. Discovery and understanding of the structure in this type of data has many applications in science and business, [1]. Here the word structure refers to geometric relationships among subsets of the data variables in the table. Examples of structure include clusters, regular patterns, outliers, distance relations, proximity of data points etc. There are many numerical and statistical techniques that can be used to analyze structural information from multidimensional data tables. These techniques can be used to automatically extract certain structural properties from the data. Examples of such techniques are principal component analysis (PCA), ¢ k-means and hierarchical clustering algorithms (see [2, 3]). The majority of these techniques focus on specific aspects of the structure of the data such as clusters. A different class of techniques for the analysis of structural information is based on the idea that the multidimensional data points can be projected in a lower dimensional space such that the structural properties of the data are preserved. We called this class of techniques structure preserving projection methods. In this paper we discuss how multidimensional data can be visualized using structure preserving projection methods. We sketch three alternative methods and point out some differences between them. The paper is structured as follows: in the following section we will give an overview of the visualization process of data analysis using projection methods. In section 3 we sketch three structure preserving projection methods. Section 4 describes the visualization of the output of the described projection methods. Three applications illustrate the methods in section 5. The process of transforming data tables into a visual form can be considered in the context of the well known visualization pipeline[4]. For projection based methods, a pipeline of four stages can be specified as follows (see Figure 1): data acquisition, projection, mapping, and rendering. data aquisition projection mapping rendering interaction "!$#&%('*),+ Transforming tabular data into images. Data acquisition is the process of acquiring and selecting the data to be analyzed. This stage results in the data table. In the projection stage, nonlinear projection techniques are used to transform data points in a high dimensional data space to a lower dimensional visualization space. The goal of these techniques is to compute a spatial representation which preserves structural properties of the data table. In the mapping stage the output of the projection is translated into a set of graphical primitives. The goal of this stage is to effectively present the data in a visual form. During rendering the graphical primitives are rendered as an image. User interaction allows the user to investigate different aspects of the data. In all but the smallest data sets it is impossible to present all information con- !#"$&%'() * tained in the data automatically in a single image. Therefore the user should be able to interact with the parameters in the visualization pipeline in a meaningful and understandable way. +-, .0/2143650798;:<1>= ?@5A80BC1EDGF Projection methods for the analysis of structure have the following useful properties: The methods do not depend upon any control parameters that would require a priori knowledge about the data. For example, these methods do not depend on control parameters that determine the number of clusters. The methods are not limited to specific types of structures. In contrast to many specific structure seeking methods, projection methods can be used for the analysis of a wide range of complex structures. The methods use human visual capacity to recognize and interpret structure. For example, problems concerning anomalies in the data are overcome since humans can easily eliminate troublesome data points (automatic clustering algorithms have difficulty doing this). We briefly summarize some aspects of three projection based techniques. It goes beyond scope of this paper to discuss each technique in detail: Multi Dimensional Scaling (MDS) computes a configuration of points in a low-dimensional Euclidean space so that the distances between two points match the original dissimilarities between the corresponding variables in the data table [5]. To apply MDS, first a distance matrix (also called a similarity or adjacency matrix) must be generated from the data table. This is done by defining a metric by which the similarity or dissimilarity between cases in the table can be determined. Depending on the data type in the table, numeric, boolean or textual, many different metrics exist to calculate this difference [6]. Formally, if HJILK is the distance between points MNI and MOK and P)I is the position of MI in visualization space, the minimum of the equation QSRUT T I K!V<I W H ILKYX[ZZ P I\X P KOZZ^]_ `'aLb (1.1) must be computed. Various numerical methods can be used for the minimization; e.g. ranging from iterative newton-raphson based methods to genetic algorithms. Self Organizing Maps (SOM) is a technique that uses a neural network consisting of a two dimensional arrangement of nodes (neurons) [7]. The basic idea is that similar input points produce similar responses in a trained network. During training, neuron responses are adjusted based on a collection of representative input points. After training, the distribution of responses in the SOM is a representation of the structure of the data set. Generative Topographic Mapping (GTM) is a technique in which an topographic mapping function between the input data and the visualization space is found [8]. The idea is to use a function, which maps a density distribution in visualization space in combination with a Gaussian noise model into the original data space. An EM (expectation-maximization) algorithm is used to find a combination of the 2D distribution function and mapping function which gives the optimal representation of the original data. There are two major differences between MDS and SOM. First, given the set input points, MDS results in a set of points in visualization space while SOM results in a response on a two dimensional field. Second, in the case of SOM, a trained neural network will result in a mapping function which can be applied to additional data points. In the case of MDS, each additional data point will require a re-computation of the projected configuration. Hence, a trained neural network describes a projection function, while such a projection function does not exist in the MDS case. There are also two major differences between SOM and GTM . First, SOM results in a response on a two dimensional field whereas GTM results in a density distribution in data space. Second, in the case of SOM, the trained projection function is implicitly defined by the neuron responses. In contrast, the GTM projection function is explicitly defined as a parametric non-linear function. The data resulting from the projection based methods can have two types: a discrete set of positions or a continuous distribution function. For the visual forms, a distinction is made between discrete mapping and continuous mapping. Some possible mappings are shown in figure 2. If the output of the projection is a set of discrete points, each point can be represented by a glyph. When the number of points is large, glyphs are less suitable due to cluttering. Projection methods may result in highly nonuniform distribution of points in visualization space, cluttering is hard to avoid in this case. To counter this problem the continuous representation can be useful to gain insight in the global structure of the data. For the mapping of a !#"$&%'() projection output * visualization glyphs Discrete Discrete splat map Continuous Continuos heightfield +-, .0/!13254 Visualization mappings of projection output data. distribution function, the underlying data points are not explicitly represented but an aggregation is applied such that the overall properties of the set are reflected in the visualization. It is clear that, if the output of the projection is a distribution function, discrete mapping is impossible as the position of individual points is lost in the mapping. Glyphs can be used to visualize both the point and its attributes. For example, the shape, color, transparency, orientation of glyphs can be used to encode information associated to the point [9]. Heightfields can be used to visualize continuous functions, such as the distribution function. In the case that the discrete set of points is large, a heightfield can be constructed through GraphSplatting [10]. In this method, a field is constructed by accumulating individual Gaussian basis splats. The usage of glyphs and heightfields are complementary. Heightfields are useful for visualizing the overall structure of the field. Glyphs are useful to visualize the details of a small set of points. In addition, depending on the representation, the height field and the glyphs can be combined in the same visualization. 687 9;:<:<=?>-@A9CB<>EDGFIH In this section the usage of the previously described methods are illustrated in the context of specific applications. The methods are implemented in a system [10] which includes an MDS projection based method and support for continuous as well as discrete visualization methods. 687JK7 @L>MBCN OP>-HQBR9PFI@AS<H This application computes and visualizes the locations of 39 cities in the Netherlands with respect to road distances. Instead of the Euclidean distance metric, the length of the road connecting the two cities is used as the distance metric. Using this data, cities are modeled as points and the distance matrix was filled with acquired road distances. MDS was used to project the points in a visualization space. Note that in this case the distance metric is not derived from attribute data of the data points, instead the distance metric is part of the input data. The left panel of figure 3 shows the result of the MDS projection. Points are labeled with city names. Grey lines connect cities with a road distance of less than 25 kilometers. The right panel shows the result overlayed on a map of the Netherlands. Red discs are drawn at the actual city locations on the map. Green discs are drawn at the computed locations. City distance visualization. Left: MDS of 39 cities based on the road distances. Right: the solution overlayed on a map of the Netherlands The right panel shows various discrepancies between the actual city locations (red discs) and the computed city locations (green discs). The largest discrepancies are in the cities in the south west of the country. An explanation can be found in the fact that the “road distance” is larger than the “earth distance” between cities. A large detour is needed to reach cities in the south west of the Netherlands due to the water around these cities. This application requires user interaction to register the output with the overlaid map. Since MDS uses only the distance matrix as its input, the result will be a point configuration that is rotation invariant; i.e. although the distances between cities are correct, the complete point set may be rotated around a center of rotation. Similarly, the result can be mirrored with respect to the visualization plane. To overcome these problems, the user can pin points on the visualization space. In this application, the user must pin at least three cities on the map in order to avoid mirrored and rotated solutions. !#"$&%'() +-,.-, /1032547698:6;2=<?>7@A6 * BDCE25FG6?B The goal of image classification is to allow images to be retrieved from data repositories subject to a user defined query. Images are classified based on a collection of features, such as color, texture and object shape. The usefulness of the features for the query system is an important question for the developers of image retrieval systems. The goal of this application was the development of a system in which feature developers can experiment with features on a wide variety of image sets. The input of our system is a set of images and each image is classified by a feature vector [11]. In this way images can be represented by points in a high dimensional feature space. The distance matrix is defined by the Euclidean distance between the points. Two images with high similarity will result in points that are close to each other in the feature space. The MDS layout will provide a global overview of the structure of the feature space as well as similarity relations among images. In addition, a weighting factor is associated to each element of the feature vector. A user can interactively scale each dimension of the feature space by changing the weighting factor, resulting in a new distance matrix. In this way the user can explore the relation between features. We applied our method to a collection of images taken from the Corel Image Collection [12]. A set of 200 images was selected across different genres, yet, at the same time care has been taken that there is a small fraction of images per genre that would be commonly regarded as “similar”. For example, images of similar objects like sailing boats, or image of objects which differ in lighting characteristics or camera positions only. For each image, 6 feature vectors were computed: a four-dimensional Gabor feature vector for texture analysis and 5 distinct color-based features vectors. Texture-base features are particularly successful when applied to genres of images where color information is of lesser importance, eg. air photography [13]. The color-based features vectors including a hue histogram, a hue histogram of the center region of the image, and 3 hue transition histograms. For transition histograms, the hue is first dithered to 16 bins; then the histogram of the 256 resulting combinations is recorded. As a pre-processing step, the images were segmented into 32, 128, and 256 tiles, and each tile was replaced by its dominant hue. The dimensionality of the feature space spanned by the 6 features vectors is 804. Figure 4 shows a snapshot of the user interface. The left panel shows the discrete view: an arrangement of points in the visualization space. Small dots are used to represent points. Grey lines represent edges between points with distances below the user provided threshold distance. Some selected points are annotated with a thumbnail image. The right panel shows the continuous field of the same point configuration. Two views of a layout for a subset of the Corel Image Collection. The left panel shows a discrete representation. Points in visualization space are represented as small dots. The right panel shows a continuous field. Some points are annotated with a thumbnail image. The layout provides a view in which the images are displayed according to their mutual dissimilarities. Similar images are clustered. A problem with the discrete view is the potential cluttering of dots, making it difficult to estimate density of points in dense regions. The continuous field provides a view of a density field. Colors are used to show which areas have a high density of points. In this way, the user can see in a glance which images are similar. Users can interact with the system in three ways. First, by dragging and pinning points in the visualization space. In this case, the MDS algorithm will compute a new solution. Second, by varying the mapping parameters of the density field, the frequency of the density field can be controlled. Changing these parameters effect the mapping stage of the pipeline. Finally, the weight factors can be changed resulting a scaling of the high dimensional feature space. All three of these interactions will result in a new distance matrix. ! The goal of this application is the analysis of the citation index of all IEEE Vis’9X papers. We show that clustering of citations leads to specific topics in visualization. We have applied our method to the analysis of the IEEE Vis’9X citation index [14]. The input data set are BibTeX entries of all papers in the proceedings of the IEEE Vis’9X conferences and all references to papers in this set from other papers in the set. The data set consists of 599 BibTeX entries and 881 references. !#"$&%'() * The goals of the visualization was to test the hypothesis that topics in visualization could be identified by only analyzing the density of the references. The motivation of this hypothesis is that papers in one topic often refer to other papers in the same topic. The distance matrix was the reference matrix. This matrix which has the dimension of the total number of papers in both directions and each element contains ‘true’ if a paper references the other. +-,/.10!2354 Left: All papers published in IEEE Vis’9X conferences, represented as discs and references between the papers represented as lines. Right: Interacting with the citation index. The influence of a group of papers is drawn with yellow (incoming) and blue (outgoing) references. Papers in the region selected by the highlighted contour on the right are shown as discs. The left panel of figure 5 shows the output of the MDS layout. Papers are represented as small circles. References between papers are represented as lines. As can be seen, aside from the papers which are not referenced and do not reference papers, there is a single clustering of points. Due to the large number of points in the cluster, it is difficult to obtain insight to the structure of the data. The right panel shows a slightly zoomed in 2D rendition of a continuous representation of the MDS layout. The density of the field clearly shows various clusters of papers. For example, the papers in the large dark region in the middle of the image deal with flow visualization. The (smaller) region below are papers describing visualization systems. The region on the right are volume visualization papers. Discrete discs (red dots representing papers) and lines (yellow lines for incoming references and blue lines for outgoing references) are also drawn as annotation. The region at the top left contain information visualization papers. The distance to the other peaks in the field illustrate the distinction between information visualization and other data visualization topics. Contour lines can be used to show cluster boundaries. Also, the influence of a paper is shown by drawing the edges representing references to the selected papers. Figure 5 also illustrates some interaction techniques. Contour lines can be used as a selection criterion. In this way all papers in a cluster can be selected. The user can also pick individual papers and show all information related to that individual paper. This paper concerns the visualization of multidimensional data using structure preserving projection methods. Some possible visualization techniques which can be used for the display of the projection methods were discussed. Three applications were given as illustration. An advantage of projection based methods is that they make use of human pattern recognition abilities for the interpretation of the data. Also, projection based methods do not require a priori knowledge about the multidimensional data. For this reasons, these methods are well suited to be included in explorative visualization toolkits. !" [1] S.K. Card, J.D. Mackinlay, and B. Shneiderman, editors. Readings in Information Visualization. Morgan Kaufmann Publishers, 1999. [2] G. H. Ball. A comparison of some cluster-seeking techniques. Technical Report RADC-TR-66-512, Rome Air Development Center, Rome, NY, 1966. [3] V. Barnett. Interpreting Multivariate Data. John Wiley & Sons, Inc., New York, 1981. [4] R.B. Haber and D.A. McNabb. Visualization idioms: A conceptual model for scientific visualization systems. In G.M. Nielson, B.D. Shriver, and L.J. Rosenblum, editors, Visualization in Scientific Computing, pages 74– 92. IEEE Computer Society Press, 1990. [5] T.F. Cox and M.A.A. Cox. Multidimensional Scaling. Chapman & Hall, London, 1994. [6] L. Kaufman and P.J. Roussew. Finding Groups in Data - An Introduction to Cluster Analysis. Wiley-Science Publication John Wiley & Sons Inc., 1990. [7] T. Kohonen. Self-Organizing Maps. Springer-Verlag Berlin Heidelberg New York, 1995. "!#%$&'( )&) [8] C.M. Bishop, M Svensén, and C.K.I. Williams. GTM: A principled alternative to the self-organizing map. Advances in Neural Information Processing Systems, 9:354–363, 1997. [9] D. Ebert, R. Rohrer, C. Shaw, P. Panda, J. Kulka, and D. Roberts. Procedural shape generation for multi-dimensional data visualization. In E. Groller, H. Loffelmann, and W. Ribarsky, editors, Data Visualization ’99 (Proceedings EG-IEEE VisSym 1999), pages 3–13. Springer Verlag, 2000. [10] R. van Liere and W.C. de Leeuw. Graphsplatting: Visualizing graphs as continuous fields. accepted for publication in IEEE Transactions on Visualization and Computer Graphics, 2002. [11] Robert van Liere, Wim de Leeuw, and Florian Waas. Interactive visualization of multidimensional feature spaces. In D.S. Ebert and C.D. Shaw, editors, Proceedings on New Paradigms for Information Visualization (NPIVM’00). IEEE Computer Society Press, 2000. [12] Corel, http://www.corel.ca/products/clipartandphotos/photos/index.htm. Corel Stock Photos, 1999. [13] B.S. Manjunath and W.Y. Ma. Texture features for browsing and retrieval of large image dat a. IEEE Transactions on Pattern Analysis and Machine Intelligenc e, 18(8):837–842, August 1996. [14] References IEEE visualization proceedings 1990-1999 can be downloaded at http://www.cwi.nl/r̃obertl/visbib, 2000.
© Copyright 2025 Paperzz