Visualization of Multivariate Data Using 3D and VR Presentations



Vladimir Batagelj
University of Ljubljana, Faculty of mathematics and physics
Department of Mathematics, Jadranska 19, 1000 Ljubljana
Tel: 386 (61) 1766 672; fax: 386 (61) 217 281
e-mail: Vladimir.Batagelj@uni-lj.si
WWW: http://vlado.fmf.uni-lj.si/



Andrej Mrvar
University of Ljubljana, Faculty of Social Sciences
Kardeljeva pl. 5, 1000 Ljubljana
e-mail: Andrej.Mrvar@uni-lj.si
WWW: http://www.uni-lj.si/~fdmrvar/andrej.html


Abstract

VRML (Virtual Reality Modeling Language) and freely available browsers for it made three dimensional presentations very popular also on personal computers. One of the most important features of VRML is the possibility of traveling in the obtained scene - egocentric view.
In the paper an introduction to visualization of multivariate data and some examples of their VR presentations are given.

Paper published in: Indo-French Workshop on Symbolic Data Analysis and its Applications. 23-25. September 1997, Paris XI - Dauphine, vol 1., p. 66-76.

1. Introduction

1.1 Data Visualization

With the growth of computing power of desktop computers, data visualization is gaining popularity among researchers as a tool for data exploring and for presentations of results (Brown 95, Baker 95.) Using a data visualization system (see Figure 1) a researcher usually adopts different goals (Wahrend 90) to: identify, locate, distinguish, categorize, cluster, rank, compare, associate or correlate some data.

Figure 1. Data visualization system.

Properties of the data set that crucially influence forms of its representation are:

Small data sets can be presented in totality and in detail in a single view. In an overall view of large data sets details are lost; and a detailed view can encompass only a part of data set.

The basic feature of VR (Virtual Reality) is the support of egocentric view - the user is immersed into the presentation as its active part; he can travel inside the data scene, the view is determined by his position in the scene. Standard data visualizations supported by general purpose programs (Excel, PowerPoint, ...) are mostly exocentric - viewer is positioned outside the presentation. Often the third dimension is used only to make the presentation fancier, and not to get better insight about the data.

In very large data sets a serious problem appears: How to avoid to be ``lost within the forest''? There are several solutions that help the user's orientation:

Additional support can be achieved by implementing trace/backtrack/replay mechanism and guided tours.

Closely related with the multiview idea are the concepts of glasses, lenses and zooming ( Pad++, inXight 96). Selecting different glasses we obtain different views on the same data. Glasses have effect on the entire window, and lenses only on the selected region.

Figure 2. Windows File Manager.

An example of the multiview approach is the presentation of files used in windows file manager (see Figure 2). It provides also different glasses (Name, All File Details, ..., Sort by ...; in the new version: Large icons, Small icons, List, Detailed list, ...).

1.2 Visualization of Multivariate Data Sets

In visualizing multivariate data we usually deal with small or large, sparse and static data sets. Let E = Xi be a set of units. A unit X is usually described by list of values of selected attributes (properties) (V1=x1,V2=x2,...,Vm=xm). It is usually represented by a glyph which integrates, as its components, elements representing unit's attributes.

From standard data analysis we know several types of 2D-glyphs: point in plane, pie charts, bar charts, columns, stars, Chernoff faces, Andrews curves, ... (Dillon 84). Most of 2D-glyphs can be extended to 3D-glyphs, and some additional should be invented.

For example, pie chart and column representation can be combined into pie cylinder (see Figure 3). By using these glyphs to represent representatives (centroids) of groups, they can be used also for presentations of groups.

Figure 3. Pie Cylinder.

For representing selected attribute over a group of units histograms and Tukey's box-and-whisker plots are often used (see Figure 4).

Figure 4. Tukey Glyph.

They can be combined into a glyph (for example, a star) representing a group (see picture in subsection 2.2 Stars)

The largest ring in Tukey glyph represents data set (population) average, the middle ring - group average, and the small one - group median. The tubes are representing 1-3 quartile, 1-9 decile and min-max intervals.

The representation elements support associative, selective, ordering and/or quantifying tasks. In the visualization task there are several levels of detail represented by the hierarchy

(attribute, unit, group, groups, data set)

Most of data analysis procedures can be seen as transformations on or relations between these levels.

Different scale types are represented by different graphical elements:

scale representation
nominal color, shape
ordinal grade, lightness, texture, arrangement (position)
numeric size, position, direction, angle

Since numeric is included in ordinal, and ordinal is included in nominal, the representations compatible with higher scales could be used also for lower scales - e.g., direction to represent nationality. A general rule is that this should be avoided because they can suggest unsubstantial associations.

1.3 Three Dimensional Data Presentations

In this paper we discuss 3D presentations of multivariate data. As a prototyping environment we selected VRML (Virtual Reality Modeling Language) because it provides a platform independent presentations and supports VR presentations.

Figure 5. Basic VRML Shapes.

In a presentation of multivariate data several VRML elements can be used:

  1. position in space (x, y, z);
  2. shape (sphere, cube, cone, cylinder, plane, ...; see Figure 5);
  3. color;
  4. size, angle, slope, area, volume;
  5. pattern (texture);
  6. direction (orientation);
  7. text;
  8. lights (different light sources, shadowing, transparency, reflections, ...);
  9. rotation of objects;
  10. different views and ways of moving in the obtained scene; camera properties (orthographic, perspective, stereoscopic; field of view).

1.4 VRML

During the first Web Conference in May 1994 some experts for virtual reality formed a group that should prepare some additions to HTML (HyperText Markup Language) in the field of virtual reality. So the idea of VRML (Virtual Reality Markup Language) was born. Silicon Graphics supported the idea significantly by giving in free use its language for description of three dimensional objects Open Inventor (Warnecke 94) together with its parser.

On the next conference, in October 1994 in Chicago, first version of VRML was announced ( Bell, Ames 96). Designers decided that HTML and VRML should be "orthogonal" but connected languages - VRML became Virtual Reality Modeling Language.

First shareware VRML browser WebSpace appeared in May 1995. Paper company gave the browser WebFX in free use in August 1995. WebFX was a plug-in for Netscape - the most popular HTML browser at that time. WebFX was later renamed to live3D. Silicon Graphics is developing its own VRML viewer - CosmoPlayer.

At Siggraph (August 1996) the VRML 2.0 specification was published and made available in its final form (Lea 96). VRML 2.0 allows the user to build user controlled multiuser scenes.

VRML is used in many areas: data organization, three dimensional maps, modeling, mathematics, chemistry, medicine,... ( Vollhardt).

2. Examples

In the following examples data about 27 different types of food are used (see table; Hart 75). They are described by 5 numeric variables: Food Energy, Protein, Fat, Calcium, and Iron. Variables were standardized before use.

Table 1. Types of Food (Raw Data).
No Food Cluster Energy (cal) Protein (g) Fat (g) Calcium (mg) Iron (mg)
1 Beef, braised 3 340 20 28 9 2.6
2 Hamburger 3 245 21 17 9 2.7
3 Beef, roast 3 420 15 39 7 2.0
4 Beef, steak 3 375 19 32 9 2.6
5 Beef, canned 3 180 22 10 17 3.7
6 Chicken, broiled 6 115 20 3 8 1.4
7 Chicken, canned 6 170 25 7 12 1.5
8 Beef heart 3 160 26 5 14 5.9
9 Lamb leg, roast 5 265 20 20 9 2.6
10 Lamb shoulder, roast 5 300 18 25 9 2.3
11 Smoked ham 4 340 20 28 9 2.5
12 Pork, roast 4 340 19 29 9 2.5
13 Pork, simmered 4 355 19 30 9 2.4
14 Beef tongue 3 205 18 14 7 2.5
15 Veal cutlet 3 185 23 9 9 2.7
16 Bluefish, baked 2 135 22 4 25 0.6
17 Clams, raw 1 70 11 1 82 6.0
18 Clams, canned 1 45 7 1 74 5.4
19 Crabmeat, canned 1 90 14 2 38 0.8
20 Haddock, fried 2 135 16 5 15 0.5
21 Mackerel, broiled 2 200 19 13 5 1.0
22 Mackerel, canned 2 155 16 9 157 1.8
23 Perch, fried 2 195 16 11 14 1.3
24 Salmon, canned 2 120 17 5 159 0.7
25 Sardines, canned 2 180 22 9 367 2.5
26 Tuna, canned 2 170 25 7 7 1.2
27 Shrimp, canned 1 110 23 1 98 2.6

Units (types of food) were manually clustered in six clusters, represented by colors

  1. clams and crabs / cyan,
  2. fish / blue,
  3. beef / magenta,
  4. pork / red,
  5. lamb / yellow,
  6. chicken / white.
The two main clusters
  1. {1, 2} - sea-food, and
  2. {3, 4, 5, 6} - meat
are represented by shape (cube, sphere).

Since the full advantage of VRML can be grasped only using VRML browser we strongly recommend the reader to visit the HTML/VRML version of this paper at:

http://vlado.fmf.uni-lj.si/vrml/paris.97/
Software for producing 3D representations of multivariate data in VRML is available at:
http://vlado.fmf.uni-lj.si/pub/vrml/

2.1 Planets

The simplest presentation of multivariate data is a presentation using planets: three selected variables are shown using positions in the space. Additional information can be represented by glyphs that represent units.

In presentation of food types in Figure 6 the positions in the space are determined by first three principal components. Different views can show interesting relations in data. For example, the positions of glyphs representing clams and crabs suggest that our decision to put them in the same group was not appropriate. Groups of similar types of food can be easily noticed in both pictures.

Planets (VRML)

Figure 6. Planets.

2.2 Stars

The use of stars is an alternative possibility to present multivariate data. Each variable is represented using the length of the corresponding ray of the star. We can also use different colors for different rays.

In Figure 7 positions in the space are again determined by the first three principal components. If we look at the pictures we can see that the shapes of the stars explain their positions in the space (or vice versa) - stars that are closer are more similar than the others.

Stars (VRML)

Figure 7. Stars.

In Figure 8 the two main clusters (sea-food - left side, meat - right side) are represented using Tukey stars. We can easily see main differences between them - low level and small variation of Fat and Energy in fish cluster, and of Calcium in meat cluster. Tukey stars can be, by introducing appropriate glyphs, used also for representing groups of units described by all three types of variables (nominal, ordinal, numeric).

Fish (VRML) Meat (VRML)

Figure 8. Tukey Stars.

2.3 3D Histograms

We can represent multivariate data also using 3D histograms. In presentation in Figure 9 the first two principal components determine the positions in the plane (value 0); the standardized variable Fat determines the height of corresponding column; six clusters are represented by color, and the main two clusters by shape of the column (prism, cylinder).

3D Histogram (VRML)

Figure 9. 3D Histogram.

2.4 3D Dendrograms

Hierarchical clustering is often used in data analysis. The process of fusing can be shown using dendrograms. We can combine this method with principal components. The first two principal components determine the position of a unit in the plane. Units are then joined using 3D dendrogram according to hierarchical clustering algorithm.

In this way we can find some similarities between the results of both methods (see Figure 10): units that are closer (according to principal components) are joined earlier than the others.

3D Dendrogram (VRML)

Figure 10. 3D Dendrogram.

2.5 3D Time Series Spiral

In Figure 11 quarterly, seasonally unadjusted time series at 1964 prices Private consumer expenditure in Austria (billions of Austrian Schillings) (Thury 82) is represented by time series spiral.

In January 1978 a special purchase tax rate for luxury goods was to be introduced. Therefore, most consumers bought the durable goods, and above all cars, which they intended to purchase in the immediate future, at the end of 1977.

3D Time Series Spiral (VRML)

Figure 11. 3D Time Series Spiral.

3. Conclusion

In the paper we presented some general ideas on data visualization and some examples of visualization of multivariate data. On this basis different kinds of programs for multivariate data visualization can be developed - from simple transformers of multivariate data to their VRML descriptions, to a visual data exploration system, based on some powerful 3D-graphic library (OpenGL, Direct3D, ...), combined with other data analysis methods.

References

  1. Ames A.L., Nadeau D.R., Moreland J.L.: The VRML Sourcebook. Wiley, New York, 1996.
  2. Baker M.P., Wickens C.D.: Human Factors in Virtual Environments for the Visual Analysis of Scientific Data. draft, 1995. http://monet.ncsa.uiuc.edu/~baker/PNL/paper.html
  3. Batagelj V., Mrvar A.: Trirazsezne predstavitve podatkov (3D Data Presentations). Proceedings of DSI'96, Portoroz, April 17-24, 1996, p. 427-432.
  4. Bell G., Parisi A., Pesce M.: The Virtual Reality Modeling Language. Version 1.0 Specification. http://www.sdsc.edu/vrml_repository/Archives/vrml10-3.html
  5. Brown J.R., Earnshaw R., Jern M., Vince J.: Visualization: Using Computer to Explore Data and Present Information. Wiley, New York, 1995.
  6. Dillon W.R., Goldstein M.: Multivariate Analysis: Methods and Applications. Wiley, New York, 1984, p. 191-202.
  7. Hartigan J.A.: Clustering Algorithms. Wiley, New York, 1975, p.86.
  8. inXight: VizControls Technology. A Xerox New Enterprise Company, 1996. http://www.inxight.com/products/visual/overview.shtml
  9. Lea R., Matsuda K., Miyashita K.: Java for 3D and VRML Worlds. New Riders, Indianapolis, 1996.
  10. Pad++: Portal filtering and 'magic lenses'. http://www.cs.unm.edu/pad++/lenses.html
  11. Thury G.: Modelling Consumer Expenditure by Intervention Analysis. TIME SERIES: Theory and Practice 1; O.D. Anderson (editor). North Holland, 1982, p. 308.
  12. Tukey J.W.: Exploratory Data Analysis. Addison-Wesley, Reading, MA, 1977.
  13. VRML in Chemistry: Vollhardt H., Moeckel G., Henn C., Teschner M., and Brickmann J.: VRML for the Communication with 3D Scenarios of Biomolecules. http://ws05.pc.chemie.th-darmstadt.de/vrmlG/
  14. Warnecke J.: The Inventor Mentor. Addison-Wesley, Reading, MA, 1994.
  15. Wehrend S., Lewis C.: A Problem-Oriented Classification of Visualization techniques. In Proceedings of IEEE Visualization'90, 1990, p. 139-143.
  16. Young F.W., Edds T., Kent D., Kuhfeld W.F.: Visual Exploratory Data Analysis. In: Classification as a Tool of Research, Proceedings of the 9th Annual Meeting of the Classification Society (F.R.G), University of Karlsruhe, F.R.G., 26-28 June, 1985. edited by W. Gaul and M. Schader.