Visualization of Multivariate Data Using 3D and VR Presentations

Vladimir Batagelj
University of Ljubljana, Faculty of mathematics and physics
Department of Mathematics, Jadranska 19, 1000 Ljubljana
Tel: 386 (61) 1766 672; fax: 386 (61) 217 281
e-mail: Vladimir.Batagelj@uni-lj.si
WWW: http://vlado.fmf.uni-lj.si/

Andrej Mrvar
University of Ljubljana, Faculty of Social Sciences
Kardeljeva pl. 5, 1000 Ljubljana
e-mail: Andrej.Mrvar@uni-lj.si
WWW: http://www.uni-lj.si/~fdmrvar/andrej.html

Abstract

VRML (Virtual Reality Modeling Language) and freely available browsers for it made three dimensional presentations very popular also on personal computers. One of the most important features of VRML is the possibility of traveling in the obtained scene - egocentric view.
In the paper an introduction to visualization of multivariate data and some examples of their VR presentations are given.

Paper published in: Indo-French Workshop on Symbolic Data Analysis and its Applications. 23-25. September 1997, Paris XI - Dauphine, vol 1., p. 66-76.

1. Introduction

1.1 Data Visualization

With the growth of computing power of desktop computers, data visualization is gaining popularity among researchers as a tool for data exploring and for presentations of results (Brown 95, Baker 95.) Using a data visualization system (see Figure 1) a researcher usually adopts different goals (Wahrend 90) to: identify, locate, distinguish, categorize, cluster, rank, compare, associate or correlate some data.

**Figure 1.** Data visualization system.

Properties of the data set that crucially influence forms of its representation are:

size: small, large, infinite;
density: sparse, dense, cluttered; and
activity: static, dynamic (deterministic, random).

Small data sets can be presented in totality and in detail in a single view. In an overall view of large data sets details are lost; and a detailed view can encompass only a part of data set.

The basic feature of VR (Virtual Reality) is the support of egocentric view - the user is immersed into the presentation as its active part; he can travel inside the data scene, the view is determined by his position in the scene. Standard data visualizations supported by general purpose programs (Excel, PowerPoint, ...) are mostly exocentric - viewer is positioned outside the presentation. Often the third dimension is used only to make the presentation fancier, and not to get better insight about the data.

In very large data sets a serious problem appears: How to avoid to be ``lost within the forest''? There are several solutions that help the user's orientation:

restart option: returns the user to the starting position;
introduction of additional orientation elements: coordinates display, grids, shadows, landmarks (static / user set). These elements can be switched on/off.
multiview: consists of at least two views (windows):
- map view: overall view (usually exocentric) which contains the current position and allows 'long' moves (jumps). For very large data sets it can be combined with zooming or fish eye.
- local view: which displays the selected portion of data set.

Additional support can be achieved by implementing trace/backtrack/replay mechanism and guided tours.

Closely related with the multiview idea are the concepts of glasses, lenses and zooming ( Pad++, inXight 96). Selecting different glasses we obtain different views on the same data. Glasses have effect on the entire window, and lenses only on the selected region.

**Figure 2.** Windows File Manager.

An example of the multiview approach is the presentation of files used in windows file manager (see Figure 2). It provides also different glasses (Name, All File Details, ..., Sort by ...; in the new version: Large icons, Small icons, List, Detailed list, ...).

1.2 Visualization of Multivariate Data Sets

In visualizing multivariate data we usually deal with small or large, sparse and static data sets. Let E = X_i be a set of units. A unit X is usually described by list of values of selected attributes (properties) (V₁=x₁,V₂=x₂,...,V_m=x_m). It is usually represented by a glyph which integrates, as its components, elements representing unit's attributes.

From standard data analysis we know several types of 2D-glyphs: point in plane, pie charts, bar charts, columns, stars, Chernoff faces, Andrews curves, ... (Dillon 84). Most of 2D-glyphs can be extended to 3D-glyphs, and some additional should be invented.

For example, pie chart and column representation can be combined into pie cylinder (see Figure 3). By using these glyphs to represent representatives (centroids) of groups, they can be used also for presentations of groups.

**Figure 3.** Pie Cylinder.

For representing selected attribute over a group of units histograms and Tukey's box-and-whisker plots are often used (see Figure 4).

**Figure 4.** Tukey Glyph.

They can be combined into a glyph (for example, a star) representing a group (see picture in subsection 2.2 Stars)

The largest ring in Tukey glyph represents data set (population) average, the middle ring - group average, and the small one - group median. The tubes are representing 1-3 quartile, 1-9 decile and min-max intervals.

The representation elements support associative, selective, ordering and/or quantifying tasks. In the visualization task there are several levels of detail represented by the hierarchy

(attribute, unit, group, groups, data set)

Most of data analysis procedures can be seen as transformations on or relations between these levels.

Different scale types are represented by different graphical elements:

scale	representation
nominal	color, shape
ordinal	grade, lightness, texture, arrangement (position)
numeric	size, position, direction, angle

Since numeric is included in ordinal, and ordinal is included in nominal, the representations compatible with higher scales could be used also for lower scales - e.g., direction to represent nationality. A general rule is that this should be avoided because they can suggest unsubstantial associations.

1.3 Three Dimensional Data Presentations

In this paper we discuss 3D presentations of multivariate data. As a prototyping environment we selected VRML (Virtual Reality Modeling Language) because it provides a platform independent presentations and supports VR presentations.

**Figure 5.** Basic VRML Shapes.

In a presentation of multivariate data several VRML elements can be used:

position in space (x, y, z);
shape (sphere, cube, cone, cylinder, plane, ...; see Figure 5);
color;
size, angle, slope, area, volume;
pattern (texture);
direction (orientation);
text;
lights (different light sources, shadowing, transparency, reflections, ...);
rotation of objects;
different views and ways of moving in the obtained scene; camera properties (orthographic, perspective, stereoscopic; field of view).

1.4 VRML

During the first Web Conference in May 1994 some experts for virtual reality formed a group that should prepare some additions to HTML (HyperText Markup Language) in the field of virtual reality. So the idea of VRML (Virtual Reality Markup Language) was born. Silicon Graphics supported the idea significantly by giving in free use its language for description of three dimensional objects Open Inventor (Warnecke 94) together with its parser.

On the next conference, in October 1994 in Chicago, first version of VRML was announced ( Bell, Ames 96). Designers decided that HTML and VRML should be "orthogonal" but connected languages - VRML became Virtual Reality Modeling Language.

First shareware VRML browser WebSpace appeared in May 1995. Paper company gave the browser WebFX in free use in August 1995. WebFX was a plug-in for Netscape - the most popular HTML browser at that time. WebFX was later renamed to live3D. Silicon Graphics is developing its own VRML viewer - CosmoPlayer.

At Siggraph (August 1996) the VRML 2.0 specification was published and made available in its final form (Lea 96). VRML 2.0 allows the user to build user controlled multiuser scenes.

VRML is used in many areas: data organization, three dimensional maps, modeling, mathematics, chemistry, medicine,... ( Vollhardt).

2. Examples

In the following examples data about 27 different types of food are used (see table; Hart 75). They are described by 5 numeric variables: Food Energy, Protein, Fat, Calcium, and Iron. Variables were standardized before use.

**Table 1.** Types of Food (Raw Data).
No	Food	Cluster	Energy (cal)	Protein (g)	Fat (g)	Calcium (mg)	Iron (mg)
1	Beef, braised	3	340	20	28	9	2.6
2	Hamburger	3	245	21	17	9	2.7
3	Beef, roast	3	420	15	39	7	2.0
4	Beef, steak	3	375	19	32	9	2.6
5	Beef, canned	3	180	22	10	17	3.7
6	Chicken, broiled	6	115	20	3	8	1.4
7	Chicken, canned	6	170	25	7	12	1.5
8	Beef heart	3	160	26	5	14	5.9
9	Lamb leg, roast	5	265	20	20	9	2.6
10	Lamb shoulder, roast	5	300	18	25	9	2.3
11	Smoked ham	4	340	20	28	9	2.5
12	Pork, roast	4	340	19	29	9	2.5
13	Pork, simmered	4	355	19	30	9	2.4
14	Beef tongue	3	205	18	14	7	2.5
15	Veal cutlet	3	185	23	9	9	2.7
16	Bluefish, baked	2	135	22	4	25	0.6
17	Clams, raw	1	70	11	1	82	6.0
18	Clams, canned	1	45	7	1	74	5.4
19	Crabmeat, canned	1	90	14	2	38	0.8
20	Haddock, fried	2	135	16	5	15	0.5
21	Mackerel, broiled	2	200	19	13	5	1.0
22	Mackerel, canned	2	155	16	9	157	1.8
23	Perch, fried	2	195	16	11	14	1.3
24	Salmon, canned	2	120	17	5	159	0.7
25	Sardines, canned	2	180	22	9	367	2.5
26	Tuna, canned	2	170	25	7	7	1.2
27	Shrimp, canned	1	110	23	1	98	2.6

Units (types of food) were manually clustered in six clusters, represented by colors

clams and crabs / cyan,
fish / blue,
beef / magenta,
pork / red,
lamb / yellow,
chicken / white.

The two main clusters

{1, 2} - sea-food, and
{3, 4, 5, 6} - meat

are represented by shape (cube, sphere).

Since the full advantage of VRML can be grasped only using VRML browser we strongly recommend the reader to visit the HTML/VRML version of this paper at:

http://vlado.fmf.uni-lj.si/vrml/paris.97/

Software for producing 3D representations of multivariate data in VRML is available at:

http://vlado.fmf.uni-lj.si/pub/vrml/

2.1 Planets

The simplest presentation of multivariate data is a presentation using planets: three selected variables are shown using positions in the space. Additional information can be represented by glyphs that represent units.

In presentation of food types in Figure 6 the positions in the space are determined by first three principal components. Different views can show interesting relations in data. For example, the positions of glyphs representing clams and crabs suggest that our decision to put them in the same group was not appropriate. Groups of similar types of food can be easily noticed in both pictures.

Planets (VRML)

**Figure 6.** Planets.

2.2 Stars

The use of stars is an alternative possibility to present multivariate data. Each variable is represented using the length of the corresponding ray of the star. We can also use different colors for different rays.

In Figure 7 positions in the space are again determined by the first three principal components. If we look at the pictures we can see that the shapes of the stars explain their positions in the space (or vice versa) - stars that are closer are more similar than the others.

Stars (VRML)

**Figure 7.** Stars.

In Figure 8 the two main clusters (sea-food - left side, meat - right side) are represented using Tukey stars. We can easily see main differences between them - low level and small variation of Fat and Energy in fish cluster, and of Calcium in meat cluster. Tukey stars can be, by introducing appropriate glyphs, used also for representing groups of units described by all three types of variables (nominal, ordinal, numeric).

Fish (VRML) Meat (VRML)

**Figure 8.** Tukey Stars.

2.3 3D Histograms

We can represent multivariate data also using 3D histograms. In presentation in Figure 9 the first two principal components determine the positions in the plane (value 0); the standardized variable Fat determines the height of corresponding column; six clusters are represented by color, and the main two clusters by shape of the column (prism, cylinder).

3D Histogram (VRML)

**Figure 9.** 3D Histogram.

2.4 3D Dendrograms

Hierarchical clustering is often used in data analysis. The process of fusing can be shown using dendrograms. We can combine this method with principal components. The first two principal components determine the position of a unit in the plane. Units are then joined using 3D dendrogram according to hierarchical clustering algorithm.

In this way we can find some similarities between the results of both methods (see Figure 10): units that are closer (according to principal components) are joined earlier than the others.

3D Dendrogram (VRML)

**Figure 10.** 3D Dendrogram.

2.5 3D Time Series Spiral

In Figure 11 quarterly, seasonally unadjusted time series at 1964 prices Private consumer expenditure in Austria (billions of Austrian Schillings) (Thury 82) is represented by time series spiral.

In January 1978 a special purchase tax rate for luxury goods was to be introduced. Therefore, most consumers bought the durable goods, and above all cars, which they intended to purchase in the immediate future, at the end of 1977.

3D Time Series Spiral (VRML)

**Figure 11.** 3D Time Series Spiral.

3. Conclusion

In the paper we presented some general ideas on data visualization and some examples of visualization of multivariate data. On this basis different kinds of programs for multivariate data visualization can be developed - from simple transformers of multivariate data to their VRML descriptions, to a visual data exploration system, based on some powerful 3D-graphic library (OpenGL, Direct3D, ...), combined with other data analysis methods.

References

Ames A.L., Nadeau D.R., Moreland J.L.: The VRML Sourcebook. Wiley, New York, 1996.
Baker M.P., Wickens C.D.: Human Factors in Virtual Environments for the Visual Analysis of Scientific Data. draft, 1995. http://monet.ncsa.uiuc.edu/~baker/PNL/paper.html
Batagelj V., Mrvar A.: Trirazsezne predstavitve podatkov (3D Data Presentations). Proceedings of DSI'96, Portoroz, April 17-24, 1996, p. 427-432.
Bell G., Parisi A., Pesce M.: The Virtual Reality Modeling Language. Version 1.0 Specification. http://www.sdsc.edu/vrml_repository/Archives/vrml10-3.html
Brown J.R., Earnshaw R., Jern M., Vince J.: Visualization: Using Computer to Explore Data and Present Information. Wiley, New York, 1995.
Dillon W.R., Goldstein M.: Multivariate Analysis: Methods and Applications. Wiley, New York, 1984, p. 191-202.
Hartigan J.A.: Clustering Algorithms. Wiley, New York, 1975, p.86.
inXight: VizControls Technology. A Xerox New Enterprise Company, 1996. http://www.inxight.com/products/visual/overview.shtml
Lea R., Matsuda K., Miyashita K.: Java for 3D and VRML Worlds. New Riders, Indianapolis, 1996.
Pad++: Portal filtering and 'magic lenses'. http://www.cs.unm.edu/pad++/lenses.html
Thury G.: Modelling Consumer Expenditure by Intervention Analysis. TIME SERIES: Theory and Practice 1; O.D. Anderson (editor). North Holland, 1982, p. 308.
Tukey J.W.: Exploratory Data Analysis. Addison-Wesley, Reading, MA, 1977.
VRML in Chemistry: Vollhardt H., Moeckel G., Henn C., Teschner M., and Brickmann J.: VRML for the Communication with 3D Scenarios of Biomolecules. http://ws05.pc.chemie.th-darmstadt.de/vrmlG/
Warnecke J.: The Inventor Mentor. Addison-Wesley, Reading, MA, 1994.
Wehrend S., Lewis C.: A Problem-Oriented Classification of Visualization techniques. In Proceedings of IEEE Visualization'90, 1990, p. 139-143.
Young F.W., Edds T., Kent D., Kuhfeld W.F.: Visual Exploratory Data Analysis. In: Classification as a Tool of Research, Proceedings of the 9th Annual Meeting of the Classification Society (F.R.G), University of Karlsruhe, F.R.G., 26-28 June, 1985. edited by W. Gaul and M. Schader.