News of the Institute of Media Informatics

PhD exposition/van Onzenoodt.: Guiding Visual Encodings through Data Dimensionality and Quality

Ulm University

Introduction of a dissertaion project | Wednesday, 28 September 2022, 10:00 am | O27/331

 

Christian van Onzenoodt, member of the research group Visual Computing gives an introdcution of his dissertation topic of the title .

Abstract: We are surrounded by data-collecting devices, be they industrial machines, self-driving cars, or simply cell phones. All of this data contains hidden insights that can be extremely valuable, such as identifying potential optimizations. The use of visualization allows for data exploration to find these insights. However, as the data dimensionality increases or quality decreases, the design of visualizations becomes more difficult but also more important. While one- or two-dimensional data can be mapped rather straightforwardly, data from higher dimensionality requires other techniques to encode the remaining dimensions, beyond the 2D canvas. Poor data quality, for example in the form of missing values, can add an additional dimension to the data that might need to be encoded as well. Therefore, the goal of this thesis is to derive guidance for visual encodings, based on data dimensionality and quality.

We distinguish three classes of data:

  1. One-dimensional data, which still allows for exploiting the remaining canvas dimension for optimizing encoding.
  2. Multi -dimensional data, where the data itself can not be mapped on a two- dimensional canvas anymore and therefore requires an additional encoding channel.
  3. Finally, to visualize high- dimensional data, complex mappings such as projections are required.

One common visualization technique for one- dimensional data is strip plots, using a dot -based encoding. However, these plots usually suffer from overdraw, especially as the number of data points increases. To address this issue, Blue Noise Plots serve as a novel technique to optimize point placement and minimize overdraw.

Multi-dimensional data is often visualized using techniques like scatterplots, scatterplot matrices, or parallel coordinates. When using scatterplots, categorial dimensions within the data are typically encoded into the points, e.g. by using different shapes. However, scatterplots also suffer from overdraw as the number of data points or their screen footprint, increases. Therefore, I present guidance for the selection of shapes that appear to suffer the least from overdraw. Another technique for visualizing multidimensional data is parallel coordinates, in which connected lines are drawn through vertical axes. However, if a data point is missing a value, the respective line can not be drawn. I present a series of possible encodings for missing values in parallel coordinates, enabling us to find those imperfections.

High-dimensional data is often projected down to two-dimensional embeddings. While projection techniques can preserve some of the high-dimensional patterns, others are lost. Glyphs like star- or flower-glyphs allow for visualizing single, high-dimensional data points but are typically not used in large numbers in such embeddings. I present the results of a series of studies, investigating the properties of these glyphs that encode the raw, high- dimensional data, in two-dimensional embeddings. This allows deriving guidance on which glyphs to use in these scenarios.

We are looking forward to numerous participation and fair discussions.