Visual Analytics of Large Homogeneous Data - Categorical, Set-typed, and Classification Data

TitleVisual Analytics of Large Homogeneous Data - Categorical, Set-typed, and Classification Data
Publication TypePhD Thesis
Year of Publication2014
AuthorsAlsallakh, B.
AdvisorMiksch, S., and H. Hauser
Academic Department Institute of Software Technology & Interactive Systems
UniversityVienna University of Technology
DegreePhD in Computer Science
Thesis Typecumulative
Number of Pages165
Date Published09/2014
CityVienna
Keywordscategorical data, classification data, set-typed data, Visual analytics
Abstract

A multidimensional data set is homogeneous when the dimensions have the same nature. For instance, these dimensions can represent the probabilities for a sample to belong to different classes, or item memberships of multiple sets. Such data appear very often in different domains to describe how a relatively large number of items are related to a relatively small number of classes or categories. For examples, a homogeneous data set might record which genes (rows) appear in which individuals (columns), or how many times books (rows) are sold in different countries (columns). Analyzing these relations reveals several patterns in the data such as genes that are observed frequently or never together, or books that sell mostly in a specific country. Both automated methods and visualization have been applied to analyze homogeneous data. However, state-of-the-art visualization techniques are lacking either in scalability with the number of data points or in addressing the specific nature of different classes of homogeneous data, and the tasks associated with them. In this dissertation, I propose novel visual metaphor and interactive exploration environment for analyzing large homogeneous data. The proposed wheel metaphor allows analyzing and selecting the data points based on their relations with the different dimensions. Moreover, it emphasizes the dimensions and the relations between them as the central part of the visualization, and allows analyzing these relations based on the data points defining them. The proposed interactive exploration environment allows analyzing different aspects of the data at multiple levels of detail. I illustrate how the proposed approach can be applied to analyze three classes of homogeneous data: set-typed data, probabilistic classification data, and categorical data. Each class has its own characteristics that imply specific requirements and tasks. These different tasks are supported by the proposed approach, thanks to its flexibility and extensibility. I demonstrate the applicability of my approach by means of usage scenarios and case studies with various datasets from multiple domains. Also, both user studies and interviews with domain experts were conducted to assess the utility of the proposed methods. The major advantages of the proposed visual metaphor is its scalability in the number of data points, thanks to dedicated aggregation methods for homogeneous data, and to the rich sets of interactions it supports to select the data based on a variety of criteria. The major disadvantages are the complexity of the visual metaphor that requires sufficient user training, the limited scalability in the number of dimensions, and the low sensitivity to small differences in the data being analyzed. Nevertheless, the wheel metaphor is suited to gain an overview of large homogeneous data, with complementary analytical methods, interactions, and coordinated views being used to cope with the limitations. As a result, novel analysis possibilities and insights in the data are possible, beyond state-of-the-art techniques.

Notes

Reviewed by Silvia Miksch (Vienna University of Technology), Helwig Hauser (University of Bergen), and John Stasko (Georgia Institute of Technology)

URLhttp://www.cvast.tuwien.ac.at/~alsallakh/dissertation.pdf
Short TitleVisual Analytics of Large Homogeneous Data
Funding projects: 
CVAST