Visual Analytics of Large Homogeneous Data - Categorical, Set-typed, and Classification Data

Teaser Image
Author	Bilal Alsallakh
Advisor	Silvia Miksch Helwig Hauser
Keywords	Visual analytics set-typed data classification data categorical data
Abstract	A multidimensional data set is homogeneous when the dimensions have the same nature. For instance, these dimensions can represent the probabilities for a sample to belong to different classes, or item memberships of multiple sets. Such data appear very often in different domains to describe how a relatively large number of items are related to a relatively small number of classes or categories. For examples, a homogeneous data set might record which genes (rows) appear in which individuals (columns), or how many times books (rows) are sold in different countries (columns). Analyzing these relations reveals several patterns in the data such as genes that are observed frequently or never together, or books that sell mostly in a specific country. Both automated methods and visualization have been applied to analyze homogeneous data. However, state-of-the-art visualization techniques are lacking either in scalability with the number of data points or in addressing the specific nature of different classes of homogeneous data, and the tasks associated with them. In this dissertation, I propose novel visual metaphor and interactive exploration environment for analyzing large homogeneous data. The proposed wheel metaphor allows analyzing and selecting the data points based on their relations with the different dimensions. Moreover, it emphasizes the dimensions and the relations between them as the central part of the visualization, and allows analyzing these relations based on the data points defining them. The proposed interactive exploration environment allows analyzing different aspects of the data at multiple levels of detail. I illustrate how the proposed approach can be applied to analyze three classes of homogeneous data: set-typed data, probabilistic classification data, and categorical data. Each class has its own characteristics that imply specific requirements and tasks. These different tasks are supported by the proposed approach, thanks to its flexibility and extensibility. I demonstrate the applicability of my approach by means of usage scenarios and case studies with various datasets from multiple domains. Also, both user studies and interviews with domain experts were conducted to assess the utility of the proposed methods. The major advantages of the proposed visual metaphor is its scalability in the number of data points, thanks to dedicated aggregation methods for homogeneous data, and to the rich sets of interactions it supports to select the data based on a variety of criteria. The major disadvantages are the complexity of the visual metaphor that requires sufficient user training, the limited scalability in the number of dimensions, and the low sensitivity to small differences in the data being analyzed. Nevertheless, the wheel metaphor is suited to gain an overview of large homogeneous data, with complementary analytical methods, interactions, and coordinated views being used to cope with the limitations. As a result, novel analysis possibilities and insights in the data are possible, beyond state-of-the-art techniques.
Year of Publication	2014
Academic Department	Institute of Software Technology & Interactive Systems
Degree	PhD in Computer Science
Number of Pages	165
Date Published	09/2014
Thesis Type	cumulative
University	Vienna University of Technology
City	Vienna
URL	http://www.cvast.tuwien.ac.at/~alsallakh/dissertation.pdf
Short Title	Visual Analytics of Large Homogeneous Data
Refereed Designation	Refereed
Internal Projects	Visual Analytics of Large Homogeneous Data
Funding projects	CVAST - Centre for Visual Analytics Science and Technology
Download citation	Google Scholar BibTeX