Visual Analytics of Large Homogeneous Data

Homogeneous multivariate data encompass multiple variables that have the same semantics. As example, these variables can represent the probabilities for a sample to belong to different classes, or item memberships of multiple sets.

With a large number of items, such homogeneous data tables become very rich of information that explains how the row entities are related to the different column variables, and how the columns are related to each other according to their relationships with the rows.

This project aims to develop visualization methods for analyzing homogeneous multivariate data. These methods should allow analyzing and selecting the row entities based on their relations with the different columns. Moreover, they emphasizes the column variables and the relations between them as the central part of the visualization, and allows analyzing these relations based on the row entities defining them.

Data

Homogeneous data {f_ij} that are defined as follows:

F: E × C → R⁺: (e_i, c_j) → f_ij and m = |C| ≪ n = |E|

where:

E is the relatively large set (thousands) of row items that typically represent single entities (individuals, samples, ..).
C is a relatively small set (tens) of column items that typically represent classes, labels, tags, or categories.
F is a bivariate function whose values f_i_j define how the row items (entities) are related to the column items (classes).

In addition to the relationships with different classes, the entities E can also have a set of l numerical or categorical attributes {A_k}:

A_k : E → S_k: e_i → A_k(e_i) = a_ik and 1 ≤ k ≤ l

Examples for real-world data that can be modeled using this class of matrix data are:

Item-Class Probabilities: Fuzzy classifiers compute the probability f_ij ∈ [0, 1] that an item e_i ∈ E belongs to class c_j ∈ C. The probabilities computed for the same item e_i with all different classes C sum up to 1.
As an example, the items E can be a large set of sample images that represent handwritten digits. The classes C represent the digits. The value f_ij indicates the probability computed by the classifier that image e_i represents the handwritten digit c_j. In addition, each image i has a set of attribute values {A_k} that represent classification features extracted from this image.
Point-Set Memberships: Matrix data of this kind record how a large set of items E belong to a small number of non-disjoint subsets C. The binary value f_ij ∈ {0,1} denotes whether e_i ∈ c_j holds.
As an example, the matrix data can denote how a large number of movies E belong to small number of genres C. A movie can belong to multiple genres and has attributes such as release date or director.
Large Contingency Tables: A two-way contingency table records the frequency of observations f_ij ∈ ℕ for each combination of categories (e_i, c_j) ∈ E×C of two categorical variables. The frequencies typically represent a statistic of each of the entities E computed for each of the columns C.
As an example, E can be a large set of books, C a set of countries, and f_ij represents the purchases of book e_i ∈ E in country c_j ∈ C. In addition, these books can have a set of attributes {A_k} such as release date, author(s) and publisher(s).

Tasks

The tasks addressed in this project revolve around pattern discovery in large matrix data of the class described above:

T1: Analyze the relations r_ijbetween the row entities E and the columns C, in the light of the attribute values a_ik.
T2: Analyze the similarity rc_j1j2between columns based on their relations with the row entities.

Users

Domain Expert (the same domain the data and the tasks come from) with sufficient background in data analysis.

A multi-level overview+detail exploration environment provides access to the matrix data f_ij the attribute values a_ik and any raw data aggregated in the matrix.

Several selection mechanisms allow marking interesting parts of the data.

Data Level

The data presented by the visualization methods are the homogeneous data f_ijof the class described above (with focus on the associations r_ijbetween the row entities E and the columns C of the data table).

Task Level

When performing task T1, the visualization is augmented with one of the attributes A_k to analyze the row-column associations in the light of the its values a_ik.

When performing task T2, the visualization is augmented with with the column similarities rc to find out which columns exhibit similar associations with the rows.

Presentation Level

The visualization methods combine familiar visual representations to gain insights in the data, such as ring charts, histograms, stacked bar charts, star graphs, and arcs.

The row-column associations r_ijand the column similarities rc_j1j2are computed using automated methods.

Depending on what the data represent, and on the tasks to be solved, these methods can employ different statistical or machine learning techniques.

Visual Analytics

	Hendrik Strobelt, Bilal Alsallakh, Joseph Botros, Peterson Brant, Mark Borowsky, Hanspeter Pfister, Alexander Lex, "Vials: Visualizing Alternative Splicing of Genes", IEEE Transactions of Visualization and Computer Graphics , vol. 22, pp. 399-408, 2016. paper Problem description
	Margit Pohl, Florian Scholz, Simone Kriglstein, Bilal Alsallakh, Silvia Miksch, "Evaluating the Dot-Based Contingency Wheel: Results from a Usability and Utility Study", HCI International, pp. 76--86, 2014.
	Bilal Alsallakh, Luana Micallef, Wolfgang Aigner, Helwig Hauser, Silvia Miksch, Peter Rodgers, "Visualizing Sets and Set-typed Data: State-of-the-Art and Future Challenges", Eurographics conference on Visualization (EuroVis)– State of The Art Reports, pp. 1-21, 2014. Survey Browser
	Bilal Alsallakh, Allan Hanbury, Helwig Hauser, Silvia Miksch, Andreas Rauber, "Visual Methods for Analyzing Probabilistic Classification Data", IEEE Transactions on Visualization and Computer Graphics, vol. 20, pp. 1703--1712, 2014. Examples with public classification datasets The visual and interaction metaphors
	Bilal Alsallakh, Silvia Miksch, Andreas Rauber, "Towards a Visualization of Multi-faceted Search Results", Workshop on Knowledge Maps and Information Retrieval (KMIR), vol. 1311, pp. 4, 2014.
	Bilal Alsallakh, "Visual Analytics of Large Homogeneous Data - Categorical, Set-typed, and Classification Data", Institute of Software Technology & Interactive Systems, vol. PhD in Computer Science, pp. 165, 2014.
	Bilal Alsallakh, Wolfgang Aigner, Silvia Miksch, Helwig Hauser, "Radial Sets: Interactive Visual Analysis of Large Overlapping Sets", IEEE Transactions on Visualization and Computer Graphics (Proceedings of InfoVis), vol. 19, pp. 2496-2505, 2013. More details and online demos The visual and interaction metaphors
	Bilal Alsallakh, Wolfgang Aigner, Silvia Miksch, Eduard Gröller, "Reinventing the Contingency Wheel: Scalable Visual Analytics of Large Categorical Data", IEEE Transactions on Visualization and Computer Graphics (Proceedings of IEEE VAST 2012), vol. 18, pp. 2849-2858, 2012. Visual and interaction metaphors of Contingency Wheel++ (5 min)
	Bilal Alsallakh, "Visual Analytics of Large Multivariate Matrix Data", Poster: IEEE VisWeek Doctoral Colloquium, 2012. Fast Forward @ VisWeek 2012
	Simone Kriglstein, Florian Scholz, Margit Pohl, Bilal Alsallakh, Silvia Miksch, "Evaluating the Dot-Based Contingency Wheel: Results from an Interview Study", Technical Report, pp. 10, 2012.
	Bilal Alsallakh, Eduard Gröller, Silvia Miksch, Martin Suntinger, "Contingency Wheel: Visual Analysis of Large Contingency Tables", Proceedings of the International Workshop on Visual Analytics (EuroVA), pp. 53-56, 2011. Visual Metaphor

Publications