Visual Summary of a Data Set

Problem: 

Suitable data formats and high data quality are essential prerequisites for almost any form of data analysis and exploration. However, in reality, there is hardly any real-world data set that would not contain wrong or missing data.

As a first step towards good data quality, the data is checked for various quality problems. Found quality problems, then, need to be communicated to the user.

Aim: 

The candidate will implement a tool to analyze a data set (data table) and produce a short interactive visual summary: number of rows and columns, format of each column (string, double, etc.), deviation of values of numerical columns, length of strings, etc. Moreover, the summary should also communicate amount and kind of previously identified quality problems of the data. The visual summary should be a mixture of text and charts, interactively connected to the original data table.

Topics: 
Visual Data Analysis, Data Quality, Data Profiling
Previous knowledge: 
Java, optional: prefuse
Scope: 
BA
Scope: 
PR
Scope: 
MA
Assigned as: 
Bachelor thesis/Bakkalaureatsarbeit
Contact: 
Theresia Gschwandtner, by appointment, gschwandtner [at] ifs.tuwien.ac.at
Area: 
Visual Analytics (VA)
Status: 
in progress