Visual Summary of a Data Set

Submitted by resi on Tue, 03/18/2014 - 11:53
Problem

Suitable data formats and high data quality are essential prerequisites for almost any form of data analysis and exploration. However, in reality, there is hardly any real-world data set that would not contain wrong or missing data.

As a first step towards good data quality, the data is checked for various quality problems. Found quality problems, then, need to be communicated to the user.

Aim

The candidate will implement a tool to analyze a data set (data table) and produce a short interactive visual summary: number of rows and columns, format of each column (string, double, etc.), deviation of values of numerical columns, length of strings, etc. Moreover, the summary should also communicate amount and kind of previously identified quality problems of the data. The visual summary should be a mixture of text and charts, interactively connected to the original data table.

Topics
Visual Data Analysis, Data Quality, Data Profiling
Previous knowledge
Java, optional: prefuse
Scope
BA
PR
MA
Assigned as
Bachelor thesis/Bakkalaureatsarbeit
Contact
Theresia Gschwandtner, by appointment, gschwandtner [at] ifs.tuwien.ac.at
Area
Visual Analytics (VA)
Status
in progress