Data Quality

Poor data quality leads to unreliable results of any kind of data processing and has profound economic impact. Although there are tools to help users with the task of data cleansing, support for dealing with the specifics of time-oriented data is rather poor. However, the time dimension has very specific characteristics which introduce quality problems, that are different from other kinds of data. To this end we tackle this important topic with Visual Analytics methods.

Data quality control can be divided into

  1. Data Profiling: identifying and communicating quality problems (e.g., w.r.t. specific Data Quality Metrics)
  2. Data Wrangling: transforming table formats or merging different sources
  3. Data Cleansing: correcting the found quality problems

Recent Publications

Problem Description: Data - Tasks - Users
Data: 

Since data quality is a problem in any domain, we consider multidimensional, time-oriented data in general.

Tasks: 

Data Profiling: identifying and communicating quality problems within the data

Data Wrangling: transforming data into another format that is suitable for further processing -- this may include merging and splitting of data entries, changing the formatting of data entries, merging two or more data tables, or augmenting the data with information from different sources

Data Cleansing: handling and correcting the identified quality problems

Users: 

data analysts of any domain

Modelling Time
Scale: 
ordinal
Scale: 
discrete
Scale: 
continuous
Scope: 
point-based
Scope: 
interval-based
Arrangement: 
linear
Arrangement: 
cyclic
Granularity & Calendars: 
single
Granularity & Calendars: 
multiple
Time Primitives: 
instant
Time Primitives: 
interval
Time Primitives: 
span