Go to content

Data Quality Management in the context of Big Data

Logo Datenqualit _t Klein

Nowadays, rapidly growing volumes of structured and unstructured data (i.e. Big Data) make data quality a highly relevant topic. For instance, large volumes of unstructured data from different, distributed sources in diverse formats are collected and analyzed (often in real time) to derive business-relevant insights and to support business decisions. To ensure well-founded and reliable results, the quality of the underlying data is indispensable, which has also been explicitly highlighted by current surveys and studies (Economist Intelligence Unit 2011, IBM Institute for Business Value 2012, SAS Institute Inc. 2013).

Even though ensuring the quality of data is extremely important in today’s information age, according to a recent survey (Packowski und Gall 2013), 60 % of the respondents see an enormous backlog with regard to the assessment, management and improvement of data quality in their companies. Recent surveys (e.g., IBM Institute for Business Value 2012) emphasize that data quality is of utmost priority, especially for unstructured and distributed data. The reason is that insufficient data quality results in incorrect findings that lead to wrong decisions, doing more damage than good. Thus, the well founded and adequate assessment, management and improvement of data quality generate a substantial benefit and as a result a strong competitive advantage for companies.

More precisely, in our research, we develop and evaluate quantitative methods and models to assess, manage and improve data quality. Our research objectives can be classified as follows:

  1. Development of approaches for assessing the quality of structured and unstructured data: We develop efficient quantitative approaches for assessing the quality of data values considering data quality dimensions such as correctness, consistency, currency, completeness, and unambiguity. Due to the large data volumes companies face, these approaches need to be applicable in an automated way. Moreover, they need to be suitable for different data formats (e.g. structured and unstructured data) as well as for distributed data (e.g., internal and external data).
     
  2. Development of data and text mining methods considering data quality: To analyze structured and unstructured data, we develop and enhance data and text mining methods in order to directly take into account the assessed data quality level (cf. 1.). Thereby, taking into account data quality does not only result in considerable changes of the derived results (e.g., document classification or assignment to clusters), which highlights the fact that not considering data quality will lead to wrong decisions. Rather, the quality of the results is derived and reported in dependence of the quality of the input data in order to enable a better decision support.
     
  3. Economic management and evaluation of data quality measures: We develop decision support approaches to enable a cost-benefit evaluation of data quality measures in the context of Big Data. Due to the characteristics of Big Data, both the benefits and the costs have to be determined in an automated way. The main benefit from the application of data quality improvement measures is the better decision quality due to higher data quality. However, both the particular application context and the applied BI method play an important role here.
     
  4. Ensuring data quality in the context of Big Data considering IT security issues: In the context of Big Data, data from a plethora of application fields and data sources (e.g., internet platforms, Social Media as well as mobile devices and services) are considered, integrated and analyzed. Thus, in addition to traditional data quality dimensions (e.g., completeness, correctness, consistency and currency), special emphasis is placed on analyzing how the availability, reputation, and integrity can be ensured in the context of Big Data and how IT security and data privacy issues can be adequately considered.

 

Selected Publications:


  1. HOMEPAGE UR
  2. Informatics and Data Science

Chair for Information Systems II

Prof. Dr. Bernd Heinrich

 

 

Team Rechts
Secretary

Tel.: +49 (0)941 943-6101
Fax:  +49 (0)941 943-6120
E-Mail
Universitätsstraße 31
93053 Regensburg

Room VielberthBuilding 3.36
Secretary is open afternoons