dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Authors Anne Helby Petersen, Claus Thorn Ekstrøm
Journal/Conference Name Journal of Statistical Software
Paper Category
Paper Abstract Data cleaning and validation are important steps in any data analysis, as the validity of the conclusions from the analysis hinges on the quality of the input data. Mistakes in the data can arise for any number of reasons, including erroneous codings, malfunctioning measurement equipment, and inconsistent data generation manuals. Ideally, a human investigator should go through each variable in the dataset and look for potential errors - both in input values and codings - but that process can be very time-consuming, expensive and error-prone in itself. We describe an R package, dataMaid, which implements an extensive and customizable suite of quality assessment aids that can be applied to a dataset in order to identify potential problems in its variables. The results are presented in an auto-generated, nontechnical, stand-alone overview document intended to be perused by an investigator with an understanding of the variables in the data, but not necessarily knowledge of R. Thereby, dataMaid aids the dialogue between data analysts and field experts, while also providing easy documentation of reproducible data quality screening. Moreover, the dataMaid solution changes the data screening process from the usual ad hoc approach to a systematic, well-documented endeavor. dataMaid also provides a suite of more typical R tools for interactive data quality assessment and screening, where the data inspections are executed directly in the R console.
Date of publication 2019
Code Programming Language R
Comment

Copyright Researcher 2021