This paper describes the methodology and current output of the data validation process led by Datlab. The purpose of the process has been to check various properties of DIGIWHIST data on public procurement and provide continuous feedback to project partners. Thus the paper will:
- Describe the validation methodology
- Provide an overview of the work done
- Summarize the current status of data quality
Datlab has used its extensive know-how in software development as well as public procurement data analytics to provide timely feedback to project partners. That led not only to a shift in data extraction strategy during the project, but more importantly towards transferring a large portion of procurement expertise towards technical staff. These steps have contributed largely to the good procurement data quality, which is paramount to the
However, judging from the experience with Czech and Slovak data, it takes years of work to actually fine-tune data extraction even from a single procurement source. In fact, such a job is never finished because of amendments to the legal framework and consequent changes to the data structure and terminology. Based on this experience, DIGIWHIST’s goals are ambitious, even if we only aim to achieve moderate quality data within the project. Note that since the DIGIWHIST team aims to further improve the quality of the data during the sustainability period, the data quality results are to be taken rather as state-of-the-art. In fact, the methodology has been designed in such a way, that validation is possible on an ongoing basis. That implies, that updated versions of source-specific validation reports can be published upon major data releases.