D2.8 Methods Paper

Jan Hrubý, Tomáš Pošepný, Jakub Krafka, Bence Toth, Jiří Skuhrovec

This document is a final methodological paper of WP2 of the DIGIWHIST project. It describes how the final database (DB henceforth) was developed starting with a high level description of each public procurement source that was processed, continuing with a description of the processes that led to the development of a structured database, followed by the processes involved in linking related data and creating a final database based on the
linked data. The last chapter contains the description of performance indicators (transparency, corruption risks and administrative capacity) and the conversion of the DIGIWHIST data template to the Open Contracting Data Standard (OCDS henceforth).

This methodology report describes the following steps in data processing:

● Data download – collection of HTML, XML, CSV and other content from government
● Structuring data – conversion of each publication from its original format to a
uniform structured data template
● Formatting data – conversion of structured text to standard data types (numbers,
dates, enumeration values) including cleaning nonsensical values or ballast
● Linking related information – grouping information which describes one real world
tender together
● Data merging – putting information from all linked data records together to create
one final image of a public tender covering its whole tendering cycle

Within DIGIWHIST, 25 public procurement data sources were processed covering all 34
jurisdictions listed in the Grant Agreement. This total number consists of:

● 21 national web portals or open data sources
● Archives for UK, CZ
● Project partner’s DB of older Hungarian tenders

The comments are closed.