This is a DIGIWHIST project deliverable with the main goal of obtaining unstructured and semi-structured data from a pool of 100+ potentially relevant datasources where further data processing is assessed to be plausible. In order to determine the right strategy for data scraping, this also requires analysis of the architecture and structure of online data portals reporting data on public procurement tenders, private and public entities such as companies, public sector organisations’ budgets, asset declarations, and political office holders.
The main outputs in this document are therefore the technical evaluation of the quality of individual datasources and the collection of raw data from those chosen for further processing. This task required the prioritisation of the better-structured and more policy-relevant sources. The selected sources and thus also raw data outputs represent a solid basis for achieving the DIGIWHIST project’s research goals.