D2.6 Final Linked Database and related alghorithms

Jan Hrubý, Tomáš Pošepný, Jakub Krafka, Tomáš Mrázek, Marek Mikeš, Michal Říha, Jiří Skuhrovec

The purpose of deliverable D2.6 is to publish source codes of the whole DIGIWHIST data processing system and final DIGIWHIST database which is the result of processing:

● 25 public procurement data sources
○ TED + TED archive
○ Current procurement portal + archive for CZ, UK, HU
○ One source for SK, PL, ES, NL, FR, LV, PT, EE, GE, SI, IE, NO, CH, LT, HR, BG, RO

● 4 public officials data sources
○ http://everypolitician.org/
○ http://www.politicaldatayearbook.com/
○ http://rulers.org/
○ https://www.cia.gov/library/publications/world-leaders-1/index.html

● company database

● 3 budget data sources
○ UK, ES, CZ

The key component of the whole process is public procurement data crawling, structuring, formatting, linking and merging of linked records, covering 35 jurisdictions. It also includes integration with the above mentioned databases like company database, public officials database and budget database. This integration is represented in our final database by several tender related indicators like Tax haven indicator, Political connections indicator or Publication rate indicator.
Methodologically the process is described in other deliverables of WP2 of the DIGIWHIST project.

Attached is the revised version of 26.02.2018.

