Data Integration in the Timber Industry
The BFH Linked Data group was commissioned by the Wood and Forestry Division of the Federal Office for the Environment (FOEN) to publish timber industry data as Linked Data.
Factsheet
- Schools involved Business School
- Institute(s) Institute for Public Sector Transformation
- Funding organisation Others
- Duration (planned) 01.05.2025 - 31.01.2026
- Head of project Prof. Dr. Thomas Gees
-
Project staff
Prof. Dr. Thomas Gees
Dr. Benedikt Simon Hitz
Melanie Senn
Prof. Dr. Jan Thomas Frecè - Partner Bundesamt für Umwelt BAFU, Sektion Holzwirtschaft und Waldwirtschaft
- Keywords Linked Open Data, Data Integration, Lindas, Visualize, Data Management, Data Publication, Open Government Data
Situation
The Wood and Forestry Division commissioned the BFH project team to integrate twelve division-specific surveys into LINDAS. The objective was, on the one hand, to prepare for the future digital publication of the Forestry and Timber Yearbook using interactive visualizations and, on the other hand, to sustainably strengthen the data as a reusable and interoperable resource within and beyond the federal administration. Due to limited resources, the lack of stable source formats, and existing dependencies on other administrative units, a prioritization of the surveys was carried out during the course of the project.
Course of action
In coordination with the client, a methodology strongly based on automated data pipelines was selected. This decision was made in light of the fact that data integration was conceived not as a one-time task, but as a recurring process. Automated pipelines enable an iterative approach with minimal effort required for adjustments (e.g. translations or migration to shared dimensions). The tool pylindas, developed by the Environmental Data Division, was used to create the data pipelines. It is based on Python and enables the implementation and maintenance of automated integration processes, provided that users possess the necessary basic knowledge. LINDAS includes a GitLab instance that is used to create and execute data pipelines. This makes it possible to run the integration process entirely independently of the availability of locally installed tools on individual employees’ computers. In addition, the use of GitLab enables structured issue management, allowing open issues to be clearly identified and systematically addressed. In this way, GitLab also serves as a bridge between staff from the domain-specific and the data-management areas.
Result
Several key surveys — including the Forest Management Test Network, parts of the Swiss forestry statistics, the Forestry Economic Accounts, the wood processing survey, and the price indices — have been integrated into LINDAS and are therefore published on int.visualize.admin.ch. Data pipelines based on pylindas and the LINDAS GitLab environment were successfully established and tested. The selected methodology enables regular, resource-efficient updates and iterative improvements in the future (e.g. regarding metadata or dimensions).
Looking ahead
The selected approach requires that the client possesses the necessary domain-specific and technical competencies for the operation and maintenance of the pipelines. For future technical adaptations and updates, internal expertise should increasingly be leveraged. Should an additional role be introduced as part of the establishment of a role framework, the function of a Data Custodian would be particularly suitable to ensure the sustainable maintenance of the data and processes. The following steps were required for successful integration: preparation of the data in CSV format, compilation of metadata relating to the dataset as a whole (e.g. author, title, description, identifier), compilation of metadata relating to individual data points (e.g. measured variables, concepts of filter dimensions in all required languages).