Data Integration in the Timber Industry

The BFH Linked Data group was commissioned by the Wood and Forestry Division of the Federal Office for the Environment (FOEN) to publish timber industry data as Linked Data.

Factsheet

Situation

The Wood and Forestry Division commissioned the BFH project team to integrate twelve division-specific surveys into LINDAS. The objective was, on the one hand, to prepare for the future digital publication of the Forestry and Timber Yearbook using interactive visualizations and, on the other hand, to sustainably strengthen the data as a reusable and interoperable resource within and beyond the federal administration. Due to limited resources, the lack of stable source formats, and existing dependencies on other administrative units, a prioritization of the surveys was carried out during the course of the project.

Course of action

In coordination with the client, a methodology strongly based on automated data pipelines was selected. This decision was made in light of the fact that data integration was conceived not as a one-time task, but as a recurring process. Automated pipelines enable an iterative approach with minimal effort required for adjustments (e.g. translations or migration to shared dimensions). The tool pylindas, developed by the Environmental Data Division, was used to create the data pipelines. It is based on Python and enables the implementation and maintenance of automated integration processes, provided that users possess the necessary basic knowledge. LINDAS includes a GitLab instance that is used to create and execute data pipelines. This makes it possible to run the integration process entirely independently of the availability of locally installed tools on individual employees’ computers. In addition, the use of GitLab enables structured issue management, allowing open issues to be clearly identified and systematically addressed. In this way, GitLab also serves as a bridge between staff from the domain-specific and the data-management areas.

Result

Several key surveys — including the Forest Management Test Network, parts of the Swiss forestry statistics, the Forestry Economic Accounts, the wood processing survey, and the price indices — have been integrated into LINDAS and are therefore published on int.visualize.admin.ch. Data pipelines based on pylindas and the LINDAS GitLab environment were successfully established and tested. The selected methodology enables regular, resource-efficient updates and iterative improvements in the future (e.g. regarding metadata or dimensions).

Looking ahead

The selected approach requires that the client possesses the necessary domain-specific and technical competencies for the operation and maintenance of the pipelines. For future technical adaptations and updates, internal expertise should increasingly be leveraged. Should an additional role be introduced as part of the establishment of a role framework, the function of a Data Custodian would be particularly suitable to ensure the sustainable maintenance of the data and processes. The following steps were required for successful integration: preparation of the data in CSV format, compilation of metadata relating to the dataset as a whole (e.g. author, title, description, identifier), compilation of metadata relating to individual data points (e.g. measured variables, concepts of filter dimensions in all required languages).

This project contributes to the following SDGs

  • 11: Sustainable cities and communities