Swiss Long Legal BERT Model

We will scrape legal text in German, French and Italian to pretrain a Swiss Long Legal BERT model capable of performing NLP tasks better in the Swiss legal domain.

Factsheet

Lead school Business School
Institute Institute for Public Sector Transformation
Research unit Digital Sustainability Lab
Funding organisation Others
Duration (planned) 15.12.2021 - 31.12.2022
Project management Prof. Dr. Matthias Stürmer
Head of project Joël Niklaus
Project staff Alperen Bektas
Veton Matoshi
Partner Schweizerisches Bundesgericht

Situation

We see a clear research gap that BERT models capable of handling long mul- tilingual text are currently underexplored (gap 1). Additionally, to the best of our knowledge, there is no multilingual legal BERT model available yet (gap 2). Tay et al. [2020b] present a benchmark for evaluating BERT-like models capable of handling long input and conclude preliminarily that BigBird [Zaheer et al., 2020] is the currently best performing variant.

Course of action

We thus propose to pretrain a BERT-like model (likely BigBird) on multi- lingual long text to fill the first research gap. To fill the second gap, we propose to further pretrain [Gururangan et al., 2020] this model on multilingual legal text.

Pretraining of a Swiss Long Legal BERT Model

Factsheet

Situation

Course of action

This project contributes to the following SDGs

Links