SwiLTra-Bench: The Swiss legal translation benchmark

July, 2025·

Joel Niklaus

Jakob Merane

Luka Nenadic

Sina Ahmadi

Yingqiang Gao

Cyrill A. H. Chevalley

Claude Humbel

Christophe Gösken

Lorenzo Tanzi

Thomas Lüthi

Stefan Palombo

Spencer Poff

Boling Yang

Nan Wu

Matthew Guillod

Robin Mamié

Daniel Brunner

Julio Pereyra

Niko Grupen

PDF Cite Dataset

Abstract

In Switzerland legal translation is uniquely important due to the country’s four official languages and requirements for multilingual legal documentation. However, this process traditionally relies on professionals who must be both legal experts and skilled translators—creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. Our systematic evaluation reveals that frontier models achieve superior translation performance across all document types, while specialized translation systems excel specifically in laws but under-perform in headnotes. Through rigorous testing and human expert validation, we demonstrate that while fine-tuning open SLMs significantly improves their translation quality, they still lag behind the best zero-shot prompted frontier models such as Claude-3.5-Sonnet. Additionally, we present SwiLTra-Judge, a specialized LLM evaluation system that aligns best with human expert assessments.

Type

Conference paper

Publication

Proceedings of the Annual Meeting of the Association for Computational Linguistics, 14894–14916

Last updated on July, 2025

Authors

Luka Nenadic

PhD Student

← Automated boilerplate: Prevalence and quality of contract generators in the context of Swiss privacy policies October, 2025

Multilingual scraper of privacy policies and terms of service March, 2025 →