Accepted materials

Oral talks

Lost in Translation: Repurposing Semantic Similarity Benchmarks for Evaluating Lexical-Semantic Consistency in LLM-Based Machine Translation
slides
Ye, Quin and Bloem, Jelke

Bridging the Low Resource Gap in Historical Cryptology: A Multilingual Diachronic Synthetic Dataset for Reproducible Cryptanalysis
slides
Bruton, Micaella and Beloucif, Meriem and Megyesi, Beáta

Cultural Grounding in Swedish: Extending an Everyday Knowledge Benchmark for LLMs
slides
Beloucif, Meriem and Sjons, Johan

Entity Linking for Faroese Using Large Language Models with Web Search
slides
Simonsen, Annika and Debess, Iben Nyholm and Einarsson, Hafsteinn

From Polyester Girlfriends to Blind Mice: Creating the First Pragmatics Understanding Benchmarks for Slovene
slides
Brglez, Mojca and Vintar, Špela

SdQuAD: A Benchmark Question Answering Dataset for Low-resource Sindhi Language
slides
Ali, Wazir and Rafay, Muhammad and Ali, Nadia and Rehman, Amar

LLMs as Assistants for Data Annotation: Addressing Disagreement and Supporting Expert Processes
slides
Andrade, Mark and Hefernan, Bláithín and Walsh, Abigail and Castilho, Sheila

Annotation Quality in Aspect-Based Sentiment Analysis: A Case Study Comparing Experts, Students, Crowdworkers, and Large Language Models
slides
Donhauser, Niklas and Fehle, Jakob and Hellwig, Nils Constantin and Weinberger, Markus and Kruschwitz, Udo and Wolff, Christian

Posters

Cross-Lingual Mathematical Reasoning in LLMs: Evaluating Performance on Icelandic vs. English Problems
poster
Einarsson, Hafsteinn

Struct2Unstruct: Creating Tender NER Datasets from Structured Procurement Records using Large Language Models
slides
poster
video
Abbas, Asim and Lee, Mark and Shanavas, Niloofer and Kovatchev, Venelin and Ali, Mubashir

Link Prediction for Event Logs in the Process Industry
poster 1
poster 2
Zhukova, Anastasia and Walton, Thomas and Lobmüller, Christian E. and Gipp, Bela

MultiZebraLogic: A Multilingual Logical Reasoning Benchmark
poster 1
poster 2
video 1
video 2
Bruun, Sofie Helene and Smart, Dan Saattrup

Progressing beyond Art Masterpieces or Touristic Clichés: how to assess your LLMs for cultural alignment?
poster
Branco, António and Silva, João and Marques, Nuno and Gomes, Luis and Campos, Ricardo and Sequeira, Raquel and Nerea, Sara and Silva, Rodrigo and Marques, Miguel and Duarte, Rodrigo and Putyato, Artur and Folques, Diogo and Valente, Tiago

Evaluating Large Language Model-based Natural Language Generation for Modular Dialog systems
poster
Emmerling, Vincent and Kowalski, Christoph and Robrecht-Hilbig, Amelie and Kopp, Stefan

JobResQA: Semi-Automatic Multilingual Benchmark Creation for LLM Machine Reading Comprehension on Résumés and Job Descriptions
Carrino, Casimiro Pio and Estrella, Paula and Zbib, Rabih and Escolano, Carlos and Fonollosa, José A. R.

Beyond English and Evasion: A Human-Annotated Multi-Domain Benchmark for High-Stakes LLM Safety Evaluation in Chinese
Zaghouani, Wajdi and Aldous, Kholoud K. and Gao, Yicheng

Evaluating the Impact of LLM-Assisted Annotation in a Perspectivized Setting: the Case of FrameNet Annotation
slides
poster
Belcavello, Frederico and Matos, Ely and Lorenzi, Arthur and Bonoto, Lisandra and Ruiz, Lívia and Pereira, Luiz Fernando and Herbst, Victor and Navarro, Yulla and Abreu, Helen de Andrade and Dutra, Lívia and Torrent, Tiago Timponi

A multilingual hallucination benchmark: MultiWikiQHalluA
slides
poster
Thoresen, Freja and Smart, Dan Saattrup

Exploring the similarities and differences between VLM-driven and traditional OCR for Historical Swedish Data
poster
Johansson, Martin and Waginder, Selma and Dannélls, Dana