Accepted materials (archival and non-archival papers, abstracts; the order is random)

Talks

Automatic Validation of the Non-Validated Spanish Speech Data of Common Voice 17.0
slides
Hernández Mena, Carlos Daniel and Scalvini, Barbara and Lág, Dávid í

FoQA: A Faroese Question-Answering Dataset
slides
Simonsen, Annika and Nielsen, Dan Saattrup and Einarsson, Hafsteinn

Annotating Attitude in Swedish Political Tweets
slides
Lindahl, Anna

Voices of Luxembourg: Tackling Dialect Diversity in a Low-Resource Setting
slides
Hosseini-Kivanani, Nina and Schommer, Christoph and Gilles, Peter

The Application of Corpus-Based Language Distance Measurement to the Diatopic Variation Study (on the Material of the Old Novgorodian Birchbark Letters)
slides
Afanasev, Ilia and Lyashevskaya, Olga

Multi-label Scandinavian Language Identification (SLIDE)
slides
Fedorova, Mariia and Frydenberg, Jonas Sebulon and Handford, Victoria and Langø, Victoria Ovedie Chruickshank and Willoch, Solveig Helene and Midtgaard, Marthe Løken and Scherrer, Yves and Mæhlum, Petter and Samuel, David

OCR Error Post-Correction with LLMs in Historical Documents: No Free Lunches
slides
Kanerva, Jenna and Ledins, Cassandra and Käpyaho, Siiri and Ginter, Filip

Posters

WikiQA-IS: Assisted Benchmark Generation and Automated Evaluation of Icelandic Cultural Knowledge in LLMs
Arnardóttir, Þórunn and Bjartur Einarsson, Elías and Ingvarsson Juto, Garðar and Páll Helgason, Þorvaldur and Einarsson, Hafsteinn

Universal Dependencies Treebank for Uzbek
poster
Akhundjanova, Arofat and Talamo, Luigi

DUDU: A Treebank for Ottoman Turkish in UD Style
poster
Yılandiloğlu, Enes and Siewert, Janine

A Simple Audio and Text Collection-Annotation Tool Targeted to Brazilian Indigenous Language Native Speakers
Polleti, Gustavo Padilha and Cozman, Fabio and Gerardi, Fabricio

Fine-Tuning Cross-Lingual LLMs for POS Tagging in Code-Switched Contexts
Absar, Shayaan

First Steps in Benchmarking Latvian in Large Language Models
poster
Skadina, Inguna and Bakanovs, Bruno and Darģis, Roberts

On the Usage of Semantics, Syntax, and Morphology for Noun Classification in IsiZulu
Sayed, Imaan and Mahlaza, Zola and van der Leek, Alexander and Mopp, Jonathan and Keet, C. Maria

VerbCraft: Morphologically-Aware Armenian Text Generation Using LLMs in Low-Resource Settings
Avetisyan, Hayastan and Broneske, David

Post-OCR Correction of Historical German Periodicals using LLMs
Danilova, Vera and Aangenendt, Gijs

From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM
poster
Holdt, Špela Arhar and Antloga, Špela and Munda, Tina and Pori, Eva and Krek, Simon

Assessing the Similarity of Cross-Lingual Seq2Seq Sentence Embeddings Using Low-Resource Spectral Clustering
Moll, Nelson and Rabbani, Tahseen

"I Need More Context and an English Translation": Analysing How LLMs Identify Personal Information in Komi, Polish, and English
Ilinykh, Nikolai and Szawerna, Maria Irena

Federated Meta-Learning for Low-Resource Translation of Kirundi
Sang, Kyle Rui and Rabbani, Tahseen and Zhou, Tianyi

Second language Korean Universal Dependency treebank v1.2: Focus on Data Augmentation and Annotation Scheme Refinement
slides
Sung, Hakyung and Shin, Gyu-Ho

Recommendations for Overcoming Linguistic Barriers in Healthcare: Challenges and Innovations in NLP for Haitian Creole
Mompelat, Ludovic

Interpreting the UAS and the LAS of the parsing of Old English with Universal Dependencies
Martin Arista, Javier and Elvira Ojanguren López, Ana and Domínguez Barragán, Sara
Download Paper

This paper interprets, from a linguistic point of view, the Unlabelled Attachment Score (UAS) and Labelled Attachment Score (LAS) metrics obtained in the Universal Dependencies parsing of Old English. The study assesses the performance of three distinct training methods based on the Natural Language Processing library spaCy: a baseline pipeline, a pretrained model, and a transformer-based model (MobileBERT). Using datasets ranging from 1,000 to 20,000 words, the best-performing model (pretrained model with 20,000 words) achieved 83.2% UAS and 74.2% LAS. The model performs better at identifying structural relations than at labeling specific dependency relations. There is a consistent 9 point gap between UAS and LAS across the different structural levels, including the word, the phrase, the clause, and the complex sentence. While the model shows high accuracy in morphologically marked local relations and morphological feature recognition (often over 90%), its accuracy is lower with long-distance dependencies and complex syntactic structures. Particularly problematic areas include non-projective dependencies, fixed expressions, copulative constructions, and double object constructions. The conclusion is reached that improving parsing accuracy will require larger training datasets and a fine-grained analysis of complex syntactic relations that is compatible with the strong performance reached in morphological feature recognition.
Beyond a Means to an End: A Case Study in Building Phonotactic Corpora for Central Australian Languages
slides
Muradoglu, Saliha and Gray, James and Simpson, Jane Helen and Proctor, Michael and Harvey, Mark

Investigating Gender Bias for Turkish in Multilingual LLMs
Özçelik, Irem and Kurfali, Murathan
Download Paper

Towards an acoustically-validated phonetic corpus of spoken Swedish
O’Regan, Jim and Edlund, Jens
Download Paper