RESOURCEFUL-2026

Institute of Computer Science, University of Tartu, Estonia

Bio

Mark Fishel is a professor of natural language processing at the Institute of Computer Science, University of Tartu, Estonia. After defending his PhD in 2011, he was a post-doctoral researcher at the University of Zurich till 2015. Since then, he is the head of chair of NLP at the University of Tartu, doing research, collaboration, teaching and supervision. His main topics of research are machine translation and multilingual language models, low-resource methods in text and speech processing as well as model uncertainty and interpretability. He is a PI at the Estonian Center of Excellence in AI and a member of the Estonian Language Council.

Talk: Translating and modelling under-resourced languages and dialects, and how (not) to do it

Most languages and dialects of the world have few speakers, low amounts of text and speech resources, as well as lack support in existing language and speech processing tools. Interestingly, these three measures do not necessarily correlate. This talk will highlight the challenges and pitfalls of working with extremely under-resourced languages and dialects, including the diversity in how these are used and spoken -- and how, when modelling them with NLP, ignoring this diversity can actively harm the language itself. I will present our work spanning the last 5 years on collecting resources and developing language, translation and speech models for languages and dialects of the Finno-Ugric family, members of which range from mid-resourced languages with millions of speakers to extremely under-resourced and critically endangered cases. This ongoing work has been a source of diverse experience, which can be transferred to other language families and similar efforts.

Tiago Torrent

Federal University of Juiz de Fora, Brazil

Bio

Tiago Torrent is a Professor of Linguistics at the Portuguese Language and Linguistics Department of the Federal University of Juiz de Fora, in Brazil. He earned a doctorate degree in Cognitive Linguistics from the Federal University of Rio de Janeiro with a Visiting Researcher period at the Linguistics Department of the University of California, Berkeley. Since 2010, he is the head of the FrameNet Brasil Computational Linguistics Lab, conducting research on Multimodal NLP and AI with Semantic Frames and Grammatical Constructions. He was awarded the ABRALIN Language Technology and Innovation Prize by the Brazilian Linguistics Association, and is a Research Productivity Grantee of the Brazilian National Research Council for Scientific and Technological Development.

Talk: Interpretability at Scale: Resources meet AI for Societally Grounded Applications

The recent success of Large Language Models (LLMs) across a wide range of canonical Natural Language Processing (NLP) tasks has contributed to the perception that such models may, in principle, obviate the need for costly, labor-intensive linguistic resources. However, for both fine-grained language analysis and societally grounded applications, reliance on LLMs alone often falls short in terms of transparency, interpretability, and methodological accountability. In this talk, I argue that LLMs and curated linguistic resources should be viewed not as competing paradigms, but as complementary components of a scalable and interpretable AI ecosystem. I explore how their integration can support the expansion of human-curated, fine-grained semantic analysis while preserving analytical rigor. In particular, I present ongoing work on the semi-automation of frame-based semantic annotation, aimed at building an AI systems that detect underreporting and provide early warnings of gender-based violence in electronic medical records. I conclude by outlining a set of methodological guidelines for enhancing the scalability of linguistic resources without compromising interpretability, with broader implications for the development of responsible and societally impactful NLP systems.

Maria Gavriilidou

Institute for Language and Speech Processing (ILSP) / Athena Research Center, Athens, Greece

Bio

Maria Gavriilidou is a linguist, senior researcher at the Institute for Language and Speech Processing (ILSP) / Athena Research Center, Athens, Greece. Her research interests focus on the areas of corpus linguistics, digital language resources and metadata, infrastructures for language resources and language technologies, computational and electronic lexicography. She is deputy Director of the CLARIN:EL Infrastructure for Language Resources and Technologies, while she has also served as the Head of the Electronic Lexicography Department of ILSP. She has coordinated several European and national projects in the above-mentioned fields. She has taught Electronic Lexicography at various Postgraduate Courses at Universities of Athens and supervised numerous theses. She has published widely in conferences and journals, while she has also served as the scientific lead of various printed and electronic dictionaries, among which the reference dictionary used in secondary education in Greece.

Talk: Language resources infrastructures in the LLMs era: role, scope, correlation and future perspectives

Language resources infrastructures act as aggregators and distributors of language data and related processing tools and services, either to the research community or to the industry. Infrastructures collect, document, curate and distribute certified digital language resources, tools and language processing services. They serve communities mainly specializing in Language Studies, Language Technology, Digital Humanities, Social and Political Sciences, which have been highly affected by the widespread prevalence of AI and specifically LLMs. I will discuss if and how the appearance of LLMs has affected language resource infrastructures, and if their role and scope need to change in order to remain true to their mandate and to successfully serve the best interest of their users, while at the same time adapting to the scientific evolutions of the field.