Workshop description

The workshop is a continuation of the workshop series RESOURCEFUL, focusing on RESOURCEs and representations For Under-resourced Languages. The main goal of the workshop is to continue exploring the role of the type and quality of resources that are available to computational linguists as well as challenges and directions for constructing new resources in light of the latest trends in natural language processing, computational linguistics and artificial intelligence.

On the one hand, data-driven machine learning techniques in natural language processing have achieved remarkable performance in many tasks, but to do so, large quantities of quality data (mostly text) are required. One question that has been raised is whether text-only data is enough to capture semantics or other modalities such as images, sounds, situated context or embodiment are required. Interpretability studies of large language models have revealed that even with large datasets the models still do not cover all the contexts of human social activity and are prone to capturing unwanted bias where data is focused towards only some contexts. Collecting, managing and understanding linguistic data in the age of machine learning is challenging and different tools are required to address these questions.

On the other hand, expert-driven annotator-based resources have been constructed over the years based on theoretical work in linguistics, psychology and related fields and a large amount of work has been done both theoretically and practically. One challenge is understanding to what degree such resources which have traditionally been aimed at rule-based natural language processing approaches are relevant today for both machine learning techniques and neuro-symbolic methods. Both types of resources are used by computational linguists. How can they be adapted for one another? To what degree can data-driven approaches be used to facilitate expert-driven annotation? What are the current challenges for expert-based annotation and data-driven methods? How can crowdsourcing and citizen science be used in building resources? How can we evaluate and reduce unwanted bias? What is required from each type of resource to evaluate how machine learning techniques work for different linguistic tasks?

Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, psychology, computational linguistics, speech, computer science, machine learning, computer vision etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and presentations of of on-going or completed research.

Submission

To be announced!

Important dates

To be announced!

Organising team

Špela Arhar Holdt, University of Ljubljana, arhar.spela@gmail.com
Micaella Bruton, Uppsala University, micaella.bruton@ling.su.se
Nikolai Ilinykh, CLASP, University of Gothenburg, nikolai.ilinykh@gu.se
Crina Madalina Tudor, Uppsala University, crina.tudor@ling.su.se
Iben Nyholm Debess, The University of the Faroe Islands, IbenND@setur.fo
Barbara Scalvini, Leiden University, barbaras@setur.fo

Partner organisers

To be announced!

Contact

E-mail: resourceful-2025 at listserv dot gu dot se