RESOURCEFUL-2025

Co-located with the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), Tallinn, Estonia.

Important information for participants (to be continuously updated)

Room name: LÄÄNE-EUROOPA, first floor from the lobby entry, https://www.hestiahotels.com/europa/en/conference/ruum/
Poster Session Location: 1st floor (ground floor is E floor) from the lobby area near the registration desk.

Posters can be mounted starting early morning on March 2. There are mounting materials provided. Poster stands have a workshop name on them, feel free to place your poster on any stand with RESOURCEFUL as a name.
Posters should be removed by the end of the day on March 2.

Registration is open right at the lobby of the venue.

Some lunch options:

Lunch is not provided on the workshop days
The conference venue has a restaurant, that possibly has a soup buffet; there is also a general menue (but it could get too crowded and slow)
Rotermanni kvartal (approximately 250 meters away) has many good restaurants
Vapiano
Nautica shopping centre (approximately 50 meters away)
Port area: Resturant Armudu, Kochi Aidad, etc.

We can issue a certificate of attendance and/or a certificate of presentation for RESOURCEFUL 2025 upon request. If you need one, please email resourceful [at] listserv [dot] gu [dot] se. In your email subject, specify either “RESOURCEFUL 2025 – Certificate of Attendance Request” or “RESOURCEFUL 2025 – Certificate of Presentation Request”. Additionally, include the names of the authors, the title(s) of the paper(s), and the presentation mode in your email. Describe any additional requests in your email. For certificates of attendance or presentation for NoDaLiDa 2025, please contact the local organisers of NoDaLiDa 2025.

The proceedings are now available in the University of Tartu Library, and they can be accessed here

Proceedings will also be published in the ACL Anthology closer to the date of the workshop.

Workshop description

The workshop is a continuation of the workshop series RESOURCEFUL, focusing on RESOURCEs and representations For Under-resourced Languages. The main goal of the workshop is to continue exploring the role of the type and quality of resources that are available to computational linguists as well as challenges and directions for constructing new resources in light of the latest trends in natural language processing, computational linguistics and artificial intelligence.

On the one hand, data-driven machine learning techniques in natural language processing have achieved remarkable performance in many tasks, but to do so, large quantities of quality data (mostly text) are required. One question that has been raised is whether text-only data is enough to capture semantics or other modalities such as images, sounds, situated context or embodiment are required. Interpretability studies of large language models have revealed that even with large datasets the models still do not cover all the contexts of human social activity and are prone to capturing unwanted bias where data is focused towards only some contexts. Collecting, managing and understanding linguistic data in the age of machine learning is challenging and different tools are required to address these questions.

On the other hand, expert-driven annotator-based resources have been constructed over the years based on theoretical work in linguistics, psychology and related fields and a large amount of work has been done both theoretically and practically. One challenge is understanding to what degree such resources which have traditionally been aimed at rule-based natural language processing approaches are relevant today for both machine learning techniques and neuro-symbolic methods. Both types of resources are used by computational linguists. How can they be adapted for one another? To what degree can data-driven approaches be used to facilitate expert-driven annotation? What are the current challenges for expert-based annotation and data-driven methods? How can crowdsourcing and citizen science be used in building resources? How can we evaluate and reduce unwanted bias? What is required from each type of resource to evaluate how machine learning techniques work for different linguistic tasks?

Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, psychology, computational linguistics, speech, computer science, machine learning, computer vision etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and presentations of of on-going or completed research.

Topics of interest

We would like to open a forum by bringing together students, researchers, and experts to address and discuss the following points:

The types of linguistic knowledge that should be captured by the models across different contexts and tasks.
Practical methods for sampling and extracting knowledge.
Relevance of traditional NLP resources for use in data-driven approaches.
Use of data-driven approaches to enhance expert-driven annotation processes.
Current challenges faced in expert-based annotation.
Crowdsourcing and citizen science initiatives to build and enrich linguistic resources.
Methods to evaluate and mitigate unwanted biases in linguistic models and data.
Creating anonymised and pseudonymised datasets and models
Evaluating the role of modern LLMs in the creation of new linguistic resources.

Contact

E-mail: resourceful at listserv dot gu dot se