Co-located with the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), Tallinn, Estonia.
News
First CfP is out!
Workshop description
The workshop is a continuation of the workshop series RESOURCEFUL, focusing on RESOURCEs and representations For Under-resourced Languages. The main goal of the workshop is to continue exploring the role of the type and quality of resources that are available to computational linguists as well as challenges and directions for constructing new resources in light of the latest trends in natural language processing, computational linguistics and artificial intelligence.
On the one hand, data-driven machine learning techniques in natural language processing have achieved remarkable performance in many tasks, but to do so, large quantities of quality data (mostly text) are required. One question that has been raised is whether text-only data is enough to capture semantics or other modalities such as images, sounds, situated context or embodiment are required. Interpretability studies of large language models have revealed that even with large datasets the models still do not cover all the contexts of human social activity and are prone to capturing unwanted bias where data is focused towards only some contexts. Collecting, managing and understanding linguistic data in the age of machine learning is challenging and different tools are required to address these questions.
On the other hand, expert-driven annotator-based resources have been constructed over the years based on theoretical work in linguistics, psychology and related fields and a large amount of work has been done both theoretically and practically. One challenge is understanding to what degree such resources which have traditionally been aimed at rule-based natural language processing approaches are relevant today for both machine learning techniques and neuro-symbolic methods. Both types of resources are used by computational linguists. How can they be adapted for one another? To what degree can data-driven approaches be used to facilitate expert-driven annotation? What are the current challenges for expert-based annotation and data-driven methods? How can crowdsourcing and citizen science be used in building resources? How can we evaluate and reduce unwanted bias? What is required from each type of resource to evaluate how machine learning techniques work for different linguistic tasks?
Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, psychology, computational linguistics, speech, computer science, machine learning, computer vision etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and presentations of of on-going or completed research.
Topics of interest
We would like to open a forum by bringing together students, researchers, and experts to address and discuss the following points:
- The types of linguistic knowledge that should be captured by the models across different contexts and tasks.
- Practical methods for sampling and extracting knowledge.
- Relevance of traditional NLP resources for use in data-driven approaches.
- Use of data-driven approaches to enhance expert-driven annotation processes.
- Current challenges faced in expert-based annotation.
- Crowdsourcing and citizen science initiatives to build and enrich linguistic resources.
- Methods to evaluate and mitigate unwanted biases in linguistic models and data.
- Creating anonymised and pseudonymised datasets and models
- Evaluating the role of modern LLMs in the creation of new linguistic resources.
Contact
E-mail: resourceful at listserv dot gu dot se