RESOURCEFUL-2023

Speakers

University of Helsinki

Bio

Jörg Tiedemann is professor of language technology at the Department of Digital Humanities at the University of Helsinki. He received his PhD in computational linguistics for work on bitext alignment and machine translation from Uppsala University before moving to the University of Groningen for 5 years of post-doctoral research on question answering and information extraction. His main research interests are connected with massively multilingual data sets and data-driven natural language processing and he currently runs an ERC-funded project on representation learning and natural language understanding.

Talk: Democratizing Machine Translation with OPUS and OPUS-MT

The demand for translation is ever growing and this trend will not stop. Being able to access the same kind of information is a fundamental prerequisite for equality in society and translation plays a crucial role when fighting discrimination based on language barriers. Efficient tools and a better coverage of the linguistic diversity in the World are necessary to cope with the amount of material that needs to be handled. Our mission is to support the development of high quality tools for automatic and computer-assisted translation by providing open services and resources that are independent of commercial interests and profit-driven companies. Equal information access is a human right and not only a privilege for people who can pay for it. In this talk I will discuss the current state of OPUS-MT, our project on open neural machine translation and the challenges that we try to tackle with multilingual NLP, transfer learning and data augmentation. I will report about on-going work on knowledge distillation, the creation of compact models for real-time translation and our work on modularization of neural MT.

Darja Fišer

Institute of Contemporary History, Ljubljana

Bio

Darja Fišer is Executive Director of CLARIN. She has a background in corpus linguistics and language resource creation. She has been Associate Professor at the Faculty of Arts, University of Ljubljana, since 2019, Senior Research Fellow at the Institute of Contemporary History since 2021, and is leading the new national research programme for Digital Humanities in Slovenia. She is also serving as a member of the Scientific Advisory Board of the Austrian Centre for Digital Humanities at the Austrian Academy of Sciences, the National Interdisciplinary Research E-Infrastructure for Bulgarian Language and Cultural Heritage Resources and Technologies, and the Czech National Corpus research infrastructure of the Institute of the Czech National Corpus at Charles University.

Talk: The role of the CLARIN research infrastructure in the era of data-intensive language studies

Advances in digitization and datafication have been transformative for linguistics and other disciplines that work with language materials. This has increased the need for research infrastructures that supports the development, documentation, archiving, dissemination, reuse and citation of language resources and tools which is prerequisite for verifiable and reproducible research. In this talk I will present the recent achievements and ongoing work of the CLARIN research infrastructure which is based on the Open Science paradigm and FAIR data principles. It provides easy and sustainable access to digital language data and offers advanced tools to discover, explore, annotate, analyse, and combine such datasets, wherever they are located. This is enabled through a networked federation of centres: language data repositories, service centres, and knowledge centres with single sign-on access for all members of the academic community in all participating countries. Tools, data and metadata from different centres are interoperable so that data collections can be combined and tools from different sources can be chained to perform operations at different levels of complexity.