PRINCIPLE project

Welcome to the PRINCIPLE Project website

PRINCIPLE stands for Providing Resources in Irish, Norwegian, Croatian and Icelandic for Purposes of Language Engineering and is implemented by a five member consortium. 

Consortium Members: Dublin City University (Project Coordinator), University of Iceland, Faculty of Humanities and Social Sciences, University of Zagreb, National Library of Norway, Iconic Translation Machines Ltd.

PRINCIPLE is a 2 year Connecting Europe Facility (CEF) funded project (Action 2018-EU-IA-0050, Grant Agreement No. INEA/CEF/ICT/A2018/1761837) whose main aim is to  identify, collect and process high-quality Language Resources (LRs) for four under-resourced European languages: 

  • Croatian
  • Icelandic
  • Irish
  • Norwegian (Bokmål and Nynorsk)

 The project started in September 2019 and will finish in August 2021. 

PRINCIPLE will produce these high-quality curated LRs in order to improve translation quality in the Digital Service Infrastructures of eJustice and eProcurement via domain-specific Machine Translation (MT) systems (CEF eTranslation engines).

These high-quality LRs will be identified through the following process:  

  • A number of national bodies and local stakeholders across Croatia, Iceland, Ireland and Norway have agreed to provide LRs to the PRINCIPLE consortium and become ‘early adopters’
  • Iconic Translation Machines will develop Neural MT engines from the donated Language Resources in order to verify the quality of the resources.  MT systems will be provided to the project’s ‘early adopters’ for the duration of the project in order to validate quality in real user scenarios and gather feedback.
  • Language Resources will then be provided for CEF eTranslation engines through the ELRC-SHARE portal

Outline of PRINCIPLE Project

  1. Language Resources (LRs) are collected from data holders and ‘early adopters’ from each of the 4 countries involved (in the specific domains of eJustice and eProcurement)
  2. Machine Translation (MT) systems are produced from these LRs and evaluated to ensure high quality output
  3. The MT systems are provided to ‘early adopters’ for free for the duration of the project
  4. ‘Early adopters’ use the MT systems  and provide feedback
  5. Based on MT output evaluation and early adopter feedback, high quality LRs are identified 
  6. Parallel LRs of high quality are uploaded  to the ELRC-SHARE portal in order to improve the automated translation system eTranslation.

Key activities that will be conducted in PRINCIPLE include the following and will be rolled out in specific phases across the duration of the project. 

  • Activity 1: Project Implementation
  • Activity 2: Use-case analysis, Data Requirements and Data Preparation
  • Activity 3: Development, evaluation and deployment of MT systems
  • Activity 4: Identification, Collection & Consolidation of Language Resources
  • Activity 5: Exploitation & Sustainability
  • Activity 6: Dissemination

Language resources have been collected from a number of dataholders and early adopters across the 4 countries. This data has been analysed and prepared for MT development based on agreed use cases. Iconic Translation Machines has created bespoke neural MT systems for 10 Early Adopters:  

  • National University of Ireland Galway (Ireland)
  • CIKLOPEA D.O.O. (Croatia) 
  • Ministry of Foreign Affairs (Iceland)
  • Standards Norway 
  • Ministry of Foreign Affairs (Norway)
  • Rannóg an Aistriúcháin (the Translation Section of the Houses of the Oireachtas)
  • Foras na Gaeilge
  • Ministry of Foreign and European Affairs (Croatia)
  • Icelandic Standards
  • Icelandic Meteorology Office

 Data identified during the development of these engines as being of “high quality” will be uploaded to the ELRC-SHARE repository in June 2021.

Events

Dates for workshops:

To be confirmed in Q2 2021

Dates for conferences where PRINCIPLE was/will be represented:

PRINCIPLE was presented at the poster session of the XVII Machine Translation Summit, that was held at Dublin City University (Ireland) on 19-23 August 2019, and this paper was published in the conference proceedings: www.aclweb.org/anthology/W19-6718.pdf

PRINCIPLE was presented at EAMT 2020 in November 2020. A 2-page paper entitled “Progress of the PRINCIPLE Project: Promoting MT for Croatian, Icelandic, Irish and Norwegian” has been published in the proceedings of the 2020 EAMT Conference and is available here (pages 465-466): https://eamt2020.inesc-id.pt/proceedings-eamt2020.pdf

PRINCIPLE was presented at the virtual poster session META-FORUM 2020 in on the 3rd of December, 2020. 

PRINCIPLE was invited to give a presentation at the 5th ELRC conference on 10th March 2021. The presentation is available here

Contact

Dublin City University (Ireland) – Andy Way, Project Coordinator (PC), andy.way@adaptcentre.ie

Iconic Translation Machines Ltd. (Ireland) – Dana Davis Sheridan, dana@iconictranslation.com

Faculty of Humanities and Social Sciences, University of Zagreb (Croatia) – Petra Bago, Data Collection Coordinator (DCC), pbago@ffzg.hr

University of Iceland (Iceland) – Gauti Kristmannsson, gautikri@hi.is 

National Library of Norway (Norway) – Jon Arild Olsen, jon.olsen@nb.no