SwePub
Sök i LIBRIS databas

  Extended search

onr:"swepub:oai:DiVA.org:ltu-101305"
 

Search: onr:"swepub:oai:DiVA.org:ltu-101305" > AfriWOZ: Corpus for...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

AfriWOZ: Corpus for Exploiting Cross-Lingual Transfer for Dialogue Generation in Low-Resource, African Languages

Adewumi, Tosin (author)
Luleå tekniska universitet,EISLAB,Masakhane
Adeyemi, Mofetoluwa (author)
Masakhane
Anuoluwapo, Aremu (author)
Masakhane
show more...
Peters, Bukola (author)
CIS
Buzaaba, Happy (author)
Masakhane
Samuel, Oyerinde (author)
Masakhane
Rufai, Amina Mardiyyah (author)
Masakhane
Ajibade, Benjamin (author)
Masakhane
Gwadabe, Tajudeen (author)
Masakhane
Koulibaly Traore, Mory Moussou (author)
Masakhane
Ajayi, Tunde Oluwaseyi (author)
Masakhane
Muhammad, Shamsuddeen (author)
Baruwa, Ahmed (author)
Masakhane
Owoicho, Paul (author)
Masakhane
Ogunremi, Tolulope (author)
Masakhane
Ngigi, Phylis (author)
Jomo Kenyatta University of Agriculture and Technology
Ahia, Orevaoghene (author)
Masakhane
Nasir, Ruqayya (author)
Masakhane
Liwicki, Foteini (author)
Luleå tekniska universitet,EISLAB
Liwicki, Marcus (author)
Luleå tekniska universitet,EISLAB
show less...
 (creator_code:org_t)
Institute of Electrical and Electronics Engineers Inc. 2023
2023
English.
In: IJCNN 2023 - International Joint Conference on Neural Networks, Conference Proceedings. - : Institute of Electrical and Electronics Engineers Inc.. - 9781665488686 - 9781665488679
  • Conference paper (peer-reviewed)
Abstract Subject headings
Close  
  • Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorùbá. There are a total of 9,000 turns, each language having 1,500 turns, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we benchmark by investigating & analyzing the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.

Subject headings

NATURVETENSKAP  -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Language Technology (hsv//eng)
NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)

Keyword

crosslingual
dialogue systems
low-resource
multilingual
NLG
Maskininlärning
Machine Learning

Publication and Content Type

ref (subject category)
kon (subject category)

Find in a library

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view