SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Aremu Anuoluwapo) "

Sökning: WFRF:(Aremu Anuoluwapo)

  • Resultat 1-5 av 5
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Adelani, David, et al. (författare)
  • A Few Thousand Translations Go A Long Way! Leveraging Pre-trained Models for African News Translation
  • 2022
  • Ingår i: NAACL 2022. - Stroudsburg : Association for Computational Linguistics. - 9781955917711 ; , s. 3053-3070
  • Konferensbidrag (refereegranskat)abstract
    • Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models are restricted to the selection of languages originally chosen for pre-training. This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pre-training? and 2) How can the resulting translation models effectively transfer to new domains? To answer these questions, we create a new African news corpus covering 16 languages, of which eight languages are not part of any existing evaluation dataset. We demonstrate that the most effective strategy for transferring both to additional languages and to additional domains is to fine-tune large pre-trained models on small quantities of high-quality translation data.
  •  
2.
  • Adelani, David Ifeoluwa, et al. (författare)
  • MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
  • 2022
  • Ingår i: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. - : Association for Computational Linguistics (ACL). ; , s. 4488-4508
  • Konferensbidrag (refereegranskat)abstract
    • African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.
  •  
3.
  • Adelani, David Ifeoluwa, et al. (författare)
  • MasakhaNER: Named Entity Recognition for African Languages
  • 2021
  • Ingår i: Transactions of the Association for Computational Linguistics. - : MIT Press. - 2307-387X. ; 9, s. 1116-1131
  • Tidskriftsartikel (refereegranskat)abstract
    • We take a step towards addressing the under-representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state-of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.
  •  
4.
  • Adewumi, Tosin, et al. (författare)
  • AfriWOZ: Corpus for Exploiting Cross-Lingual Transfer for Dialogue Generation in Low-Resource, African Languages
  • 2023
  • Ingår i: IJCNN 2023 - International Joint Conference on Neural Networks, Conference Proceedings. - : Institute of Electrical and Electronics Engineers Inc.. - 9781665488686 - 9781665488679
  • Konferensbidrag (refereegranskat)abstract
    • Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorùbá. There are a total of 9,000 turns, each language having 1,500 turns, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we benchmark by investigating & analyzing the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.
  •  
5.
  • Gehrmann, Sebastian, et al. (författare)
  • The GEM Benchmark : Natural Language Generation, its Evaluation and Metrics
  • 2021
  • Ingår i: The 1st Workshop on Natural Language Generation, Evaluation, and Metrics. - Stroudsburg, PA, USA : Association for Computational Linguistics. ; , s. 96-120
  • Konferensbidrag (refereegranskat)abstract
    • We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-5 av 5

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy