SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Adewumi Tosin) "

Sökning: WFRF:(Adewumi Tosin)

  • Resultat 1-10 av 16
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Kovács, György, Postdoctoral researcher, 1984-, et al. (författare)
  • Pedagogical Principles in the Online Teaching of NLP : A Retrospection
  • 2021
  • Ingår i: Teaching NLP. - Stroudsburg, PA, USA : Association for Computational Linguistics (ACL). ; , s. 1-12, s. 1-12
  • Konferensbidrag (refereegranskat)abstract
    • The ongoing COVID-19 pandemic has brought online education to the forefront of pedagogical discussions. To make this increased interest sustainable in a post-pandemic era, online courses must be built on strong pedagogical foundations. With a long history of pedagogic research, there are many principles, frameworks, and models available to help teachers in doing so. These models cover different teaching perspectives, such as constructive alignment, feedback, and the learning environment. In this paper, we discuss how we designed and implemented our online Natural Language Processing (NLP) course following constructive alignment and adhering to the pedagogical principles of LTU. By examining our course and analyzing student evaluation forms, we show that we have met our goal and successfully delivered the course. Furthermore, we discuss the additional benefits resulting from the current mode of delivery, including the increased reusability of course content and increased potential for collaboration between universities. Lastly, we also discuss where we can and will further improve the current course design.
  •  
2.
  • Adelani, David Ifeoluwa, et al. (författare)
  • MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
  • 2022
  • Ingår i: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. - : Association for Computational Linguistics (ACL). ; , s. 4488-4508
  • Konferensbidrag (refereegranskat)abstract
    • African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.
  •  
3.
  • Adelani, David Ifeoluwa, et al. (författare)
  • MasakhaNER: Named Entity Recognition for African Languages
  • 2021
  • Ingår i: Transactions of the Association for Computational Linguistics. - : MIT Press. - 2307-387X. ; 9, s. 1116-1131
  • Tidskriftsartikel (refereegranskat)abstract
    • We take a step towards addressing the under-representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state-of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.
  •  
4.
  • Adewumi, Tosin, et al. (författare)
  • AfriWOZ: Corpus for Exploiting Cross-Lingual Transfer for Dialogue Generation in Low-Resource, African Languages
  • 2023
  • Ingår i: IJCNN 2023 - International Joint Conference on Neural Networks, Conference Proceedings. - : Institute of Electrical and Electronics Engineers Inc.. - 9781665488686 - 9781665488679
  • Konferensbidrag (refereegranskat)abstract
    • Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorùbá. There are a total of 9,000 turns, each language having 1,500 turns, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we benchmark by investigating & analyzing the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.
  •  
5.
  • Adewumi, Tosin, 1978-, et al. (författare)
  • Bipol : Multi-axes Evaluation of Bias with Explainability in BenchmarkDatasets
  • 2023
  • Ingår i: Proceedings of Recent Advances in Natural Language Processing. - : Incoma Ltd.. ; , s. 1-10
  • Konferensbidrag (refereegranskat)abstract
    • We investigate five English NLP benchmark datasets (on the superGLUE leaderboard) and two Swedish datasets for bias, along multiple axes. The datasets are the following: Boolean Question (Boolq), CommitmentBank (CB), Winograd Schema Challenge (WSC), Winogender diagnostic (AXg), Recognising Textual Entailment (RTE), Swedish CB, and SWEDN. Bias can be harmful and it is known to be common in data, which ML models learn from. In order to mitigate bias in data, it is crucial to be able to estimate it objectively. We use bipol, a novel multi-axes bias metric with explainability, to estimate and explain how much bias exists in these datasets. Multilingual, multi-axes bias evaluation is not very common. Hence, we also contribute a new, large Swedish bias-labeled dataset (of 2 million samples), translated from the English version and train the SotA mT5 model on it. In addition, we contribute new multi-axes lexica for bias detection in Swedish. We make the codes, model, and new dataset publicly available.
  •  
6.
  • Adewumi, Tosin, 1978-, et al. (författare)
  • ML_LTU at SemEval-2022 Task 4: T5 Towards Identifying Patronizingand Condescending Language
  • 2022
  • Ingår i: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). - : Association for Computational Linguistics. ; , s. 473-478
  • Konferensbidrag (refereegranskat)abstract
    • This paper describes the system used by the Machine Learning Group of LTU in subtask 1 of the SemEval-2022 Task 4: Patronizing and Condescending Language (PCL) Detection. Our system consists of finetuning a pretrained text-to-text transfer transformer (T5) and innovatively reducing its out-of-class predictions. The main contributions of this paper are 1) the description of the implementation details of the T5 model we used, 2) analysis of the successes & struggles of the model in this task, and 3) ablation studies beyond the official submission to ascertain the relative importance of data split. Our model achieves an F1 score of 0.5452 on the official test set.
  •  
7.
  • Adewumi, Tosin P., 1978-, et al. (författare)
  • The Challenge of Diacritics in Yorùbá Embeddings
  • 2020
  • Ingår i: ML4D 2020 Proceedings. - : Neural Information Processing Systems Foundation.
  • Konferensbidrag (refereegranskat)abstract
    • The major contributions of this work include the empirical establishment of a better performance for Yoruba embeddings from undiacritized (normalized) dataset and provision of new analogy sets for evaluation.The Yoruba language, being a tonal language, utilizes diacritics (tonal marks) in written form. We show that this affects embedding performance by creating embeddings from exactly the same Wikipedia dataset but with the second one normalized to be undiacritized. We further compare average intrinsic performance with two other work (using analogy test set & WordSim) and we obtain the best performance in WordSim and corresponding Spearman correlation.
  •  
8.
  • Adewumi, Tosin P., 1978-, et al. (författare)
  • Vector Representations of Idioms in Chatbots
  • 2020
  • Ingår i: Proceedings. - : Chalmers University of Technology.
  • Konferensbidrag (refereegranskat)abstract
    • Open-domain chatbots have advanced but still have many gaps. My PhD aims to solve a few of those gaps by creating vector representations of idioms (figures of speech) that will be beneficial to chatbots and natural language processing (NLP), generally. In the process, new, optimal fastText embeddings in Swedish and English have been created and the first Swedish analogy test set, larger than the Google original, for intrinsic evaluation of Swedish embeddings has also been produced. Major milestones have been attained and others are soon to follow. The deliverables of this project will give NLP researchers the opportunity to measure the quality of Swedish embeddings easily and advance state-of-the-art (SotA) in NLP.
  •  
9.
  • Adewumi, Tosin, 1978-, et al. (författare)
  • Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms
  • 2022
  • Ingår i: Proceedings of the 13th Language Resources and Evaluation Conference. - : European Language Resources Association (ELRA). ; , s. 689-696
  • Konferensbidrag (refereegranskat)abstract
    • We present a fairly large, Potential Idiomatic Expression (PIE) dataset for Natural Language Processing (NLP) in English. The challenges with NLP systems with regards to tasks such as Machine Translation (MT), word sense disambiguation (WSD) and information retrieval make it imperative to have a labelled idioms dataset with classes such as it is in this work. To the best of the authors’ knowledge, this is the first idioms corpus with classes of idioms beyond the literal and the general idioms classification. Inparticular, the following classes are labelled in the dataset: metaphor, simile, euphemism, parallelism, personification, oxymoron, paradox, hyperbole, irony and literal. We obtain an overall inter-annotator agreement (IAA) score, between two independent annotators, of 88.89%. Many past efforts have been limited in the corpus size and classes of samples but this dataset contains over 20,100 samples with almost 1,200 cases of idioms (with their meanings) from 10 classes (or senses). The corpus may also be extended by researchers to meet specific needs. The corpus has part of speech (PoS) tagging from the NLTK library. Classification experiments performed on the corpus to obtain a baseline and comparison among three common models, including the state-of-the-art (SoTA) BERT model, give good results. We also make publicly available the corpus and the relevant codes for working with it for NLP tasks.
  •  
10.
  • Adewumi, Tosin, 1978-, et al. (författare)
  • State-of-the-Art in Open-Domain Conversational AI: A Survey
  • 2022
  • Ingår i: Information. - : MDPI. - 2078-2489. ; 13:6
  • Forskningsöversikt (refereegranskat)abstract
    • We survey SoTA open-domain conversational AI models with the objective of presenting the prevailing challenges that still exist to spur future research. In addition, we provide statistics on the gender of conversational AI in order to guide the ethics discussion surrounding the issue. Open-domain conversational AI models are known to have several challenges, including bland, repetitive responses and performance degradation when prompted with figurative language, among others. First, we provide some background by discussing some topics of interest in conversational AI. We then discuss the method applied to the two investigations carried out that make up this study. The first investigation involves a search for recent SoTA open-domain conversational AI models, while the second involves the search for 100 conversational AI to assess their gender. Results of the survey show that progress has been made with recent SoTA conversational AI, but there are still persistent challenges that need to be solved, and the female gender is more common than the male for conversational AI. One main takeaway is that hybrid models of conversational AI offer more advantages than any single architecture. The key contributions of this survey are (1) the identification of prevailing challenges in SoTA open-domain conversational AI, (2) the rarely held discussion on open-domain conversational AI for low-resource languages, and (3) the discussion about the ethics surrounding the gender of conversational AI.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy