SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Beloucif Meriem) "

Sökning: WFRF:(Beloucif Meriem)

  • Resultat 1-9 av 9
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Beloucif, Meriem, et al. (författare)
  • Elvis vs. M. Jackson : Who has More Albums? Classification and Identification of Elements in Comparative Questions
  • 2022
  • Ingår i: LREC 2022. - : European Language Resources Association. - 9791095546726 ; , s. 3771-3779
  • Konferensbidrag (refereegranskat)abstract
    • Comparative Question Answering (cQA) is the task of providing concrete and accurate responses to queries such as: "Is Lyft cheaper than a regular taxi?" or "What makes a mortgage different from a regular loan?". In this paper, we propose two new open-domain real-world datasets for identifying and labeling comparative questions. While the first dataset contains instances of English questions labeled as comparative vs. non-comparative, the second dataset provides additional labels including the objects and the aspects of comparison. We conduct several experiments that evaluate the soundness of our datasets. The evaluation of our datasets using various classifiers show promising results that reach close-to-human results on a binary classification task with a neural model using ALBERT embeddings. When approaching the unsupervised sequence labeling task, some headroom remains.
  •  
2.
  • Beloucif, Meriem, et al. (författare)
  • Probing Pre-trained Language Models for Semantic Attributes and their Values
  • 2021
  • Ingår i: Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021. - Stroudsburg, PA, USA : Association for Computational Linguistics. - 9781955917100 ; , s. 2554-2559
  • Konferensbidrag (refereegranskat)abstract
    • Pretrained Language Models (PTLMs) yield state-of-the-art performance on many Natural Language Processing tasks, including syntax, semantics and commonsense reasoning. In this paper, we focus on identifying to what extent do PTLMs capture semantic attributes and their values, e.g. the relation between rich and high net worth. We use PTLMs to predict masked tokens using patterns and lists of items from Wikidata in order to verify how likely PTLMs encode semantic attributes along with their values. Such inferences based on semantics are intuitive for us humans as part of our language understanding. Since PTLMs are trained on large amounts of Wikipedia data, we would assume that they can generate similar predictions. However, our findings reveal that PTLMs perform still much worse than humans on this task. We show an analysis which explains how to exploit our methodology to integrate better context and semantics into PTLMs using knowledge bases.
  •  
3.
  • Beloucif, Meriem, et al. (författare)
  • Using Wikidata for Enhancing Compositionality in Pre-trained Language Models
  • 2023
  • Ingår i: Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing. - : INCOMA. - 9789544520922 ; , s. 170-178
  • Konferensbidrag (refereegranskat)abstract
    • One of the many advantages of pre-trained language models (PLMs) such as BERT and RoBERTa is their flexibility and contextual nature. These features give PLMs strong capabilities for representing lexical semantics. However, PLMs seem incapable of capturing high-level semantics in terms of compositionally. We show that when augmented with the relevant semantic knowledge, PMLs learn to capture a higher degree of lexical compositionality. We annotate a large dataset from Wikidata highlighting a type of semantic inference that is easy for humans to understand but difficult for PLMs, like the correlation between age and date of birth. We use this resource for finetuning DistilBERT, BERT large and RoBERTa. Our results show that the performance of PLMs against the test data continuously improves when augmented with such a rich resource. Our results are corroborated by a consistent improvement over most GLUE benchmark natural language understanding tasks.
  •  
4.
  • Bondarenko, Alexander, et al. (författare)
  • Overview of Touché 2022 : Argument Retrieval
  • 2022
  • Ingår i: Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2022). - Cham : Springer Nature. - 9783031136436 - 9783031136429 ; , s. 311-336
  • Konferensbidrag (refereegranskat)abstract
    • This paper is a condensed report on the third year of the Touche lab on argument retrieval held at CLEF 2022. With the goal to foster and support the development of technologies for argument mining and argument analysis, we organized three shared tasks in the third edition of Touche: (a) argument retrieval for controversial topics, where participants retrieve a gist of arguments from a collection of online debates, (b) argument retrieval for comparative questions, where participants retrieve argumentative passages from a generic web crawl, and (c) image retrieval for arguments, where participants retrieve images from a focused web crawl that show support or opposition to some stance.
  •  
5.
  • Bruton, Micaella, et al. (författare)
  • BERTie Bott's Every Flavor Labels : A Tasty Introduction to Semantic Role Labeling for Galician
  • 2023
  • Ingår i: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. - : Association for Computational Linguistics. - 9798891760608 ; , s. 10892-10902
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we leverage existing corpora, WordNet, and dependency parsing to build the first Galician dataset for training semantic role labeling systems in an effort to expand available NLP resources. Additionally, we introduce verb indexing, a new pre-processing method, which helps increase the performance when semantically parsing highly-complex sentences. We use transfer-learning to test both the resource and the verb indexing method. Our results show that the effects of verb indexing were amplified in scenarios where the model was both pre-trained and fine-tuned on datasets utilizing the method, but improvements are also noticeable when only used during fine-tuning. The best-performing Galician SRL model achieved an f1 score of 0.74, introducing a baseline for future Galician SRL systems. We also tested our method on Spanish where we achieved an f1 score of 0.83, outperforming the baseline set by the 2009 CoNLL Shared Task by 0.025 showing the merits of our verb indexing method for pre-processing.
  •  
6.
  • Kniele, Annika, et al. (författare)
  • Uppsala University at SemEval-2023 Task12 : Zero-shot Sentiment Classification for Nigerian Pidgin Tweets
  • 2023
  • Ingår i: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023). - : Association for Computational Linguistics. - 9781959429999 ; , s. 1491-1497
  • Konferensbidrag (refereegranskat)abstract
    • While sentiment classification has been considered a practically solved task for high-resource languages such as English, the scarcity of data for many languages still makes it a challenging task. The AfriSenti-SemEval shared task aims to classify sentiment on Twitter data for 14 low-resource African languages. In our participation, we focus on Nigerian Pidgin as the target language. We have investigated the effect of English monolingual and multilingual pre-trained models on the sentiment classification task for Nigerian Pidgin. Our setup includes zero-shot models (using English, Igbo and Hausa data) and a Nigerian Pidgin fine-tuned model. Our results show that English fine-tuned models perform slightly better than models fine-tuned on other Nigerian languages, which could be explained by the lexical and structural closeness between Nigerian Pidgin and English. The best results were reported on the monolingual Nigerian Pidgin data. The model pre-trained on English and fine-tuned on Nigerian Pidgin was submitted to Task A Track 4 of the AfriSenti-SemEval Shared Task 12, and scored 25 out of 32 in the ranking.
  •  
7.
  • Muhammad, Shamsuddeen, et al. (författare)
  • AfriSenti : A Twitter Sentiment Analysis Benchmark for African Languages
  • 2023
  • Ingår i: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. - : Association for Computational Linguistics. - 9798891760608 ; , s. 13968-13981
  • Konferensbidrag (refereegranskat)abstract
    • Africa is home to over 2,000 languages from over six language families and has the highest linguistic diversity among all continents. This includes 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial in enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yoruba) from four language families. The tweets were annotated by native speakers and used in the AfriSenti-SemEval shared task (with over 200 participants, see website: https://afrisenti-semeval.github.io). We describe the data collection methodology, annotation process, and the challenges we dealt with when curating each dataset. We further report baseline experiments conducted on the AfriSenti datasets and discuss their usefulness.
  •  
8.
  • Muhammad, Shamsuddeen Hassan, et al. (författare)
  • SemEval-2023 Task 12 : Sentiment Analysis for African Languages (AfriSenti-SemEval)
  • 2023
  • Ingår i: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023). - : Association for Computational Linguistics. - 9781959429999 ; , s. 2319-2337
  • Konferensbidrag (refereegranskat)abstract
    • We present the first Africentric SemEval Shared task, Sentiment Analysis for African Languages (AfriSenti-SemEval) - The dataset is available at https://github.com/afrisenti-semeval/afrisent-semeval-2023. AfriSenti-SemEval is a sentiment classification challenge in 14 African languages: Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yorb (Muhammad et al., 2023), using data labeled with 3 sentiment classes. We present three subtasks: (1) Task A: monolingual classification, which received 44 submissions; (2) Task B: multilingual classification, which received 32 submissions; and (3) Task C: zero-shot classification, which received 34 submissions. The best performance for tasks A and B was achieved by NLNDE team with 71.31 and 75.06 weighted F1, respectively. UCAS-IIE-NLP achieved the best average score for task C with 58.15 weighted F1. We describe the various approaches adopted by the top 10 systems and their approaches.
  •  
9.
  • Stenlund, Mathias Hans Erik, et al. (författare)
  • Improving Translation Quality for Low-Resource Inuktitut with Various Preprocessing Techniques
  • 2023
  • Ingår i: Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing. - : INCOMA Ltd.. - 9789544520922 ; , s. 475-479
  • Konferensbidrag (refereegranskat)abstract
    • Neural machine translation has been shown to outperform all other machine translation paradigms when trained in a high-resource setting. However, it still performs poorly when dealing with low-resource languages, for which parallel data for training is scarce. This is especially the case for morphologically complex languages such as Turkish, Tamil, Uyghur, etc. In this paper, we investigate various preprocessing methods for Inuktitut, a low-resource indigenous language from North America, without a morphological analyzer. On both the original and romanized scripts, we test various preprocessing techniques such as Byte-Pair Encoding, random stemming, and data augmentation using Hungarian for the Inuktitut-to-English translation task. We found that there are benefits to retaining the original script as it helps to achieve higher BLEU scores than the romanized models.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-9 av 9

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy