SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Muñoz Sánchez Ricardo 1992) "

Sökning: WFRF:(Muñoz Sánchez Ricardo 1992)

  • Resultat 1-8 av 8
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Goldfarb-Tarrant, Seraphina, et al. (författare)
  • Intrinsic Bias Metrics Do Not Correlate with Application Bias
  • 2021
  • Ingår i: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), August 2021, Online. - Stroudsburg, PA, USA : Association for Computational Linguistics. - 9781954085527
  • Konferensbidrag (refereegranskat)abstract
    • Natural Language Processing (NLP) systems learn harmful societal biases that cause them to amplify inequality as they are deployed in more and more situations. To guide efforts at debiasing these systems, the NLP community relies on a variety of metrics that quantify bias in models. Some of these metrics are intrinsic, measuring bias in word embedding spaces, and some are extrinsic, measuring bias in downstream tasks that the word embeddings enable. Do these intrinsic and extrinsic metrics correlate with each other? We compare intrinsic and extrinsic metrics across hundreds of trained models covering different tasks and experimental conditions. Our results show no reliable correlation between these metrics that holds in all scenarios across tasks and languages. We urge researchers working on debiasing to focus on extrinsic measures of bias, and to make using these measures more feasible via creation of new challenge sets and annotated test data. To aid this effort, we release code, a new intrinsic metric, and an annotated test set focused on gender bias in hate speech.
  •  
2.
  • Kokkinakis, Dimitrios, 1965, et al. (författare)
  • Investigating the Effects of MWE Identification in Structural Topic Modelling
  • 2023
  • Ingår i: 19th Workshop on Multiword Expressions, MWE 2023 - Proceedings. - : ACL. ; , s. 36-44
  • Konferensbidrag (refereegranskat)abstract
    • Multiword expressions (MWEs) are common word combinations which exhibit idiosyncrasies in various linguistic levels. For various downstream natural language processing applications and tasks, the identification and discovery of MWEs has been proven to be potentially practical and useful, but still challenging to codify. In this paper we investigate various, relevant to MWE, resources and tools for Swedish, and, within a specific application scenario, we apply structural topic modelling to investigate whether there are any interpretative advantages of identifying MWEs.
  •  
3.
  • Kokkinakis, Dimitrios, 1965, et al. (författare)
  • Scaling-up the Resources for a Freely Available Swedish VADER (svVADER)
  • 2023
  • Ingår i: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa).
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • With widespread commercial applications in various domains, sentiment analysis has become a success story for Natural Language Processing (NLP). Still, although sentiment analysis has rapidly progressed during the last years, mainly due to the application of modern AI technologies, many approaches apply knowledge-based strategies, such as lexicon-based, to the task. This is particularly true for analyzing short social media content, e.g., tweets. Moreover, lexicon-based sentiment analysis approaches are usually preferred over learning-based methods when training data is unavailable or insufficient. Therefore, our main goal is to scale-up and apply a lexicon-based approach which can be used as a baseline to Swedish sentiment analysis. All scaled-up resources are made available, while the performance of this enhanced tool is evaluated on two short datasets, achieving adequate results.
  •  
4.
  • Muñoz Sánchez, Ricardo, 1992, et al. (författare)
  • A First Attempt at Unreliable News Detection in Swedish
  • 2022
  • Ingår i: Proceedings of the Second International Workshop on Resources and Techniques for User Information in Abusive Language Analysis, Marseille, 20-25 June, 2022 / Editors: Johanna Monti, Valerio Basile, Maria Pia Di Buono, Raffaele Manna, Antonio Pascucci, Sara Tonell. - Paris : European Language Resources Association (ELRA). - 9791095546993
  • Konferensbidrag (refereegranskat)abstract
    • Throughout the COVID-19 pandemic, a parallel infodemic has also been going on such that the information has been spreading faster than the virus itself. During this time, every individual needs to access accurate news in order to take corresponding protective measures, regardless of their country of origin or the language they speak, as misinformation can cause significant loss to not only individuals but also society. In this paper we train several machine learning models (ranging from traditional machine learning to deep learning) to try to determine whether news articles come from either a reliable or an unreliable source, using just the body of the article. Moreover, we use a previously introduced corpus of news in Swedish related to the COVID-19 pandemic for the classification task. Given that our dataset is both unbalanced and small, we use subsampling and easy data augmentation (EDA) to try to solve these issues. In the end, we realize that, due to the small size of our dataset, using traditional machine learning along with data augmentation yields results that rival those of transformer models such as BERT.
  •  
5.
  • Muñoz Sánchez, Ricardo, 1992, et al. (författare)
  • Did the Names I Used within My Essay Affect My Score? Diagnosing Name Biases in Automated Essay Scoring
  • 2024
  • Ingår i: Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024). - : Association for Computational Linguistics.
  • Konferensbidrag (refereegranskat)abstract
    • Automated essay scoring (AES) of second-language learner essays is a high-stakes task as it can affect the job and educational opportunities a student may have access to. Thus, it becomes imperative to make sure that the essays are graded based on the students’ language proficiency as opposed to other reasons, such as personal names used in the text of the essay. Moreover, most of the research data for AES tends to contain personal identifiable information. Because of that, pseudonymization becomes an important tool to make sure that this data can be freely shared. Thus, our systems should not grade students based on which given names were used in the text of the essay, both for fairness and for privacy reasons. In this paper we explore how given names affect the CEFR level classification of essays of second language learners of Swedish. We use essays containing just one personal name and substitute it for names from lists of given names from four different ethnic origins, namely Swedish, Finnish, Anglo-American, and Arabic. We find that changing the names within the essays has no apparent effect on the classification task, regardless of whether a feature-based or a transformer-based model is used.
  •  
6.
  •  
7.
  • Szawerna, Maria Irena, et al. (författare)
  • Detecting Personal Identifiable Information in Swedish Learner Essays
  • 2024
  • Ingår i: Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024). - : Association for Computational Linguistics.
  • Konferensbidrag (refereegranskat)abstract
    • Linguistic data can — and often does — contain PII (Personal Identifiable Information). Both from a legal and ethical standpoint, the sharing of such data is not permissible. According to the GDPR, pseudonymization, i.e. the replacement of sensitive information with surrogates, is an acceptable strategy for privacy preservation. While research has been conducted on the detection and replacement of sensitive data in Swedish medical data using Large Language Models (LLMs), it is unclear whether these models handle PII in less structured and more thematically varied texts equally well. In this paper, we present and discuss the performance of an LLM-based PII-detection system for Swedish learner essays.
  •  
8.
  • Szawerna, Maria Irena, et al. (författare)
  • Pseudonymization Categories across Domain Boundaries
  • 2024
  • Ingår i: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). - : ELRA and ICCL.
  • Konferensbidrag (refereegranskat)abstract
    • Linguistic data, a component critical not only for research in a variety of fields but also for the development of various Natural Language Processing (NLP) applications, can contain personal information. As a result, its accessibility is limited, both from a legal and an ethical standpoint. One of the solutions is the pseudonymization of the data. Key stages of this process include the identification of sensitive elements and the generation of suitable surrogates in a way that the data is still useful for the intended task. Within this paper, we conduct an analysis of tagsets that have previously been utilized in anonymization and pseudonymization. We also investigate what kinds of Personally Identifiable Information (PII) appear in various domains. These reveal that none of the analyzed tagsets account for all of the PII types present cross-domain at the level of detailedness seemingly required for pseudonymization. We advocate for a universal system of tags for categorizing PIIs leading up to their replacement. Such categorization could facilitate the generation of grammatically, semantically, and sociolinguistically appropriate surrogates for the kinds of information that are considered sensitive in a given domain, resulting in a system that would enable dynamic pseudonymization while keeping the texts readable and useful for future research in various fields.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-8 av 8

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy