Sökning: id:"swepub:oai:DiVA.org:umu-213782" >
Computer, enhence :
Computer, enhence : POS-tagging improvements for nonbinary pronoun use in Swedish
-
- Björklund, Henrik (författare)
- Umeå universitet,Institutionen för datavetenskap
-
- Devinney, Hannah, 1995- (författare)
- Umeå universitet,Institutionen för datavetenskap,Umeå centrum för genusstudier (UCGS)
-
(creator_code:org_t)
- The Association for Computational Linguistics, 2023
- 2023
- Engelska.
-
Ingår i: Proceedings of the third workshop on language technology for equality, diversity, inclusion. - : The Association for Computational Linguistics. - 9789544520847 ; , s. 54-61
- Relaterad länk:
-
https://sites.google...
-
visa fler...
-
https://urn.kb.se/re...
-
visa färre...
Abstract
Ämnesord
Stäng
- Part of Speech (POS) taggers for Swedish routinely fail for the third person gender-neutral pronoun hen, despite the fact that it has been a well-established part of the Swedish language since at least 2014. In addition to simply being a form of gender bias, this failure can have negative effects on other tasks relying on POS information. We demonstrate the usefulness of semi-synthetic augmented datasets in a case study, retraining a POS tagger to correctly recognize hen as a personal pronoun. We evaluate our retrained models for both tag accuracy and on a downstream task (dependency parsing) in a classicial NLP pipeline.Our results show that adding such data works to correct for the disparity in performance. The accuracy rate for identifying hen as a pronoun can be brought up to acceptable levels with only minor adjustments to the tagger’s vocabulary files. Performance parity to gendered pronouns can be reached after retraining with only a few hundred examples. This increase in POS tag accuracy also results in improvements for dependency parsing sentences containing hen.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Language Technology (hsv//eng)
Nyckelord
- Part-of-Speech
- gendered pronouns
- neopronouns
- computational linguistics
- datorlingvistik
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)
Hitta via bibliotek
Till lärosätets databas