Sökning: onr:"swepub:oai:gup.ub.gu.se/336413" >
Can Stanza be Used ...
Can Stanza be Used for Part-of-Speech Tagging Historical Polish?
-
- Szawerna, Maria Irena (författare)
- Gothenburg University,Göteborgs universitet,Institutionen för svenska, flerspråkighet och språkteknologi,Språkbanken Text, Institutionen för svenska, flerspråkighet och språkteknologi,Department of Swedish, Multilingualism, Language Technology,Språkbanken Text, Department of Swedish, multilingualism, language technology
-
(creator_code:org_t)
- Association for Computational Linguistics, 2024
- 2024
- Engelska.
-
Ingår i: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop. - : Association for Computational Linguistics. - 9798891760905
- Relaterad länk:
-
https://gup.ub.gu.se...
Abstract
Ämnesord
Stäng
- The goal of this paper is to evaluate the performance of Stanza, a part-of-speech (POS) tagger developed for modern Polish, on historical text to assess its possible use for automating the annotation of other historical texts. While the issue of the reliability of utilizing POS taggers on historical data has been previously discussed, most of the research focuses on languages whose grammar differs from Polish, meaning that their results need not be fully applicable in this case. The evaluation of Stanza is conducted on two sets of 10286 and 3270 manually annotated tokens from a piece of historical Polish writing (1899), and the errors are analyzed qualitatively and quantitatively. The results show a good performance of the tagger, especially when it comes to Universal Part-of-Speech (UPOS) tags, which is promising for utilizing the tagger for automatic annotation in larger projects, and pinpoint some common features of misclassified tokens.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Language Technology (hsv//eng)
- HUMANIORA -- Språk och litteratur -- Studier av enskilda språk (hsv//swe)
- HUMANITIES -- Languages and Literature -- Specific Languages (hsv//eng)
Nyckelord
- nlp
- natural language processing
- part-of-speech tagging
- historical linguistics
- Polish
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)
Hitta via bibliotek
Till lärosätets databas