Sökning: id:"swepub:oai:DiVA.org:ltu-98277" >
Exploring Swedish &...
Exploring Swedish & English fastText Embeddings
-
- Adewumi, Oluwatosin, 1978- (författare)
- Luleå tekniska universitet,EISLAB
-
- Liwicki, Foteini (författare)
- Luleå tekniska universitet,EISLAB
-
- Liwicki, Marcus (författare)
- Luleå tekniska universitet,EISLAB
-
(creator_code:org_t)
- 2022
- 2022
- Engelska.
-
Ingår i: Artificial Intelligence and Cognition 2022. ; , s. 201-208
- Relaterad länk:
-
https://ceur-ws.org/...
-
visa fler...
-
https://ltu.diva-por... (primary) (Raw object)
-
https://urn.kb.se/re...
-
visa färre...
Abstract
Ämnesord
Stäng
- In this paper, we show that embeddings from relatively smaller corpora sometimes outperform thosefrom larger corpora and we introduce a new Swedish analogy test set and make it publicly available.To achieve good performance in Natural Language Processing (NLP) downstream tasks, several factorsplay important roles: dataset size, the right hyper-parameters, and well-trained embeddings. We utilizethe fastText tool for our experiments. We evaluate both the Swedish and English embeddings that wecreated using intrinsic evaluation (including analogy & Spearman correlation) and compare them with2 common, publicly available embeddings. Our English continuous Bag-of-Words (CBoW)-negativesampling embedding shows better performance compared to the publicly available GoogleNews version.We also describe the relationship between NLP and cognitive science. We contribute the embeddings forresearch or other useful purposes by publicly releasing them.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Language Technology (hsv//eng)
Nyckelord
- Embeddings
- fastText
- Analogy set
- Swedish
- Maskininlärning
- Machine Learning
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)