SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:9789179296148 "

Sökning: L773:9789179296148

  • Resultat 1-9 av 9
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Adesam, Yvonne, 1975, et al. (författare)
  • Part-of-speech tagging of Swedish texts in the neural era
  • 2021
  • Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics, NoDaLiDa, May 31–2 June, 2021, Reykjavik, Iceland (online) / eds Simon Dobnik and Lilja Øvrelid. - Linköping : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789179296148
  • Konferensbidrag (refereegranskat)
  •  
2.
  • Duong, Quan, et al. (författare)
  • An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish
  • 2021
  • Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31–2 June, 2021, Reykjavik, Iceland (online). - Linköping : Linköping Electronic Conference Proceedings. - 1650-3686 .- 1650-3740. - 9789179296148
  • Konferensbidrag (refereegranskat)abstract
    • Historical corpora are known to contain errors introduced by OCR (optical character recognition) methods used in the digitization process, often said to be degrading the performance of NLP systems. Correcting these errors manually is a time-consuming process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We build on previous work on fully automatic unsupervised extraction of parallel data to train a character-based sequence-to-sequence NMT (neural machine translation) model to conduct OCR error correction designed for English, and adapt it to Finnish by proposing solutions that take the rich morphology of the language into account. Our new method shows increased performance while remaining fully unsupervised, with the added benefit of spelling normalisation. The source code and models are available on GitHub and Zenodo.
  •  
3.
  • Hagström, Lovisa, 1995, et al. (författare)
  • Knowledge Distillation for Swedish NER models: A Search for Performance and Efficiency
  • 2021
  • Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), pp. 124–134. Reykjavík, Iceland.. - Linköping : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789179296148
  • Konferensbidrag (refereegranskat)abstract
    • The current recipe for better model performance within NLP is to increase model size and training data. While it gives us models with increasingly impressive results, it also makes it more difficult to train and deploy state-of-the-art models for NLP due to increasing computational costs. Model compression is a field of research that aims to alleviate this problem. The field encompasses different methods that aim to preserve the performance of a model while decreasing the size of it. One such method is knowledge distillation. In this article, we investigate the effect of knowledge distillation for named entity recognition models in Swedish. We show that while some sequence tagging models benefit from knowledge distillation, not all models do. This prompts us to ask questions about in which situations and for which models knowledge distillation is beneficial. We also reason about the effect of knowledge distillation on computational costs.
  •  
4.
  • Hansson, Saga, et al. (författare)
  • The Swedish Winogender Dataset
  • 2021
  • Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31 - June 2, 2021, Reykjavik, Iceland (online). - Linköping : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789179296148
  • Konferensbidrag (refereegranskat)abstract
    • We introduce the SweWinogender test set, a diagnostic dataset to measure gender bias in coreference resolution. It is modelled after the English Winogender benchmark, and is released with reference statistics on the distribution of men and women between occupations and the association between gender and occupation in modern corpus material. The paper discusses the design and creation of the dataset, and presents a small investigation of the supplementary statistics.
  •  
5.
  • Hengchen, Simon, 1988, et al. (författare)
  • SuperSim: a test set for word similarity and relatedness in Swedish
  • 2021
  • Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2 2021, Reykjavik, Iceland (online). - Linköping : Linköping Electronic Conference Proceedings. - 1650-3686 .- 1650-3740. - 9789179296148
  • Konferensbidrag (refereegranskat)abstract
    • Language models are notoriously difficult to evaluate. We release SuperSim, a large-scale similarity and relatedness test set for Swedish built with expert human judgments. The test set is composed of 1,360 word-pairs independently judged for both relatedness and similarity by five annotators. We evaluate three different models (Word2Vec, fastText, and GloVe) trained on two separate Swedish datasets, namely the Swedish Gigaword corpus and a Swedish Wikipedia dump, to provide a baseline for future comparison. We release the fully annotated test set, code, baseline models, and data.
  •  
6.
  • Norlund, Tobias, 1991, et al. (författare)
  • Building a Swedish Open-Domain Conversational Language Model
  • 2021
  • Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). - 9789179296148 ; , s. 357-366
  • Konferensbidrag (refereegranskat)abstract
    • We present on-going work of evaluating the, to our knowledge, first large generative language model trained to converse in Swedish, using data from the online discussion forum Flashback. We conduct a human evaluation pilot study that indicates the model is often able to respond to conversations in both a human-like and informative manner, on a diverse set of topics. While data from online forums can be useful to build conversational systems, we reflect on the negative consequences that incautious application might have, and the need for taking active measures to safeguard against them.
  •  
7.
  •  
8.
  • Volodina, Elena, 1973, et al. (författare)
  • CoDeRooMor: A new dataset for non-inflectional morphology studies of Swedish
  • 2021
  • Ingår i: 23rd Nordic Conference on Computational Linguistics (NoDaLiDa) Proceedings, May 31–2 June, 2021, Reykjavik, Iceland Online / Simon Dobnik, Lilja Øvrelid (Editors). - Linköping : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789179296148
  • Konferensbidrag (refereegranskat)abstract
    • The paper introduces a new resource, CoDeRooMor, for studying the morphology of modern Swedish word formation. The approximately 16.000 lexical items in the resource have been manually segmented into word-formation morphemes, and labeled for their categories, such as prefixes, suffixes, roots, etc. Word-formation mechanisms, such as derivation and compounding have been associated with each item on the list. The article describes the selection of items for manual annotation and the principles of annotation, reports on the reliability of the manual annotation, and presents tools, resources and some first statistics. Given the”gold” nature of the resource, it is possible to use it for empirical studies as well as to develop linguistically-aware algorithms for morpheme segmentation and labeling (cf statistical subword approach). The resource is freely available through Språkbanken-Text.
  •  
9.
  • Zechner, Niklas, 1984 (författare)
  • Cross-Topic Author Identification – a Case Study on Swedish Literature
  • 2021
  • Ingår i: The Eighth Swedish Language Technology Conference (SLTC-2020), Selected Contributions, 25–27 November 2020, Gothenburg, Sweden, Online. - Linköping : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789179296148
  • Konferensbidrag (refereegranskat)abstract
    • Using material from the Swedish Literature Bank, we investigate whether common methods of author identification using word frequencies and part of speech frequencies are sensitive to differences in topic. The results show that this is the case, thereby casting doubt on much previous work in author identification. This sets the stage for a broader future study, comparing other methods and generalising the results.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-9 av 9

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy