SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Ekgren Ariel) "

Sökning: WFRF:(Ekgren Ariel)

  • Resultat 1-6 av 6
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Cuba Gyllensten, Amaru, et al. (författare)
  • R-grams: Unsupervised Learning of Semantic Units in Natural Language
  • 2019
  • Ingår i: Proceedings of the 13th International Conference on Computational Semantics - Student Papers.
  • Konferensbidrag (refereegranskat)abstract
    • This paper investigates data-driven segmentation using Re-Pair or Byte Pair Encoding-techniques. In contrast to previous work which has primarily been focused on subword units for machine translation, we are interested in the general properties of such segments above the word level. We call these segments r-grams, and discuss their properties and the effect they have on the token frequency distribution. The proposed approach is evaluated by demonstrating its viability in embedding techniques, both in monolingual and multilingual test settings. We also provide a number of qualitative examples of the proposed methodology, demonstrating its viability as a language-invariant segmentation procedure.
  •  
2.
  • Ekgren, Ariel, et al. (författare)
  • GPT-SW3 : An Autoregressive Language Model for the Scandinavian Languages
  • 2024
  • Ingår i: <em>2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings</em>. - : European Language Resources Association (ELRA). ; , s. 7886-7900
  • Konferensbidrag (refereegranskat)abstract
    • This paper details the process of developing the first native large generative language model for the North Germanic languages, GPT-SW3. We cover all parts of the development process, from data collection and processing, training configuration and instruction finetuning, to evaluation, applications, and considerations for release strategies. We discuss pros and cons of developing large language models for smaller languages and in relatively peripheral regions of the globe, and we hope that this paper can serve as a guide and reference for other researchers that undertake the development of large generative models for smaller languages. 
  •  
3.
  • Gogoulou, Evangelia, et al. (författare)
  • Cross-lingual Transfer of Monolingual Models
  • 2022
  • Ingår i: 2022 Language Resources and Evaluation Conference, LREC 2022. - : European Language Resources Association (ELRA). - 9791095546726 ; , s. 948-955
  • Konferensbidrag (refereegranskat)abstract
    • Recent studies in cross-lingual learning using multilingual models have cast doubt on the previous hypothesis that shared vocabulary and joint pre-training are the keys to cross-lingual generalization. We introduce a method for transferring monolingual models to other languages through continuous pre-training and study the effects of such transfer from four different languages to English. Our experimental results on GLUE show that the transferred models outperform an English model trained from scratch, independently of the source language. After probing the model representations, we find that model knowledge from the source language enhances the learning of syntactic and semantic knowledge in English. ©  licensed under CC-BY-NC-4.0.
  •  
4.
  •  
5.
  • Karlgren, Jussi, et al. (författare)
  • Evaluating learning language representations
  • 2015
  • Konferensbidrag (refereegranskat)abstract
    • Machine learning offers significant benefits for systems that process and understand natural language: (a) lower maintenance and upkeep costs than when using manually-constructed resources, (b) easier portability to new domains, tasks, or languages, and (c) robust and timely adaptation to situation-specific settings. However, the behaviour of an adaptive system is less predictable than when using an edited, stable resource, which makes quality control a continuous issue. This paper proposes an evaluation benchmark for measuring the quality, coverage, and stability of a natural language system as it learns word meaning. Inspired by existing tests for human vocabulary learning, we outline measures for the quality of semantic word representations, such as when learning word embeddings or other distributed representations. These measures highlight differences between the types of underlying learning processes as systems ingest progressively more data.
  •  
6.
  • Karlgren, Jussi, et al. (författare)
  • Semantic Topology
  • 2014
  • Ingår i: Proceedings of the 23d ACM international conference on Conference on information &amp; knowledge management (CIKM '14). - New York : Association for Computing Machinery (ACM). - 9781450325981 ; , s. 1939-1942
  • Konferensbidrag (refereegranskat)abstract
    • A reasonable requirement (among many others) for a lexical or semantic component in an information system is that it should be able to learn incrementally from the linguistic data it is exposed to, that it can distinguish between the topical impact of various terms, and that it knows if it knows stuff or not.We work with a specific representation framework – semantic spaces – which well accommodates the first requirement; in this short paper, we investigate the global qualities of semantic spaces by a topological procedure – mapper – which gives an indication of topical density of the space; we examine the local context of terms of interest in the semantic space using another topologically inspired approach which gives an indication of the neighbourhood of the terms of interest. Our aim is to be able to establish the qualities of the semantic space under consideration without resorting to inspection of the data used to build it.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-6 av 6

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy