SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Adesam Yvonne 1975) "

Search: WFRF:(Adesam Yvonne 1975)

  • Result 1-34 of 34
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Adesam, Yvonne, 1975, et al. (author)
  • Defining the Eukalyptus forest – the Koala treebank of Swedish
  • 2015
  • In: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania. Edited by Beáta Megyesi. - 1650-3686 .- 1650-3740. - 9789175190983 ; , s. 1-9
  • Conference paper (peer-reviewed)abstract
    • This paper details the design of the lexical and syntactic layers of a new annotated corpus of Swedish contemporary texts. In order to make the corpus adaptable into a variety of representations, the annotation is of a hybrid type with head-marked constituents and function-labeled edges, and with a rich annotation of non-local dependencies. The source material has been taken from public sources, to allow the resulting corpus to be made freely available.
  •  
2.
  •  
3.
  • Adesam, Yvonne, 1975, et al. (author)
  • Multiwords, Word Senses and Multiword Senses in the Eukalyptus Treebank of Written Swedish
  • 2015
  • In: Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14), 11–12 December 2015 Warsaw, Poland. - 9788363159184 ; , s. 3-12
  • Conference paper (peer-reviewed)abstract
    • Multiwords reside at the intersection of the lexicon and syntax and in an annotation project, they will affect both levels. In the Eukalyptus treebank of written Swedish, we treat multiwords formally as syntactic objects, which are assigned a lexical type and sense. With the help of a simple dichotomy, analyzed vs unanalyzed multiwords, and the expressiveness of the syntactic annotation formalism employed, we are able to flexibly handle most multiword types and usages.
  •  
4.
  • Adesam, Yvonne, 1975, et al. (author)
  • The Eukalyptus Treebank of Written Swedish
  • 2018
  • In: Seventh Swedish Language Technology Conference (SLTC), Stockholm, 7–9 November 2018.
  • Conference paper (other academic/artistic)
  •  
5.
  •  
6.
  • Johansson, Richard, 1975, et al. (author)
  • A Multi-domain Corpus of Swedish Word Sense Annotation
  • 2016
  • In: 10th edition of the Language Resources and Evaluation Conference, 23-28 May 2016, Portorož (Slovenia). - : European Language Resources Association. - 9782951740891
  • Conference paper (peer-reviewed)abstract
    • We describe the word sense annotation layer in Eukalyptus, a freely available five-domain corpus of contemporary Swedish with several annotation layers. The annotation uses the SALDO lexicon to define the sense inventory, and allows word sense annotation of compound segments and multiword units. We give an overview of the new annotation tool developed for this project, and finally present an analysis of the inter-annotator agreement between two annotators.
  •  
7.
  • Johansson, Richard, 1975, et al. (author)
  • Training a Swedish Constituency Parser on Six Incompatible Treebanks
  • 2020
  • In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). - : European Language Resources Association (ELRA).
  • Conference paper (peer-reviewed)abstract
    • We investigate a transition-based parser that usesEukalyptus, a function-tagged constituent treebank for Swedish which includesdiscontinuous constituents. In addition, we show that the accuracy of this parser can be improved by using a multitask learning architecture that makes it possible to train the parser on additional treebanks that use other annotation models.
  •  
8.
  • Adesam, Yvonne, 1975, et al. (author)
  • A lexical resource for computational historical linguistics
  • 2021
  • In: The Swedish FrameNet++. Harmonization, integration, method development and practical language technology applications. - Amsterdam / Philadelphia : John Benjamins Publishing Company. - 1567-8202. - 978 90 272 5848 9 ; , s. 98-121
  • Book chapter (peer-reviewed)abstract
    • In this chapter we present the diachronic dimension of Swedish FrameNet++. We describe the historical lexical resources currently available for Swedish, linked to the Contemporary Swedish lexicon Saldo. We present a case study of how interlinking the dictionaries simultaneously allows us to study lexical change. We also present a method of linking text words to lexicon entries, facilitating interactive exploration of historical texts. Diachronical language resources present both a high-variation challenge from a wider language technology perspective, and an interesting object of linguistic study. While a number of improvements of the parts of the diachronic lexical macroresource are still needed, this resource is invaluable for analysing and accessing historical texts, as well as for both synchronic historical and diachronic lexical studies.
  •  
9.
  •  
10.
  • Adesam, Yvonne, 1975, et al. (author)
  • Computer-aided Morphology Expansion for Old Swedish
  • 2014
  • In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) May 26-31, 2014 Reykjavik, Iceland. - 9782951740884 ; , s. 1102-1105
  • Conference paper (peer-reviewed)abstract
    • In this paper we describe and evaluate a tool for paradigm induction and lexicon extraction that has been applied to Old Swedish. The tool is semi-supervised and uses a small seed lexicon and unannotated corpora to derive full inflection tables for input lemmata. In the work presented here, the tool has been modified to deal with the rich spelling variation found in Old Swedish texts. We also present some initial experiments, which are the first steps towards creating a large-scale morphology for Old Swedish.
  •  
11.
  •  
12.
  • Adesam, Yvonne, 1975, et al. (author)
  • Exploring the Quality of the Digital Historical Newspaper Archive KubHist
  • 2019
  • In: Proceedings of the 4th Conference of The Association Digital Humanities in the Nordic Countries (DHN), Copenhagen, Denmark, March 5-8, 2019 / edited by Costanza Navarretta, Manex Agirrezabal, Bente Maegaard. - Aachen : CEUR Workshop Proceedings. - 1613-0073.
  • Conference paper (peer-reviewed)abstract
    • The KubHist Corpus is a massive corpus of Swedish historical newspapers, digitized by the Royal Swedish library, and available through the Språkbanken corpus infrastructure Korp. This paper contains a first overview of the KubHist corpus, exploring some of the difficulties with the data, such as OCR errors and spelling variation, and discussing possible paths for improving the quality and the searchability.
  •  
13.
  • Adesam, Yvonne, 1975, et al. (author)
  • FSvReader – Exploring Old Swedish Cultural Heritage Texts
  • 2018
  • In: CEUR Workshop Proceedings, vol. 2084. Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018. Edited by Eetu, Mäkelä Mikko, Tolonen Jouni Tuominen. - Helsinki : University of Helsinki, Faculty of Arts. - 1613-0073.
  • Conference paper (peer-reviewed)abstract
    • This paper describes FSvReader, a tool for easier access to Old Swedish (13th–16th century) texts. Through automatic fuzzy linking of words in a text to a dictionary describing the language of the time, the reader has direct access to dictionary pop-up definitions, in spite of the large amount of morphological and spelling variation. The linked dictionary entries can also be used for simple searches in the text, highlighting possible further instances of the same entry.
  •  
14.
  • Adesam, Yvonne, 1975, et al. (author)
  • Old Swedish Part-of-Speech Tagging between Variation and External Knowledge
  • 2016
  • In: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Berlin, Germany, August 11, 2016. - Stroudsburg, PA : Association for Computational Linguistics. - 9781945626098
  • Conference paper (peer-reviewed)
  •  
15.
  • Adesam, Yvonne, 1975, et al. (author)
  • Part-of-speech tagging of Swedish texts in the neural era
  • 2021
  • In: Proceedings of the 23rd Nordic Conference on Computational Linguistics, NoDaLiDa, May 31–2 June, 2021, Reykjavik, Iceland (online) / eds Simon Dobnik and Lilja Øvrelid. - Linköping : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789179296148
  • Conference paper (peer-reviewed)
  •  
16.
  •  
17.
  •  
18.
  • Adesam, Yvonne, 1975, et al. (author)
  • Språkteknologi för svenska språket genom tiderna
  • 2016
  • In: Kungliga Skytteanska Samfundets Handlingar. - Umeå : Institutionen för språkstudier, Umeå universitet & Kungl. Skytteanska Samfundet. - 0560-2416. ; 76:Studier i svensk språkhistoria 13, s. 65-87, s. 65-87
  • Journal article (peer-reviewed)abstract
    • Språkbanken, the Swedish Language Bank, is a language technology research unit at the Department of Swedish, University of Gothenburg. We develop language resources – such as corpora, lexical resources, and analytical tools – for all variants of Swedish, from Old Swedish laws to present-day social media. Historical texts offer exciting theoretical and methodological challenges for language technology because they often defy the assumption inherent in most automatic analysis tools that the texts contain a standardized written language. In this article, we describe our ongoing work on the development of annotated historical corpora, as well as our efforts on linking various resources (both corpora and lexical resources). This research advances the state of the art of language technology as well as enables new research for scholars in other disciplines.
  •  
19.
  •  
20.
  • Adesam, Yvonne, 1975, et al. (author)
  • The Koala Part-of-Speech Tagset
  • 2019
  • In: Northern European Journal of Language Technology. - : Linkoping University Electronic Press. - 2000-1533. ; 6, s. 5-41
  • Journal article (peer-reviewed)abstract
    • We present the Koala part-of-speech tagset for written Swedish. The categorization takes the Swedish Academy Grammar (SAG) as its main starting point, to fit with the current descriptive view on Swedish grammar. We argue that neither SAG, as is, nor any of the existing part-of-speech tagsets meet our requirements for a broadly applicable categorization. Our proposal is outlined and compared to the other descriptions, and motivations for both the tagset as a whole as well as decisions about individual tags are discussed.
  •  
21.
  • Adesam, Yvonne, 1975- (author)
  • The Multilingual Forest : Investigating High-quality Parallel Corpus Development
  • 2012
  • Doctoral thesis (other academic/artistic)abstract
    • This thesis explores the development of parallel treebanks, collections of language data consisting of texts and their translations, with syntactic annotation and alignment, linking words, phrases, and sentences to show translation equivalence. We describe the semi-manual annotation of the SMULTRON parallel treebank, consisting of 1,000 sentences in English, German and Swedish. This description is the starting point for answering the first of two questions in this thesis.What issues need to be considered to achieve a high-quality, consistent,parallel treebank?The units of annotation and the choice of annotation schemes are crucial for quality, and some automated processing is necessary to increase the size. Automatic quality checks and evaluation are essential, but manual quality control is still needed to achieve high quality.Additionally, we explore improving the automatically created annotation for one language, using information available from the annotation of the other languages. This leads us to the second of the two questions in this thesis.Can we improve automatic annotation by projecting information available in the other languages?Experiments with automatic alignment, which is projected from two language pairs, L1–L2 and L1–L3, onto the third pair, L2–L3, show an improvement in precision, in particular if the projected alignment is intersected with the system alignment. We also construct a test collection for experiments on annotation projection to resolve prepositional phrase attachment ambiguities. While majority vote projection improves the annotation, compared to the basic automatic annotation, using linguistic clues to correct the annotation before majority vote projection is even better, although more laborious. However, some structural errors cannot be corrected by projection at all, as different languages have different wording, and thus different structures.
  •  
22.
  • Berdicevskis, Aleksandrs, 1983, et al. (author)
  • Superlim: A Swedish Language Understanding Evaluation Benchmark
  • 2023
  • In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, December 6-10, 2023, Singapore / Houda Bouamor, Juan Pino, Kalika Bali (Editors). - Stroudsburg, PA : Association for Computational Linguistics. - 9798891760608
  • Conference paper (peer-reviewed)
  •  
23.
  • Berdicevskis, Aleksandrs, 1983, et al. (author)
  • To drop or not to drop? Predicting the omission of the infinitival marker in a Swedish future construction
  • 2024
  • In: Corpus Linguistics and Linguistic Theory. - 1613-7027 .- 1613-7035. ; 20:1, s. 219-261
  • Journal article (peer-reviewed)abstract
    • We investigate the optional omission of the infinitival marker in a Swedish future tense construction. During the last two decades the frequency of omission has been rapidly increasing, and this process has received considerable attention in the literature. We test whether the knowledge which has been accumulated can yield accurate predictions of language variation and change. We extracted all occurrences of the construction from a very large collection of corpora. The dataset was automatically annotated with language-internal predictors which have previously been shown or hypothesized to affect the variation. We trained several models in order to make two kinds of predictions: whether the marker will be omitted in a specific utterance and how large the proportion of omissions will be for a given time period. For most of the approaches we tried, we were not able to achieve a better-than-baseline performance. The only exception was predicting the proportion of omissions using autoregressive integrated moving average models for one-step-ahead forecast, and in this case time was the only predictor that mattered. Our data suggest that most of the language-internal predictors do have some effect on the variation, but the effect is not strong enough to yield reliable predictions.
  •  
24.
  • Berdicevskis, Aleksandrs, 1983, et al. (author)
  • We may actually all die tomorrow... nevertheless: Predicting short-term frequency changes in Swedish neologisms
  • 2022
  • In: Live and learn: Festschrift in honor of Lars Borin / Editors: Elena Volodina, Dana Dannélls, Aleksandrs Berdicevskis, Markus Forsberg, Shafqat Virk. - Göteborg : Institutionen för svenska, flerspråkighet och språkteknologi, Göteborgs universitet. - 1401-5919. - 9789187850837 ; , s. 5-12
  • Book chapter (other academic/artistic)abstract
    • Predicting the future is difficult, as Lars Borin likes to point out by saying the phrase which is included in the title of this paper. Nevertheless, we attempt to predict short-term changes in the frequency of new Swedish words based on some measures of their linguistic and social dissemination. We show that it is possible to predict the direction of change with a higher-than-baseline accuracy. Most interestingly, we show that predictions are much less accurate for those words that denote new phenomena than for those who are new signifiers for already existing phenomena.
  •  
25.
  •  
26.
  •  
27.
  •  
28.
  • Bouma, Gerlof, 1979, et al. (author)
  • Part-of-speech and Morphology Tagging Old Swedish
  • 2016
  • In: Proceedings of the Sixth Swedish Language Technology Conference (SLTC) Umeå University, 17-18 November, 2016.
  • Conference paper (other academic/artistic)
  •  
29.
  • Cap, Fabienne, et al. (author)
  • SWORD : Towards Cutting-Edge Swedish Word Processing
  • 2016
  • In: Proceedings of SLTC 2016.
  • Conference paper (peer-reviewed)abstract
    • Despite many years of research on Swedish language technology, there is still no well-documented standard for Swedish word processing covering the whole spectrum from low-level tokenization to morphological analysis and disambiguation. SWORD is a new initiative within the SWE-CLARIN consortium aiming to develop documented standards for Swedish word processing. In this paper, we report on a pilot study of Swedish tokenization, where we compare the output of six different tokenizers on four different text types. For one text type (Wikipedia articles), we also compare to the tokenization produced by six manual annotators.
  •  
30.
  • Coussé, Evie, 1980, et al. (author)
  • Hur används de, dem och dom i nutida skriftspråk? En storskalig korpusundersökning av nyheter och sociala medier : How are the spelling variants de, dem och dom used in contemporary written Swedish? A large-scale corpus study of news texts and social media
  • 2023
  • In: Språk & Stil. - 1101-1165 .- 2002-4010. ; NF 33, s. 39-70
  • Journal article (peer-reviewed)abstract
    • This study ties in with a longstanding debate on the Swedish spelling variants de, dem and dom for personal pronouns (third person plural) and definite articles (plural). It charts the usage of de, dem and dom in five large corpora with news and social media texts over the past 25 years. The corpora contain more than 1.5 billion tokens, which rules out manual handling of the data. Instead, this study makes use of computational methods (including an AI language model) to automatically identify and classify relevant observations. Analysis of the news corpora shows a relatively stable usage of de, dem and dom over the past 25 years. The forms de and dem are predominantly used according to the norm: de for pronouns in subject position and as a definite article; dem for pronouns in object position. The colloquial form dom is hardly found in news texts. Analysis of the social media corpora shows more variation and change. The colloquial form dom is used in 5–25% of all instances instead of de or dem and has decreased after an initial rise. The forms de and dem are sometimes used in a non-standard way: de occurs in object position in 4–10% of the observations; dem is found in subject position or as a definite article in 1–7% of the cases. Non-standard dem is potentially on the rise with younger writers. The corpus analysis also provides details on the usage of de and dem in relative clauses, and on the users’ ratings of posts containing de, dem and dom on the social media platform Reddit
  •  
31.
  •  
32.
  • Hansson, Saga, et al. (author)
  • The Swedish Winogender Dataset
  • 2021
  • In: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31 - June 2, 2021, Reykjavik, Iceland (online). - Linköping : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789179296148
  • Conference paper (peer-reviewed)abstract
    • We introduce the SweWinogender test set, a diagnostic dataset to measure gender bias in coreference resolution. It is modelled after the English Winogender benchmark, and is released with reference statistics on the distribution of men and women between occupations and the association between gender and occupation in modern corpus material. The paper discusses the design and creation of the dataset, and presents a small investigation of the supplementary statistics.
  •  
33.
  • Ljunglöf, Peter, 1971, et al. (author)
  • Assessing the quality of Språkbanken’s annotations
  • 2019
  • Reports (other academic/artistic)abstract
    • Most of the corpora in Språkbanken Text consist of unannotated plain text, such as almost all newspaper texts, social media texts, novels and official documents. We also have some corpora that are manually annotated in different ways, such as Talbanken (annotated for part-of-speech and syntactic structure), and the Stockholm Umeå Corpus (annotated for part-of-speech). Språkbanken’s annotation pipeline Sparv aims to automatise the work of automatically annotating all our corpora, while still keeping the manual annotations intact. When all corpora are annotated, they can be made available, e.g., in the corpus searh tools Korp and Strix. Until now there has not been any comprehensive overview of the annotation tools and models that Sparv has been using for the last eight years. Some of them have not been updated since the start, such as the part-of-speech tagger Hunpos and the dependency parser MaltParser. There are also annotation tools that we still have not included, such as a constituency-based parser. Therefore Språkbanken initiated a project with the aim of conducting such an overview. This document is the outcome of that project, and it contains descriptions of the types of manual and automatic annotations that we currently have in Språkbanken, as well as an incomplete overview of the state-of-the-art with regards to annotation tools and models.
  •  
34.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-34 of 34

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view