SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "hsv:(NATURVETENSKAP) hsv:(Data och informationsvetenskap) ;pers:(Nivre Joakim 1962)"

Sökning: hsv:(NATURVETENSKAP) hsv:(Data och informationsvetenskap) > Nivre Joakim 1962

  • Resultat 1-10 av 148
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Dürlich, Luise, et al. (författare)
  • What Causes Unemployment? : Unsupervised Causality Mining from Swedish Governmental Reports
  • 2023
  • Ingår i: Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023). - : Association for Computational Linguistics. - 9781959429739 ; , s. 25-29
  • Konferensbidrag (refereegranskat)abstract
    • Extracting statements about causality from text documents is a challenging task in the absence of annotated training data. We create a search system for causal statements about user-specified concepts by combining pattern matching of causal connectives with semantic similarity ranking, using a language model fine-tuned for semantic textual similarity. Preliminary experiments on a small test set from Swedish governmental reports show promising results in comparison to two simple baselines.
  •  
2.
  • Hershcovich, Daniel, et al. (författare)
  • Kopsala : Transition-Based Graph Parsing via Efficient Training and Effective Encoding
  • 2020
  • Ingår i: 16th International Conference on Parsing Technologies and IWPT 2020 Shared Task on Parsing Into Enhanced Universal Dependencies. - Stroudsburg, PA, USA : Association for Computational Linguistics. - 9781952148118 ; , s. 236-244
  • Konferensbidrag (refereegranskat)abstract
    • We present Kopsala, the Copenhagen-Uppsala system for the Enhanced Universal Dependencies Shared Task at IWPT 2020. Our system is a pipeline consisting of off-the-shelf models for everything but enhanced graph parsing, and for the latter, a transition-based graph parser adapted from Che et al. (2019). We train a single enhanced parser model per language, using gold sentence splitting and tokenization for training, and rely only on tokenized surface forms and multilingual BERT for encoding. While a bug introduced just before submission resulted in a severe drop in precision, its post-submission fix would bring us to 4th place in the official ranking, according to average ELAS. Our parser demonstrates that a unified pipeline is elective for both Meaning Representation Parsing and Enhanced Universal Dependencies.
  •  
3.
  • Nivre, Joakim, 1962-, et al. (författare)
  • Nucleus Composition in Transition-based Dependency Parsing
  • 2022
  • Ingår i: Computational linguistics - Association for Computational Linguistics (Print). - : MIT Press Journals. - 0891-2017 .- 1530-9312. ; 48:4, s. 849-886
  • Tidskriftsartikel (refereegranskat)abstract
    • Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tesnière instead posits a more abstract notion of nucleus, which may be realized as one or more words. In this article, we investigate the effect of enriching computational parsing models with a concept of nucleus inspired by Tesnière. We begin by reviewing how the concept of nucleus can be defined in the framework of Universal Dependencies, which has become the de facto standard for training and evaluating supervised dependency parsers, and explaining how composition functions can be used to make neural transition-based dependency parsers aware of the nuclei thus defined. We then perform an extensive experimental study, using data from 20 languages to assess the impact of nucleus composition across languages with different typological characteristics, and utilizing a variety of analytical tools including ablation, linear mixed-effects models, diagnostic classifiers, and dimensionality reduction. The analysis reveals that nucleus composition gives small but consistent improvements in parsing accuracy for most languages, and that the improvement mainly concerns the analysis of main predicates, nominal dependents, clausal dependents, and coordination structures. Significant factors explaining the rate of improvement across languages include entropy in coordination structures and frequency of certain function words, in particular determiners. Analysis using dimensionality reduction and diagnostic classifiers suggests that nucleus composition increases the similarity of vectors representing nuclei of the same syntactic type. 
  •  
4.
  • Carlsson, Fredrik, et al. (författare)
  • BRANCH-GAN : IMPROVING TEXT GENERATION WITH (NOT SO) LARGE LANGUAGE MODELS
  • 2024
  • Konferensbidrag (refereegranskat)abstract
    • The current advancements in open domain text generation have been spearheaded by Transformer-based large language models. Leveraging efficient parallelization and vast training datasets, these models achieve unparalleled text generation capabilities. Even so, current models are known to suffer from deficiencies such as repetitive texts, looping issues, and lack of robustness. While adversarial training through generative adversarial networks (GAN) is a proposed solution, earlier research in this direction has predominantly focused on older architectures, or narrow tasks. As a result, this approach is not yet compatible with modern language models for open-ended text generation, leading to diminished interest within the broader research community. We propose a computationally efficient GAN approach for sequential data that utilizes the parallelization capabilities of Transformer models. Our method revolves around generating multiple branching sequences from each training sample, while also incorporating the typical next-step prediction loss on the original data. In this way, we achieve a dense reward and loss signal for both the generator and the discriminator, resulting in a stable training dynamic. We apply our training method to pre-trained language models, using data from their original training set but less than 0.01% of the available data. A comprehensive human evaluation shows that our method significantly improves the quality of texts generated by the model while avoiding the previously reported sparsity problems of GAN approaches. Even our smaller models outperform larger original baseline models with more than 16 times the number of parameters. Finally, we corroborate previous claims that perplexity on held-out data is not a sufficient metric for measuring the quality of generated texts. 
  •  
5.
  • Dürlich, Luise, et al. (författare)
  • Overview of the CLEF-2024 Eloquent Lab : Task 2 on HalluciGen
  • 2024
  • Ingår i: <em>CEUR Workshop Proceedings</em>. - : CEUR-WS. ; , s. 691-702
  • Konferensbidrag (refereegranskat)abstract
    • In the HalluciGen task we aim to discover whether LLMs have an internal representation of hallucination. Specifically, we investigate whether LLMs can be used to both generate and detect hallucinated content. In the cross-model evaluation setting we take this a step further and explore the viability of using an LLM to evaluate output produced by another LLM. We include generation, detection, and cross-model evaluation steps for two scenarios: paraphrase and machine translation. Overall we find that performance of the baselines and submitted systems is highly variable, however initial results are promising and lessons learned from this year’s task will provide a solid foundation for future iterations of the task. In particular, we highlight that human validation of generated output is ideally necessary to ensure the robustness of the cross-model evaluation results. We aim to address this challenge in future iterations of HalluciGen. 
  •  
6.
  • Karlgren, Jussi, et al. (författare)
  • ELOQUENT CLEF Shared Tasks for Evaluation of Generative Language Model Quality
  • 2024
  • Ingår i: Lecture Notes in Computer Science. - : Springer Science and Business Media Deutschland GmbH. - 0302-9743 .- 1611-3349. ; 14612 LNCS, s. 459-465
  • Tidskriftsartikel (refereegranskat)abstract
    • ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to bring together some high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. The selected tasks for this first year of ELOQUENT are (1) probing a language model for topical competence; (2) assessing the ability of models to generate and detect hallucinations; (3) assessing the robustness of a model output given variation in the input prompts; and (4) establishing the possibility to distinguish human-generated text from machine-generated text.
  •  
7.
  • Lindqvist, Ellinor, et al. (författare)
  • Low-Resource Techniques for Analysing the Rhetorical Structure of Swedish Historical Petitions
  • 2023
  • Ingår i: <em>RESOURCEFUL 2023 - Workshop on Resources and Representations for Under-Resourced Languages and Domains, Proceedings of the 2nd</em>. - : Association for Computational Linguistics. ; , s. 132-139
  • Konferensbidrag (refereegranskat)abstract
    • Natural language processing techniques can be valuable for improving and facilitating historical research. This is also true for the analysis of petitions, a source which has been relatively little used in historical research. However, limited data resources pose challenges for mainstream natural language processing approaches based on machine learning. In this paper, we explore methods for automatically segmenting petitions according to their rhetorical structure. We find that the use of rules, word embeddings, and especially keywords can give promising results for this task.
  •  
8.
  • Baldwin, Timothy, et al. (författare)
  • Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics
  • 2021
  • Ingår i: Dagstuhl Reports. - Dagstuhl. - 2192-5283. ; 11:7, s. 89-138
  • Tidskriftsartikel (refereegranskat)abstract
    • Computational linguistics builds models that can usefully process and produce language and thatcan increase our understanding of linguistic phenomena. From the computational perspective,language data are particularly challenging notably due to their variable degree of idiosyncrasy(unexpected properties shared by few peer objects), and the pervasiveness of non-compositionalphenomena such as multiword expressions (whose meaning cannot be straightforwardly deducedfrom the meanings of their components, e.g. red tape, by and large, to pay a visit and to pullone’s leg) and constructions (conventional associations of forms and meanings). Additionally, ifmodels and methods are to be consistent and valid across languages, they have to face specificitiesinherent either to particular languages, or to various linguistic traditions.These challenges were addressed by the Dagstuhl Seminar 21351 entitled “Universals ofLinguistic Idiosyncrasy in Multilingual Computational Linguistics”, which took place on 30-31 August 2021. Its main goal was to create synergies between three distinct though partlyoverlapping communities: experts in typology, in cross-lingual morphosyntactic annotation and inmultiword expressions. This report documents the program and the outcomes of the seminar. Wepresent the executive summary of the event, reports from the 3 Working Groups and abstracts ofindividual talks and open problems presented by the participants.
  •  
9.
  • Ballesteros, Miguel, et al. (författare)
  • MaltOptimizer : Fast and Effective Parser Optimization
  • 2016
  • Ingår i: Natural Language Engineering. - 1351-3249 .- 1469-8110. ; 22:2, s. 187-213
  • Tidskriftsartikel (refereegranskat)abstract
    • Statistical parsers often require careful parameter tuning and feature selection. This is a nontrivial task for application developers who are not interested in parsing for its own sake, and it can be time-consuming even for experienced researchers. In this paper we present MaltOptimizer, a tool developed to automatically explore parameters and features for MaltParser, a transition-based dependency parsing system that can be used to train parser's given treebank data. MaltParser provides a wide range of parameters for optimization, including nine different parsing algorithms, an expressive feature specification language that can be used to define arbitrarily rich feature models, and two machine learning libraries, each with their own parameters. MaltOptimizer is an interactive system that performs parser optimization in three stages. First, it performs an analysis of the training set in order to select a suitable starting point for optimization. Second, it selects the best parsing algorithm and tunes the parameters of this algorithm. Finally, it performs feature selection and tunes machine learning parameters. Experiments on a wide range of data sets show that MaltOptimizer quickly produces models that consistently outperform default settings and often approach the accuracy achieved through careful manual optimization.
  •  
10.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 148
Typ av publikation
konferensbidrag (104)
tidskriftsartikel (28)
bokkapitel (7)
samlingsverk (redaktörskap) (3)
bok (3)
proceedings (redaktörskap) (1)
visa fler...
annan publikation (1)
doktorsavhandling (1)
visa färre...
Typ av innehåll
refereegranskat (135)
övrigt vetenskapligt/konstnärligt (12)
populärvet., debatt m.m. (1)
Författare/redaktör
Stymne, Sara, 1977- (10)
de Lhoneux, Miryam, ... (9)
Ginter, Filip (8)
Hall, Johan (8)
Kulmizev, Artur (7)
visa fler...
Dürlich, Luise (6)
Hajic, Jan (6)
Goldberg, Yoav (6)
de Marneffe, Marie-C ... (6)
Manning, Christopher ... (6)
Dubremetz, Marie, 19 ... (6)
Nilsson, Jens (5)
Gogoulou, Evangelia (5)
Schuster, Sebastian (5)
Tiedemann, Jörg (4)
Megyesi, Beata (4)
Ballesteros, Miguel (4)
Basirat, Ali, 1982- (4)
Zeman, Daniel (4)
Nilsson, Mattias (3)
Sahlgren, Magnus (3)
Zhang, Yue (3)
Kuhlmann, Marco (3)
Gómez-Rodríguez, Car ... (3)
Hardmeier, Christian (3)
Bohnet, Bernd (3)
Oepen, Stephan (3)
Bunt, Harry (3)
Löwe, Welf (2)
Abdou, Mostafa (2)
Ravishankar, Vinit (2)
Liwicki, Marcus (2)
Sandin, Fredrik, 197 ... (2)
Karlgren, Jussi (2)
Guillou, Liane (2)
Carlsson, Fredrik (2)
Pettersson, Eva (2)
Baldwin, Timothy (2)
Savary, Agata (2)
Bengoetxea, Kepa (2)
Agirre, Eneko (2)
Gojenola, Koldo (2)
Johansson, Richard (2)
Boguslavsky, Igor (2)
Farkas, Richard (2)
Øvrelid, Lilja (2)
Buljan, Maja (2)
Constant, Matthieu (2)
Silveira, Natalia (2)
visa färre...
Lärosäte
Uppsala universitet (142)
RISE (14)
Linnéuniversitetet (4)
Luleå tekniska universitet (2)
Linköpings universitet (2)
Stockholms universitet (1)
visa fler...
visa färre...
Språk
Engelska (147)
Svenska (1)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (148)
Humaniora (8)
Teknik (1)
Medicin och hälsovetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy