SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Nivre Joakim 1962 ) "

Sökning: WFRF:(Nivre Joakim 1962 )

  • Resultat 1-50 av 150
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Baldwin, Timothy, et al. (författare)
  • Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics
  • 2021
  • Ingår i: Dagstuhl Reports. - Dagstuhl. - 2192-5283. ; 11:7, s. 89-138
  • Tidskriftsartikel (refereegranskat)abstract
    • Computational linguistics builds models that can usefully process and produce language and thatcan increase our understanding of linguistic phenomena. From the computational perspective,language data are particularly challenging notably due to their variable degree of idiosyncrasy(unexpected properties shared by few peer objects), and the pervasiveness of non-compositionalphenomena such as multiword expressions (whose meaning cannot be straightforwardly deducedfrom the meanings of their components, e.g. red tape, by and large, to pay a visit and to pullone’s leg) and constructions (conventional associations of forms and meanings). Additionally, ifmodels and methods are to be consistent and valid across languages, they have to face specificitiesinherent either to particular languages, or to various linguistic traditions.These challenges were addressed by the Dagstuhl Seminar 21351 entitled “Universals ofLinguistic Idiosyncrasy in Multilingual Computational Linguistics”, which took place on 30-31 August 2021. Its main goal was to create synergies between three distinct though partlyoverlapping communities: experts in typology, in cross-lingual morphosyntactic annotation and inmultiword expressions. This report documents the program and the outcomes of the seminar. Wepresent the executive summary of the event, reports from the 3 Working Groups and abstracts ofindividual talks and open problems presented by the participants.
  •  
2.
  • Ballesteros, Miguel, et al. (författare)
  • MaltOptimizer : Fast and Effective Parser Optimization
  • 2016
  • Ingår i: Natural Language Engineering. - 1351-3249 .- 1469-8110. ; 22:2, s. 187-213
  • Tidskriftsartikel (refereegranskat)abstract
    • Statistical parsers often require careful parameter tuning and feature selection. This is a nontrivial task for application developers who are not interested in parsing for its own sake, and it can be time-consuming even for experienced researchers. In this paper we present MaltOptimizer, a tool developed to automatically explore parameters and features for MaltParser, a transition-based dependency parsing system that can be used to train parser's given treebank data. MaltParser provides a wide range of parameters for optimization, including nine different parsing algorithms, an expressive feature specification language that can be used to define arbitrarily rich feature models, and two machine learning libraries, each with their own parameters. MaltOptimizer is an interactive system that performs parser optimization in three stages. First, it performs an analysis of the training set in order to select a suitable starting point for optimization. Second, it selects the best parsing algorithm and tunes the parameters of this algorithm. Finally, it performs feature selection and tunes machine learning parameters. Experiments on a wide range of data sets show that MaltOptimizer quickly produces models that consistently outperform default settings and often approach the accuracy achieved through careful manual optimization.
  •  
3.
  •  
4.
  •  
5.
  • Basirat, Ali, 1982-, et al. (författare)
  • Real-valued syntactic word vectors
  • 2020
  • Ingår i: Journal of experimental and theoretical artificial intelligence (Print). - 0952-813X .- 1362-3079. ; 32:4, s. 557-579
  • Tidskriftsartikel (refereegranskat)abstract
    • We introduce a word embedding method that generates a set of real-valued word vectors from a distributional semantic space. The semantic space is built with a set of context units (words) which are selected by an entropy-based feature selection approach with respect to the certainty involved in their contextual environments. We show that the most predictive context of a target word is its preceding word. An adaptive transformation function is also introduced that reshapes the data distribution to make it suitable for dimensionality reduction techniques. The final low-dimensional word vectors are formed by the singular vectors of a matrix of transformed data. We show that the resulting word vectors are as good as other sets of word vectors generated with popular word embedding methods.
  •  
6.
  • Basirat, Ali, 1982-, et al. (författare)
  • Real-valued Syntactic Word Vectors (RSV) for Greedy Neural Dependency Parsing
  • 2017
  • Konferensbidrag (refereegranskat)abstract
    • We show that a set of real-valued word vectors formed by right singular vectors of a transformed co-occurrence matrix are meaningful for determining different types of dependency relations between words. Our experimental results on the task of dependency parsing confirm the superiority of the word vectors to the other sets of word vectors generated by popular methods of word embedding. We also study the effect of using these vectors on the accuracy of dependency parsing in different languages versus using more complex parsing architectures.
  •  
7.
  • Basirat, Ali, Postdoctoral Researcher, 1982-, et al. (författare)
  • Syntactic Nuclei in Dependency Parsing – : A Multilingual Exploration
  • 2021
  • Ingår i: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. - Stroudsburg, PA, USA : Association for Computational Linguistics. - 9781954085022 ; , s. 1376-1387
  • Konferensbidrag (refereegranskat)abstract
    • Standard models for syntactic dependency parsing take words to be the elementary units that enter into dependency relations. In this paper, we investigate whether there are any benefits from enriching these models with the more abstract notion of nucleus proposed by Tesniere. We do this by showing how the concept of nucleus can be defined in the framework of Universal Dependencies and how we can use composition functions to make a transition-based dependency parser aware of this concept. Experiments on 12 languages show that nucleus composition gives small but significant improvements in parsing accuracy. Further analysis reveals that the improvement mainly concerns a small number of dependency relations, including relations of coordination, direct objects, nominal modifiers, and main predicates.
  •  
8.
  • Bengoetxea, Kepa, et al. (författare)
  • On WordNet Semantic Classes and Dependency Parsing
  • 2014
  • Ingår i: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). ; , s. 649-655
  • Konferensbidrag (refereegranskat)
  •  
9.
  •  
10.
  •  
11.
  •  
12.
  • Buljan, Maja, et al. (författare)
  • A Tale of Four Parsers : Methodological Reflections on Diagnostic Evaluation and In-Depth Error Analysis for Meaning Representation Parsing
  • 2022
  • Ingår i: Language Resources and Evaluation. - : Springer Science and Business Media LLC. - 1574-020X .- 1574-0218. ; 56:4, s. 1075-1102
  • Tidskriftsartikel (refereegranskat)abstract
    • We discuss methodological choices in diagnostic evaluation and error analysis in meaning representation parsing (MRP), i.e. mapping from natural language utterances to graph-based encodings of semantic structure. We expand on a pilot quantitative study in contrastive diagnostic evaluation, inspired by earlier work in syntactic dependency parsing, and propose a novel methodology for qualitative error analysis. This two-pronged study is performed using a selection of submissions, data, and evaluation tools featured in the 2019 shared task on MRP. Our aim is to devise methods for identifying strengths and weaknesses in different broad families of parsing techniques, as well as investigating the relations between specific parsing approaches, different meaning representation frameworks, and individual linguistic phenomena—by identifying and comparing common error patterns. Our preliminary empirical results suggest that the proposed methodologies can be meaningfully applied to parsing into graph-structured target representations, as a side-effect uncovering hitherto unknown properties of the different systems that can inform future development and cross-fertilization across approaches.
  •  
13.
  • Buljan, Maja, et al. (författare)
  • A Tale of Three Parsers : Towards Diagnostic Evaluation for Meaning Representation Parsing
  • 2020
  • Ingår i: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). - Paris : European Language Resources Association (ELRA). - 9791095546344 ; , s. 1902-1909
  • Konferensbidrag (refereegranskat)abstract
    • We discuss methodological choices in contrastive and diagnostic evaluation in meaning representation parsing, i.e. mapping from natural language utterances to graph-based encodings of semantic structure. Drawing inspiration from earlier work in syntactic dependency parsing, we transfer and refine several quantitative diagnosis techniques for use in the context of the 2019 shared task on Meaning Representation Parsing (MRP). As in parsing proper, moving evaluation from simple rooted trees to general graphs brings along its own range of challenges. Specifically, we seek to begin to shed light on relative strenghts and weaknesses in different broad families of parsing techniques. In addition to these theoretical reflections, we conduct a pilot experiment on a selection of top-performing MRP systems and two of the five meaning representation frameworks in the shared task. Empirical results suggest that the proposed methodology can be meaningfully applied to parsing into graph-structured target representations, uncovering hitherto unknown properties of the different systems that can inform future development and cross-fertilization across approaches.
  •  
14.
  • Bunt, Harry, et al. (författare)
  • Grammars, Parsers and Recognizers
  • 2014
  • Ingår i: Journal of Logic and Computation. - : Oxford Journals. ; 24:2, s. 309-
  • Tidskriftsartikel (refereegranskat)
  •  
15.
  • Calacean, Mihaela, et al. (författare)
  • A Data-Driven Dependency Parser for Romanian
  • 2009
  • Ingår i: Proceedings of the Seventh International Workshop on Treebanks and Linguistic Theories.. - 9789078328773 ; , s. 65-76
  • Konferensbidrag (refereegranskat)
  •  
16.
  • Carlsson, Fredrik, et al. (författare)
  • Fine-Grained Controllable Text Generation Using Non-Residual Prompting
  • 2022
  • Ingår i: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. - Stroudsburg, PA, USA : Association for Computational Linguistics. - 9781955917216 ; , s. 6837-6857
  • Konferensbidrag (refereegranskat)abstract
    • The introduction of immensely large Causal Language Models (CLMs) has rejuvenated the interest in open-ended text generation. However, controlling the generative process for these Transformer-based models is at large an unsolved problem. Earlier work has explored either plug-and-play decoding strategies, or more powerful but blunt approaches such as prompting. There hence currently exists a trade-off between fine-grained control, and the capability for more expressive high-level instructions. To alleviate this trade-off, we propose an encoder-decoder architecture that enables intermediate text prompts at arbitrary time steps. We propose a resource-efficient method for converting a pre-trained CLM into this architecture, and demonstrate its potential on various experiments, including the novel task of contextualized word inclusion. Our method provides strong results on multiple experimental settings, proving itself to be both expressive and versatile.
  •  
17.
  •  
18.
  • Constant, Matthieu, et al. (författare)
  • A Transition-Based System for Joint Lexical and Syntactic Analysis
  • 2016
  • Ingår i: PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1. ; , s. 161-171
  • Konferensbidrag (refereegranskat)abstract
    • We present a transition-based system that jointly predicts the syntactic structure and lexical units of a sentence by building two structures over the input words: a syntactic dependency tree and a forest of lexical units including multiword expressions (MWEs). This combined representation allows us to capture both the syntactic and semantic structure of MWEs, which in turn enables deeper downstream semantic analysis, especially for semi-compositional MWEs. The proposed system extends the arc-standard transition system for dependency parsing with transitions for building complex lexical units. Experiments on two different data sets show that the approach significantly improves MWE identification accuracy (and sometimes syntactic accuracy) compared to existing joint approaches.
  •  
19.
  • de Lhoneux, Miryam, 1990-, et al. (författare)
  • Arc-Hybrid Non-Projective Dependency Parsing with a Static-Dynamic Oracle
  • 2017
  • Ingår i: IWPT 2017 15th International Conference on Parsing Technologies. - Pisa, Italy : Association for Computational Linguistics. - 9781945626739 ; , s. 99-104
  • Konferensbidrag (refereegranskat)abstract
    • We extend the arc-hybrid transition system for dependency parsing with a SWAP transition that enables reordering of the words and construction of non-projective trees. Although this extension potentially breaks the arc-decomposability of the transition system, we show that the existing dynamic oracle can be modified and combined with a static oracle for the SWAP transition. Experiments on five languages with different degrees of non-projectivity show that the new system gives competitive accuracy and is significantly better than a system trained with a purely static oracle.
  •  
20.
  • de Lhoneux, Miryam, 1990-, et al. (författare)
  • From raw text to Universal Dependencies : look, no tags!
  • 2017
  • Ingår i: Proceedings of the CoNLL 2017 Shared Task. - Vancouver, Canada : Association for Computational Linguistics. - 9781945626708 ; , s. 207-217
  • Konferensbidrag (refereegranskat)abstract
    • We present the Uppsala submission to the CoNLL 2017 shared task on parsing from raw text to universal dependencies. Our system is a simple pipeline consisting of two components. The first performs joint word and sentence segmentation on raw text; the second predicts dependency trees from raw words. The parser bypasses the need for part-of-speech tagging, but uses word embeddings based on universal tag distributions. We achieved a macroaveraged LAS F1 of 65.11 in the official test run and obtained the 2nd best result for sentence segmentation with a score of 89.03. After fixing two bugs, we obtained an unofficial LAS F1 of 70.49.
  •  
21.
  • de Lhoneux, Miryam, 1990-, et al. (författare)
  • Recursive Subtree Composition in LSTM-Based Dependency Parsing
  • 2019
  • Ingår i: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. - Stroudsburg : Association for Computational Linguistics. - 9781950737130 ; , s. 1566-1576
  • Konferensbidrag (refereegranskat)abstract
    • The need for tree structure modelling on top of sequence modelling is an open issue in neural dependency parsing. We investigate the impact of adding a tree layer on top of a sequential model by recursively composing subtree representations (composition) in a transition-based parser that uses features extracted by a BiLSTM. Composition seems superfluous with such a model, suggesting that BiLSTMs capture information about subtrees. We perform model ablations to tease out the conditions under which composition helps. When ablating the backward LSTM, performance drops and composition does not recover much of the gap. When ablating the forward LSTM, performance drops less dramatically and composition recovers a substantial part of the gap, indicating that a forward LSTM and composition capture similar information. We take the backward LSTM to be related to lookahead features and the forward LSTM to the rich history-based features both crucial for transition-based parsers. To capture history-based information, composition is better than a forward LSTM on its own, but it is even better to have a forward LSTM as part of a BiLSTM. We correlate results with language properties, showing that the improved lookahead of a backward LSTM is especially important for head-final languages.
  •  
22.
  • de Lhoneux, Miryam, 1990-, et al. (författare)
  • What Should/Do/Can LSTMs Learn When Parsing Auxiliary Verb Constructions?
  • 2019
  • Ingår i: CoRR. ; abs/1907.07950
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • This article is a linguistic investigation of a neural parser. We look at transitivity and agreement information of auxiliary verb constructions (AVCs) in comparison to finite main verbs (FMVs). This comparison is motivated by theoretical work in dependency grammar and in particular the work of Tesnière (1959) where AVCs and FMVs are both instances of a nucleus, the basic unit of syntax. An AVC is a dissociated nucleus, it consists of at least two words, and a FMV is its non-dissociated counterpart, consisting of exactly one word. We suggest that the representation of AVCs and FMVs should capture similar information. We use diagnostic classifiers to probe agreement and transitivity information in vectors learned by a transition-based neural parser in four typologically different languages. We find that the parser learns different information about AVCs and FMVs if only sequential models (BiLSTMs) are used in the architecture but similar information when a recursive layer is used. We find explanations for why this is the case by looking closely at how information is learned in the network and looking at what happens with different dependency representations of AVCs.
  •  
23.
  • de Lhoneux, Miryam, 1990-, et al. (författare)
  • What Should/Do/Can LSTMs Learn When Parsing Auxiliary Verb Constructions?
  • 2020
  • Ingår i: Computational linguistics - Association for Computational Linguistics (Print). - : MIT Press. - 0891-2017 .- 1530-9312. ; 46:4, s. 763-784
  • Tidskriftsartikel (refereegranskat)abstract
    • There is a growing interest in investigating what neural NLP models learn about language. A prominent open question is the question of whether or not it is necessary to model hierarchical structure. We present a linguistic investigation of a neural parser adding insights to this question. We look at transitivity and agreement information of auxiliary verb constructions (AVCs) in comparison to finite main verbs (FMVs). This comparison is motivated by theoretical work in dependency grammar and in particular the work of Tesnière (1959), where AVCs and FMVs are both instances of a nucleus, the basic unit of syntax. An AVC is a dissociated nucleus; it consists of at least two words, and an FMV is its non-dissociated counterpart, consisting of exactly one word. We suggest that the representation of AVCs and FMVs should capture similar information. We use diagnostic classifiers to probe agreement and transitivity information in vectors learned by a transition-based neural parser in four typologically different languages. We find that the parser learns different information about AVCs and FMVs if only sequential models (BiLSTMs) are used in the architecture but similar information when a recursive layer is used. We find explanations for why this is the case by looking closely at how information is learned in the network and looking at what happens with different dependency representations of AVCs. We conclude that there may be benefits to using a recursive layer in dependency parsing and that we have not yet found the best way to integrate it in our parsers.
  •  
24.
  • de Marneffe, Marie-Catherine, et al. (författare)
  • Dependency Grammar
  • 2019
  • Ingår i: Annual review of linguistics. - : ANNUAL REVIEWS. - 2333-9691 .- 2333-9683. ; 5, s. 197-218
  • Tidskriftsartikel (refereegranskat)abstract
    • Dependency grammar is a descriptive and theoretical tradition in linguistics that can be traced back to antiquity. It has long been influential in the European linguistics tradition and has more recently become a mainstream approach to representing syntactic and semantic structure in natural language processing. In this review, we introduce the basic theoretical assumptions of dependency grammar and review some key aspects in which different dependency frameworks agree or disagree. We also discuss advantages and disadvantages of dependency representations and introduce Universal Dependencies, a framework for multilingual dependency-based morphosyntactic annotation that has been applied to more than 60 languages.
  •  
25.
  • De Marneffe, Marie-Catherine, et al. (författare)
  • Universal Dependencies
  • 2021
  • Ingår i: Computational Linguistics. - : MIT Press. - 0891-2017 .- 1530-9312. ; 47, s. 255-308
  • Tidskriftsartikel (refereegranskat)abstract
    • Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for crosslinguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.
  •  
26.
  • de Marneffe, Marie-Catherine, et al. (författare)
  • Universal Stanford Dependencies : A Cross-Linguistic Typology
  • 2014
  • Ingår i: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC). - 9782951740884 ; , s. 4585-4592
  • Konferensbidrag (refereegranskat)abstract
    • Revisiting the now de facto standard Stanford dependency representation, we propose an improved taxonomy to capture grammatical relations across languages, including morphologically rich ones. We suggest a two-layered taxonomy: a set of broadly attested universal grammatical relations, to which language-specific relations can be added. We emphasize the lexicalist stance of the Stanford Dependencies, which leads to a particular, partially new treatment of compounding, prepositions, and morphology. We show how existing dependency schemes for several languages map onto the universal taxonomy proposed here and close with consideration of practical implications of dependency representation choices for NLP applications, in particular parsing.
  •  
27.
  •  
28.
  • Dobrovoljc, Kaja, et al. (författare)
  • The Universal Dependencies Treebank of Spoken Slovenian
  • 2016
  • Ingår i: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). - 9782951740891 ; , s. 1566-1573
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents the construction of an open-source dependency treebank of spoken Slovenian, the first syntactically annotated collection of spontaneous speech in Slovenian. The treebank has been manually annotated using the Universal Dependencies annotation scheme, a one-layer syntactic annotation scheme with a high degree of cross-modality, cross-framework and cross-language interoperability. In this original application of the scheme to spoken language transcripts, we address a wide spectrum of syntactic particularities in speech, either by extending the scope of application of existing universal labels or by proposing new speech-specific extensions. The initial analysis of the resulting treebank and its comparison with the written Slovenian UD treebank confirms significant syntactic differences between the two language modalities, with spoken data consisting of shorter and more elliptic sentences, less and simpler nominal phrases, and more relations marking disfluencies, interaction, deixis and modality.
  •  
29.
  • Dubremetz, Marie, 1988-, et al. (författare)
  • Extraction of Nominal Multiword Expressions in French
  • 2014
  • Ingår i: Proceedings of the 10th Workshop on Multiword Expressions (MWE). - Gothenburg, Sweden : Association for Computational Linguistics. ; , s. 72-76
  • Konferensbidrag (refereegranskat)
  •  
30.
  •  
31.
  • Dubremetz, Marie, 1988-, et al. (författare)
  • Rhetorical Figure Detection : Chiasmus, Epanaphora, Epiphora
  • 2018
  • Ingår i: Frontiers in Digital Humanities. - : Frontiers Media SA. - 2297-2668. ; 5:10
  • Tidskriftsartikel (refereegranskat)abstract
    • Rhetorical figures are valuable linguistic data for literary analysis. In this article, we target the detection of three rhetorical figures that belong to the family of repetitive figures: chiasmus (I go where I please, and I please where I go.), epanaphora also called anaphora (“Poor old European Commission! Poor old European Council.”) and epiphora (“This house is mine. This car is mine. You are mine.”). Detecting repetition of words is easy for a computer but detecting only the ones provoking a rhetorical effect is difficult because of many accidental and irrelevant repetitions. For all figures, we train a log-linear classifier on a corpus of political debates. The corpus is only very partially annotated, but we nevertheless obtain good results, with more than 50% precision for all figures. We then apply our models to totally different genres and perform a comparative analysis, by comparing corpora of fiction, science and quotes. Thanks to the automatic detection of rhetorical figures, we discover that chiasmus is more likely to appear in the scientific context whereas epanaphora and epiphora are more common in fiction.
  •  
32.
  •  
33.
  •  
34.
  •  
35.
  • Dürlich, Luise, et al. (författare)
  • On the Concept of Resource-Efficiency in NLP
  • 2023
  • Ingår i: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa). ; , s. 135-145
  • Konferensbidrag (refereegranskat)abstract
    • Resource-efficiency is a growing concern in the NLP community. But what are the resources we care about and why? How do we measure efficiency in a way that is reliable and relevant? And how do we balance efficiency and other important concerns? Based on a review of the emerging literature on the subject, we discuss different ways of conceptualizing efficiency in terms of product and cost, using a simple case study on fine-tuning and knowledge distillation for illustration. We propose a novel metric of amortized efficiency that is better suited for life-cycle analysis than existing metrics.
  •  
36.
  • Dürlich, Luise, et al. (författare)
  • On the Concept of Resource-Efficiency in NLP
  • 2023
  • Ingår i: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa). ; , s. 135-145
  • Konferensbidrag (refereegranskat)
  •  
37.
  • Dürlich, Luise, et al. (författare)
  • What Causes Unemployment? : Unsupervised Causality Mining from Swedish Governmental Reports
  • 2023
  • Ingår i: Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023). - : Association for Computational Linguistics. - 9781959429739 ; , s. 25-29
  • Konferensbidrag (refereegranskat)abstract
    • Extracting statements about causality from text documents is a challenging task in the absence of annotated training data. We create a search system for causal statements about user-specified concepts by combining pattern matching of causal connectives with semantic similarity ranking, using a language model fine-tuned for semantic textual similarity. Preliminary experiments on a small test set from Swedish governmental reports show promising results in comparison to two simple baselines.
  •  
38.
  • Eryigit, Gülsen, et al. (författare)
  • Dependency Parsing of Turkish
  • 2008
  • Ingår i: Computational linguistics - Association for Computational Linguistics (Print). - 0891-2017 .- 1530-9312. ; 34:3, s. 357-389
  • Tidskriftsartikel (refereegranskat)abstract
    • The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.
  •  
39.
  •  
40.
  •  
41.
  •  
42.
  •  
43.
  •  
44.
  • Gogoulou, Evangelia, et al. (författare)
  • A Study of Continual Learning Under Language Shift
  • 2023
  • Annan publikation (populärvet., debatt m.m.)abstract
    • The recent increase in data and model scale for language model pre-training has led to huge training costs. In scenarios where new data become available over time, updating a model instead of fully retraining it would therefore provide significant gains. In this paper, we study the benefits and downsides of updating a language model when new data comes from new languages - the case of continual learning under language shift. Starting from a monolingual English language model, we incrementally add data from Norwegian and Icelandic to investigate how forward and backward transfer effects depend on the pre-training order and characteristics of languages, for different model sizes and learning rate schedulers. Our results show that, while forward transfer is largely positive and independent of language order, backward transfer can be either positive or negative depending on the order and characteristics of new languages. To explain these patterns we explore several language similarity metrics and find that syntactic similarity appears to have the best correlation with our results. 
  •  
45.
  •  
46.
  • Hall, Johan, et al. (författare)
  • Parsing Discontinuous Phrase Structure with Grammatical Functions
  • 2008
  • Ingår i: Advances in Natural Language Processing. - Berlin / Heidelberg : Springer. - 9783540852865 ; , s. 169-180
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a novel technique for parsing discontinuous phrase structure representations, labeled with both phrase labels and grammatical functions. Phrase structure representations are transformed into dependency representations with complex edge labels, which makes it possible to induce a dependency parser model that recovers the phrase structure with both phrase labels and grammatical functions. We perform an evaluation on the German TIGER treebank and the Swedish Talbanken05 treebank and report competitive results for both data sets.
  •  
47.
  • Hershcovich, Daniel, et al. (författare)
  • Kopsala : Transition-Based Graph Parsing via Efficient Training and Effective Encoding
  • 2020
  • Ingår i: 16th International Conference on Parsing Technologies and IWPT 2020 Shared Task on Parsing Into Enhanced Universal Dependencies. - Stroudsburg, PA, USA : Association for Computational Linguistics. - 9781952148118 ; , s. 236-244
  • Konferensbidrag (refereegranskat)abstract
    • We present Kopsala, the Copenhagen-Uppsala system for the Enhanced Universal Dependencies Shared Task at IWPT 2020. Our system is a pipeline consisting of off-the-shelf models for everything but enhanced graph parsing, and for the latter, a transition-based graph parser adapted from Che et al. (2019). We train a single enhanced parser model per language, using gold sentence splitting and tokenization for training, and rely only on tokenized surface forms and multilingual BERT for encoding. While a bug introduced just before submission resulted in a severe drop in precision, its post-submission fix would bring us to 4th place in the official ranking, according to average ELAS. Our parser demonstrates that a unified pipeline is elective for both Meaning Representation Parsing and Enhanced Universal Dependencies.
  •  
48.
  •  
49.
  •  
50.
  • Karlgren, Jussi, et al. (författare)
  • ELOQUENT CLEF Shared Tasks for Evaluation of Generative Language Model Quality
  • 2024
  • Ingår i: Lecture Notes in Computer Science. - : Springer Science and Business Media Deutschland GmbH. - 0302-9743 .- 1611-3349. ; 14612 LNCS, s. 459-465
  • Tidskriftsartikel (refereegranskat)abstract
    • ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to bring together some high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. The selected tasks for this first year of ELOQUENT are (1) probing a language model for topical competence; (2) assessing the ability of models to generate and detect hallucinations; (3) assessing the robustness of a model output given variation in the input prompts; and (4) establishing the possibility to distinguish human-generated text from machine-generated text.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 150
Typ av publikation
konferensbidrag (103)
tidskriftsartikel (29)
bokkapitel (8)
samlingsverk (redaktörskap) (3)
bok (3)
doktorsavhandling (2)
visa fler...
proceedings (redaktörskap) (1)
annan publikation (1)
visa färre...
Typ av innehåll
refereegranskat (135)
övrigt vetenskapligt/konstnärligt (14)
populärvet., debatt m.m. (1)
Författare/redaktör
Nivre, Joakim, 1962- (150)
Stymne, Sara, 1977- (10)
de Lhoneux, Miryam, ... (10)
Ginter, Filip (8)
Hall, Johan (8)
Kulmizev, Artur (7)
visa fler...
de Marneffe, Marie-C ... (7)
Hajic, Jan (6)
Goldberg, Yoav (6)
Manning, Christopher ... (6)
Dubremetz, Marie, 19 ... (6)
Dürlich, Luise (5)
Nilsson, Jens (5)
Ballesteros, Miguel (5)
Schuster, Sebastian (5)
Tiedemann, Jörg (4)
Megyesi, Beata (4)
Basirat, Ali, 1982- (4)
Gogoulou, Evangelia (4)
Zeman, Daniel (4)
Nilsson, Mattias (3)
Zhang, Yue (3)
Kuhlmann, Marco (3)
Gómez-Rodríguez, Car ... (3)
Hardmeier, Christian (3)
Bohnet, Bernd (3)
Oepen, Stephan (3)
Bunt, Harry (3)
Löwe, Welf (2)
Abdou, Mostafa (2)
Ravishankar, Vinit (2)
Liwicki, Marcus (2)
Sandin, Fredrik, 197 ... (2)
Sahlgren, Magnus (2)
Karlgren, Jussi (2)
Pettersson, Eva (2)
Baldwin, Timothy (2)
Savary, Agata (2)
Bengoetxea, Kepa (2)
Agirre, Eneko (2)
Gojenola, Koldo (2)
Johansson, Richard (2)
Boguslavsky, Igor (2)
Farkas, Richard (2)
Øvrelid, Lilja (2)
Buljan, Maja (2)
Constant, Matthieu (2)
Silveira, Natalia (2)
Màrquez, Lluís (2)
Eryigit, Gülsen (2)
visa färre...
Lärosäte
Uppsala universitet (146)
RISE (12)
Linnéuniversitetet (4)
Luleå tekniska universitet (2)
Linköpings universitet (2)
Stockholms universitet (1)
visa fler...
Karolinska Institutet (1)
visa färre...
Språk
Engelska (149)
Svenska (1)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (146)
Humaniora (12)
Teknik (1)
Medicin och hälsovetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy