SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1351 3249 OR L773:1469 8110 "

Sökning: L773:1351 3249 OR L773:1469 8110

  • Resultat 1-15 av 15
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ballesteros, Miguel, et al. (författare)
  • MaltOptimizer : Fast and Effective Parser Optimization
  • 2016
  • Ingår i: Natural Language Engineering. - 1351-3249 .- 1469-8110. ; 22:2, s. 187-213
  • Tidskriftsartikel (refereegranskat)abstract
    • Statistical parsers often require careful parameter tuning and feature selection. This is a nontrivial task for application developers who are not interested in parsing for its own sake, and it can be time-consuming even for experienced researchers. In this paper we present MaltOptimizer, a tool developed to automatically explore parameters and features for MaltParser, a transition-based dependency parsing system that can be used to train parser's given treebank data. MaltParser provides a wide range of parameters for optimization, including nine different parsing algorithms, an expressive feature specification language that can be used to define arbitrarily rich feature models, and two machine learning libraries, each with their own parameters. MaltOptimizer is an interactive system that performs parser optimization in three stages. First, it performs an analysis of the training set in order to select a suitable starting point for optimization. Second, it selects the best parsing algorithm and tunes the parameters of this algorithm. Finally, it performs feature selection and tunes machine learning parameters. Experiments on a wide range of data sets show that MaltOptimizer quickly produces models that consistently outperform default settings and often approach the accuracy achieved through careful manual optimization.
  •  
2.
  • Basirat, Ali, et al. (författare)
  • A statistical model for grammar mapping
  • 2016
  • Ingår i: Natural Language Engineering. - : Cambridge University Press. - 1351-3249 .- 1469-8110. ; 22:2, s. 215-255
  • Tidskriftsartikel (refereegranskat)abstract
    • The two main classes of grammars are (a) hand-crafted grammars, which are developed bylanguage experts, and (b) data-driven grammars, which are extracted from annotated corpora.This paper introduces a statistical method for mapping the elementary structures of a data-driven grammar onto the elementary structures of a hand-crafted grammar in order to combinetheir advantages. The idea is employed in the context of Lexicalized Tree-Adjoining Grammars(LTAG) and tested on two LTAGs of English: the hand-crafted LTAG developed in theXTAG project, and the data-driven LTAG, which is automatically extracted from the PennTreebank and used by the MICA parser. We propose a statistical model for mapping anyelementary tree sequence of the MICA grammar onto a proper elementary tree sequence ofthe XTAG grammar. The model has been tested on three subsets of the WSJ corpus thathave average lengths of 10, 16, and 18 words, respectively. The experimental results show thatfull-parse trees with average F1 -scores of 72.49, 64.80, and 62.30 points could be built from94.97%, 96.01%, and 90.25% of the XTAG elementary tree sequences assigned to the subsets,respectively. Moreover, by reducing the amount of syntactic lexical ambiguity of sentences,the proposed model significantly improves the efficiency of parsing in the XTAG system.
  •  
3.
  • Björklund, Johanna, et al. (författare)
  • Syntactic methods for topic-independent authorship attribution
  • 2017
  • Ingår i: Natural Language Engineering. - : CAMBRIDGE UNIV PRESS. - 1351-3249 .- 1469-8110. ; 23:5, s. 789-806
  • Tidskriftsartikel (refereegranskat)abstract
    • The efficacy of syntactic features for topic-independent authorship attribution is evaluated, taking a feature set of frequencies of words and punctuation marks as baseline. The features are 'deep' in the sense that they are derived by parsing the subject texts, in contrast to 'shallow' syntactic features for which a part-of-speech analysis is enough. The experiments are made on two corpora of online texts and one corpus of novels written around the year 1900. The classification tasks include classical closed-world authorship attribution, identification of separate texts among the works of one author, and cross-topic authorship attribution. In the first tasks, the feature sets were fairly evenly matched, but for the last task, the syntax-based feature set outperformed the baseline feature set. These results suggest that, compared to lexical features, syntactic features are more robust to changes in topic.
  •  
4.
  • Boye, Johan, et al. (författare)
  • Robust parsing and spoken negotiative dialogue with databases
  • 2008
  • Ingår i: Natural Language Engineering. - : Cambridge University Press. - 1351-3249 .- 1469-8110. ; 14:3, s. 289-312
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a robust parsing algorithm and semantic formalism for the interpretation of utterances in spoken negotiative dialogue with databases. The algorithm works in two passes: a domain-specific pattern-matching phase and a domain-independent semantic analysis phase. Robustness is achieved by limiting the set of representable utterance types to an empirically motivated subclass which is more expressive than propositional slot–value lists, but much less expressive than first-order logic. Our evaluation shows that in actual practice the vast majority of utterances that occur can be handled, and that the parsing algorithm is highly efficient and accurate.
  •  
5.
  • Carlberger, Johan, et al. (författare)
  • The development and performance of a grammar checker for Swedish : A language engineering perspective
  • 2004
  • Ingår i: Natural Language Engineering. - 1351-3249 .- 1469-8110. ; 1:1
  • Tidskriftsartikel (refereegranskat)abstract
    • This article describes the construction and performance of Granska – a surface-oriented system for grammar checking of Swedish text. With the use of carefully constructed error detection rules, written in a new structured rule language, the system can detect and suggest corrections for a number of grammatical errors in Swedish texts. In this article, we specifically focus on how erroneously split compounds and disagreement are handled in the rules. The system combines probabilistic and rule-based methods to achieve high efficiency and robustness. The error detection rules are optimized using statistics of part-of-speech bigrams and words in a way that each rule needs to be checked as seldom as possible. We have found that the Granska system with higher efficiency can achieve the same or better results than systems with conventional technology.
  •  
6.
  • Dörpinghaus, Jens (författare)
  • Automated annotation of parallel bible corpora with cross-lingual semantic concordance
  • 2024
  • Ingår i: Natural Language Engineering. - : Cambridge University Press. - 1351-3249 .- 1469-8110. ; , s. 1-24
  • Tidskriftsartikel (refereegranskat)abstract
    • Here we present an improved approach for automated annotation of New Testament corpora with cross-lingual semantic concordance based on Strong’s numbers. Based on already annotated texts, they provide references to the original Greek words. Since scientific editions and translations of biblical texts are often not available for scientific purposes and are rarely freely available, there is a lack of up-to-date training data. In addition, since annotation, curation, and quality control of alignments between these texts are expensive, there is a lack of available biblical resources for scholars. We present two improved approaches to the problem, based on dictionaries and already annotated biblical texts. We provide a detailed evaluation of annotated and unannotated translations. We also discuss a proof of concept based on English and German New Testament translations. The results presented in this paper are novel and, to our knowledge, unique. They show promising performance, although further research is needed.
  •  
7.
  • Gustafson, Joakim, et al. (författare)
  • Speech technology on trial : Experiences from the August system
  • 2000
  • Ingår i: Natural Language Engineering. - 1351-3249 .- 1469-8110. ; 6:3-4, s. 273-286
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, the August spoken dialogue system is described. This experimental Swedish dialogue system, which featured an animated talking agent, was exposed to the general public during a trial period of six months. The construction of the system was partly motivated by the need to collect genuine speech data from people with little or no previous experience of spoken dialogue systems. A corpus of more than 10,000 utterances of spontaneous computer- directed speech was collected and empirical linguistic analyses were carried out. Acoustical, lexical and syntactical aspects of this data were examined. In particular, user behavior and user adaptation during error resolution were emphasized. Repetitive sequences in the database were analyzed in detail. Results suggest that computer-directed speech during error resolution is increased in duration, hyperarticulated and contains inserted pauses. Design decisions which may have influenced how the users behaved when they interacted with August are discussed and implications for the development of future systems are outlined.
  •  
8.
  • Högberg, Johanna, 1978- (författare)
  • An inference algorithm for regular tree languages
  • 2011
  • Ingår i: Natural Language Engineering. - Cambridge : Cambridge University Press. - 1351-3249 .- 1469-8110. ; 17:2, s. 203-219
  • Tidskriftsartikel (refereegranskat)abstract
    • We present a randomized inference algorithm for regular tree languages. The algorithm takes as input two disjoint finite nonempty sets of trees P and N, and outputs a nondeterministic finite tree automaton that accepts every tree in P, and rejects every tree in N. The output automaton typically represents a non-trivial generalisation of the examples given in P and N. To obtain compact output automata, we use a heuristics similar to bisimulation minimization.The algorithm has time complexity of O(|N||P|^2). Experiments are conducted on a prototype implementation, and the empirical results appear to second the theoretical results.
  •  
9.
  • Jönsson, Arne, 1955- (författare)
  • A model for habitable and efficient dialogue management for natural language interaction
  • 1997
  • Ingår i: Natural Language Engineering. - 1351-3249 .- 1469-8110. ; 3:2/3, s. 103-122
  • Tidskriftsartikel (refereegranskat)abstract
    • Natural language interfaces require dialogue models that allow for robust, habitable and efficient interaction. This paper presents such a model for dialogue management for natural language interfaces. The model is based on empirical studies of human computer interaction in various simple service applications. It is shown that for applications belonging to this class the dialogue can be handled using fairly simple means. The interaction can be modeled in a dialogue grammar with information on the functional role of an utterance as conveyed in the linguistic structure. Focusing is handled using dialogue objects recorded in a dialogue tree representing the constituents of the dialogue. The dialogue objects in the dialogue tree can be accessed by the various modules for interpretation, generation and background system access. Focused entities are modeled in entities pertaining to objects or sets of objects, and related domain concept information; properties of the domain objects. A simple copying principle, where a new dialogue object's focal parameters are instantiated with information from the preceding dialogue object, accounts for most context dependent utterances. The action to be carried out by the interface is determined on the basis of how the objects and related properties are specified. This in turn depends on information presented in the user utterance, context information from the dialogue tree and information in the domain model. The use of dialogue objects facilitates customization to the sublanguage utilized in a specific application. The framework has successfully been applied to various background systems and interaction modalities. In the paper results from the customization of the dialogue manager to three typed interaction applications are presented together with results from applying the model to two applications utilizing spoken interaction.
  •  
10.
  • Kalpakchi, Dmytro, et al. (författare)
  • Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies
  • 2024
  • Ingår i: Natural Language Engineering. - : Cambridge University Press (CUP). - 1351-3249 .- 1469-8110. ; , s. 217-255
  • Tidskriftsartikel (refereegranskat)abstract
    • We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our method provides a strong, deterministic and inexpensive-to-train baseline for less-resourced languages. While a language-specific corpus is still required, its size is nowhere near those required by modern neural question generation (QG) architectures. Our method surpasses QG baselines previously reported in the literature in terms of automatic evaluation metrics and shows a good performance in terms of human evaluation.
  •  
11.
  • Karlgren, Jussi, et al. (författare)
  • High-dimensional distributed semantic spaces for utterances
  • 2019
  • Ingår i: Natural Language Engineering. - : Cambridge University Press. - 1351-3249 .- 1469-8110. ; 25:4, s. 503-517
  • Tidskriftsartikel (refereegranskat)abstract
    • High-dimensional distributed semantic spaces have proven useful and effective for aggregating and processing visual, auditory and lexical information for many tasks related to human-generated data. Human language makes use of a large and varying number of features, lexical and constructional items as well as contextual and discourse-specific data of various types, which all interact to represent various aspects of communicative information. Some of these features are mostly local and useful for the organisation of, for example, argument structure of a predication; others are persistent over the course of a discourse and necessary for achieving a reasonable level of understanding of the content. This paper describes a model for high-dimensional representation for utterance and text-level data including features such as constructions or contextual data, based on a mathematically principled and behaviourally plausible approach to representing linguistic information. The implementation of the representation is a straightforward extension of Random Indexing models previously used for lexical linguistic items. The paper shows how the implementedmodel is able to represent a broad range of linguistic features in a common integral framework of fixed dimensionality, which is computationally habitable, and which is suitable as a bridge between symbolic representations such as dependency analysis and continuous representations used, for example, in classifiers or further machine-learning approaches. This is achieved with operations on vectors that constitute a powerful computational algebra, accompanied with an associative memory for the vectors. The paper provides a technical overview of the framework and a worked through implemented example of how it can be applied to various types of linguistic features.
  •  
12.
  • Kutlu, Ferhat, et al. (författare)
  • Toward a shallow discourse parser for Turkish
  • 2023
  • Ingår i: Natural Language Engineering. - 1351-3249 .- 1469-8110.
  • Tidskriftsartikel (refereegranskat)abstract
    • One of the most interesting aspects of natural language is how texts cohere, which involves the pragmatic or semantic relations that hold between clauses (addition, cause-effect, conditional, similarity), referred to as discourse relations. A focus on the identification and classification of discourse relations appears as an imperative challenge to be resolved to support tasks such as text summarization, dialogue systems, and machine translation that need information above the clause level. Despite the recent interest in discourse relations in well-known languages such as English, data and experiments are still needed for typologically different and less-resourced languages. We report the most comprehensive investigation of shallow discourse parsing in Turkish, focusing on two main sub-tasks: identification of discourse relation realization types and the sense classification of explicit and implicit relations. The work is based on the approach of fine-tuning a pre-trained language model (BERT) as an encoder and classifying the encoded data with neural network-based classifiers. We firstly identify the discourse relation realization type that holds in a given text, if there is any. Then, we move on to the sense classification of the identified explicit and implicit relations. In addition to in-domain experiments on a held-out test set from the Turkish Discourse Bank (TDB 1.2), we also report the out-domain performance of our models in order to evaluate its generalization abilities, using the Turkish part of the TED Multilingual Discourse Bank. Finally, we explore the effect of multilingual data aggregation on the classification of relation realization type through a cross-lingual experiment. The results suggest that our models perform relatively well despite the limited size of the TDB 1.2 and that there are language-specific aspects of detecting the types of discourse relation realization. We believe that the findings are important both in providing insights regarding the performance of the modern language models in a typologically different language and in the low-resource scenario, given that the TDB 1.2 is 1/20th of the Penn Discourse TreeBank in terms of the number of total relations.
  •  
13.
  • Sahlgren, Magnus, et al. (författare)
  • Automatic Bilingual Lexicon Acquisition Using Random Indexing of Parallel Corpora
  • 2005
  • Ingår i: Natural Language Engineering. - 1351-3249 .- 1469-8110. ; 11:3, s. 327-341
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a very simple and effective approach to using parallel corpora for automatic bilingual lexicon acquisition. The approach, which uses the Random Indexing vector space methodology, is based on finding correlations between terms based on their distributional characteristics. The approach requires a minimum of preprocessing and linguistic knowledge, and is efficient, fast and scalable. In this paper, we explain how our approach differs from traditional cooccurrence-based word alignment algorithms, and we demonstrate how to extract bilingual lexica using the Random Indexing approach applied to aligned parallel data. The acquired lexica are evaluated by comparing them to manually compiled gold standards, and we report overlap of around 60%. We also discuss methodological problems with evaluating lexical resources of this kind.
  •  
14.
  • Tiedemann, Jörg (författare)
  • Optimisation of Word Alignment Clues
  • 2005
  • Ingår i: Natural Language Engineering. - 1351-3249 .- 1469-8110. ; 11, s. 279-293
  • Tidskriftsartikel (refereegranskat)
  •  
15.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-15 av 15

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy