SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Volk Martin) srt2:(2005-2009)"

Sökning: WFRF:(Volk Martin) > (2005-2009)

  • Resultat 1-6 av 6
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Hall, Johan, 1973- (författare)
  • MaltParser -- An Architecture for Inductive Labeled Dependency Parsing
  • 2006
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • This licentiate thesis presents a software architecture for inductive labeled dependency parsing of unrestricted natural language text, which achieves a strict modularization of parsing algorithm, feature model and learning method such that these parameters can be varied independently. The architecture is based on the theoretical framework of inductive dependency parsing by Nivre \citeyear{nivre06c} and has been realized in MaltParser, a system that supports several parsing algorithms and learning methods, for which complex feature models can be defined in a special description language. Special attention is given in this thesis to learning methods based on support vector machines (SVM).The implementation is validated in three sets of experiments using data from three languages (Chinese, English and Swedish). First, we check if the implementation realizes the underlying architecture. The experiments show that the MaltParser system outperforms the baseline and satisfies the basic constraints of well-formedness. Furthermore, the experiments show that it is possible to vary parsing algorithm, feature model and learning method independently. Secondly, we focus on the special properties of the SVM interface. It is possible to reduce the learning and parsing time without sacrificing accuracy by dividing the training data into smaller sets, according to the part-of-speech of the next token in the current parser configuration. Thirdly, the last set of experiments present a broad empirical study that compares SVM to memory-based learning (MBL) with five different feature models, where all combinations have gone through parameter optimization for both learning methods. The study shows that SVM outperforms MBL for more complex and lexicalized feature models with respect to parsing accuracy. There are also indications that SVM, with a splitting strategy, can achieve faster parsing than MBL. The parsing accuracy achieved is the highest reported for the Swedish data set and very close to the state of the art for Chinese and English.
  •  
2.
  • Hardmeier, Christian, et al. (författare)
  • Using Linguistic Annotations in Statistical Machine Translation of Film Subtitles
  • 2009
  • Ingår i: Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. - Tartu : Tartu University Library. ; , s. 57-64
  • Konferensbidrag (refereegranskat)abstract
    • Statistical Machine Translation (SMT) has been successfully employed to support translation of film subtitles. We explore the integration of Constraint Grammar corpus annotations into a Swedish–Danish subtitle SMT system in the framework of factored SMT. While the usefulness of the annotations is limited with large amounts of parallel data, we show that linguistic annotations can increase the gains in translation quality when monolingual data in the target language is added to an SMT system based on a small parallel corpus.
  •  
3.
  • Hjelm, Hans, 1973- (författare)
  • Cross-language Ontology Learning : Incorporating and Exploiting Cross-language Data in the Ontology Learning Process
  • 2009
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • An ontology is a knowledge-representation structure, where words, terms or concepts are defined by their mutual hierarchical relations. Ontologies are becoming ever more prevalent in the world of natural language processing, where we currently see a tendency towards using semantics for solving a variety of tasks, particularly tasks related to information access. Ontologies, taxonomies and thesauri (all related notions) are also used in various variants by humans, to standardize business transactions or for finding conceptual relations between terms in, e.g., the medical domain. The acquisition of machine-readable, domain-specific semantic knowledge is time consuming and prone to inconsistencies. The field of ontology learning therefore provides tools for automating the construction of domain ontologies (ontologies describing the entities and relations within a particular field of interest), by analyzing large quantities of domain-specific texts. This thesis studies three main topics within the field of ontology learning. First, we examine which sources of information are useful within an ontology learning system and how the information sources can be combined effectively. Secondly, we do this with a special focus on cross-language text collections, to see if we can learn more from studying several languages at once, than we can from a single-language text collection. Finally, we investigate new approaches to formal and automatic evaluation of the quality of a learned ontology. We demonstrate how to combine information sources from different languages and use them to train automatic classifiers to recognize lexico-semantic relations. The cross-language data is shown to have a positive effect on the quality of the learned ontologies. We also give theoretical and experimental results, showing that our ontology evaluation method is a good complement to and in some aspects improves on the evaluation measures in use today.
  •  
4.
  • Nivre, Joakim, et al. (författare)
  • Treebanking in Northern Europe
  • 2005
  • Ingår i: Nordisk sprogteknologi 2004. - : Museum Tusculanums forlag, Copenhagen. - 9788763502481 ; , s. 292-
  • Bokkapitel (refereegranskat)
  •  
5.
  • Samuelsson, Yvonne, et al. (författare)
  • Alignment Tools for Parallel Treebanks
  • 2007
  • Ingår i: Data Structures for Linguistic Resources and Applications. - 9783823363149
  • Konferensbidrag (refereegranskat)abstract
    • This paper reports about our efforts in creating a tri-lingual parallel treebank. The focal points are consistency checking and all aspects of sub-sentential alignment. We discuss the alignment guidelines, the importance of quality checks, and special alignment problems. Then we look at alignment algorithms and alignment visualization tools and we compare our own TreeAligner with other alignment tools. Our constituent structure treebanks contain just over 1,000 sentences and around 18,000 tokens in each language.
  •  
6.
  • Samuelsson, Yvonne, et al. (författare)
  • Automatic Phrase Alignment: Using Statistical N-Gram Alignment for Syntactic Phrase Alignment
  • 2007
  • Ingår i: Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT 2007). - : Northern European Association for Language Technology (NEALT). ; , s. 139-150
  • Konferensbidrag (refereegranskat)abstract
    • A parallel treebank consists of syntactically annotated sentences in two or more languages, taken from translated documents. These parallel sentences are linked through alignment. This paper explores the use of word n-gram alignment, computed for statistical machine translation, to create syntactic phrase alignment. We achieve a weighted F0.5-score of over 65%.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-6 av 6

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy