SwePub
Sök i SwePub databas

  Utökad sökning

Booleska operatorer måste skrivas med VERSALER

Träfflista för sökning "AMNE:(NATURAL SCIENCES Computer and Information Sciences) ;pers:(Tiedemann Jörg)"

Sökning: AMNE:(NATURAL SCIENCES Computer and Information Sciences) > Tiedemann Jörg

  • Resultat 1-10 av 120
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ehrentraut, Claudia, et al. (författare)
  • Detecting hospital-acquired infections : A document classification approach using support vector machines and gradient tree boosting
  • 2018
  • Ingår i: Health Informatics Journal. - : SAGE Publications. - 1460-4582 .- 1741-2811. ; 24:1, s. 24-42
  • Tidskriftsartikel (refereegranskat)abstract
    • Hospital-acquired infections pose a significant risk to patient health, while their surveillance is an additional workload for hospital staff. Our overall aim is to build a surveillance system that reliably detects all patient records that potentially include hospital-acquired infections. This is to reduce the burden of having the hospital staff manually check patient records. This study focuses on the application of text classification using support vector machines and gradient tree boosting to the problem. Support vector machines and gradient tree boosting have never been applied to the problem of detecting hospital-acquired infections in Swedish patient records, and according to our experiments, they lead to encouraging results. The best result is yielded by gradient tree boosting, at 93.7percent recall, 79.7percent precision and 85.7percent F1 score when using stemming. We can show that simple preprocessing techniques and parameter tuning can lead to high recall (which we aim for in screening patient records) with appropriate precision for this task.
  •  
2.
  • Tiedemann, Jörg, et al. (författare)
  • A Discriminative Approach to Tree Alignment
  • 2009
  • Ingår i: Proceedings of the International Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography and Language Learning (in connection with RANLP’09). ; , s. 33-39
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we propose a discriminative framework for automatic tree alignment. We use a rich feature set and a log-linear model trained on small amounts of hand-aligned training data. We include contextual features and link dependencies to improve the results even further. We achieve an overall F-score of almost 80% which is significantly better than other scores reported for this task.
  •  
3.
  • Tiedemann, Jörg, et al. (författare)
  • Building a Large Machine-Aligned Parallel Treebank
  • 2009
  • Ingår i: Proceedings of the 8th International Workshop on Treebanks and Linguistic Theories (TLT’08). - Milano/Italy : EDUCatt. - 9788883117121 ; , s. 197-208
  • Konferensbidrag (refereegranskat)abstract
    • This paper reports on-going work on building a large automatically tree-aligned parallel treebank in the context of a syntax-based machine translation (MT) approach. For this we develop a discriminative tree aligner based on a log-linear model with a rich feature set. We incorporate various language-independent and language-specific features taking advantage of existing tools and annotation. Our initial experiments on a small hand-aligned treebank show promising results even with small amounts of training data. The performance of our approach is well above unsupervised techniques reported elsewhere. This enables us to quickly create training material and alignment models for additional language pairs. In recent work, we aligned more than one million sentence pairs and started our experiments with the extraction of transfer knowledge for our example-based machine translation system.
  •  
4.
  • Tiedemann, Jörg (författare)
  • Improved Text Extraction from PDF Documents for Large-Scale Natural Language Processing
  • 2014
  • Ingår i: Computational Linguistics and Intelligent Text Processing, Cicling 2014, PT I. - 9783642549052 - 9783642549069 ; , s. 102-112
  • Konferensbidrag (refereegranskat)abstract
    • The inability of reliable text extraction from arbitrary documents is often an obstacle for large scale NLP based on resources crawled from the Web. One of the largest problems in the conversion of PDF documents is the detection of the boundaries of common textual units such as paragraphs, sentences and words. PDF is a file format optimized for printing and encapsulates a complete description of the layout of a document including text, fonts, graphics and so on. This paper describes a tool for extracting texts from arbitrary PDF files for the support of large-scale data-driven natural language processing. Our approach combines the benefits of several existing solutions for the conversion of PDF documents to plain text and adds a language-independent post-processing procedure that cleans the output for further linguistic processing. In particular, we use the PDF-rendering libraries pdfXtk, Apache Tika and Poppler in various configurations. From the output of these tools we recover proper boundaries using on-the-fly language models and language-independent extraction heuristics. In our research, we looked especially at publications from the European Union, which constitute a valuable multilingual resource, for example, for training statistical machine translation models. We use our tool for the conversion of a large multilingual database crawled from the EU bookshop with the aim of building parallel corpora. Our experiments show that our conversion software is capable of fixing various common issues leading to cleaner data sets in the end.
  •  
5.
  • Shao, Yan, 1990- (författare)
  • Segmenting and Tagging Text with Neural Networks
  • 2018
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Segmentation and tagging of text are important preprocessing steps for higher-level natural language processing tasks. In this thesis, we apply a sequence labelling framework based on neural networks to various segmentation and tagging tasks, including sentence segmentation, word segmentation, morpheme segmentation, joint word segmentation and part-of-speech tagging, and named entity transliteration. We apply a general neural CRF model to different tasks by designing specific tag sets. In addition, we explore effective ways of representing input characters, such as utilising concatenated n-grams and sub-character features, and use ensemble decoding to mitigate the effects of random parameter initialisation.The segmentation and tagging models are evaluated in a truly multilingual setup with more than 70 datasets. The experimental results indicate that the proposed neural CRF model is effective for segmentation and tagging in general as state-of-the-art accuracies are achieved on datasets in different languages, genres, and annotation schemes for various tasks. For word segmentation, we propose several typological factors to statistically characterise the difficulties posed by different languages and writing systems. Based on this analysis, we apply language-specific settings to the segmentation system for higher accuracy. Our system achieves substantially better results on languages that are more difficult to segment when compared to previous work. Moreover, we investigate conventionally adopted evaluation metrics for segmentation tasks. We propose that precision should be excluded and using recall alone is more adequate for sentence segmentation and word segmentation. The segmentation and tagging tools implemented along with this thesis are publicly available as experimental frameworks for future development as well as preprocessing tools for higher-level NLP tasks.
  •  
6.
  •  
7.
  • Ahrenberg, Lars and Merkel, Magnus and Ridings, Daniel and Sågvall Hein, Anna and Tiedemann, Jörg (författare)
  • Automatic processing of parallel corpora: A Swedish perspective.
  • 1999
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • As empirical methods have come to the fore in language technology and translation studies, the processing of parallel texts and parallel corpora have become a major issue. In this article we review the state of the art in alignment and data extraction tec
  •  
8.
  • Ahrenberg, Lars, 1948-, et al. (författare)
  • Automatic Processing of Parallel Corpora: A Swedish Perspective
  • 1999
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • As empirical methods have come to the fore in multilingual language technology and translation studies, the processing of parallel texts and parallel corpora have become a major research area in computational linguistics. In this article we review the state of the art in alignment and data extraction techniques for parallel texts, and give an overview of current work in Sweden in this area. In a final section, we summarize the results achieved so far and make some proposals for future research.
  •  
9.
  •  
10.
  • Ahrenberg, Lars, 1948-, et al. (författare)
  • Evaluation of word alignment systems
  • 2000
  • Ingår i: Proceedings of the Second International Conference on Linguistic Resources and Evaluation (LREC-2000). - Paris, France : European Language Resources Association (ELRA). ; , s. 1255-1261
  • Konferensbidrag (refereegranskat)
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 120
Typ av publikation
konferensbidrag (80)
bokkapitel (14)
tidskriftsartikel (8)
rapport (5)
annan publikation (4)
doktorsavhandling (4)
visa fler...
samlingsverk (redaktörskap) (1)
bok (1)
proceedings (redaktörskap) (1)
licentiatavhandling (1)
recension (1)
visa färre...
Typ av innehåll
refereegranskat (91)
övrigt vetenskapligt/konstnärligt (29)
Författare/redaktör
Hardmeier, Christian (18)
Plas, Lonneke van de ... (15)
Nivre, Joakim (11)
Mur, Jori (9)
Bouma, Gosse (8)
visa fler...
Noord, Gertjan van (8)
Sågvall Hein, Anna (7)
Stymne, Sara, 1977- (5)
Östling, Robert, 198 ... (5)
Stymne, Sara (5)
Nivre, Joakim, 1962- (4)
Fahmi, Ismail (4)
Forsbom, Eva (3)
Smith, Aaron (3)
Pettersson, Eva (3)
Nabende, Peter (2)
Nivre, Joakim, Profe ... (2)
Agić, Zeljko (2)
Dalianis, Hercules (2)
Ahrenberg, Lars, 194 ... (2)
Merkel, Magnus (2)
Guillou, Liane (2)
Ginter, Filip (2)
Kloosterman, Geert (2)
Merkler, Danijela (1)
Krek, Simon (1)
Dobrovoljc, Kaja (1)
Moze, Sara (1)
Ahrenberg, Lars and ... (1)
Olsson, Leif-Jöran (1)
Ahrenberg, Lars (1)
Merkel, Magnus, 1959 ... (1)
Ridings, Daniel (1)
Almqvist, Ingrid (1)
Östling, Robert (1)
Forsbom, Eva, 1964- (1)
Bollmann, Marcel (1)
Augenstein, Isabelle (1)
Prashant, Mathur (1)
Bertels, Ann (1)
Fairon, Cédrick (1)
Verlinde, Serge (1)
Loáiciga, Sharid (1)
Bjerva, Johannes (1)
Han Veiga, Maria (1)
Oepen, Stephan (1)
Callin, Jimmy (1)
Schleussner, Sebasti ... (1)
Cap, Fabienne (1)
visa färre...
Lärosäte
Uppsala universitet (111)
Stockholms universitet (6)
Linköpings universitet (3)
Kungliga Tekniska Högskolan (1)
Språk
Engelska (119)
Franska (1)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (120)
Medicin och hälsovetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy