SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Dannélls Dana 1976) srt2:(2020-2024)"

Sökning: WFRF:(Dannélls Dana 1976) > (2020-2024)

  • Resultat 1-10 av 22
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Berdicevskis, Aleksandrs, 1983, et al. (författare)
  • Superlim: A Swedish Language Understanding Evaluation Benchmark
  • 2023
  • Ingår i: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, December 6-10, 2023, Singapore / Houda Bouamor, Juan Pino, Kalika Bali (Editors). - Stroudsburg, PA : Association for Computational Linguistics. - 9798891760608
  • Konferensbidrag (refereegranskat)
  •  
2.
  • Borin, Lars, 1957, et al. (författare)
  • Introduction: Swedish FrameNet+
  • 2021
  • Ingår i: The Swedish FrameNet++. Harmonization, integration, method development and practical language technology applications / editor(s): Dana Dannélls, Lars Borin and Karin Friberg Heppin. - Amsterdam / Philadelphia : John Benjamins Publishing Company. - 1567-8202. - 978 90 272 5848 9 ; , s. 3-36
  • Bokkapitel (refereegranskat)abstract
    • The Swedish FrameNet++ was designed to be several things. As a digital artifact, it is an integrated panchronic lexical macroresource, primarily for Swedish, but including several other languages, intended as a basic infrastructural component in Swedish language technology research and for developing natural language processing applications. As an activity, it is a long-term R&D initiative, initially aimed at bringing about this macroresource, and now at maintaining and extending it, at promoting its use in language technology research and application development, as well as ensuring that the results of this research and development in their turn are incorporated in the macroresource. As a product of research, it reflects both computational and linguistic approaches to lexicology, lexical semantics, and lexical typology.
  •  
3.
  • Dannélls, Dana, 1976, et al. (författare)
  • A Supervised Machine Learning Approach for Post-OCR Error Detection for Historical Text
  • 2021
  • Ingår i: Linköping Electronic Press Workshop and Conference Collection. Selected contributions from the Eighth Swedish Language Technology Conference (SLTC-2020), 25-27 November, 2020. - Linköping : Linköping Electronic Press. - 2003-6523.
  • Konferensbidrag (refereegranskat)abstract
    • Training machine learning models with high accuracy requires careful feature engineering, which involves finding the best feature combinations and extracting their values from the data. The task becomes extremely laborious for specific problems such as post Optical Character Recognition (OCR) error detection because of the diversity of errors in the data. In this paper we present a machine learning approach which exploits character n-gram statistics as the only feature for the OCR error detection task. Our method achieves a significant improvement over the baseline reaching state-of-the-art results of 91% and 89% F1 measure on English and Swedish datasets respectively. We report various experiments to select the appropriate machine learning algorithm and to compare our approach to previously reported traditional approaches.
  •  
4.
  • Dannélls, Dana, 1976, et al. (författare)
  • A Two-OCR Engine Method for Digitized Swedish Newspapers
  • 2021
  • Ingår i: Selected Papers from the CLARIN Annual Conference 2020, Linköping Electronic Conference Proceedings 180. - Linköping : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789179296094
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we present a two-OCR engine method that was developed at Kungliga biblioteket (KB), the National Library of Sweden, for improving the correctness of the OCR for mass digitization of Swedish newspapers. To evaluate the method a reference material spanning the years 1818–2018 was prepared and manually transcribed. A quantitative evaluation was then performed against the material. In this first evaluation we experimented with word lists for different time periods. The results show that even though there was no significant overall improvement of the OCR results, some combinations of word lists are successful for certain periods and should therefore be explored further.
  •  
5.
  • Dannélls, Dana, 1976, et al. (författare)
  • Beyond strings of characters: Resources meet NLP – Again
  • 2022
  • Ingår i: Live and learn: Festschrift in honor of Lars Borin / Editors: Elena Volodina, Dana Dannélls, Aleksandrs Berdicevskis, Markus Forsberg, Shafqat Virk. - Göteborg : Institutionen för svenska, flerspråkighet och språkteknologi, Göteborgs universitet. - 1401-5919. - 9789187850837 ; , s. 29-37
  • Bokkapitel (refereegranskat)abstract
    • FrameNet (FN) resources have existed for many languages for over a decade but their adoption in real world applications has been limited. To celebrate the 65 anniversary of Lars Borin, the initiator and leader of Swedish FrameNet, among others, we take a standpoint to motivate why language resources are crucial for moving NLP forward. We present our position on (a) the need for language resources to embrace other dimensions of text and language use, and (b) the need for them to relate to other representations through multimodality.
  •  
6.
  • Dannélls, Dana, 1976, et al. (författare)
  • Building a Language Technology Infrastructure for Digital Humanities: Challenges, Opportunities and Progress
  • 2020
  • Ingår i: Proceedings of the Twin Talks 2 and 3 Workshops at DHN 2020 and DH 2020 Ottawa Canada and Riga Latvia, July 23 and October 20, 2020 / edited by Steven Krauwer, Darja Fišer. - : CEUR-WS.org. - 1613-0073.
  • Konferensbidrag (refereegranskat)abstract
    • Språkbanken Text, a research unit at the University of Gothenburg, forms part of the National Language Bank of Sweden and is the main coordinating node of Swe-Clarin, the Swedish national CLARIN node. During the past years, Språkbanken Text has been actively engaged in a number of humanities and social sciences related research projects. This engagement has primarily concerned the development of new resources, methods and tools to accurately process large amounts of digitized material, in addition to interfaces for visualizing the materials, making them easily accessible for further analysis. The activities within Swe-Clarin have been essential for the progress and the success of this work. In this paper we present what was required from Språkbanken Text in order to meet the expectations of researchers from the humanities and social sciences. We discuss some of the challenges this work involves and describe the opportunities this field brings with it and how these opportunities could help to progress the work of Språkbanken Text toward building a language technology infrastructure that supports interdisciplinary research.
  •  
7.
  • Dannélls, Dana, 1976, et al. (författare)
  • Computational representation of FrameNet for multilingual natural language generation
  • 2021
  • Ingår i: The Swedish FrameNet++. Harmonization, integration, method development and practical language technology applications. - Amsterdam / Philadelphia : John Benjamins Publishing Company. - 1567-8202. - 9789027258489 ; , s. 281-301
  • Bokkapitel (refereegranskat)abstract
    • Multilingual natural language generation, the process of producing written or spoken utterances in parallel languages from either structured or unstructured representations requires large amounts of syntactic and semantic information to generate an expression that is tailored to the target audience. This information is offered by FrameNet-like resources, which have been developed for a number of languages. In this chapter, we present a computational FrameNet grammar resource for multilingual natural language generation. We compare between English and Swedish framenets to illustrate how these can be unified under a shared computational representation using Grammatical Framework. We demonstrate how the grammar was exploited in two practical multilingual natural language generation applications to facilitate tourist communication and empower museum users with coherent artwork descriptions.
  •  
8.
  • Dannélls, Dana, 1976, et al. (författare)
  • Evaluation of a Two-OCR Engine Method: First Results on Digitized Swedish Newspapers Spanning over nearly 200 Years
  • 2020
  • Ingår i: CLARIN Annual Conference 2020, (Virtual Event), 5-7 October, 2020. Book of Abstracts.
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • In this paper we present a two-OCR engine method that was developed at Kungliga biblioteket (KB), the National Library of Sweden, for improving the correctness of the OCR for mass digitization of Swedish newspapers. We report the first quantitative evaluation results on a material spanning over nearly 200 years. In this first evaluation phase we experimented with word lists for different time periods. Although there was no significant overall improvement of the OCR results, the evaluation shows that some combinations of word lists are successful for certain periods and should therefore be explored further.
  •  
9.
  • Dannélls, Dana, 1976, et al. (författare)
  • OCR Error Detection on Historical Text Using Uni-Feature and Multi-Feature Based Machine Learning Models
  • 2020
  • Ingår i: Swedish Language Technology Conference (SLTC), 25-27 November 2020, University of Gothenburg.
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • Detecting errors that are caused by Optical Character Recognition (OCR) systems is a challenging task that has received much attention over the years. Recent work has explored machine learning methods using hand-crafted feature engineering, which, in addition to the difficulty in identifying the best feature combinations, is often very time and resources expensive. This raises the question: Do we always need many features to achieve better results? This is an open-ended question and its answer might depend on the task at hand. For OCR error detection, we experimented and found that interestingly a uni-feature based system conquered multi-feature based systems on a Swedish data set achieving state-of-the art results, and performed equally well on an English dataset. We also experimented to find which machine learning algorithm is more suitable for the task at hand by comparing the performance of five well-known machine learning algorithms, namely Logistic regression, Decision Trees, Bernoulli Naive Bayes, Naive Bays, and Support Vector Machines.
  •  
10.
  • Dannélls, Dana, 1976, et al. (författare)
  • Supervised OCR Post-Correction of Historical Swedish Texts: What Role Does the OCR System Play?
  • 2020
  • Ingår i: Proceedings of the Digital Humanities in the Nordic Countries, 5th Conference, Riga, Latvia, October 21-23, 2020 / edited by Sanita Reinsone, Inguna Skadiņa, Anda Baklāne, Jānis Daugavietis. - : CEUR-WS. - 1613-0073.
  • Konferensbidrag (refereegranskat)abstract
    • Current approaches for post-correction of OCR errors offer solutions that are tailored to a specific OCR system. This can be problematic if the post-correction method was trained on a specific OCR system but have to be applied on the result of another system. Whereas OCR post-correction of historical text has received much attention lately, the question of what role does the OCR system play for the post-correction method has not been addressed. In this study we explore a dataset of 400 documents of historical Swedish text which has been OCR processed by three state-of-the-art OCR systems: Abbyy Finereader, Tesseract and Ocropus. We examine the OCR results of each system and present a supervised machine learning post-correction method that tries to approach the challenges exhibited by each system. We study the performance of our method by using three evaluation tools: PrimA, Språkbanken evaluation tool and Frontiers Toolkit. Based on the evaluation analysis we discuss the impact each of the OCR systems has on the results of the post- correction method. We report on quantitative and qualitative results showing varying degrees of OCR post-processing complexity that are important to consider when developing an OCR post-correction method.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 22

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy