SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Heil Raphaela) "

Sökning: WFRF:(Heil Raphaela)

  • Resultat 1-10 av 10
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  •  
2.
  • Heil, Raphaela, et al. (författare)
  • A Study of Augmentation Methods for Handwritten Stenography Recognition
  • 2023
  • Konferensbidrag (refereegranskat)abstract
    • One of the factors limiting the performance of handwritten text recognition (HTR) for stenography is the small amount of annotated training data. To alleviate the problem of data scarcity, modern HTR methods often employ data augmentation. However, due to specifics of the stenographic script, such settings may not be directly applicable for stenography recognition. In this work, we study 22 classical augmentation techniques, most of which are commonly used for HTR of other scripts, such as Latin handwriting. Through extensive experiments, we identify a group of augmentations, including for example contained ranges of random rotation, shifts and scaling, that are beneficial to the use case of stenography recognition. Furthermore, a number of augmentation approaches, leading to a decrease in recognition performance, are identified. Our results are supported by statistical hypothesis testing. A link to the source code is provided in the paper.
  •  
3.
  • Heil, Raphaela (författare)
  • Document Image Processing for Handwritten Text Recognition : Deep Learning-based Transliteration of Astrid Lindgren’s Stenographic Manuscripts
  • 2023
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Document image processing and handwritten text recognition have been applied to a variety of materials, scripts, and languages, both modern and historic. They are crucial building blocks in the on-going digitisation efforts of archives, where they aid in preserving archival materials and foster knowledge sharing. The latter is especially facilitated by making document contents available to interested readers who may have little to no practice in, for example, reading a specific script type, and might therefore face challenges in accessing the material.  The first part of this dissertation focuses on reducing editorial artefacts, specifically in the form of struck-through words, in manuscripts. The main goal of this process is to identify struck-through words and remove as much of the strikethrough artefacts as possible in order to regain access to the original word. This step can serve both as preprocessing, to aid human annotators and readers, as well as in computerised pipelines, such as handwritten text recognition. Two deep learning-based approaches, exploring paired and unpaired data settings, are examined and compared. Furthermore, an approach for generating synthetic strikethrough data, for example, for training and testing purposes, and three novel datasets are presented. The second part of this dissertation is centred around applying handwritten text recognition to the stenographic manuscripts of Swedish children's book author Astrid Lindgren (1907 - 2002). Manually transliterating stenography, also known as shorthand, requires special domain knowledge of the script itself. Therefore, the main focus of this part is to reduce the required manual work, aiming to increase the accessibility of the material. In this regard, a baseline for handwritten text recognition of Swedish stenography is established. Two approaches for improving upon this baseline are examined. Firstly, a variety of data augmentation techniques, commonly-used in handwritten text recognition, are studied. Secondly, different target sequence encoding methods, which aim to approximate diplomatic transcriptions, are investigated. The latter, in combination with a pre-training approach, significantly improves the recognition performance. In addition to the two presented studies, the novel LION dataset is published, consisting of excerpts from Astrid Lindgren's stenographic manuscripts. 
  •  
4.
  •  
5.
  • Heil, Raphaela, et al. (författare)
  • Handwritten Stenography Recognition and the LION Dataset
  • 2024
  • Ingår i: International Journal on Document Analysis and Recognition. - 1433-2833 .- 1433-2825.
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we establish the first baseline for handwritten stenography recognition, using the novel LION dataset, and investigate the impact of including selected aspects of stenographic theory into the recognition process. We make the LION dataset publicly available with the aim of encouraging future research in handwritten stenography recognition. A state-of-the-art text recognition model is trained to establish a baseline. Stenographic domain knowledge is integrated by transforming the target sequences into representations which approximate diplomatic transcriptions, wherein each symbol in the script is represented by its own character in the transliteration, as opposed to corresponding combinations of characters from the Swedish alphabet. Four such encoding schemes are evaluated and results are further improved by integrating a pre-training scheme, based on synthetic data. The baseline model achieves an average test character error rate (CER) of 29.81% and a word error rate (WER) of 55.14%. Test error rates are reduced significantly (p< 0.01) by combining stenography-specific target sequence encodings with pre-training and fine-tuning, yielding CERs in the range of 24.5–26% and WERs of 44.8–48.2%. An analysis of selected recognition errors illustrates the challenges that the stenographic writing system poses to text recognition. This work establishes the first baseline for handwritten stenography recognition. Our proposed combination of integrating stenography-specific knowledge, in conjunction with pre-training and fine-tuning on synthetic data, yields considerable improvements. Together with our precursor study on the subject, this is the first work to apply modern handwritten text recognition to stenography. The dataset and our code are publicly available via Zenodo.
  •  
6.
  • Heil, Raphaela, et al. (författare)
  • Paired Image to Image Translation for Strikethrough Removal from Handwritten Words
  • 2022
  • Ingår i: DOCUMENT ANALYSIS SYSTEMS, DAS 2022. - Cham : Springer Nature. - 9783031065552 - 9783031065545 ; , s. 309-322
  • Konferensbidrag (refereegranskat)abstract
    • Transcribing struck-through, handwritten words, for example for the purpose of genetic criticism, can pose a challenge to both humans and machines, due to the obstructive properties of the superimposed strokes. This paper investigates the use of paired image to image translation approaches to remove strikethrough strokes from handwritten words. Four different neural network architectures are examined, ranging from a few simple convolutional layers to deeper ones, employing Dense blocks. Experimental results, obtained from one synthetic and one genuine paired strikethrough dataset, confirm that the proposed paired models outperform the CycleGAN-based state of the art, while using less than a sixth of the trainable parameters.
  •  
7.
  • Heil, Raphaela, et al. (författare)
  • Restoration of Archival Images Using NeuralNetworks
  • 2022
  • Ingår i: Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022). ; , s. 79-93
  • Konferensbidrag (refereegranskat)abstract
    • Substantial parts of the image material of today’s digital archives are of low quality, creating problemsfor automated processing using machine learning. These quality issues can stem from a multitude ofreasons, ranging from damaged originals to the reproduction hardware. Modern machine learninghas made automatic “restoration” or “colourization” readily available. Curators and scholars mightwant to “improve” or “restore” the original’s quality to create engagement with the artefacts. However,a fundamental problem of the “restoration” process is that information must always be added to theoriginal, creating reproductions with a synthesized extended realism.In this paper, we will discuss the nature of the “restoration” or “colourization” process in two parts.Firstly, we will focus on how the restoration algorithms work, discussing the nature of digital imageryand some intrinsic properties of “enhancement”. Secondly, we propose a system, based on modernmachine learning, that can automatically “improve” the quality of digital reproductions of handwrittenmedieval manuscripts to allow for large scale computerized analysis. Furthermore, we provide code forthe proposed system. Lastly, we end the paper by discussing when and if “restoration” can, and should,be used.
  •  
8.
  • Heil, Raphaela, et al. (författare)
  • Shorthand Secrets: Deciphering Astrid Lindgren's Stenographed Drafts with HTR Methods
  • 2021
  • Konferensbidrag (refereegranskat)abstract
    • Astrid Lindgren, Swedish author of children’s books, is knownfor having both composed and edited her literary work in the Melin sys-tem of shorthand (a Swedish shorthand system based on Gabelsberger).Her original drafts and manuscripts are preserved in 670 stenographednotepads kept at the National Library of Sweden and The Swedish Insti-tute of Children’s Books. For long these notepads have been consideredundecipherable and are until recently untouched by research.This paper introduces handwritten text recognition (HTR) and docu-ment image analysis (DIA) approaches to address the challenges inherentin Lindgren’s original drafts and manuscripts. It broadly covers aspectssuch as preprocessing and extraction of words, alignment of transcrip-tions and the fast transcription of large amounts of words.This is the first work to apply HTR and DIA to Gabelsberger-basedshorthand material. In particular, it presents early-stage results whichdemonstrate that these stenographed manuscripts can indeed be tran-scribed, both manually by experts and by employing computerised ap-proaches.
  •  
9.
  • Heil, Raphaela, et al. (författare)
  • Strikethrough Removal from Handwritten Words Using CycleGANs
  • 2021
  • Ingår i: Document Analysis and Recognition -- ICDAR 2021. - Cham : Springer. ; , s. 572-586
  • Konferensbidrag (refereegranskat)abstract
    • Obtaining the original, clean forms of struck-through handwritten words can be of interest to literary scholars, focusing on tasks such as genetic criticism. In addition to this, replacing struck-through words can also have a positive impact on text recognition tasks. This work presents a novel unsupervised approach for strikethrough removal from handwritten words, employing cycle-consistent generative adversarial networks (CycleGANs). The removal performance is improved upon by extending the network with an attribute-guided approach. Furthermore, two new datasets, a synthetic multi-writer set, based on the IAM database, and a genuine single-writer dataset, are introduced for the training and evaluation of the models. The experimental results demonstrate the efficacy of the proposed method, where the examined attribute-guided models achieve F1 scores above 0.8 on the synthetic test set, improving upon the performance of the regular CycleGAN. Despite being trained exclusively on the synthetic dataset, the examined models even produce convincing cleaned images for genuine struck-through words. 
  •  
10.
  • Heil, Raphaela, et al. (författare)
  • Word Spotting in Historical Handwritten Manuscripts using Capsule Networks
  • 2018
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • Word spotting is popularly used for digitisation and transcription of historical handwritten documents. Recently, deep learning based methods have dominated the current state-of-the-art in learning-based word spotting. However, deep learning architectures such as Convolutional Neural Networks (CNNs) require a large amount of training data, and suffer from translation invariance. Capsule Networks (CapsNet) have been recently introduced as a data-efficient alternative to CNNs. This work explores the applicability of CapsNets for segmentation-based word spotting, and is the first such effort in the Handwritten Text Recognition (HTR) community to the best of authors' knowledge. The effectiveness of CapsNets will be empirically evaluated on well-known historical handwritten datasets using standard evaluation measures. The impact of varying amounts of training data on the recognition performance will be investigated, along with a comparison with the state-of-the-art methods.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 10

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy