Sökning: WFRF:(Olsson Leif Jöran 1971) >
Integrating languag...
Integrating language resources in two OCR engines to improve processing of historical Swedish text.
-
- Dannélls, Dana, 1976 (författare)
- Gothenburg University,Göteborgs universitet,Institutionen för svenska språket,Department of Swedish
-
- Olsson, Leif-Jöran, 1971 (författare)
- Gothenburg University,Göteborgs universitet,Institutionen för svenska språket,Department of Swedish
-
(creator_code:org_t)
- 2018
- 2018
- Engelska.
-
Ingår i: CLARIN Annual Conference.
- Relaterad länk:
-
https://gup.ub.gu.se...
Abstract
Ämnesord
Stäng
- We are aiming to address the difficulties that many History and Social Sciences researchers struggle with to bring in non-digitized text into language analysis workflows. In this paper we present the language resources and material we used for training two Optical Character Recognition engines for processing historical Swedish text written in Fraktur (blackletter). The trained models, resources and dictionaries are freely available and accessible through our web service, hosted at Språkbanken, to enable users and developers easy access for extraction of historical Swedish text a that are only available in images for further processing.
Ämnesord
- HUMANIORA -- Annan humaniora -- Övrig annan humaniora (hsv//swe)
- HUMANITIES -- Other Humanities -- Other Humanities not elsewhere specified (hsv//eng)
- NATURVETENSKAP -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Language Technology (hsv//eng)
Nyckelord
- OCR
- Historical Swedish text
- Language models.
Publikations- och innehållstyp
- vet (ämneskategori)
- kon (ämneskategori)