1. |
- Blomqvist, Christopher, et al.
(author)
-
Joint Handwritten Text Recognition and Word Classification for Tabular Information Extraction
- 2022
-
In: 2022 26th International Conference on Pattern Recognition (ICPR). - 9781665490627 - 9781665490634 ; , s. 1564-1570
-
Conference paper (peer-reviewed)abstract
- In this paper, we present a system for extracting tabular information from loosely structured handwritten documents. The system consists of three parts, (i) a u-net like CNN-based method for text detection and segmentation, (ii) a new attention-based method for simultaneous text recognition and classification of word-parts, and (iii) a method for matching the word parts into a tabular structure for each entry. A key contribution is the observation that the new attention-based recognition and classification module makes it possible for improved spatial analysis of the tabular information. The method is evaluated on a unique historical document: The Swedish Wealth Tax of 1571, consisting of 11,453 pages of hand-written tax records. The evaluation shows that the system provides a significant improvement to the state-of-the-art to the problem of tabular extraction from loosely structured historical documents.
|
|