Sökning: onr:"swepub:oai:DiVA.org:ltu-92096" >
Toward Semi-Supervised Graphical Object Detection in Document Images
-
- Kallempudi, Goutham (författare)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
-
- Hashmi, Khurram Azeem (författare)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; Mindgarage, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany
-
- Pagani, Alain (författare)
- German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany
-
visa fler...
-
- Liwicki, Marcus (författare)
- Luleå tekniska universitet,EISLAB
-
- Stricker, Didier (författare)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany
-
- Afzal, Muhammad Zeshan (författare)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; Mindgarage, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany
-
visa färre...
-
(creator_code:org_t)
- 2022-06-08
- Engelska.
-
Ingår i: Future Internet. - : MDPI. - 1999-5903. ; 14:6
- Relaterad länk:
-
https://doi.org/10.3...
-
visa fler...
-
https://urn.kb.se/re...
-
https://doi.org/10.3...
-
visa färre...
Abstract
Ämnesord
Stäng
- The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in documents. However, these models necessitate a substantial amount of labeled data for the training process. This paper presents an end-to-end semi-supervised framework for graphical object detection in scanned document images to address this limitation. Our method is based on a recently proposed Soft Teacher mechanism that examines the effects of small percentage-labeled data on the classification and localization of graphical objects. On both the PubLayNet and the IIIT-AR-13K datasets, the proposed approach outperforms the supervised models by a significant margin in all labeling ratios (1%, 5%, and 10%). Furthermore, the 10% PubLayNet Soft Teacher model improves the average precision of Table, Figure, and List by +5.4,+1.2, and +3.2 points, respectively, with a similar total mAP as the Faster-RCNN baseline. Moreover, our model trained on 10% of IIIT-AR-13K labeled data beats the previous fully supervised method +4.5 points.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Programvaruteknik (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Software Engineering (hsv//eng)
- NATURVETENSKAP -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
Nyckelord
- graphical page objects
- object detection
- document image analysis
- semi-supervised
- soft teacher
- Maskininlärning
- Machine Learning
Publikations- och innehållstyp
- ref (ämneskategori)
- art (ämneskategori)