Sökning: onr:"swepub:oai:DiVA.org:ltu-86918" >
Cascade Network wit...
Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images
-
- Hashmi, Khurram Azeem (författare)
- Department of Computer Science, Technical University, 67663 Kaiserslautern, Germany; Mindgarage, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany
-
- Pagani, Alain (författare)
- German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany
-
- Liwicki, Marcus (författare)
- Luleå tekniska universitet,EISLAB
-
visa fler...
-
- Stricker, Didier (författare)
- Department of Computer Science, Technical University, 67663 Kaiserslautern, Germany; German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany
-
- Afzal, Muhammad Zeshan (författare)
- Department of Computer Science, Technical University, 67663 Kaiserslautern, Germany; Mindgarage, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany
-
visa färre...
-
(creator_code:org_t)
- 2021-08-19
- 2021
- Engelska.
-
Ingår i: Applied Sciences. - : MDPI. - 2076-3417. ; 11:16
- Relaterad länk:
-
https://doi.org/10.3...
-
visa fler...
-
https://ltu.diva-por... (primary) (Raw object)
-
https://www.mdpi.com...
-
https://urn.kb.se/re...
-
https://doi.org/10.3...
-
visa färre...
Abstract
Ämnesord
Stäng
- This paper presents a novel architecture for detecting mathematical formulas in document images, which is an important step for reliable information extraction in several domains. Recently, Cascade Mask R-CNN networks have been introduced to solve object detection in computer vision. In this paper, we suggest a couple of modifications to the existing Cascade Mask R-CNN architecture: First, the proposed network uses deformable convolutions instead of conventional convolutions in the backbone network to spot areas of interest better. Second, it uses a dual backbone of ResNeXt-101, having composite connections at the parallel stages. Finally, our proposed network is end-to-end trainable. We evaluate the proposed approach on the ICDAR-2017 POD and Marmot datasets. The proposed approach demonstrates state-of-the-art performance on ICDAR-2017 POD at a higher IoU threshold with an f1-score of 0.917, reducing the relative error by 7.8%. Moreover, we accomplished correct detection accuracy of 81.3% on embedded formulas on the Marmot dataset, which results in a relative error reduction of 30%.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Datorseende och robotik (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Computer Vision and Robotics (hsv//eng)
Nyckelord
- formula detection
- Cascade Mask R-CNN
- mathematical expression detection
- document image analysis
- deep neural networks
- computer vision
- Maskininlärning
- Machine Learning
Publikations- och innehållstyp
- ref (ämneskategori)
- art (ämneskategori)
Hitta via bibliotek
Till lärosätets databas