SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Hashmi Khurram Azeem) srt2:(2022)"

Sökning: WFRF:(Hashmi Khurram Azeem) > (2022)

  • Resultat 1-6 av 6
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Hashmi, Khurram Azeem, et al. (författare)
  • Exploiting Concepts of Instance Segmentation to Boost Detection in Challenging Environments
  • 2022
  • Ingår i: Sensors. - : MDPI. - 1424-8220. ; 22:10
  • Tidskriftsartikel (refereegranskat)abstract
    • In recent years, due to the advancements in machine learning, object detection has become a mainstream task in the computer vision domain. The first phase of object detection is to find the regions where objects can exist. With the improvements in deep learning, traditional approaches, such as sliding windows and manual feature selection techniques, have been replaced with deep learning techniques. However, object detection algorithms face a problem when performed in low light, challenging weather, and crowded scenes, similar to any other task. Such an environment is termed a challenging environment. This paper exploits pixel-level information to improve detection under challenging situations. To this end, we exploit the recently proposed hybrid task cascade network. This network works collaboratively with detection and segmentation heads at different cascade levels. We evaluate the proposed methods on three complex datasets of ExDark, CURE-TSD, and RESIDE, and achieve a mAP of 0.71, 0.52, and 0.43, respectively. Our experimental results assert the efficacy of the proposed approach.
  •  
2.
  • Kallempudi, Goutham, et al. (författare)
  • Toward Semi-Supervised Graphical Object Detection in Document Images
  • 2022
  • Ingår i: Future Internet. - : MDPI. - 1999-5903. ; 14:6
  • Tidskriftsartikel (refereegranskat)abstract
    • The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in documents. However, these models necessitate a substantial amount of labeled data for the training process. This paper presents an end-to-end semi-supervised framework for graphical object detection in scanned document images to address this limitation. Our method is based on a recently proposed Soft Teacher mechanism that examines the effects of small percentage-labeled data on the classification and localization of graphical objects. On both the PubLayNet and the IIIT-AR-13K datasets, the proposed approach outperforms the supervised models by a significant margin in all labeling ratios (1%, 5%, and 10%). Furthermore, the 10% PubLayNet Soft Teacher model improves the average precision of Table, Figure, and List by +5.4,+1.2, and +3.2 points, respectively, with a similar total mAP as the Faster-RCNN baseline. Moreover, our model trained on 10% of IIIT-AR-13K labeled data beats the previous fully supervised method +4.5 points.
  •  
3.
  • Muralidhara, Shishir, et al. (författare)
  • Attention-Guided Disentangled Feature Aggregation for Video Object Detection
  • 2022
  • Ingår i: Sensors. - : MDPI. - 1424-8220. ; 22:21
  • Tidskriftsartikel (refereegranskat)abstract
    • Object detection is a computer vision task that involves localisation and classification of objects in an image. Video data implicitly introduces several challenges, such as blur, occlusion and defocus, making video object detection more challenging in comparison to still image object detection, which is performed on individual and independent images. This paper tackles these challenges by proposing an attention-heavy framework for video object detection that aggregates the disentangled features extracted from individual frames. The proposed framework is a two-stage object detector based on the Faster R-CNN architecture. The disentanglement head integrates scale, spatial and task-aware attention and applies it to the features extracted by the backbone network across all the frames. Subsequently, the aggregation head incorporates temporal attention and improves detection in the target frame by aggregating the features of the support frames. These include the features extracted from the disentanglement network along with the temporal features. We evaluate the proposed framework using the ImageNet VID dataset and achieve a mean Average Precision (mAP) of 49.8 and 52.5 using the backbones of ResNet-50 and ResNet-101, respectively. The improvement in performance over the individual baseline methods validates the efficacy of the proposed approach.
  •  
4.
  • Naik, Shivam, et al. (författare)
  • Investigating Attention Mechanism for Page Object Detection in Document Images
  • 2022
  • Ingår i: Applied Sciences. - : MDPI. - 2076-3417. ; 12:15
  • Tidskriftsartikel (refereegranskat)abstract
    • Page object detection in scanned document images is a complex task due to varying document layouts and diverse page objects. In the past, traditional methods such as Optical Character Recognition (OCR)-based techniques have been employed to extract textual information. However, these methods fail to comprehend complex page objects such as tables and figures. This paper addresses the localization problem and classification of graphical objects that visually summarize vital information in documents. Furthermore, this work examines the benefit of incorporating attention mechanisms in different object detection networks to perform page object detection on scanned document images. The model is designed with a Pytorch-based framework called Detectron2. The proposed pipelines can be optimized end-to-end and exhaustively evaluated on publicly available datasets such as DocBank, PublayNet, and IIIT-AR-13K. The achieved results reflect the effectiveness of incorporating the attention mechanism for page object detection in documents.
  •  
5.
  • Shehzadi, Tahira, et al. (författare)
  • Mask-Aware Semi-Supervised Object Detection in Floor Plans
  • 2022
  • Ingår i: Applied Sciences. - : MDPI. - 2076-3417. ; 12:19
  • Tidskriftsartikel (refereegranskat)abstract
    • Research has been growing on object detection using semi-supervised methods in past few years. We examine the intersection of these two areas for floor-plan objects to promote the research objective of detecting more accurate objects with less labeled data. The floor-plan objects include different furniture items with multiple types of the same class, and this high inter-class similarity impacts the performance of prior methods. In this paper, we present Mask R-CNN-based semi-supervised approach that provides pixel-to-pixel alignment to generate individual annotation masks for each class to mine the inter-class similarity. The semi-supervised approach has a student–teacher network that pulls information from the teacher network and feeds it to the student network. The teacher network uses unlabeled data to form pseudo-boxes, and the student network uses both label data with the pseudo boxes and labeled data as the ground truth for training. It learns representations of furniture items by combining labeled and label data. On the Mask R-CNN detector with ResNet-101 backbone network, the proposed approach achieves a mAP of 98.8%, 99.7%, and 99.8% with only 1%, 5% and 10% labeled data, respectively. Our experiment affirms the efficiency of the proposed approach, as it outperforms the previous semi-supervised approaches using only 1% of the labels.
  •  
6.
  • Sinha, Sankalp, et al. (författare)
  • Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
  • 2022
  • Ingår i: Applied Sciences. - : MDPI. - 2076-3417. ; 12:20
  • Tidskriftsartikel (refereegranskat)abstract
    • In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document images. Document images are sparse in contextual information, and the graphical page objects are logically clustered. This paper investigates the effectiveness of deep and robust backbones in the document image domain. Further, it explores the idea of learnable object proposals through Sparse R-CNN. This paper shows that simple domain adaptation of top-performing object detectors to the document image domain does not lead to better results. Furthermore, empirically showing that detectors based on dense object priors like Faster R-CNN, Mask R-CNN, and Cascade Mask R-CNN are perhaps not best suited for graphical page object detection. Detectors that reduce the number of object candidates while making them learnable are a step towards a better approach. We formulate and evaluate the Sparse R-CNN (SR-CNN) model on the IIIT-AR-13k, PubLayNet, and DocBank datasets and hope to inspire a rethinking of object proposals in the domain of graphical page object detection.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-6 av 6

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy