SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Shah Mubarak) "

Sökning: WFRF:(Shah Mubarak)

  • Resultat 1-10 av 16
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Acsintoae, Andra, et al. (författare)
  • UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
  • 2022
  • Ingår i: 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022). - : IEEE COMPUTER SOC. - 9781665469463 - 9781665469470 ; , s. 20111-20121
  • Konferensbidrag (refereegranskat)abstract
    • Detecting abnormal events in video is commonly framed as a one-class classification task, where training videos contain only normal events, while test videos encompass both normal and abnormal events. In this scenario, anomaly detection is an open-set problem. However, some studies assimilate anomaly detection to action recognition. This is a closed-set scenario that fails to test the capability of systems at detecting new anomaly types. To this end, we propose UBnormal, a new supervised open-set benchmark composed of multiple virtual scenes for video anomaly detection. Unlike existing data sets, we introduce abnormal events annotated at the pixel level at training time, for the first time enabling the use of fully-supervised learning methods for abnormal event detection. To preserve the typical open-set formulation, we make sure to include dis-joint sets of anomaly types in our training and test collections of videos. To our knowledge, UBnormal is the first video anomaly detection benchmark to allow a fair head-to-head comparison between one-class open-set models and supervised closed-set models, as shown in our experiments. Moreover, we provide empirical evidence showing that UB-normal can enhance the performance of a state-of-the-art anomaly detection framework on two prominent data sets, Avenue and ShanghaiTech. Our benchmark is freely available at https://github.com/lilygeorgescu/UBnormal.
  •  
2.
  • Barbalau, Antonio, et al. (författare)
  • SSMTL plus plus : Revisiting self-supervised multi-task learning for video anomaly detection
  • 2023
  • Ingår i: Computer Vision and Image Understanding. - : ACADEMIC PRESS INC ELSEVIER SCIENCE. - 1077-3142 .- 1090-235X. ; 229
  • Tidskriftsartikel (refereegranskat)abstract
    • A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on detecting high-motion regions using optical flow or background subtraction, since we believe the currently used pre-trained YOLOv3 is suboptimal, e.g. objects in motion or objects from unknown classes are never detected. Second, we modernize the 3D convolutional backbone by introducing multi-head self-attention modules, inspired by the recent success of vision transformers. As such, we alternatively introduce both 2D and 3D convolutional vision transformer (CvT) blocks. Third, in our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps through knowledge distillation, solving jigsaw puzzles, estimating body pose through knowledge distillation, predicting masked regions (inpainting), and adversarial learning with pseudo-anomalies. We conduct experiments to assess the performance impact of the introduced changes. Upon finding more promising configurations of the framework, dubbed SSMTL++v1 and SSMTL++v2, we extend our preliminary experiments to more data sets, demonstrating that our performance gains are consistent across all data sets. In most cases, our results on Avenue, ShanghaiTech and UBnormal raise the state-of-the-art performance bar to a new level.
  •  
3.
  • Bhunia, Ankan Kumar, et al. (författare)
  • Handwriting Transformers
  • 2021
  • Ingår i: 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021). - : IEEE. - 9781665428125 - 9781665428132 ; , s. 1066-1074
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns. The proposed HWT captures the long and short range relationships within the style examples through a self-attention mechanism, thereby encoding both global and local style patterns. Further, the proposed transformer-based HWT comprises an encoder-decoder attention that enables style-content entanglement by gathering the style representation of each query character. To the best of our knowledge, we are the first to introduce a transformer-based generative network for styled handwritten text generation. Our proposed HWT generates realistic styled handwritten text images and significantly outperforms the state-of-the-art demonstrated through extensive qualitative, quantitative and human-based evaluations. The proposed HWT can handle arbitrary length of text and any desired writing style in a few-shot setting. Further, our HWT generalizes well to the challenging scenario where both words and writing style are unseen during training, generating realistic styled handwritten text images.
  •  
4.
  • Bhunia, Ankan Kumar, et al. (författare)
  • Person Image Synthesis via Denoising Diffusion Model
  • 2023
  • Ingår i: 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR. - : IEEE COMPUTER SOC. - 9798350301298 - 9798350301304 ; , s. 5968-5976
  • Konferensbidrag (refereegranskat)abstract
    • The pose-guided person image generation task requires synthesizing photorealistic images of humans in arbitrary poses. The existing approaches use generative adversarial networks that do not necessarily maintain realistic textures or need dense correspondences that struggle to handle complex deformations and severe occlusions. In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution. Our proposed Person Image Diffusion Model (PIDM) disintegrates the complex transfer problem into a series of simpler forward-backward denoising steps. This helps in learning plausible source-to-target transformation trajectories that result in faithful textures and undistorted appearance details. We introduce a texture diffusion module based on cross-attention to accurately model the correspondences between appearance and pose information available in source and target images. Further, we propose disentangled classifier-free guidance to ensure close resemblance between the conditional inputs and the synthesized output in terms of both pose and appearance information. Our extensive results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios. We also show how our generated images can help in downstream tasks. Code is available at https://github.com/ankanbhunia/PIDM.
  •  
5.
  • Cao, Jiale, et al. (författare)
  • PSTR: End-to-End One-Step Person Search With Transformers
  • 2022
  • Ingår i: 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR). - : IEEE COMPUTER SOC. - 9781665469463 - 9781665469470 ; , s. 9448-9457
  • Konferensbidrag (refereegranskat)abstract
    • We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture. PSTR comprises a person search-specialized (PSS) module that contains a detection encoder-decoder for person detection along with a discriminative re-id decoder for person re-id. The discriminative re-id decoder utilizes a multi-level supervision scheme with a shared decoder for discriminative re-id feature learning and also comprises a part attention block to encode relationship between different parts of a person. We further introduce a simple multi-scale scheme to support re-id across person instances at different scales. PSTR jointly achieves the diverse objectives of object-level recognition (detection) and instance-level matching (re-id). To the best of our knowledge, we are the first to propose an end-to-end one-step transformer-based person search framework. Experiments are performed on two popular benchmarks: CUHK-SYSU and PRW. Our extensive ablations reveal the merits of the proposed contributions. Further, the proposed PSTR sets a new state-of-the-art on both benchmarks. On the challenging PRW benchmark, PSTR achieves a mean average precision (mAP) score of 56.5%.
  •  
6.
  • Gupta, Akshita, et al. (författare)
  • OW-DETR: Open-world Detection Transformer
  • 2022
  • Ingår i: 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR). - : IEEE COMPUTER SOC. - 9781665469463 - 9781665469470 ; , s. 9225-9234
  • Konferensbidrag (refereegranskat)abstract
    • Open-world object detection (OWOD) is a challenging computer vision problem, where the task is to detect a known set of object categories while simultaneously identifying unknown objects. Additionally, the model must incrementally learn new classes that become known in the next training episodes. Distinct from standard object detection, the OWOD setting poses significant challenges for generating quality candidate proposals on potentially unknown objects, separating the unknown objects from the background and detecting diverse unknown objects. Here, we introduce a novel end-to-end transformer-based framework, OW-DETR, for open-world object detection. The proposed OW-DETR comprises three dedicated components namely, attention-driven pseudo-labeling, novelty classification and objectness scoring to explicitly address the aforementioned OWOD challenges. Our OW-DETR explicitly encodes multi-scale contextual information, possesses less inductive bias, enables knowledge transfer from known classes to the unknown class and can better discriminate between unknown objects and background. Comprehensive experiments are performed on two benchmarks: MS-COCO and PASCAL VOC. The extensive ablations reveal the merits of our proposed contributions. Further, our model outperforms the recently introduced OWOD approach, ORE, with absolute gains ranging from 1.8% to 3.3% in terms of unknown recall on MS-COCO. In the case of incremental object detection, OW-DETR outperforms the state-of-theart for all settings on PASCAL VOC. Our code is available at https://github.com/akshitac8/OW-DETR.
  •  
7.
  • Hanif, Asif, et al. (författare)
  • Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation
  • 2023
  • Ingår i: MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II. - : SPRINGER INTERNATIONAL PUBLISHING AG. - 9783031438943 - 9783031438950 ; , s. 457-467
  • Konferensbidrag (refereegranskat)abstract
    • It is imperative to ensure the robustness of deep learning models in critical applications such as, healthcare. While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. Using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. Code is available at https://github.com/asif-hanif/vafa.
  •  
8.
  • Khan, Salman, et al. (författare)
  • Guest Editorial Introduction to the Special Section on Transformer Models in Vision
  • 2023
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 45:11, s. 12721-12725
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • Transformer models have achieved outstanding results on a variety of language tasks, such as text classification, ma- chine translation, and question answering. This success in the field of Natural Language Processing (NLP) has sparked interest in the computer vision community to apply these models to vision and multi-modal learning tasks. However, visual data has a unique structure, requiring the need to rethink network designs and training methods. As a result, Transformer models and their variations have been suc- cessfully used for image recognition, object detection, seg- mentation, image super-resolution, video understanding, image generation, text-image synthesis, and visual question answering, among other applications.
  •  
9.
  • Khan, Salman, et al. (författare)
  • Transformers in Vision: A Survey
  • 2022
  • Ingår i: ACM Computing Surveys. - : ASSOC COMPUTING MACHINERY. - 0360-0300 .- 1557-7341. ; 54:10
  • Tidskriftsartikel (refereegranskat)abstract
    • Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks, e.g., Long short-term memory. Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text, and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets. These strengths have led to exciting progress on a number of vision tasks using Transformer networks. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline. We start with an introduction to fundamental concepts behind the success of Transformers, i.e., self-attention, large-scale pre-training, and bidirectional feature encoding. We then cover extensive applications of transformers in vision including popular recognition tasks (e.g., image classification, object detection, action recognition, and segmentation), generative modeling, multi-modal tasks (e.g., visual-question answering, visual reasoning, and visual grounding), video processing (e.g., activity recognition, video forecasting), low-level vision (e.g., image super-resolution, image enhancement, and colorization), and three-dimensional analysis (e.g., point cloud classification and segmentation). We compare the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value. Finally, we provide an analysis on open research directions and possible future works. We hope this effort will ignite further interest in the community to solve current challenges toward the application of transformer models in computer vision.
  •  
10.
  • Kini, Jyoti, et al. (författare)
  • CT-VOS: Cutout prediction and tagging for self-supervised video object segmentation
  • 2024
  • Ingår i: Computer Vision and Image Understanding. - : ACADEMIC PRESS INC ELSEVIER SCIENCE. - 1077-3142 .- 1090-235X. ; 238
  • Tidskriftsartikel (refereegranskat)abstract
    • We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better visual correspondence across frames and object-background discriminability for accurate object segmentation. Distinct from previous self-supervised VOS methods, our approach is based on a learning loss formulation that takes into account both object and background information to ensure object-background discriminability, rather than using only object appearance. The objective function comprises cutout-based reconstruction (cutout region represents part of a frame, whose pixels are replaced with some constant values) and tag prediction loss terms. The cutout-based reconstruction term utilizes a simple cutout scheme to learn the pixel-wise correspondence between the current and previous frames in order to reconstruct the original current frame with added cutout region in it. The introduced cutout patch guides the model to focus on the reappearance of scene parts , thereby implicitly equipping the model to address occlusion-based scenarios Next, the tag prediction term encourages object-background separability by grouping tags of all pixels in the cutout region that are similar, while separating them from the tags of the rest of the reconstructed pixels. Additionally, we introduce a zoom-in scheme that addresses the problem of small object segmentation by capturing fine structural information at multiple scales. Our proposed approach, termed CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS. A detailed ablation showcases the importance of the proposed loss formulation to effectively establish correspondences, object-background discriminability, and the impact of our zoom-in scheme to accurately segment small-sized objects.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy