SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Sullivan Josephine) "

Search: WFRF:(Sullivan Josephine)

  • Result 1-10 of 59
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Aghazadeh, Omid, 1982-, et al. (author)
  • Mixture component identification and learning for visual recognition
  • 2012
  • In: Computer Vision – ECCV 2012. - Berlin, Heidelberg : Springer. - 9783642337826 ; , s. 115-128
  • Conference paper (peer-reviewed)abstract
    • The non-linear decision boundary between object and background classes - due to large intra-class variations - needs to be modelled by any classifier wishing to achieve good results. While a mixture of linear classifiers is capable of modelling this non-linearity, learning this mixture from weakly annotated data is non-trivial and is the paper's focus. Our approach is to identify the modes in the distribution of our positive examples by clustering, and to utilize this clustering in a latent SVM formulation to learn the mixture model. The clustering relies on a robust measure of visual similarity which suppresses uninformative clutter by using a novel representation based on the exemplar SVM. This subtle clustering of the data leads to learning better mixture models, as is demonstrated via extensive evaluations on Pascal VOC 2007. The final classifier, using a HOG representation of the global image patch, achieves performance comparable to the state-of-the-art while being more efficient at detection time.
  •  
2.
  • Aghazadeh, Omid, et al. (author)
  • Multi view registration for novelty/background separation
  • 2012
  • In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. - : IEEE Computer Society. - 9781467312264 ; , s. 757-764
  • Conference paper (peer-reviewed)abstract
    • We propose a system for the automatic segmentation of novelties from the background in scenarios where multiple images of the same environment are available e.g. obtained by wearable visual cameras. Our method finds the pixels in a query image corresponding to the underlying background environment by comparing it to reference images of the same scene. This is achieved despite the fact that all the images may have different viewpoints, significantly different illumination conditions and contain different objects cars, people, bicycles, etc. occluding the background. We estimate the probability of each pixel, in the query image, belonging to the background by computing its appearance inconsistency to the multiple reference images. We then, produce multiple segmentations of the query image using an iterated graph cuts algorithm, initializing from these estimated probabilities and consecutively combine these segmentations to come up with a final segmentation of the background. Detection of the background in turn highlights the novel pixels. We demonstrate the effectiveness of our approach on a challenging outdoors data set.
  •  
3.
  • Aghazadeh, Omid, 1982-, et al. (author)
  • Novelty Detection from an Ego-Centric perspective
  • 2011
  • In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. - 9781457703942 ; , s. 3297-3304
  • Conference paper (peer-reviewed)abstract
    • This paper demonstrates a system for the automatic extraction of novelty in images captured from a small video camera attached to a subject's chest, replicating his visual perspective, while performing activities which are repeated daily. Novelty is detected when a (sub)sequence cannot be registered to previously stored sequences captured while performing the same daily activity. Sequence registration is performed by measuring appearance and geometric similarity of individual frames and exploiting the invariant temporal order of the activity. Experimental results demonstrate that this is a robust way to detect novelties induced by variations in the wearer's ego-motion such as stopping and talking to a person. This is an essentially new and generic way of automatically extracting information of interest to the camera wearer and can be used as input to a system for life logging or memory support.
  •  
4.
  • Azizpour, Hossein, 1985-, et al. (author)
  • Factors of Transferability for a Generic ConvNet Representation
  • 2016
  • In: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE Computer Society Digital Library. - 0162-8828 .- 1939-3539. ; 38:9, s. 1790-1802
  • Journal article (peer-reviewed)abstract
    • Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.
  •  
5.
  • Azizpour, Hossein, 1985-, et al. (author)
  • From Generic to Specific Deep Representations for Visual Recognition
  • 2015
  • In: Proceedings of CVPR 2015. - : IEEE conference proceedings. - 9781467367592
  • Conference paper (peer-reviewed)abstract
    • Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve stateof-the-art performances on 16 visual recognition tasks.
  •  
6.
  • Baldassarre, Federico, et al. (author)
  • Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks
  • 2020
  • In: Proceedings, Part XXVIII Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020. - Cham : Springer Nature. ; , s. 612-630
  • Conference paper (peer-reviewed)abstract
    • Visual relationship detection is fundamental for holistic image understanding. However, the localization and classification of (subject, predicate, object) triplets remain challenging tasks, due to the combinatorial explosion of possible relationships, their long-tailed distribution in natural images, and an expensive annotation process. This paper introduces a novel weakly-supervised method for visual relationship detection that relies on minimal image-level predicate labels. A graph neural network is trained to classify predicates in images from a graph representation of detected objects, implicitly encoding an inductive bias for pairwise relations. We then frame relationship detection as the explanation of such a predicate classifier, i.e. we obtain a complete relation by recovering the subject and object of a predicted predicate. We present results comparable to recent fully- and weakly-supervised methods on three diverse and challenging datasets: HICO-DET for human-object interaction, Visual Relationship Detection for generic object-to-object relations, and UnRel for unusual triplets; demonstrating robustness to non-comprehensive annotations and good few-shot generalization.
  •  
7.
  • Baldassarre, Federico (author)
  • Structured Representations for Explainable Deep Learning
  • 2023
  • Doctoral thesis (other academic/artistic)abstract
    • Deep learning has revolutionized scientific research and is being used to take decisions in increasingly complex scenarios. With growing power comes a growing demand for transparency and interpretability. The field of Explainable AI aims to provide explanations for the predictions of AI systems. The state of the art of AI explainability, however, is far from satisfactory. For example, in Computer Vision, the most prominent post-hoc explanation methods produce pixel-wise heatmaps over the input domain, which are meant to visualize the importance of individual pixels of an image or video. We argue that such dense attribution maps are poorly interpretable to non-expert users because of the domain in which explanations are formed - we may recognize shapes in a heatmap but they are just blobs of pixels. In fact, the input domain is closer to the raw data of digital cameras than to the interpretable structures that humans use to communicate, e.g. objects or concepts. In this thesis, we propose to move beyond dense feature attributions by adopting structured internal representations as a more interpretable explanation domain. Conceptually, our approach splits a Deep Learning model in two: the perception step that takes as input dense representations and the reasoning step that learns to perform the task at hand. At the interface between the two are structured representations that correspond to well-defined objects, entities, and concepts. These representations serve as the interpretable domain for explaining the predictions of the model, allowing us to move towards more meaningful and informative explanations. The proposed approach introduces several challenges, such as how to obtain structured representations, how to use them for downstream tasks, and how to evaluate the resulting explanations. The works included in this thesis address these questions, validating the approach and providing concrete contributions to the field. For the perception step, we investigate how to obtain structured representations from dense representations, whether by manually designing them using domain knowledge or by learning them from data without supervision. For the reasoning step, we investigate how to use structured representations for downstream tasks, from Biology to Computer Vision, and how to evaluate the learned representations. For the explanation step, we investigate how to explain the predictions of models that operate in a structured domain, and how to evaluate the resulting explanations. Overall, we hope that this work inspires further research in Explainable AI and helps bridge the gap between high-performing Deep Learning models and the need for transparency and interpretability in real-world applications.
  •  
8.
  • Björkstrand, David, 1991-, et al. (author)
  • Cross-attention Masked Auto-Encoder for Human 3D Motion Infilling and Denoising
  • 2023
  • Conference paper (peer-reviewed)abstract
    • Human 3D pose and motion capture have numerous applications in fields such as augmented and virtual reality, animation, robotics and sports. However, even the best capturing methods suffer from artifacts such as missed joints and noisy or inaccurate joint positions. To address this we propose the Cross-attention Masked Auto-Encoder (XMAE) for human 3D motion infilling and denoising. XMAE extends the original Masked Auto-Encoder design by introducing cross-attention in the decoder to deal with the train-test gap common in methods utilizing masking and mask tokens. Furthermore, we introduce joint displacement as an additional noise source during training, enabling XMAE to learn to correct incorrect joint positions. Through extensive experiments, we show XMAE's effectiveness compared to state-of-the-art approaches across three public datasets and its ability to denoise real-world data, reducing limb length standard deviation by 28\% when applied on our in-the-wild professional soccer dataset.
  •  
9.
  • Bujwid, Sebastian, et al. (author)
  • Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions
  • 2021
  • In: Proceedings of the Third Workshop on Beyond Vision and LANguage. - : Association for Computational Linguistics. ; , s. 38-52
  • Conference paper (peer-reviewed)abstract
    • We study the impact of using rich and diverse textual descriptions of classes for zero-shot learning (ZSL) on ImageNet. We create a new dataset ImageNet-Wiki that matches each ImageNet class to its corresponding Wikipedia article. We show that merely employing these Wikipedia articles as class descriptions yields much higher ZSL performance than prior works. Even a simple model using this type of auxiliary data outperforms state-of-the-art models that rely on standard features of word embedding encodings of class names. These results highlight the usefulness and importance of textual descriptions for ZSL, as well as the relative importance of auxiliary data type compared to the algorithmic progress. Our experimental results also show that standard zero-shot learning approaches generalize poorly across categories of classes.
  •  
10.
  • Burenius, Magnus, 1983-, et al. (author)
  • 3D pictorial structures for multiple view articulated pose estimation
  • 2013
  • In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). - : IEEE Computer Society. ; , s. 3618-3625
  • Conference paper (peer-reviewed)abstract
    • We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-10 of 59
Type of publication
conference paper (44)
journal article (6)
doctoral thesis (5)
other publication (2)
licentiate thesis (1)
patent (1)
show more...
show less...
Type of content
peer-reviewed (50)
other academic/artistic (8)
pop. science, debate, etc. (1)
Author/Editor
Sullivan, Josephine (39)
Carlsson, Stefan (25)
Sullivan, Josephine, ... (14)
Azizpour, Hossein, 1 ... (8)
Maki, Atsuto (8)
Sharif Razavian, Ali ... (5)
show more...
Razavian, Ali Sharif (5)
Zhong, Yang (5)
Li, Haibo (4)
Zhao, Yu (4)
Danielsson, Oscar, 1 ... (3)
Burenius, Magnus, 19 ... (3)
Aghazadeh, Omid, 198 ... (2)
Azizpour, Hossein (2)
Aghazadeh, Omid (2)
Baldassarre, Federic ... (2)
Smith, Kevin, 1975- (2)
Smith, Kevin, Associ ... (2)
Björkman, Mårten, 19 ... (2)
Ban, Yifang (2)
Loy, Gareth (2)
Nillius, Peter (2)
Kragic, Danica, 1971 ... (1)
Ek, Carl Henrik (1)
Carlsson, Stefan, Pr ... (1)
Halvorsen, Kjartan (1)
Bianchi, G (1)
Jensfelt, Patric, 19 ... (1)
Eriksson, Martin (1)
Carlssom, Stefan (1)
Argyros, Antonis (1)
Azizpour, Hossein, A ... (1)
Sullivan, Josephine, ... (1)
Pirsiavash, Hamed, A ... (1)
Sundblad, Yngve (1)
Söderberg, Magnus (1)
Björkstrand, David, ... (1)
Bretzner, Lars (1)
Wang, Tiesheng (1)
Kjellström, Hedvig, ... (1)
Bujwid, Sebastian (1)
Hayman, Eric (1)
Ban, Yifang, Profess ... (1)
Sullivan, Josephine, ... (1)
Schiele, Bernt, Prof ... (1)
Lindeberg, Tony, Pro ... (1)
Matsoukas, Christos (1)
Gamba, Matteo (1)
Chmielewski-Anders, ... (1)
Gerard, Sebastian (1)
show less...
University
Royal Institute of Technology (59)
Language
English (59)
Research subject (UKÄ/SCB)
Natural sciences (45)
Engineering and Technology (14)

Year

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view