SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1939 3539 "

Sökning: L773:1939 3539

  • Resultat 1-50 av 77
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Aanaes, H, et al. (författare)
  • Robust factorization
  • 2002
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539. ; 24:9, s. 1215-1225
  • Tidskriftsartikel (refereegranskat)abstract
    • Factorization algorithms for recovering structure and motion from an image stream have many advantages, but they usually require a set of well-tracked features. Such a set is in generally not available in practical applications. There is thus a need for making factorization algorithms deal effectively with errors in the tracked features. We propose a new and computationally efficient algorithm for applying an arbitrary errorfunction in the factorization scheme. This algorithm enables the use of robust statistical techniques and arbitrary noise models for the individual features. These techniques and models enable the factorization scheme to deal effectively with mismatched features, missing features, and noise on the individual features. The proposed approach further includes a new method for Euclidean reconstruction that significantly improves convergence of the factorization algorithms. The proposed algorithm has been implemented as a modification of the Christy-Horaud factorization scheme, which yields a perspective reconstruction. Based on this implementation, a considerable increase in error tolerance is demonstrated on real and synthetic data. The proposed scheme can, however, be applied to most other factorization algorithms.
  •  
2.
  • Abdelnour, Jerome, et al. (författare)
  • NAAQA: A Neural Architecture for Acoustic Question Answering
  • 2022
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : Institute of Electrical and Electronics Engineers (IEEE). - 0162-8828 .- 1939-3539 .- 2160-9292. ; , s. 1-12
  • Tidskriftsartikel (refereegranskat)
  •  
3.
  • Azizpour, Hossein, 1985-, et al. (författare)
  • Factors of Transferability for a Generic ConvNet Representation
  • 2016
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE Computer Society Digital Library. - 0162-8828 .- 1939-3539. ; 38:9, s. 1790-1802
  • Tidskriftsartikel (refereegranskat)abstract
    • Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.
  •  
4.
  • Balgi, Sourabh, 1991-, et al. (författare)
  • Contradistinguisher : A Vapnik’s Imperative to Unsupervised Domain Adaptation
  • 2022
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - Piscataway, NJ, United States : Institute of Electrical and Electronics Engineers (IEEE). - 0162-8828 .- 1939-3539. ; 44:9, s. 4730-4747
  • Tidskriftsartikel (refereegranskat)abstract
    • Recent domain adaptation works rely on an indirect way of first aligning the source and target domain distributions and then train a classifier on the labeled source domain to classify the target domain. However, the main drawback of this approach is that obtaining a near-perfect domain alignment in itself might be difficult/impossible (e.g., language domains). To address this, inspired by how humans use supervised-unsupervised learning to perform tasks seamlessly across multiple domains or tasks, we follow Vapnik’s imperative of statistical learning that states any desired problem should be solved in the most direct way rather than solving a more general intermediate task and propose a direct approach to domain adaptation that does not require domain alignment. We propose a model referred to as Contradistinguisher that learns contrastive features and whose objective is to jointly learn to contradistinguish the unlabeled target domain in an unsupervised way and classify in a supervised way on the source domain. We achieve the state-of-the-art on Office-31, Digits and VisDA-2017 datasets in both single-source and multi-source settings. We demonstrate that performing data augmentation results in an improvement in the performance over vanilla approach. We also notice that the contradistinguish-loss enhances performance by increasing the shape bias.
  •  
5.
  • Bigun, Josef, et al. (författare)
  • Multidimensional orientation estimation with applications to texture analysis and optical flow
  • 1991
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : Institute of Electrical and Electronics Engineers (IEEE). - 0162-8828 .- 1939-3539. ; 13:8, s. 775-790
  • Tidskriftsartikel (refereegranskat)abstract
    • The problem of detection of orientation in finite dimensional Euclidean spaces is solved in the least squares sense. In particular, the theory is developed for the case when such orientation computations are necessary at all local neighborhoods of the n-dimensional Euclidean space. Detection of orientation is shown to correspond to fitting an axis or a plane to the Fourier transform of an n-dimensional structure. The solution of this problem is related to the solution of a well-known matrix eigenvalue problem. Moreover, it is shown that the necessary computations can be performed in the spatial domain without actually doing a Fourier transformation. Along with the orientation estimate, a certainty measure, based on the error of the fit, is proposed. Two applications in image analysis are considered: texture segmentation and optical flow. An implementation for 2-D (texture features) as well as 3-D (optical flow) is presented. In the case of 2-D, the method exploits the properties of the complex number field to by-pass the eigenvalue analysis, improving the speed and the numerical stability of the method. The theory is verified by experiments which confirm accurate orientation estimates and reliable certainty measures in the presence of noise. The comparative results indicate that the proposed theory produces algorithms computing robust texture features as well as optical flow. The computations are highly parallelizable and can be used in realtime image analysis since they utilize only elementary functions in a closed form (up to dimension 4) and Cartesian separable convolutions.
  •  
6.
  • Bigun, Josef, 1961-, et al. (författare)
  • Recognition by symmetry derivatives and the generalized structure tensor
  • 2004
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - Los Alamitos, USA : IEEE Computer Society. - 0162-8828 .- 1939-3539. ; 26:12, s. 1590-1605
  • Tidskriftsartikel (refereegranskat)abstract
    • We suggest a set of complex differential operators that can be used to produce and filter dense orientation (tensor) fields for feature extraction, matching, and pattern recognition. We present results on the invariance properties of these operators, that we call symmetry derivatives. These show that, in contrast to ordinary derivatives, all orders of symmetry derivatives of Gaussians yield a remarkable invariance: they are obtained by replacing the original differential polynomial with the same polynomial, but using ordinary coordinates x and y corresponding to partial derivatives. Moreover, the symmetry derivatives of Gaussians are closed under the convolution operator and they are invariant to the Fourier transform. The equivalent of the structure tensor, representing and extracting orientations of curve patterns, had previously been shown to hold in harmonic coordinates in a nearly identical manner. As a result, positions, orientations, and certainties of intricate patterns, e.g., spirals, crosses, parabolic shapes, can be modeled by use of symmetry derivatives of Gaussians with greater analytical precision as well as computational efficiency. Since Gaussians and their derivatives are utilized extensively in image processing, the revealed properties have practical consequences for local orientation based feature extraction. The usefulness of these results is demonstrated by two applications:tracking cross markers in long image sequences from vehicle crash tests andalignment of noisy fingerprints.
  •  
7.
  • Björkman, Mårten, et al. (författare)
  • Real-time epipolar geometry estimation of binocular stereo heads
  • 2002
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : Institute of Electrical and Electronics Engineers (IEEE). - 0162-8828 .- 1939-3539. ; 24:3, s. 425-432
  • Tidskriftsartikel (refereegranskat)abstract
    • Stereo is an important cue for visually guided robots. While moving around in the world, such a robot can use dynamic fixation to overcome limitations in image resolution and field of view. In this paper, a binocular stereo system capable of dynamic fixation is presented. The external calibration is performed continuously taking temporal consistency into consideration, greatly simplifying the process. The essential matrix, which is estimated in real-time, is used to describe the epipolar geometry. It will be shown, how outliers can be identified and excluded from the calculations. An iterative approach based on a differential model of the optical flow, commonly used in structure from motion, is also presented and tested towards the essential matrix. The iterative method will be shown to be superior in terms of both computational speed and robustness, when the vergence angles are less than about 15degrees. For larger angles, the differential model is insufficient and the essential matrix is preferably used instead.
  •  
8.
  • Cao, Jiale, et al. (författare)
  • From Handcrafted to Deep Features for Pedestrian Detection : A Survey
  • 2022
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - New York : IEEE. - 0162-8828 .- 1939-3539. ; 44:9, s. 4913-4934
  • Tidskriftsartikel (refereegranskat)abstract
    • Pedestrian detection is an important but challenging problem in computer vision, especially in human-centric tasks. Over the past decade, significant improvement has been witnessed with the help of handcrafted features and deep features. Here we present a comprehensive survey on recent advances in pedestrian detection. First, we provide a detailed review of single-spectral pedestrian detection that includes handcrafted features based methods and deep features based approaches. For handcrafted features based methods, we present an extensive review of approaches and find that handcrafted features with large freedom degrees in shape and space have better performance. In the case of deep features based approaches, we split them into pure CNN based methods and those employing both handcrafted and CNN based features. We give the statistical analysis and tendency of these methods, where feature enhanced, part-aware, and post-processing methods have attracted main attention. In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection, which provides more robust features for illumination variance. Furthermore, we introduce some related datasets and evaluation metrics, and a deep experimental analysis. We conclude this survey by emphasizing open problems that need to be addressed and highlighting various future directions. Researchers can track an up-to-date list at https://github.com/JialeCao001/PedSurvey.
  •  
9.
  • Cao, Jiale, et al. (författare)
  • SipMaskv2: Enhanced Fast Image and Video Instance Segmentation
  • 2023
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE. - 0162-8828 .- 1939-3539 .- 2160-9292. ; 45:3, s. 3798-3812
  • Tidskriftsartikel (refereegranskat)abstract
    • We propose a fast single-stage method for both image and video instance segmentation, called SipMask, that preserves the instance spatial information by performing multiple sub-region mask predictions. The main module in our method is a light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for the sub-regions within a bounding-box, enabling a better delineation of spatially adjacent instances. To better correlate mask prediction with object detection, we further propose a mask alignment weighting loss and a feature alignment scheme. In addition, we identify two issues that impede the performance of single-stage instance segmentation and introduce two modules, including a sample selection scheme and an instance refinement module, to address these two issues. Experiments are performed on both image instance segmentation dataset MS COCO and video instance segmentation dataset YouTube-VIS. On MS COCO test-dev set, our method achieves a state-of-the-art performance. In terms of real-time capabilities, it outperforms YOLACT by a gain of 3.0% (mask AP) under the similar settings, while operating at a comparable speed. On YouTube-VIS validation set, our method also achieves promising results. The source code is available at https://github.com/JialeCao001/SipMask.
  •  
10.
  • Carreira, Joao, et al. (författare)
  • Free-Form Region Description with Second-Order Pooling
  • 2015
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539. ; 37:6, s. 1177-1189
  • Tidskriftsartikel (refereegranskat)abstract
    • Semantic segmentation and object detection are nowadays dominated by methods operating on regions obtained as a result of a bottom-up grouping process (segmentation) but use feature extractors developed for recognition on fixed-form (e.g. rectangular) patches, with full images as a special case. This is most likely suboptimal. In this paper we focus on feature extraction and description over free-form regions and study the relationship with their fixed-form counterparts. Our main contributions are novel pooling techniques that capture the second-order statistics of local descriptors inside such free-form regions. We introduce second-order generalizations of average and max-pooling that together with appropriate non-linearities, derived from the mathematical structure of their embedding space, lead to state-of-the-art recognition performance in semantic segmentation experiments without any type of local feature coding. In contrast, we show that codebook-based local feature coding is more important when feature extraction is constrained to operate over regions that include both foreground and large portions of the background, as typical in image classification settings, whereas for high-accuracy localization setups, second-order pooling over free-form regions produces results superior to those of the winning systems in the contemporary semantic segmentation challenges, with models that are much faster in both training and testing.
  •  
11.
  • Danelljan, Martin, 1989-, et al. (författare)
  • Discriminative Scale Space Tracking
  • 2017
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 39:8, s. 1561-1575
  • Tidskriftsartikel (refereegranskat)abstract
    • Accurate scale estimation of a target is a challenging research problem in visual object tracking. Most state-of-the-art methods employ an exhaustive scale search to estimate the target size. The exhaustive search strategy is computationally expensive and struggles when encountered with large scale variations. This paper investigates the problem of accurate and robust scale estimation in a tracking-by-detection framework. We propose a novel scale adaptive tracking approach by learning separate discriminative correlation filters for translation and scale estimation. The explicit scale filter is learned online using the target appearance sampled at a set of different scales. Contrary to standard approaches, our method directly learns the appearance change induced by variations in the target scale. Additionally, we investigate strategies to reduce the computational cost of our approach. Extensive experiments are performed on the OTB and the VOT2014 datasets. Compared to the standard exhaustive scale search, our approach achieves a gain of 2.5 percent in average overlap precision on the OTB dataset. Additionally, our method is computationally efficient, operating at a 50 percent higher frame rate compared to the exhaustive scale search. Our method obtains the top rank in performance by outperforming 19 state-of-the-art trackers on OTB and 37 state-of-the-art trackers on VOT2014.
  •  
12.
  • Demisse, G. G., et al. (författare)
  • Deformation Based Curved Shape Representation
  • 2018
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE Computer Society. - 0162-8828 .- 1939-3539. ; 40:6, s. 1338-1351
  • Tidskriftsartikel (refereegranskat)abstract
    • n this paper, we introduce a deformation based representation space for curved shapes in R-n. Given an ordered set of points sampled from a curved shape, the proposed method represents the set as an element of a finite dimensional matrix Lie group. Variation due to scale and location are filtered in a preprocessing stage, while shapes that vary only in rotation are identified by an equivalence relationship. The use of a finite dimensional matrix Lie group leads to a similarity metric with an explicit geodesic solution. Subsequently, we discuss some of the properties of the metric and its relationship with a deformation by least action. Furthermore, invariance to reparametrization or estimation of point correspondence between shapes is formulated as an estimation of sampling function. Thereafter, two possible approaches are presented to solve the point correspondence estimation problem. Finally, we propose an adaptation of k-means clustering for shape analysis in the proposed representation space. Experimental results show that the proposed representation is robust to uninformative cues, e.g., local shape perturbation and displacement. In comparison to state of the art methods, it achieves a high precision on the Swedish and the Flavia leaf datasets and a comparable result on MPEG-7, Kimia99 and Kimia216 datasets.
  •  
13.
  • Dombrowski, Ann Kathrin, et al. (författare)
  • Diffeomorphic Counterfactuals with Generative Models
  • 2024
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539 .- 0162-8828. ; 46:5, s. 3257-3274
  • Tidskriftsartikel (refereegranskat)abstract
    • Counterfactuals can explain classification decisions of neural networks in a human interpretable way. We propose a simple but effective method to generate such counterfactuals. More specifically, we perform a suitable diffeomorphic coordinate transformation and then perform gradient ascent in these coordinates to find counterfactuals which are classified with great confidence as a specified target class. We propose two methods to leverage generative models to construct such suitable coordinate systems that are either exactly or approximately diffeomorphic. We analyze the generation process theoretically using Riemannian differential geometry and validate the quality of the generated counterfactuals using various qualitative and quantitative measures.
  •  
14.
  • Duff, Timothy, et al. (författare)
  • PLMP : Point-Line Minimal Problems in Complete Multi-View Visibility
  • 2024
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : Institute of Electrical and Electronics Engineers (IEEE). - 0162-8828 .- 1939-3539. ; 46:1, s. 421-435
  • Tidskriftsartikel (refereegranskat)abstract
    • We present a complete classification of all minimal problems for generic arrangements of points and lines completely observed by calibrated perspective cameras. We show that there are only 30 minimal problems in total, no problems exist for more than 6 cameras, for more than 5 points, and for more than 6 lines. We present a sequence of tests for detecting minimality starting with counting degrees of freedom and ending with full symbolic and numeric verification of representative examples. For all minimal problems discovered, we present their algebraic degrees, i.e.the number of solutions, which measure their intrinsic difficulty. It shows how exactly the difficulty of problems grows with the number of views. Importantly, several new minimal problems have small degrees that might be practical in image matching and 3D reconstruction.
  •  
15.
  • Eldesokey, Abdelrahman, et al. (författare)
  • Confidence Propagation through CNNs for Guided Sparse Depth Regression
  • 2020
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE. - 0162-8828 .- 1939-3539. ; 42:10
  • Tidskriftsartikel (refereegranskat)abstract
    • Generally, convolutional neural networks (CNNs) process data on a regular grid, e.g. data generated by ordinary cameras. Designing CNNs for sparse and irregularly spaced input data is still an open research problem with numerous applications in autonomous driving, robotics, and surveillance. In this paper, we propose an algebraically-constrained normalized convolution layer for CNNs with highly sparse input that has a smaller number of network parameters compared to related work. We propose novel strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. We also propose an objective function that simultaneously minimizes the data error while maximizing the output confidence. To integrate structural information, we also investigate fusion strategies to combine depth and RGB information in our normalized convolution network framework. In addition, we introduce the use of output confidence as an auxiliary information to improve the results. The capabilities of our normalized convolution network framework are demonstrated for the problem of scene depth completion. Comprehensive experiments are performed on the KITTI-Depth and the NYU-Depth-v2 datasets. The results clearly demonstrate that the proposed approach achieves superior performance while requiring only about 1-5% of the number of parameters compared to the state-of-the-art methods.
  •  
16.
  • Eriksson, Anders, et al. (författare)
  • Rotation Averaging with the Chordal Distance: Global Minimizers and Strong Duality
  • 2021
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539 .- 0162-8828. ; 43:1, s. 256-268
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper we explore the role of duality principles within the problem of rotation averaging, a fundamental task in a wide range of applications. In its conventional form, rotation averaging is stated as a minimization over multiple rotation constraints. As these constraints are non-convex, this problem is generally considered challenging to solve globally. We show how to circumvent this difficulty through the use of Lagrangian duality. While such an approach is well-known it is normally not guaranteed to provide a tight relaxation. Based on spectral graph theory, we analytically prove that in many cases there is no duality gap unless the noise levels are severe. This allows us to obtain certifiably global solutions to a class of important non-convex problems in polynomial time. We also propose an efficient, scalable algorithm that outperforms general purpose numerical solvers by a large margin and compares favourably to current state-of-the-art. Further, our approach is able to handle the large problem instances commonly occurring in structure from motion settings and it is trivially parallelizable. Experiments are presented for a number of different instances of both synthetic and real-world data.
  •  
17.
  • Evain, S., et al. (författare)
  • A Lightweight Neural Network for Monocular View Generation with Occlusion Handling
  • 2021
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE Computer Society. - 0162-8828 .- 1939-3539. ; 43:6, s. 1832-1844
  • Tidskriftsartikel (refereegranskat)abstract
    • In this article, we present a very lightweight neural network architecture, trained on stereo data pairs, which performs view synthesis from one single image. With the growing success of multi-view formats, this problem is indeed increasingly relevant. The network returns a prediction built from disparity estimation, which fills in wrongly predicted regions using a occlusion handling technique. To do so, during training, the network learns to estimate the left-right consistency structural constraint on the pair of stereo input images, to be able to replicate it at test time from one single image. The method is built upon the idea of blending two predictions: a prediction based on disparity estimation and a prediction based on direct minimization in occluded regions. The network is also able to identify these occluded areas at training and at test time by checking the pixelwise left-right consistency of the produced disparity maps. At test time, the approach can thus generate a left-side and a right-side view from one input image, as well as a depth map and a pixelwise confidence measure in the prediction. The work outperforms visually and metric-wise state-of-the-art approaches on the challenging KITTI dataset, all while reducing by a very significant order of magnitude (5 or 10 times) the required number of parameters (6.5 M). © 1979-2012 IEEE.
  •  
18.
  • Felsberg, Michael, et al. (författare)
  • Channel smoothing : Efficient robust smoothing of low-level signal features
  • 2006
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 0162-8828 .- 1939-3539. ; 28:2, s. 209-222
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we present a new and efficient method to implement robust smoothing of low-level signal features: B-spline channel smoothing. This method consists of three steps: encoding of the signal features into channels, averaging of the channels, and decoding of the channels. We show that linear smoothing of channels is equivalent to robust smoothing of the signal features if we make use of quadratic B-splines to generate the channels. The linear decoding from B-spline channels allows the derivation of a robust error norm, which is very similar to Tukey's biweight error norm. We compare channel smoothing with three other robust smoothing techniques: nonlinear diffusion, bilateral filtering, and mean-shift filtering, both theoretically and on a 2D orientation-data smoothing task. Channel smoothing is found to be superior in four respects: It has a lower computational complexity, it is easy to implement, it chooses the global minimum error instead of the nearest local minimum, and it can also be used on nonlinear spaces, such as orientation space. © 2006 IEEE.
  •  
19.
  • Felsberg, Michael, 1974-, et al. (författare)
  • Online Learning of Correspondences between Images
  • 2013
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE Computer Society. - 0162-8828 .- 1939-3539. ; 35:1, s. 118-129
  • Tidskriftsartikel (refereegranskat)abstract
    • We propose a novel method for iterative learning of point correspondences between image sequences. Points moving on surfaces in 3D space are projected into two images. Given a point in either view, the considered problem is to determine the corresponding location in the other view. The geometry and distortions of the projections are unknown as is the shape of the surface. Given several pairs of point-sets but no access to the 3D scene, correspondence mappings can be found by excessive global optimization or by the fundamental matrix if a perspective projective model is assumed. However, an iterative solution on sequences of point-set pairs with general imaging geometry is preferable. We derive such a method that optimizes the mapping based on Neyman's chi-square divergence between the densities representing the uncertainties of the estimated and the actual locations. The densities are represented as channel vectors computed with a basis function approach. The mapping between these vectors is updated with each new pair of images such that fast convergence and high accuracy are achieved. The resulting algorithm runs in real-time and is superior to state-of-the-art methods in terms of convergence and accuracy in a number of experiments.
  •  
20.
  • Fukui, Kazuhiro, et al. (författare)
  • Difference subspace and its generalization for subspace-based methods
  • 2015
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 0162-8828 .- 1939-3539. ; 37:11, s. 2164-2177
  • Tidskriftsartikel (refereegranskat)abstract
    • Subspace-based methods are known to provide a practical solution for image set-based object recognition. Based on the insight that local shape differences between objects offer a sensitive cue for recognition, this paper addresses the problem of extracting a subspace representing the difference components between class subspaces generated from each set of object images independently of each other. We first introduce the difference subspace (DS), a novel geometric concept between two subspaces as an extension of a difference vector between two vectors, and describe its effectiveness in analyzing shape differences. We then generalize it to the generalized difference subspace (GDS) for multi-class subspaces, and show the benefit of applying this to subspace and mutual subspace methods, in terms of recognition capability. Furthermore, we extend these methods to kernel DS (KDS) and kernel GDS (KGDS) by a nonlinear kernel mapping to deal with cases involving larger changes in viewing direction. In summary, the contributions of this paper are as follows: 1) a DS/KDS between two class subspaces characterizes shape differences between the two respectively corresponding objects, 2) the projection of an input vector onto a DS/KDS realizes selective visualization of shape differences between objects, and 3) the projection of an input vector or subspace onto a GDS/KGDS is extremely effective at extracting differences between multiple subspaces, and therefore improves object recognition performance. We demonstrate validity through shape analysis on synthetic and real images of 3D objects as well as extensive comparison of performance on classification tests with several related methods; we study the performance in face image classification on the Yale face database B+ and the CMU Multi-PIE database, and hand shape classification of multi-view images.
  •  
21.
  • Fukui, Kazuhiro, et al. (författare)
  • Discriminant feature extraction by generalized difference subspace
  • 2023
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : Institute of Electrical and Electronics Engineers (IEEE). - 0162-8828 .- 1939-3539. ; 45:2, s. 1618-1635
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper reveals the discriminant ability of the orthogonal projection of data onto a generalized difference subspace (GDS) both theoretically and experimentally. In our previous work, we have demonstrated that GDS projection works as the quasi-orthogonalization of class subspaces. Interestingly, GDS projection also works as a discriminant feature extraction through a similar mechanism to the Fisher discriminant analysis (FDA). A direct proof of the connection between GDS projection and FDA is difficult due to the significant difference in their formulations. To avoid the difficulty, we first introduce geometrical Fisher discriminant analysis (gFDA) based on a simplified Fisher criterion. gFDA can work stably even under few samples, bypassing the small sample size (SSS) problem of FDA. Next, we prove that gFDA is equivalent to GDS projection with a small correction term. This equivalence ensures GDS projection to inherit the discriminant ability from FDA via gFDA. Furthermore, we discuss two useful extensions of these methods, 1) nonlinear extension by kernel trick, 2) the combination of convolutional neural network (CNN) features. The equivalence and the effectiveness of the extensions have been verified through extensive experiments on the extended Yale B+, CMU face database, ALOI, ETH80, MNIST and CIFAR10, focusing on the SSS problem. IEEE
  •  
22.
  • Gallego, Guillermo, et al. (författare)
  • Event-Based Vision : A Survey
  • 2022
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : Institute of Electrical and Electronics Engineers (IEEE). - 0162-8828 .- 1939-3539. ; 44:1, s. 154-180
  • Tidskriftsartikel (refereegranskat)abstract
    • Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of mu s), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
  •  
23.
  • Gupta, Akshita, et al. (författare)
  • Generative Multi-Label Zero-Shot Learning
  • 2023
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 45:12, s. 14611-14624
  • Tidskriftsartikel (refereegranskat)abstract
    • Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training. The test samples can additionally contain seen categories in the generalized variant. Existing approaches rely on learning either shared or label-specific attention from the seen classes. Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge. In contrast, state-of-the-art single-label generative adversarial network (GAN) based approaches learn to directly synthesize the class-specific visual features from the corresponding class attribute embeddings. However, synthesizing multi-label features from GANs is still unexplored in the context of zero-shot setting. When multiple objects occur jointly in a single image, a critical question is how to effectively fuse multi-class information. In this work, we introduce different fusion approaches at the attribute-level, feature-level and cross-level (across attribute and feature-levels) for synthesizing multi-label features from their corresponding multi-label class embeddings. To the best of our knowledge, our work is the first to tackle the problem of multi-label feature synthesis in the (generalized) zero-shot setting. Our cross-level fusion-based generative approach outperforms the state-of-the-art on three zero-shot benchmarks: NUS-WIDE, Open Images and MS COCO. Furthermore, we show the generalization capabilities of our fusion approach in the zero-shot detection task on MS COCO, achieving favorable performance against existing methods.
  •  
24.
  • Henter, Gustav Eje, et al. (författare)
  • Minimum entropy rate simplification of stochastic processes
  • 2016
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE. - 0162-8828 .- 1939-3539. ; 38:12, s. 2487-2500
  • Tidskriftsartikel (refereegranskat)abstract
    • We propose minimum entropy rate simplification (MERS), an information-theoretic, parameterization-independent framework for simplifying generative models of stochastic processes. Applications include improving model quality for sampling tasks by concentrating the probability mass on the most characteristic and accurately described behaviors while de-emphasizing the tails, and obtaining clean models from corrupted data (nonparametric denoising). This is the opposite of the smoothing step commonly applied to classification models. Drawing on rate-distortion theory, MERS seeks the minimum entropy-rate process under a constraint on the dissimilarity between the original and simplified processes. We particularly investigate the Kullback-Leibler divergence rate as a dissimilarity measure, where, compatible with our assumption that the starting model is disturbed or inaccurate, the simplification rather than the starting model is used for the reference distribution of the divergence. This leads to analytic solutions for stationary and ergodic Gaussian processes and Markov chains. The same formulas are also valid for maximum-entropy smoothing under the same divergence constraint. In experiments, MERS successfully simplifies and denoises models from audio, text, speech, and meteorology.
  •  
25.
  • Ionescu, Catalin, et al. (författare)
  • Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments.
  • 2014
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539. ; 36:7, s. 1325-1339
  • Tidskriftsartikel (refereegranskat)abstract
    • We introduce a new dataset, Human3.6M, of 3.6 Million 3D Human poses, acquired by recording the performance of 11 subjects, under 4 different viewpoints, for training realistic human sensing systems and for evaluating the next generation of human pose estimation models. Besides increasing the size the current state of the art datasets by several orders of magnitude, we aim to complement such datasets with a diverse set of poses encountered in typical human activities (taking photos, posing, greeting, eating, etc.), with synchronized image, motion capture and depth data, and with accurate 3D body scans of all subjects involved. We also provide mixed reality videos where 3D human models are animated using motion capture data and inserted using correct 3D geometry, in complex real environments, viewed with moving cameras, and under occlusion. Finally, we provide large scale statistical models and detailed evaluation baselines for the dataset illustrating its diversity and the scope for improvement by future work in the research community. The dataset and code for the associated large-scale learning models, features, visualization tools, as well as the evaluation server, are available online at http://vision.imar.ro/human3.6m.
  •  
26.
  • Ismaeil, K. Al, et al. (författare)
  • Real-Time Enhancement of Dynamic Depth Videos with Non-Rigid Deformations
  • 2017
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE Computer Society. - 0162-8828 .- 1939-3539. ; 39:10, s. 2045-2059
  • Tidskriftsartikel (refereegranskat)abstract
    • We propose a novel approach for enhancing depth videos containing non-rigidly deforming objects. Depth sensors are capable of capturing depth maps in real-time but suffer from high noise levels and low spatial resolutions. While solutions for reconstructing 3D details in static scenes, or scenes with rigid global motions have been recently proposed, handling unconstrained non-rigid deformations in relative complex scenes remains a challenge. Our solution consists in a recursive dynamic multi-frame super-resolution algorithm where the relative local 3D motions between consecutive frames are directly accounted for. We rely on the assumption that these 3D motions can be decoupled into lateral motions and radial displacements. This allows to perform a simple local per-pixel tracking where both depth measurements and deformations are dynamically optimized. The geometric smoothness is subsequently added using a multi-level L1 minimization with a bilateral total variation regularization. The performance of this method is thoroughly evaluated on both real and synthetic data. As compared to alternative approaches, the results show a clear improvement in reconstruction accuracy and in robustness to noise, to relative large non-rigid deformations, and to topological changes. Moreover, the proposed approach, implemented on a CPU, is shown to be computationally efficient and working in real-time. 
  •  
27.
  • Jalil, Taghia, et al. (författare)
  • Variational Inference for Watson Mixture Model
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 0162-8828 .- 1939-3539.
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • This paper addresses modelling data using the multivariate Watson distributions. The Watson distribution is one of thesimplest distributions for analyzing axially symmetric data. This distribution has gained some attention in recent years due to itsmodeling capability. However, its Bayesian inference is fairly understudied due to difficulty in handling the normalization factor. Recentdevelopment of Monte-Carlo Markov chain (MCMC) sampling methods can be applied for this purpose. However, these methods canbe prohibitively slow for practical applications. A deterministic alternative is provided by variational methods that convert inferenceproblems into optimization problems. In this paper, we present a variational inference for Watson mixture model. First, the variationalframework is used to side-step the intractability arising from the coupling of latent states and parameters. Second, the variational freeenergy is further lower bounded in order to avoid intractable moment computation. The proposed approach provides a lower bound onthe log marginal likelihood and retains distributional information over all parameters. Moreover, we show that it can regulate its owncomplexity by pruning unnecessary mixture components while avoiding over-fitting. We discuss potential applications of the modelingwith Watson distributions in the problem of blind source separation, and clustering gene expression data sets.
  •  
28.
  • Javed, Sajid, et al. (författare)
  • Visual Object Tracking With Discriminative Filters and Siamese Networks: A Survey and Outlook
  • 2023
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 45:5, s. 6552-6574
  • Tidskriftsartikel (refereegranskat)abstract
    • Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating tracking paradigms, which have led to significant progress. Following the rapid evolution of visual object tracking in the last decade, this survey presents a systematic and thorough review of more than 90 DCFs and Siamese trackers, based on results in nine tracking benchmarks. First, we present the background theory of both the DCF and Siamese tracking core formulations. Then, we distinguish and comprehensively review the shared as well as specific open research challenges in both these tracking paradigms. Furthermore, we thoroughly analyze the performance of DCF and Siamese trackers on nine benchmarks, covering different experimental aspects of visual tracking: datasets, evaluation metrics, performance, and speed comparisons. We finish the survey by presenting recommendations and suggestions for distinguished open challenges based on our analysis.
  •  
29.
  • Jiang, Deyang, et al. (författare)
  • Ring and Radius Sampling Based Phasor Field Diffraction Algorithm for Non-Line-of-Sight Reconstruction
  • 2021
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : Institute of Electrical and Electronics Engineers (IEEE). - 0162-8828 .- 1939-3539. ; , s. 1-1
  • Tidskriftsartikel (refereegranskat)abstract
    • Non-Line-of-Sight (NLOS) imaging reconstructs occluded scenes based on indirect diffuse reflections. The computational complexity and memory consumption of existing NLOS reconstruction algorithms make them challenging to be implemented in real-time. This paper presents a fast and memory-efficient phasor field-diffraction-based NLOS reconstruction algorithm. In the proposed algorithm, the radial property of the Rayleigh Sommerfeld diffraction (RSD) kernels along with the linear property of Fourier transform are utilized to reconstruct the Fourier domain representations of RSD kernels using a set of kernel bases. Moreover, memory consumption is further reduced by sampling the kernel bases in a radius direction and constructing them during the run-time. According to the analysis, the memory efficiency can be improved by as much as 220x. Experimental results show that compared with the original RSD algorithm, the reconstruction time of the proposed algorithm is significantly reduced with little impact on the final imaging quality.
  •  
30.
  • Johnsson, Kerstin, et al. (författare)
  • Low Bias Local Intrinsic Dimension Estimation from Expected Simplex Skewness
  • 2015
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539. ; 37:1, s. 196-202
  • Tidskriftsartikel (refereegranskat)abstract
    • In exploratory high-dimensional data analysis, local intrinsic dimension estimation can sometimes be used in order to discriminate between data sets sampled from different low-dimensional structures. Global intrinsic dimension estimators can in many cases be adapted to local estimation, but this leads to problems with high negative bias or high variance. We introduce a method that exploits the curse/blessing of dimensionality and produces local intrinsic dimension estimators that have very low bias, even in cases where the intrinsic dimension is higher than the number of data points, in combination with relatively low variance. We show that our estimators have a very good ability to classify local data sets by their dimension compared to other local intrinsic dimension estimators; furthermore we provide examples showing the usefulness of local intrinsic dimension estimation in general and our method in particular for stratification of real data sets.
  •  
31.
  • Joseph, K. J., et al. (författare)
  • Incremental Object Detection via Meta-Learning
  • 2022
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 44:12, s. 9209-9216
  • Tidskriftsartikel (refereegranskat)abstract
    • In a real-world setting, object instances from new classes can be continuously encountered by object detectors. When existing object detectors are applied to such scenarios, their performance on old classes deteriorates significantly. A few efforts have been reported to address this limitation, all of which apply variants of knowledge distillation to avoid catastrophic forgetting. We note that although distillation helps to retain previous learning, it obstructs fast adaptability to new tasks, which is a critical requirement for incremental learning. In this pursuit, we propose a meta-learning approach that learns to reshape model gradients, such that information across incremental tasks is optimally shared. This ensures a seamless information transfer via a meta-learned gradient preconditioning that minimizes forgetting and maximizes knowledge transfer. In comparison to existing meta-learning methods, our approach is task-agnostic, allows incremental addition of new-classes and scales to high-capacity models for object detection. We evaluate our approach on a variety of incremental learning settings defined on PASCAL-VOC and MS COCO datasets, where our approach performs favourably well against state-of-the-art methods. Code and trained models: https://github.com/JosephKJ/iOD.
  •  
32.
  • Kahl, Fredrik, et al. (författare)
  • Multiple View Geometry Under the L-infinity Norm
  • 2008
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539. ; 30:9, s. 1603-1617
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a new framework for solving geometric structure and motion problems based on the L-infinity-norm. Instead of using the common sum-of-squares cost function, that is, the L-2-norm, the model-fitting errors are measured using the L-infinity-norm. Unlike traditional methods based on L-2, our framework allows for the efficient computation of global estimates. We show that a variety of structure and motion problems, for example, triangulation, camera resectioning, and homography estimation, can be recast as quasi-convex optimization problems within this framework. These problems can be efficiently solved using second-order cone programming (SOCP), which is a standard technique in convex optimization. The methods have been implemented in Matlab and the resulting toolbox has been made publicly available. The algorithms have been validated on real data in different settings on problems with small and large dimensions and with excellent performance.
  •  
33.
  • Khan, Salman, et al. (författare)
  • Guest Editorial Introduction to the Special Section on Transformer Models in Vision
  • 2023
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 45:11, s. 12721-12725
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • Transformer models have achieved outstanding results on a variety of language tasks, such as text classification, ma- chine translation, and question answering. This success in the field of Natural Language Processing (NLP) has sparked interest in the computer vision community to apply these models to vision and multi-modal learning tasks. However, visual data has a unique structure, requiring the need to rethink network designs and training methods. As a result, Transformer models and their variations have been suc- cessfully used for image recognition, object detection, seg- mentation, image super-resolution, video understanding, image generation, text-image synthesis, and visual question answering, among other applications.
  •  
34.
  •  
35.
  • Le, Huu, 1988, et al. (författare)
  • Deterministic Approximate Methods for Maximum Consensus Robust Fitting
  • 2021
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539 .- 0162-8828. ; 43:3, s. 842-857
  • Tidskriftsartikel (refereegranskat)abstract
    • Maximum consensus estimation plays a critically important role in several robust fitting problems in computer vision. Currently, the most prevalent algorithms for consensus maximization draw from the class of randomized hypothesize-and-verify algorithms, which are cheap but can usually deliver only rough approximate solutions. On the other extreme, there are exact algorithms which are exhaustive search in nature and can be costly for practical-sized inputs. This paper fills the gap between the two extremes by proposing deterministic algorithms to approximately optimize the maximum consensus criterion. Our work begins by reformulating consensus maximization with linear complementarity constraints. Then, we develop two novel algorithms: one based on non-smooth penalty method with a Frank-Wolfe style optimization scheme, the other based on the Alternating Direction Method of Multipliers (ADMM). Both algorithms solve convex subproblems to efficiently perform the optimization. We demonstrate the capability of our algorithms to greatly improve a rough initial estimate, such as those obtained using least squares or a randomized algorithm. Compared to the exact algorithms, our approach is much more practical on realistic input sizes. Further, our approach is naturally applicable to estimation problems with geometric residuals. Matlab code and demo program for our methods can be downloaded from https://goo.gl/FQcxpi.
  •  
36.
  • Leordeanu, Marius, et al. (författare)
  • Generalized Boundaries from Multiple Image Interpretations
  • 2014
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539. ; 36:7, s. 1312-1324
  • Tidskriftsartikel (refereegranskat)abstract
    • Boundary detection is a fundamental computer vision problem that is essential for a variety of tasks, such as contour and region segmentation, symmetry detection and object recognition and categorization. We propose a generalized formulation for boundary detection, with closed-form solution, applicable to the localization of different types of boundaries, such as object edges in natural images and occlusion boundaries from video. Our generalized boundary detection method (Gb) simultaneously combines low-level and mid-level image representations in a single eigenvalue problem and solves for the optimal continuous boundary orientation and strength. The closed-form solution to boundary detection enables our algorithm to achieve state-of-the-art results at a significantly lower computational cost than current methods. We also propose two complementary novel components that can seamlessly be combined with Gb: first, we introduce a soft-segmentation procedure that provides region input layers to our boundary detection algorithm for a significant improvement in accuracy, at negligible computational cost; second, we present an efficient method for contour grouping and reasoning, which when applied as a final post-processing stage, further increases the boundary detection performance.
  •  
37.
  • Li, Long, et al. (författare)
  • Robust Perception and Precise Segmentation for Scribble-Supervised RGB-D Saliency Detection
  • 2024
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 46:1, s. 479-496
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper proposes a scribble-based weakly supervised RGB-D salient object detection (SOD) method to relieve the annotation burden from pixel-wise annotations. In view of the ensuing performance drop, we summarize two natural deficiencies of the scribbles and try to alleviate them, which are the weak richness of the pixel training samples (WRPS) and the poor structural integrity of the salient objects (PSIO). WRPS hinders robust saliency perception learning, which can be alleviated via model design for robust feature learning and pseudo labels generation for training sample enrichment. Specifically, we first design a dynamic searching process module as a meta operation to conduct multi-scale and multi-modal feature fusion for the robust RGB-D SOD model construction. Then, a dual-branch consistency learning mechanism is proposed to generate enough pixel training samples for robust saliency perception learning. PSIO makes direct structural learning infeasible since scribbles can not provide integral structural supervision. Thus, we propose an edge-region structure-refinement loss to recover the structural information and make precise segmentation. We deploy all components and conduct ablation studies on two baselines to validate their effectiveness and generalizability. Experimental results on eight datasets show that our method outperforms other scribble-based SOD models and achieves comparable performance with fully supervised state-of-the-art methods.
  •  
38.
  • Liao, Qianfang, 1983-, et al. (författare)
  • Point Set Registration for 3D Range Scans Using Fuzzy Cluster-based Metric and Efficient Global Optimization
  • 2021
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE. - 0162-8828 .- 1939-3539. ; 43:9, s. 3229-3246
  • Tidskriftsartikel (refereegranskat)abstract
    • This study presents a new point set registration method to align 3D range scans. In our method, fuzzy clusters are utilized to represent a scan, and the registration of two given scans is realized by minimizing a fuzzy weighted sum of the distances between their fuzzy cluster centers. This fuzzy cluster-based metric has a broad basin of convergence and is robust to noise. Moreover, this metric provides analytic gradients, allowing standard gradient-based algorithms to be applied for optimization. Based on this metric, the outlier issues are addressed. In addition, for the first time in rigid point set registration, a registration quality assessment in the absence of ground truth is provided. Furthermore, given specified rotation and translation spaces, we derive the upper and lower bounds of the fuzzy cluster-based metric and develop a branch-and-bound (BnB)-based optimization scheme, which can globally minimize the metric regardless of the initialization. This optimization scheme is performed in an efficient coarse-to-fine fashion: First, fuzzy clustering is applied to describe each of the two given scans by a small number of fuzzy clusters. Then, a global search, which integrates BnB and gradient-based algorithms, is implemented to achieve a coarse alignment for the two scans. During the global search, the registration quality assessment offers a beneficial stop criterion to detect whether a good result is obtained. Afterwards, a relatively large number of points of the two scans are directly taken as the fuzzy cluster centers, and then, the coarse solution is refined to be an exact alignment using the gradient-based local convergence. Compared to existing counterparts, this optimization scheme makes a large improvementin terms of robustness and efficiency by virtue of the fuzzy cluster-based metric and the registration quality assessment. In the experiments, the registration results of several 3D range scan pairs demonstrate the accuracy and effectiveness of the proposed method, as well as its superiority to state-of-the-art registration approaches.
  •  
39.
  • Lindeberg, Tony, 1964- (författare)
  • Scale-space for discrete signals
  • 1990
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE Computer Society. - 0162-8828 .- 1939-3539. ; 12:3, s. 234-254
  • Tidskriftsartikel (refereegranskat)abstract
    • This article addresses the formulation of a scale-space theory for discrete signals. In one dimension it is possible to characterize the smoothing transformations completely and an exhaustive treatment is given, answering the following two main questions:Which linear transformations remove structure in the sense that the number of local extrema (or zero-crossings) in the output signal does not exceed the number of local extrema (or zero-crossings) in the original signal?How should one create a multi-resolution family of representations with the property that a signal at a coarser level of scale never contains more structure than a signal at a finer level of scale?It is proposed that there is only one reasonable way to define a scale-space for 1D discrete signals comprising a continuous scale parameter, namely by (discrete) convolution with the family of kernels T(n; t) = e^{-t} I_n(t), where I_n are the modified Bessel functions of integer order. Similar arguments applied in the continuous case uniquely lead to the Gaussian kernel.Some obvious discretizations of the continuous scale-space theory are discussed in view of the results presented. It is shown that the kernel T(n; t) arises naturally in the solution of a discretized version of the diffusion equation. The commonly adapted technique with a sampled Gaussian can lead to undesirable effects since scale-space violations might occur in the corresponding representation. The result exemplifies the fact that properties derived in the continuous case might be violated after discretization.A two-dimensional theory, showing how the scale-space should be constructed for images, is given based on the requirement that local extrema must not be enhanced, when the scale parameter is increased continuously. In the separable case the resulting scale-space representation can be calculated by separated convolution with the kernel T(n; t).The presented discrete theory has computational advantages compared to a scale-space implementation based on the sampled Gaussian, for instance concerning the Laplacian of the Gaussian. The main reason is that the discrete nature of the implementation has been taken into account already in the theoretical formulation of the scale-space representation.
  •  
40.
  • Liu, Jun, et al. (författare)
  • Feature Boosting Network for 3D Pose Estimation
  • 2020
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539 .- 0162-8828. ; 42:2, s. 494-501
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, a feature boosting network is proposed for estimating 3D hand pose and 3D body pose from a single RGB image. In this method, the features learned by the convolutional layers are boosted with a new long short-term dependence-aware (LSTD) module, which enables the intermediate convolutional feature maps to perceive the graphical long short-term dependency among different hand (or body) parts using the designed Graphical ConvLSTM. Learning a set of features that are reliable and discriminatively representative of the pose of a hand (or body) part is difficult due to the ambiguities, texture and illumination variation, and self-occlusion in the real application of 3D pose estimation. To improve the reliability of the features for representing each body part and enhance the LSTD module, we further introduce a context consistency gate (CCG) in this paper, with which the convolutional feature maps are modulated according to their consistency with the context representations. We evaluate the proposed method on challenging benchmark datasets for 3D hand pose estimation and 3D full body pose estimation. Experimental results show the effectiveness of our method that achieves state-of-the-art performance on both of the tasks.
  •  
41.
  • Liu, Jun, et al. (författare)
  • NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
  • 2020
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539 .- 0162-8828. ; 42:10, s. 2684-2701
  • Tidskriftsartikel (refereegranskat)abstract
    • Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding.
  •  
42.
  • Liu, Jun, et al. (författare)
  • Skeleton-Based Online Action Prediction Using Scale Selection Network
  • 2020
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539 .- 0162-8828. ; 42:6, s. 1453-1467
  • Tidskriftsartikel (refereegranskat)abstract
    • Action prediction is to recognize the class label of an ongoing activity when only a part of it is observed. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the temporal axis. Since there are significant temporal scale variations in the observed part of the ongoing action at different time steps, a novel window scale selection method is proposed to make our network focus on the performed part of the ongoing action and try to suppress the possible incoming interference from the previous actions at each step. An activation sharing scheme is also proposed to handle the overlapping computations among the adjacent time steps, which enables our framework to run more efficiently. Moreover, to enhance the performance of our framework for action prediction with the skeletal input data, a hierarchy of dilated tree convolutions are also designed to learn the multi-level structured semantic representations over the skeleton joints at each frame. Our proposed approach is evaluated on four challenging datasets. The extensive experiments demonstrate the effectiveness of our method for skeleton-based online action prediction.
  •  
43.
  • Ma, Zhanyu, et al. (författare)
  • Bayesian Estimation of Beta Mixture Models with Variational Inference
  • 2011
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 0162-8828 .- 1939-3539. ; 33:11, s. 2160-2173
  • Tidskriftsartikel (refereegranskat)abstract
    • Bayesian estimation of the parameters in beta mixture models (BMM) is analytically intractable. The numerical solutionsto simulate the posterior distribution are available, but incur high computational cost. In this paper, we introduce an approximation tothe prior/posterior distribution of the parameters in the beta distribution and propose an analytically tractable (closed-form) Bayesianapproach to the parameter estimation. The approach is based on the variational inference (VI) framework. Following the principles ofthe VI framework and utilizing the relative convexity bound, the extended factorized approximation method is applied to approximate thedistribution of the parameters in BMM. In a fully Bayesian model where all the parameters of the BMM are considered as variables andassigned proper distributions, our approach can asymptotically find the optimal estimate of the parameters posterior distribution. Also,the model complexity can be determined based on the data. The closed-form solution is proposed so that no iterative numericalcalculation is required. Meanwhile, our approach avoids the drawback of overfitting in the conventional expectation maximizationalgorithm. The good performance of this approach is verified by experiments with both synthetic and real data.
  •  
44.
  • Ma, Zhanyu, et al. (författare)
  • Variational Bayesian Matrix Factorization for Bounded Support Data
  • 2015
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 0162-8828 .- 1939-3539. ; 37:4, s. 876-889
  • Tidskriftsartikel (refereegranskat)abstract
    • A novel Bayesian matrix factorization method for bounded support data is presented. Each entry in the observation matrix is assumed to be beta distributed. As the beta distribution has two parameters, two parameter matrices can be obtained, which matrices contain only nonnegative values. In order to provide low-rank matrix factorization, the nonnegative matrix factorization (NMF) technique is applied. Furthermore, each entry in the factorized matrices, i.e., the basis and excitation matrices, is assigned with gamma prior. Therefore, we name this method as beta-gamma NMF (BG-NMF). Due to the integral expression of the gamma function, estimation of the posterior distribution in the BG-NMF model can not be presented by an analytically tractable solution. With the variational inference framework and the relative convexity property of the log-inverse-beta function, we propose a new lower-bound to approximate the objective function. With this new lower-bound, we derive an analytically tractable solution to approximately calculate the posterior distributions. Each of the approximated posterior distributions is also gamma distributed, which retains the conjugacy of the Bayesian estimation. In addition, a sparse BG-NMF can be obtained by including a sparseness constraint to the gamma prior. Evaluations with synthetic data and real life data demonstrate the good performance of the proposed method.
  •  
45.
  • Madan, Neelu, et al. (författare)
  • Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection
  • 2024
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 46:1, s. 525-542
  • Tidskriftsartikel (refereegranskat)abstract
    • Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically framed as a one-class classification task, where the learning is conducted on normal examples only. An entire family of successful anomaly detection methods is based on learning to reconstruct masked normal inputs (e.g. patches, future frames, etc.) and exerting the magnitude of the reconstruction error as an indicator for the abnormality level. Unlike other reconstruction-based methods, we present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level. The proposed self-supervised block is extremely flexible, enabling information masking at any layer of a neural network and being compatible with a wide range of neural architectures. In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss. Furthermore, we show that our block is applicable to a wider variety of tasks, adding anomaly detection in medical images and thermal videos to the previously considered tasks based on RGB images and surveillance videos. We exhibit the generality and flexibility of SSMCTB by integrating it into multiple state-of-the-art neural models for anomaly detection, bringing forth empirical results that confirm considerable performance improvements on five benchmarks: MVTec AD, BRATS, Avenue, ShanghaiTech, and Thermal Rare Event.
  •  
46.
  • Maki, Atsuto, et al. (författare)
  • In Memoriam : Jan-Olof Eklundh
  • 2022
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 44:9, s. 4488-4489
  • Tidskriftsartikel (refereegranskat)
  •  
47.
  • Mathe, Stefan, et al. (författare)
  • Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition
  • 2015
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539. ; 37:7, s. 1408-1424
  • Tidskriftsartikel (refereegranskat)abstract
    • Systems based on bag-of-words models from image features collected at maxima of sparse interest point operators have been used successfully for both computer visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in 'saccade and fixate' regimes, the methodology and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement existing state-of-the art large scale dynamic computer vision annotated datasets like Hollywood-2 [1] and UCF Sports [2] with human eye movements collected under the ecological constraints of visual action and scene context recognition tasks. To our knowledge these are the first large human eye tracking datasets to be collected and made publicly available for video, vision. imar. ro/eyetracking (497,107 frames, each viewed by 19 subjects), unique in terms of their (a) large scale and computer vision relevance, (b) dynamic, video stimuli, (c) task control, as well as free-viewing. Second, we introduce novel dynamic consistency and alignment measures, which underline the remarkable stability of patterns of visual search among subjects. Third, we leverage the significant amount of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatio-temporal interest point image sampling strategies and the human fixations, as well as their impact for visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end automatic system, leveraging some of the advanced computer vision practice, can lead to state of the art results.
  •  
48.
  • Moreno, Rodrigo, et al. (författare)
  • On improving the efficiency of tensor voting
  • 2011
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - Washington, DC, USA : IEEE Computer Society. - 0162-8828 .- 1939-3539. ; 33:11, s. 2215-2228
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper proposes two alternative formulations to reduce the high computational complexity of tensor voting, a robust perceptual grouping technique used to extract salient information from noisy data. The first scheme consists of numerical approximations of the votes, which have been derived from an in-depth analysis of the plate and ball voting processes. The second scheme simplifies the formulation while keeping the same perceptual meaning of the original tensor voting: The stick tensor voting and the stick component of the plate tensor voting must reinforce surfaceness, the plate components of both the plate and ball tensor voting must boost curveness, whereas junctionness must be strengthened by the ball component of the ball tensor voting. Two new parameters have been proposed for the second formulation in order to control the potentially conflictive influence of the stick component of the plate vote and the ball component of the ball vote. Results show that the proposed formulations can be used inapplications where efficiency is an issue since they have a complexity of order O(1). Moreover, the second proposed formulation has been shown to be more appropriate than the original tensor voting for estimating saliencies by appropriately setting the two new parameters.
  •  
49.
  • Naseer, Muzammal, et al. (författare)
  • Stylized Adversarial Defense
  • 2023
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 45:5, s. 6403-6414
  • Tidskriftsartikel (refereegranskat)abstract
    • Deep Convolution Neural Networks (CNNs) can easily be fooled by subtle, imperceptible changes to the input images. To address this vulnerability, adversarial training creates perturbation patterns and includes them in the training set to robustify the model. In contrast to existing adversarial training methods that only use class-boundary information (e.g., using a cross-entropy loss), we propose to exploit additional information from the feature space to craft stronger adversaries that are in turn used to learn a robust model. Specifically, we use the style and content information of the target sample from another class, alongside its class-boundary information to create adversarial perturbations. We apply our proposed multi-task objective in a deeply supervised manner, extracting multi-scale feature knowledge to create maximally separating adversaries. Subsequently, we propose a max-margin adversarial training approach that minimizes the distance between source image and its adversary and maximizes the distance between the adversary and the target image. Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses, generalizes well to naturally occurring corruptions and data distributional shifts, and retains the models accuracy on clean examples.
  •  
50.
  • Nasihatkon, Seyed Behrooz, 1983, et al. (författare)
  • Multiresolution Search of the Rigid Motion Space for Intensity-Based Registration
  • 2018
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539 .- 0162-8828. ; 40:1, s. 179-191
  • Tidskriftsartikel (refereegranskat)abstract
    • We study the relation between the correlation-based target functions of low-resolution and high-resolution intensity-based registration for the class of rigid transformations. Our results show that low-resolution target values can tightly bound the high-resolution target function in natural images. This can help with analyzing and better understanding the process of multiresolution image registration. It also gives a guideline for designing multiresolution algorithms in which the search space in higher resolution registration is restricted given the fitness values for lower resolution image pairs. To demonstrate this, we incorporate our multiresolution technique into a Lipschitz global optimization framework. We show that using the multiresolution scheme can result in large gains in the efficiency of such algorithms. The method is evaluated by applying to the problems of 2D registration, 3D rotation search, and the detection of reflective symmetry in 2D and 3D images.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 77
Typ av publikation
tidskriftsartikel (76)
forskningsöversikt (1)
Typ av innehåll
refereegranskat (75)
övrigt vetenskapligt/konstnärligt (2)
Författare/redaktör
Khan, Fahad (8)
Khan, Salman (5)
Kahl, Fredrik (4)
Pajdla, Tomas (4)
Aanæs, Henrik (3)
Solem, Jan Erik (3)
visa fler...
Åström, Karl (3)
Liu, Jun (3)
Wang, Gang (3)
Eriksson, Anders (2)
Ottersten, Björn, 19 ... (2)
Wiklund, Johan (2)
Shah, Mubarak (2)
Aouada, D. (2)
Fierrez, Julian (1)
Alonso-Fernandez, Fe ... (1)
Bigun, Josef, 1961- (1)
Liu, X (1)
Lou, X. (1)
Luo, J. (1)
Aanaes, H (1)
Fisker, R (1)
Carstensen, JM (1)
Zhang, Cheng (1)
Abdelnour, Jerome (1)
Rouat, Jean (1)
Salvi, Giampiero (1)
Kragic, Danica, 1971 ... (1)
Liao, Z. (1)
Enqvist, Olof (1)
Ulen, Johannes (1)
Wittek, Peter (1)
Ionescu, Radu Tudor (1)
Carlsson, Stefan (1)
Andreasson, Henrik, ... (1)
Liao, Qianfang, 1983 ... (1)
Tistarelli, Massimo (1)
Bigun, Josef (1)
Mirbach, B. (1)
Sullivan, Josephine (1)
Ahlberg, Jörgen (1)
Wadströmer, Niclas (1)
Larsson, Fredrik (1)
Magnusson, Måns (1)
Rögnvaldsson, Thorst ... (1)
Johnsson, Kerstin (1)
Fontes, Magnus (1)
Varagnolo, Damiano (1)
Pillonetto, Gianluig ... (1)
Bouguelia, Mohamed-R ... (1)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (21)
Linköpings universitet (19)
Lunds universitet (14)
Chalmers tekniska högskola (14)
Högskolan i Halmstad (3)
Uppsala universitet (2)
visa fler...
Göteborgs universitet (1)
Luleå tekniska universitet (1)
Stockholms universitet (1)
Örebro universitet (1)
Malmö universitet (1)
Mittuniversitetet (1)
Högskolan i Skövde (1)
Högskolan i Borås (1)
Sveriges Lantbruksuniversitet (1)
visa färre...
Språk
Engelska (77)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (60)
Teknik (22)
Samhällsvetenskap (2)
Medicin och hälsovetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy