SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Järemo Lawin Felix) "

Search: WFRF:(Järemo Lawin Felix)

  • Result 1-11 of 11
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Kristan, Matej, et al. (author)
  • The Ninth Visual Object Tracking VOT2021 Challenge Results
  • 2021
  • In: 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021). - : IEEE COMPUTER SOC. - 9781665401913 ; , s. 2711-2738
  • Conference paper (peer-reviewed)abstract
    • The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity organized by the VOT initiative. Results of 71 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2021 challenge was composed of four sub-challenges focusing on different tracking domains: (i) VOT-ST2021 challenge focused on short-term tracking in RGB, (ii) VOT-RT2021 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2021 focused on long-term tracking, namely coping with target disappearance and reappearance and (iv) VOT-RGBD2021 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2021 dataset was refreshed, while VOT-RGBD2021 introduces a training dataset and sequestered dataset for winner identification. The source code for most of the trackers, the datasets, the evaluation kit and the results along with the source code for most trackers are publicly available at the challenge website(1).
  •  
2.
  • Goutam, Bhat, et al. (author)
  • Learning What to Learn for Video Object Segmentation
  • 2020
  • In: Computer Vision. - Cham : Springer International Publishing. - 9783030585358 - 9783030585365 ; , s. 777-794
  • Conference paper (peer-reviewed)abstract
    • Video object segmentation (VOS) is a highly challengingproblem, since the target object is only defined by a first-frame refer-ence mask during inference. The problem of how to capture and utilizethis limited information to accurately segment the target remains a fun-damental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shotlearner. Our learner is designed to predict a powerful parametric modelof the target by minimizing a segmentation error in the first frame. Wefurther go beyond the standard few-shot learning paradigm by learningwhat our target model should learn in order to maximize segmentationaccuracy. We perform extensive experiments on standard benchmarks.Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding toa 2.6% relative improvement over the previous best result. The code andmodels are available at https://github.com/visionml/pytracking.
  •  
3.
  • Järemo-Lawin, Felix, et al. (author)
  • Deep Projective 3D Semantic Segmentation
  • 2017
  • In: Computer Analysis of Images and Patterns. - Cham : Springer. - 9783319646886 - 9783319646893 ; , s. 95-107
  • Conference paper (peer-reviewed)abstract
    • Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying point cloud data, leading to decreased spatial resolution and increased memory consumption. Additionally, 3D-CNNs greatly suffer from the limited availability of annotated datasets.
  •  
4.
  • Järemo Lawin, Felix, et al. (author)
  • Density Adaptive Point Set Registration
  • 2018
  • In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. - : IEEE. - 9781538664209 ; , s. 3829-3837
  • Conference paper (peer-reviewed)abstract
    • Probabilistic methods for point set registration have demonstrated competitive results in recent years. These techniques estimate a probability distribution model of the point clouds. While such a representation has shown promise, it is highly sensitive to variations in the density of 3D points. This fundamental problem is primarily caused by changes in the sensor location across point sets.    We revisit the foundations of the probabilistic registration paradigm. Contrary to previous works, we model the underlying structure of the scene as a latent probability distribution, and thereby induce invariance to point set density changes. Both the probabilistic model of the scene and the registration parameters are inferred by minimizing the Kullback-Leibler divergence in an Expectation Maximization based framework. Our density-adaptive registration successfully handles severe density variations commonly encountered in terrestrial Lidar applications. We perform extensive experiments on several challenging real-world Lidar datasets. The results demonstrate that our approach outperforms state-of-the-art probabilistic methods for multi-view registration, without the need of re-sampling.
  •  
5.
  • Järemo-Lawin, Felix, et al. (author)
  • Efficient Multi-frequency Phase Unwrapping Using Kernel Density Estimation
  • 2016
  • In: Computer Vision – ECCV 2016 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV. - Cham : Springer. - 9783319464923 - 9783319464930 ; , s. 170-185
  • Conference paper (peer-reviewed)abstract
    • In this paper we introduce an efficient method to unwrap multi-frequency phase estimates for time-of-flight ranging. The algorithm generates multiple depth hypotheses and uses a spatial kernel density estimate (KDE) to rank them. The confidence produced by the KDE is also an effective means to detect outliers. We also introduce a new closed-form expression for phase noise prediction, that better fits real data. The method is applied to depth decoding for the Kinect v2 sensor, and compared to the Microsoft Kinect SDK and to the open source driver libfreenect2. The intended Kinect v2 use case is scenes with less than 8m range, and for such cases we observe consistent improvements, while maintaining real-time performance. When extending the depth range to the maximal value of 18.75 m, we get about 52% more valid measurements than libfreenect2. The effect is that the sensor can now be used in large depth scenes, where it was previously not a good choice.
  •  
6.
  • Järemo Lawin, Felix, 1990- (author)
  • Learning Representations for Segmentation and Registration
  • 2021
  • Doctoral thesis (other academic/artistic)abstract
    • In computer vision, the aim is to model and extract high-level information from visual sensor measurements such as images, videos and 3D points. Since visual data is often high-dimensional, noisy and irregular, achieving robust data modeling is challenging. This thesis presents works that address challenges within a number of different computer vision problems. First, the thesis addresses the problem of phase unwrapping for multi-frequency amplitude modulated time-of-flight (ToF) ranging. ToF is used in depth cameras, which have many applications in 3D reconstruction and gesture recognition. While amplitude modulation in time-of-flight ranging can provide accurate measurements for the depth, it also causes depth ambiguities. This thesis presents a method to resolve the ambiguities by estimating the likelihoods of different hypotheses for the depth values. This is achieved by performing kernel density estimation over the hypotheses in a spatial neighborhood of each pixel in the depth image. The depth hypothesis with the highest estimated likelihood can then be selected as the output depth. This approach yields improvements in the quality of the depth images and extends the effective range in both indoor and outdoor environments. Next, point set registration is investigated, which is the problem of aligning point sets from overlapping depth images or 3D models. Robust registration is fundamental to many vision tasks, such as multi-view 3D reconstruction and object pose estimation for robotics. The thesis presents a method for handling density variations in the measured point sets. This is achieved by modeling a latent distribution representing the underlying structure of the scene. Both the model of the scene and the registration parameters are inferred in an Expectation-Maximization based framework. Secondly, the thesis introduces a method for integrating features from deep neural networks into the registration model. It is shown that the deep features improve registration performance in terms of accuracy and robustness. Additionally, improved feature representations are generated by training the deep neural network end-to-end by minimizing registration errors produced by our registration model. Further, an approach for 3D point set segmentation is presented. As scene models are often represented using 3D point measurements, segmentation of these is important for general scene understanding. Learning models for segmentation requires a significant amount of annotated data, which is expensive and time-consuming to acquire. The approach presented in the thesis circumvents this by projecting the points into virtual camera views and render 2D images. The method can then exploit accurate convolutional neural networks for image segmentation and map the segmentation predictions back to the 3D points. This also allows for transferring learning using available annotated image data, thereby reducing the need for 3D annotations. Finally, the thesis explores the problem of video object segmentation (VOS), where the task is to track and segment target objects in each frame of a video sequence. Accurate VOS requires a robust model of the target that can adapt to different scenarios and objects. This needs to be achieved using only a single labeled reference frame as training data for each video sequence. To address the challenges in VOS, the thesis introduces a parametric target model, optimized to predict a target label derived from the mask annotation. The target model is integrated into a deep neural network, where its predictions guide a decoder module to produce target segmentation masks. The deep network is trained on labeled video data to output accurate segmentation masks for each frame. Further, it is shown that by training the entire network model in an end-to-end manner, it can learn a representation of the target that provides increased segmentation accuracy. 
  •  
7.
  • Järemo-Lawin, Felix, et al. (author)
  • Registration Loss Learning for Deep Probabilistic Point Set Registration
  • 2020
  • In: 2020 International Conference on 3D Vision (3DV). - : IEEE. - 9781728181288 - 9781728181295 ; , s. 563-572
  • Conference paper (peer-reviewed)abstract
    • Probabilistic methods for point set registration have interesting theoretical properties, such as linear complexity in the number of used points, and they easily generalize to joint registration of multiple point sets. In this work, we improve their recognition performance to match state of the art. This is done by incorporating learned features, by adding a von Mises-Fisher feature model in each mixture component, and by using learned attention weights. We learn these jointly using a registration loss learning strategy (RLL) that directly uses the registration error as a loss, by back-propagating through the registration iterations. This is possible as the probabilistic registration is fully differentiable, and the result is a learning framework that is truly end-to-end. We perform extensive experiments on the 3DMatch and Kitti datasets. The experiments demonstrate that our approach benefits significantly from the integration of the learned features and our learning strategy, outperforming the state-of-the-art on Kitti. Code is available at https://github.com/felja633/RLLReg.
  •  
8.
  • Kristan, M., et al. (author)
  • The Eighth Visual Object Tracking VOT2020 Challenge Results
  • 2020
  • In: Computer Vision. - Cham : Springer International Publishing. - 9783030682378 ; , s. 547-601
  • Conference paper (peer-reviewed)abstract
    • The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity organized by the VOT initiative. Results of 58 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The VOT2020 challenge was composed of five sub-challenges focusing on different tracking domains: (i) VOT-ST2020 challenge focused on short-term tracking in RGB, (ii) VOT-RT2020 challenge focused on “real-time” short-term tracking in RGB, (iii) VOT-LT2020 focused on long-term tracking namely coping with target disappearance and reappearance, (iv) VOT-RGBT2020 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2020 challenge focused on long-term tracking in RGB and depth imagery. Only the VOT-ST2020 datasets were refreshed. A significant novelty is introduction of a new VOT short-term tracking evaluation methodology, and introduction of segmentation ground truth in the VOT-ST2020 challenge – bounding boxes will no longer be used in the VOT-ST challenges. A new VOT Python toolkit that implements all these novelites was introduced. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net ). 
  •  
9.
  • Robinson, Andreas, 1975-, et al. (author)
  • Discriminative Learning and Target Attention for the 2019 DAVIS Challenge onVideo Object Segmentation
  • 2019
  • In: CVPR 2019 workshops.
  • Conference paper (peer-reviewed)abstract
    • In this work, we address the problem of semi-supervised video object segmentation, where the task is to segment a target object in every image of the video sequence, given a ground truth only in the first frame. To be successful it is crucial to robustly handle unpredictable target appearance changes and distracting objects in the background. In this work we obtain a robust and efficient representation of the target by integrating a fast and light-weight discriminative target model into a deep segmentation network. Trained during inference, the target model learns to discriminate between the local appearances of target and background image regions. Its predictions are enhanced to accurate segmentation masks in a subsequent refinement stage.To further improve the segmentation performance, we add a new module trained to generate global target attention vectors, given the input mask and image feature maps. The attention vectors add semantic information about thetarget from a previous frame to the refinement stage, complementing the predictions provided by the target appearance model. Our method is fast and requires no network fine-tuning. We achieve a combined J and F-score of 70.6 on the DAVIS 2019 test-challenge data
  •  
10.
  • Robinson, Andreas, 1975-, et al. (author)
  • Learning Fast and Robust Target Models for Video Object Segmentation
  • 2020
  • In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). - : IEEE. - 9781728171685 ; , s. 7404-7413
  • Conference paper (peer-reviewed)abstract
    • Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time. The main difficulty is to effectively handle appearance changes and similar background objects, while maintaining accurate segmentation. Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting. More recent methods integrate generative target appearance models, but either achieve limited robustness or require large amounts of training data. We propose a novel VOS architecture consisting of two network components. The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation. The segmentation model is exclusively trained offline, designed to process the coarse scores into high quality segmentation masks. Our method is fast, easily trainable and remains highly effective in cases of limited training data. We perform extensive experiments on the challenging YouTube-VOS and DAVIS datasets. Our network achieves favorable performance, while operating at higher frame-rates compared to state-of-the-art. Code and trained models are available at https://github.com/andr345/frtm-vos.
  •  
11.
  • Tavares, Anderson, et al. (author)
  • Assessing losses for point set registration
  • 2020
  • In: IEEE Robotics and Automation Letters. - : Institute of Electrical and Electronics Engineers Inc.. - 2377-3766. ; 5:2, s. 3360-3367
  • Journal article (peer-reviewed)abstract
    • This letter introduces a framework for evaluation of the losses used in point set registration. In order for a loss to be useful with a local optimizer, such as e.g. Levenberg-Marquardt, or expectation maximization (EM), it must be monotonic with respect to the sought transformation. This motivates us to introduce monotonicity violation probability (MVP) curves, and use these to assess monotonicity empirically for many different local distances, such as point-to-point, point-to-plane, and plane-to-plane. We also introduce a local shape-to-shape distance, based on the Wasserstein distance of the local normal distributions. Evaluation is done on a comprehensive benchmark of terrestrial lidar scans from two publicly available datasets. It demonstrates that matching robustness can be improved significantly, by using kernel versions of local distances together with inverse density based sample weighting. 
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-11 of 11

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view