SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Kjellström Hedvig 1973 ) "

Sökning: WFRF:(Kjellström Hedvig 1973 )

  • Resultat 1-50 av 60
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Broomé, Sofia, et al. (författare)
  • Dynamics are important for the recognition of equine pain in video
  • 2019
  • Ingår i: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. - : Institute of Electrical and Electronics Engineers (IEEE). - 1063-6919.
  • Konferensbidrag (refereegranskat)abstract
    • A prerequisite to successfully alleviate pain in animals is to recognize it, which is a great challenge in non-verbal species. Furthermore, prey animals such as horses tend to hide their pain. In this study, we propose a deep recurrent two-stream architecture for the task of distinguishing pain from non-pain in videos of horses. Different models are evaluated on a unique dataset showing horses under controlled trials with moderate pain induction, which has been presented in earlier work. Sequential models are experimentally compared to single-frame models, showing the importance of the temporal dimension of the data, and are benchmarked against a veterinary expert classification of the data. We additionally perform baseline comparisons with generalized versions of state-of-the-art human pain recognition methods. While equine pain detection in machine learning is a novel field, our results surpass veterinary expert performance and outperform pain detection results reported for other larger non-human species. 
  •  
2.
  • Broomé, Sara, et al. (författare)
  • Going Deeper than Tracking : A Survey of Computer-Vision Based Recognition of Animal Pain and Emotions
  • 2023
  • Ingår i: International Journal of Computer Vision. - : Springer Nature. - 0920-5691 .- 1573-1405. ; 131:2, s. 572-590
  • Tidskriftsartikel (refereegranskat)abstract
    • Advances in animal motion tracking and pose recognition have been a game changer in the study of animal behavior. Recently, an increasing number of works go ‘deeper’ than tracking, and address automated recognition of animals’ internal states such as emotions and pain with the aim of improving animal welfare, making this a timely moment for a systematization of the field. This paper provides a comprehensive survey of computer vision-based research on recognition of pain and emotional states in animals, addressing both facial and bodily behavior analysis. We summarize the efforts that have been presented so far within this topic—classifying them across different dimensions, highlight challenges and research gaps, and provide best practice recommendations for advancing the field, and some future directions for research. 
  •  
3.
  • Broomé, Sofia, 1990- (författare)
  • Learning Spatiotemporal Features in Low-Data and Fine-Grained Action Recognition with an Application to Equine Pain Behavior
  • 2022
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Recognition of pain in animals is important because pain compromises animal welfare and can be a manifestation of disease. This is a difficult task for veterinarians and caretakers, partly because horses, being prey animals, display subtle pain behavior, and because they cannot verbalize their pain. An automated video-based system has a large potential to improve the consistency and efficiency of pain predictions.Video recording is desirable for ethological studies because it interferes minimally with the animal, in contrast to more invasive measurement techniques, such as accelerometers. Moreover, to be able to say something meaningful about animal behavior, the subject needs to be studied for longer than the exposure of single images. In deep learning, we have not come as far for video as we have for single images, and even more questions remain regarding what types of architectures should be used and what these models are actually learning. Collecting video data with controlled moderate pain labels is both laborious and involves real animals, and the amount of such data should therefore be limited. The low-data scenario, in particular, is under-explored in action recognition, in favor of the ongoing exploration of how well large models can learn large datasets.The first theme of the thesis is automated recognition of equine pain. Here, we propose a method for end-to-end equine pain recognition from video, finding, in particular, that the temporal modeling ability of the artificial neural network is important to improve the classification. We surpass veterinarian experts on a dataset with horses undergoing well-defined moderate experimental pain induction.  Next, we investigate domain transfer to another type of pain in horses: less defined, longer-acting and lower-grade orthopedic pain. We find that a smaller, recurrent video model is more robust to domain shift on a target dataset than a large, pre-trained, 3D CNN, having equal performance on a source dataset. We also discuss challenges with learning video features on real-world datasets.Motivated by questions arisen within the application area, the second theme of the thesis is empirical properties of deep video models. Here, we study the spatiotemporal features that are learned by deep video models in end-to-end video classification and propose an explainability method as a tool for such investigations. Further, the question of whether different approaches to frame dependency treatment in video models affect their cross-domain generalization ability is explored through empirical study. We also propose new datasets for light-weight temporal modeling and to investigate texture bias within action recognition.
  •  
4.
  • Broomé, Sofia, et al. (författare)
  • Recur, Attend or Convolve? : On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
  • 2023
  • Ingår i: 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 4188-4198
  • Konferensbidrag (refereegranskat)abstract
    • Most action recognition models today are highly parameterized, and evaluated on datasets with appearance-wise distinct classes. It has also been shown that 2D Convolutional Neural Networks (CNNs) tend to be biased toward texture rather than shape in still image recognition tasks [19], in contrast to humans. Taken together, this raises suspicion that large video models partly learn spurious spatial texture correlations rather than to track relevant shapes over time to infer generalizable semantics from their movement. A natural way to avoid parameter explosion when learning visual patterns over time is to make use of recurrence. Biological vision consists of abundant recurrent circuitry, and is superior to computer vision in terms of domain shift generalization. In this article, we empirically study whether the choice of low-level temporal modeling has consequences for texture bias and cross-domain robustness. In order to enable a light-weight and systematic assessment of the ability to capture temporal structure, not revealed from single frames, we provide the Temporal Shape (TS) dataset, as well as modified domains of Diving48 allowing for the investigation of spatial texture bias in video models. The combined results of our experiments indicate that sound physical inductive bias such as recurrence in temporal modeling may be advantageous when robustness to domain shift is important for the task.
  •  
5.
  • Broomé, Sofia, et al. (författare)
  • Sharing pain : Using pain domain transfer for video recognition of low grade orthopedic pain in horses
  • 2022
  • Ingår i: PLOS ONE. - : Public Library of Science (PLoS). - 1932-6203. ; 17:3, s. e0263854-
  • Tidskriftsartikel (refereegranskat)abstract
    • Orthopedic disorders are common among horses, often leading to euthanasia, which often could have been avoided with earlier detection. These conditions often create varying degrees of subtle long-term pain. It is challenging to train a visual pain recognition method with video data depicting such pain, since the resulting pain behavior also is subtle, sparsely appearing, and varying, making it challenging for even an expert human labeller to provide accurate ground-truth for the data. We show that a model trained solely on a dataset of horses with acute experimental pain (where labeling is less ambiguous) can aid recognition of the more subtle displays of orthopedic pain. Moreover, we present a human expert baseline for the problem, as well as an extensive empirical study of various domain transfer methods and of what is detected by the pain recognition method trained on clean experimental pain in the orthopedic dataset. Finally, this is accompanied with a discussion around the challenges posed by real-world animal behavior datasets and how best practices can be established for similar fine-grained action recognition tasks. Our code is available at https://github.com/sofiabroome/painface-recognition.
  •  
6.
  • Bütepage, Judith, et al. (författare)
  • A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Human behavior is a continuous stochastic spatio-temporal process which is governed by semantic actions and affordances as well as latent factors. Therefore, video-based human activity modeling is concerned with a number of tasks such as inferring current and future semantic labels, predicting future continuous observations as well as imagining possible future label and feature sequences. In this paper we present a semi-supervised probabilistic deep latent variable model that can represent both discrete labels and continuous observations as well as latent dynamics over time. This allows the model to solve several tasks at once without explicit fine-tuning. We focus here on the tasks of action classification, detection, prediction and anticipation as well as motion prediction and synthesis based on 3D human activity data recorded with Kinect. We further extend the model to capture hierarchical label structure and to model the dependencies between multiple entities, such as a human and objects. Our experiments demonstrate that our principled approach to human activity modeling can be used to detect current and anticipate future semantic labels and to predict and synthesize future label and feature sequences. When comparing our model to state-of-the-art approaches, which are specifically designed for e.g. action classification, we find that our probabilistic formulation outperforms or is comparable to these task specific models.
  •  
7.
  • Butepage, Judith, et al. (författare)
  • Anticipating many futures : Online human motion prediction and generation for human-robot interaction
  • 2018
  • Ingår i: 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA). - : IEEE COMPUTER SOC. - 9781538630815 ; , s. 4563-4570
  • Konferensbidrag (refereegranskat)abstract
    • Fluent and safe interactions of humans and robots require both partners to anticipate the others' actions. The bottleneck of most methods is the lack of an accurate model of natural human motion. In this work, we present a conditional variational autoencoder that is trained to predict a window of future human motion given a window of past frames. Using skeletal data obtained from RGB depth images, we show how this unsupervised approach can be used for online motion prediction for up to 1660 ms. Additionally, we demonstrate online target prediction within the first 300-500 ms after motion onset without the use of target specific training data. The advantage of our probabilistic approach is the possibility to draw samples of possible future motion patterns. Finally, we investigate how movements and kinematic cues are represented on the learned low dimensional manifold.
  •  
8.
  • Butepage, Judith, et al. (författare)
  • Predicting the what and how - A probabilistic semi-supervised approach to multi-task human activity modeling
  • 2019
  • Ingår i: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. - : IEEE Computer Society. - 9781728125060 ; , s. 2923-2926
  • Konferensbidrag (refereegranskat)abstract
    • Video-based prediction of human activity is usually performed on one of two levels: either a model is trained to anticipate high-level action labels or it is trained to predict future trajectories either in skeletal joint space or in image pixel space. This separation of classification and regression tasks implies that models cannot make use of the mutual information between continuous and semantic observations. However, if a model knew that an observed human wants to drink from a nearby glass, the space of possible trajectories would be highly constrained to reaching movements. Likewise, if a model had predicted a reaching trajectory, the inference of future semantic labels would rank 'lifting' more likely than 'walking'. In this work, we propose a semi-supervised generative latent variable model that addresses both of these levels by modeling continuous observations as well as semantic labels. This fusion of signals allows the model to solve several tasks, such as action detection and anticipation as well as motion prediction and synthesis, simultaneously. We demonstrate this ability on the UTKinect-Action3D dataset, which consists of noisy, partially labeled multi-action sequences. The aim of this work is to encourage research within the field of human activity modeling based on mixed categorical and continuous data.
  •  
9.
  • Butepage, Judith, et al. (författare)
  • Social Affordance Tracking over Time - A Sensorimotor Account of False-Belief Tasks
  • 2016
  • Ingår i: Proceedings of the 38th Annual Meeting of the Cognitive Science Society, CogSci 2016. - : The Cognitive Science Society. ; , s. 1014-1019
  • Konferensbidrag (refereegranskat)abstract
    • False-belief task have mainly been associated with the explanatory notion of the theory of mind and the theory-theory. However, it has often been pointed out that this kind of high-level reasoning is computational and time expensive. During the last decades, the idea of embodied intelligence, i.e. complex behavior caused by sensorimotor contingencies, has emerged in both the fields of neuroscience, psychology and artificial intelligence. Viewed from this perspective, the failing in a false-belief test can be the result of the impairment to recognize and track others' sensorimotor contingencies and affordances. Thus, social cognition is explained in terms of low-level signals instead of high-level reasoning. In this work, we present a generative model for optimal action selection which simultaneously can be employed to make predictions of others' actions. As we base the decision making on a hidden state representation of sensorimotor signals, this model is in line with the ideas of embodied intelligence. We demonstrate how the tracking of others' hidden states can give rise to correct false-belief inferences, while a lack thereof leads to failing. With this work, we want to emphasize the importance of sensorimotor contingencies in social cognition, which might be a key to artificial, socially intelligent systems.
  •  
10.
  • Christoffersen, Benjamin, et al. (författare)
  • Asymptotically Exact and Fast Gaussian Copula Models for Imputation of Mixed Data Types
  • 2021
  • Ingår i: <em>Proceedings of Machine Learning Research</em>. - : ML Research Press. ; , s. 870-885
  • Konferensbidrag (refereegranskat)abstract
    • Missing values with mixed data types is a common problem in a large number of machine learning applications such as processing of surveys and in different medical applications. Recently, Gaussian copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework. While the present Gaussian copula models have shown to yield state of the art performance, they have two limitations: they are based on an approximation that is fast but may be imprecise and they do not support unordered multinomial variables. We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures. The method we provide has lower errors for the estimated model parameters and the imputed values, compared to previously proposed methods. We also extend the previous Gaussian copula models to include unordered multinomial variables in addition to the present support of ordinal, binary, and continuous variables. 
  •  
11.
  • Colomer, Marc Botet, et al. (författare)
  • To Adapt or Not to Adapt? : Real-Time Adaptation for Semantic Segmentation
  • 2023
  • Ingår i: 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 16502-16513
  • Konferensbidrag (refereegranskat)abstract
    • The goal of Online Domain Adaptation for semantic segmentation is to handle unforeseeable domain changes that occur during deployment, like sudden weather events. However, the high computational costs associated with bruteforce adaptation make this paradigm unfeasible for realworld applications. In this paper we propose HAMLET, a Hardware-Aware Modular Least Expensive Training framework for real-time domain adaptation. Our approach includes a hardware-aware back-propagation orchestration agent (HAMT) and a dedicated domain-shift detector that enables active control over when and how the model is adapted (LT). Thanks to these advancements, our approach is capable of performing semantic segmentation while simultaneously adapting at more than 29FPS on a single consumer-grade GPU. Our framework's encouraging accuracy and speed trade-off is demonstrated on OnDA and SHIFT benchmarks through experimental results.
  •  
12.
  • Dogan, Fethiye Irmak (författare)
  • Robots That Understand Natural Language Instructions and Resolve Ambiguities
  • 2023
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Verbal communication is a key challenge in human-robot interaction. For effective verbal interaction, understanding natural language instructions and clarifying ambiguous user requests are crucial for robots. In real-world environments, the instructions can be ambiguous for many reasons. For instance, when a user asks the robot to find and bring 'the porcelain mug', the mug might be located in the kitchen cabinet or on the dining room table, depending on whether it is clean or full (semantic ambiguities). Additionally, there can be multiple mugs in the same location, and the robot can disambiguate them by asking follow-up questions based on their distinguishing features, such as their color or spatial relations to other objects (visual ambiguities).While resolving ambiguities, previous works have addressed this problem by only disambiguating the objects in the robot's current view and have not considered ones outside the robot's point of view. To fill in this gap and resolve semantic ambiguities caused by objects possibly being located at multiple places, we present a novel approach by reasoning about their semantic properties. On the other hand, while dealing with ambiguous instructions caused by multiple similar objects in the same location, most of the existing systems ask users to repeat their requests with the assumption that the robot is familiar with all of the objects in the environment. To address this limitation and resolve visual ambiguities, we present an interactive system that asks for follow-up clarifications to disambiguate the described objects using the pieces of information that the robot could understand from the request and the objects in the environment that are known to the robot.In summary, in this thesis, we aim to resolve semantic and visual ambiguities to guide a robot's search for described objects specified in user instructions. With semantic disambiguation, we aim to find described objects' locations across an entire household by leveraging object semantics to form clarifying questions when there are ambiguities. After identifying object locations, with visual disambiguation, we aim to identify the specified object among multiple similar objects located in the same space. To achieve this, we suggest a multi-stage approach where the robot first identifies the objects that are fitting to the user's description, and if there are multiple objects, the robot generates clarification questions by describing each potential target object with its spatial relations to other objects. Our results emphasize the significance of semantic and visual disambiguation for successful task completion and human-robot collaboration.
  •  
13.
  • Dovesi, P. L., et al. (författare)
  • Real-Time Semantic Stereo Matching
  • 2020
  • Ingår i: Proceedings - IEEE International Conference on Robotics and Automation. - : Institute of Electrical and Electronics Engineers Inc.. ; , s. 10780-10787
  • Konferensbidrag (refereegranskat)abstract
    • Scene understanding is paramount in robotics, self-navigation, augmented reality, and many other fields. To fully accomplish this task, an autonomous agent has to infer the 3D structure of the sensed scene (to know where it looks at) and its content (to know what it sees). To tackle the two tasks, deep neural networks trained to infer semantic segmentation and depth from stereo images are often the preferred choices. Specifically, Semantic Stereo Matching can be tackled by either standalone models trained for the two tasks independently or joint end-to-end architectures. Nonetheless, as proposed so far, both solutions are inefficient because requiring two forward passes in the former case or due to the complexity of a single network in the latter, although jointly tackling both tasks is usually beneficial in terms of accuracy. In this paper, we propose a single compact and lightweight architecture for real-time semantic stereo matching. Our framework relies on coarse-to-fine estimations in a multi-stage fashion, allowing: i) very fast inference even on embedded devices, with marginal drops in accuracy, compared to state-of-the-art networks, ii) trade accuracy for speed, according to the specific application requirements. Experimental results on high-end GPUs as well as on an embedded Jetson TX2 confirm the superiority of semantic stereo matching compared to standalone tasks and highlight the versatility of our framework on any hardware and for any application.
  •  
14.
  • Eriksson, Sara, et al. (författare)
  • Dancing With Drones : Crafting Novel Artistic Expressions Through Intercorporeality
  • 2019
  • Ingår i: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. - New York, NY, USA : ACM. - 9781450359702 ; , s. 617:1-617:12
  • Konferensbidrag (refereegranskat)abstract
    • Movement-based interactions are gaining traction, requiring a better understanding of how such expressions are shaped by designers. Through an analysis of an artistic process aimed to deliver a commissioned opera where custom-built drones are performing on stage alongside human performers, we observed the importance of achieving an intercorporeal understanding to shape body-based emotional expressivity. Our analysis reveals how the choreographer moves herself to: (1) imitate and feel the affordances and expressivity of the drones' 'otherness' through her own bodily experience; (2) communicate to the engineer of the team how she wants to alter the drones' behaviors to be more expressive; (3) enact and interactively alter her choreography. Through months of intense development and creative work, such an intercorporeal understanding was achieved by carefully crafting the drones' behaviors, but also by the choreographer adjusting her own somatics and expressions. The choreography arose as a result of the expressivity they enabled together.
  •  
15.
  • Hamesse, Charles, et al. (författare)
  • Simultaneous measurement imputation and outcome prediction for achilles tendon rupture rehabilitation
  • 2018
  • Ingår i: CEUR Workshop Proceedings. - : CEUR-WS. ; , s. 82-86
  • Konferensbidrag (refereegranskat)abstract
    • Achilles Tendon Rupture (ATR) is one of the typical soft tissue injuries. Accurately predicting the rehabilitation outcome of ATR using noisy measurements with missing entries is crucial for treatment decision support. In this work, we design a probabilistic model that simultaneously predicts the missing measurements and the rehabilitation outcome in an end-to-end manner. We evaluate our model and compare it with multiple baselines including multi-stage methods using an ATR clinical cohort. Experimental results demonstrate the superiority of our model for ATR rehabilitation outcome prediction.
  •  
16.
  • Hamesse, Charles, et al. (författare)
  • Simultaneous Measurement Imputation and Outcome Prediction for Achilles Tendon Rupture Rehabilitation
  • 2019
  • Ingår i: Proceedings of Machine Learning Research 106.
  • Konferensbidrag (refereegranskat)abstract
    • Achilles Tendon Rupture (ATR) is one of the typical soft tissue injuries. Rehabilitation after such a musculoskeletal injury remains a prolonged process with a very variable outcome. Accurately predicting rehabilitation outcome is crucial for treatment decision support. However, it is challenging to train an automatic method for predicting the AT Rrehabilitation outcome from treatment data, due to a massive amount of missing entries in the data recorded from ATR patients, as well as complex nonlinear relations between measurements and outcomes. In this work, we design an end-to-end probabilistic framework to impute missing data entries and predict rehabilitation outcomes simultaneously. We evaluate our model on a real-life ATR clinical cohort, comparing with various baselines. The proposed method demonstrates its clear superiority over traditional methods which typically perform imputation and prediction in two separate stages.
  •  
17.
  •  
18.
  • Haubro Andersen, Pia, et al. (författare)
  • Towards Machine Recognition of Facial Expressions of Pain in Horses
  • 2021
  • Ingår i: Animals. - : MDPI. - 2076-2615. ; 11:6
  • Forskningsöversikt (refereegranskat)abstract
    • Simple Summary Facial activity can convey valid information about the experience of pain in a horse. However, scoring of pain in horses based on facial activity is still in its infancy and accurate scoring can only be performed by trained assessors. Pain in humans can now be recognized reliably from video footage of faces, using computer vision and machine learning. We examine the hurdles in applying these technologies to horses and suggest two general approaches to automatic horse pain recognition. The first approach involves automatically detecting objectively defined facial expression aspects that do not involve any human judgment of what the expression "means". Automated classification of pain expressions can then be done according to a rule-based system since the facial expression aspects are defined with this information in mind. The other involves training very flexible machine learning methods with raw videos of horses with known true pain status. The upside of this approach is that the system has access to all the information in the video without engineered intermediate methods that have filtered out most of the variation. However, a large challenge is that large datasets with reliable pain annotation are required. We have obtained promising results from both approaches. Automated recognition of human facial expressions of pain and emotions is to a certain degree a solved problem, using approaches based on computer vision and machine learning. However, the application of such methods to horses has proven difficult. Major barriers are the lack of sufficiently large, annotated databases for horses and difficulties in obtaining correct classifications of pain because horses are non-verbal. This review describes our work to overcome these barriers, using two different approaches. One involves the use of a manual, but relatively objective, classification system for facial activity (Facial Action Coding System), where data are analyzed for pain expressions after coding using machine learning principles. We have devised tools that can aid manual labeling by identifying the faces and facial keypoints of horses. This approach provides promising results in the automated recognition of facial action units from images. The second approach, recurrent neural network end-to-end learning, requires less extraction of features and representations from the video but instead depends on large volumes of video data with ground truth. Our preliminary results suggest clearly that dynamics are important for pain recognition and show that combinations of recurrent neural networks can classify experimental pain in a small number of horses better than human raters.
  •  
19.
  •  
20.
  •  
21.
  • Jonell, Patrik, et al. (författare)
  • Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia : Clinical Feasibility and Preliminary Results
  • 2021
  • Ingår i: Frontiers in Computer Science. - : Frontiers Media SA. - 2624-9898. ; 3
  • Tidskriftsartikel (refereegranskat)abstract
    • Non-invasive automatic screening for Alzheimer's disease has the potential to improve diagnostic accuracy while lowering healthcare costs. Previous research has shown that patterns in speech, language, gaze, and drawing can help detect early signs of cognitive decline. In this paper, we describe a highly multimodal system for unobtrusively capturing data during real clinical interviews conducted as part of cognitive assessments for Alzheimer's disease. The system uses nine different sensor devices (smartphones, a tablet, an eye tracker, a microphone array, and a wristband) to record interaction data during a specialist's first clinical interview with a patient, and is currently in use at Karolinska University Hospital in Stockholm, Sweden. Furthermore, complementary information in the form of brain imaging, psychological tests, speech therapist assessment, and clinical meta-data is also available for each patient. We detail our data-collection and analysis procedure and present preliminary findings that relate measures extracted from the multimodal recordings to clinical assessments and established biomarkers, based on data from 25 patients gathered thus far. Our findings demonstrate feasibility for our proposed methodology and indicate that the collected data can be used to improve clinical assessments of early dementia.
  •  
22.
  • Karipidou, Kelly, et al. (författare)
  • Computer Analysis of Sentiment Interpretation in Musical Conducting
  • 2017
  • Ingår i: Proceedings - 12th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2017. - : IEEE. - 9781509040230 ; , s. 400-405
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a unique dataset consisting of 20 recordings of the same musical piece, conducted with 4 different musical intentions in mind. The upper body and baton motion of a professional conductor was recorded, as well as the sound of each instrument in a professional string quartet following the conductor. The dataset is made available for benchmarking of motion recognition algorithms. An HMM-based emotion intent classification method is trained with subsets of the data, and classification of other subsets of the data show firstly that the motion of the baton communicates energetic intention to a high degree, secondly, that the conductor’s torso, head and other arm conveys calm intention to a high degree, and thirdly, that positive vs negative sentiments are communicated to a high degree through other channels than the body and baton motion – most probably, through facial expression and muscle tension conveyed through articulated hand and finger motion. The long-term goal of this work is to develop a computer model of the entire conductor-orchestra communication pro- cess; the studies presented here indicate that computer modeling of the conductor-orchestra communication is feasible.
  •  
23.
  • Karvonen, Andrew, et al. (författare)
  • The ‘New Urban Science’: towards the interdisciplinary and transdisciplinary pursuit of sustainable transformations
  • 2021
  • Ingår i: Urban Transformations. - : Springer Nature. - 2524-8162. ; 3:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Digitalisation is an increasingly important driver of urban development. The ‘New Urban Science’ is one particular approach to urban digitalisation that promises new ways of knowing and managing cities more effectively. Proponents of the New Urban Science emphasise urban data analytics and modelling as a means to develop novel insights on how cities function. However, there are multiple opportunities to broaden and deepen these practices through collaborations between the natural and social sciences as well as with public authorities, private companies, and civil society. In this article, we summarise the history and critiques of urban science and then call for a New Urban Science that embraces interdisciplinary and transdisciplinary approaches to scientific knowledge production and application. We argue that such an expanded version of the New Urban Science can be used to develop urban transformative capacity and achieve ecologically resilient, economically prosperous, and socially robust cities of the twenty-first century.
  •  
24.
  • Kjellström, Hedvig, 1973- (författare)
  • Contextual Action Recognition
  • 2011
  • Ingår i: Visual Analysis of Human<em>s </em>. - : Springer. - 9780857299963 ; , s. 355-376
  • Bokkapitel (övrigt vetenskapligt/konstnärligt)
  •  
25.
  • Klasson, Marcus, et al. (författare)
  • A hierarchical grocery store image dataset with visual and semantic labels
  • 2019
  • Ingår i: Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019. - : Institute of Electrical and Electronics Engineers (IEEE). - 9781728119755 ; , s. 491-500
  • Konferensbidrag (refereegranskat)abstract
    • Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application – classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.
  •  
26.
  • Klasson, Marcus (författare)
  • Fine-Grained and Continual Visual Recognition for Assisting Visually Impaired People
  • 2022
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • In recent years, computer vision-based assistive technologies have enabled visually impaired people to use automatic visual recognition on their mobile phones. These systems should be capable of recognizing objects on fine-grained levels to provide the user with accurate predictions. Additionally, the user should have the option to update the system continuously to recognize new objects of interest. However, there are several challenges that need to be tackled to enable such features with assistive vision systems in real and highly-varying environments. For instance, fine-grained image recognition usually requires large amounts of labeled data to be robust. Moreover, image classifiers struggle with retaining performance of previously learned abilities when they are adapted to new tasks. This thesis is divided into two parts where we address these challenges. First, we focus on the application of using assistive vision systems for grocery shopping, where items are naturally structured based on fine-grained details. We demonstrate how image classifiers can be trained with a combination of natural images and web-scraped information about the groceries to obtain more accurate classification performance compared to only using natural images for training. Thereafter, we bring forward a new approach for continual learning called replay scheduling, where we select which tasks to replay at different times to improve memory retention. Furthermore, we propose a novel framework for learning replay scheduling policies that can generalize to new continual learning scenarios for mitigating the catastrophic forgetting effect in image classifiers. This thesis provides insights on practical challenges that need to be addressed to enhance the usefulness of computer vision for assisting the visually impaired in real-world scenarios.
  •  
27.
  • Klasson, Marcus, et al. (författare)
  • Learn the Time to Learn : Replay Scheduling in Continual Learning
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Replay-based continual learning have shown to be successful in mitigating catastrophic forgetting despite having limited access to historical data. However, storing historical data is cheap in many real-world applications, yet replaying all seen data would be prohibited due to processing time constraints. In such settings, we propose learning the time to learn for a continual learning system, in which we learn replay schedules over which tasks to replay at different time steps. To demonstrate the importance of learning the time to learn, we use Monte Carlo tree search in an ideal continual learning scenario to find the proper replay schedule. We perform extensive evaluations to show the benefits of replay scheduling in various memory settings and in combination with different replay methods. Moreover, the results indicate that the found schedules are consistent with human learning insights. Our findings opens up for new research directions that can bring current continual learning research closer to real-world needs.
  •  
28.
  • Klasson, Marcus, et al. (författare)
  • Policy Learning for Replay Scheduling in Continual Learning
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Scheduling over which tasks to select for replay at different times have been demonstrated to be important in continual learning. However, a replay scheduling policy that can be applied in any continual learning scenario is currently missing, which makes replay scheduling infeasible in real-world scenarios. To this end, we propose using reinforcement learning to enable learning policies that can be applied in new continual learning scenarios without additional computational cost. In our experiments, we show that the learned policies can propose replay schedules that efficiently mitigate catastrophic forgetting in environments with previously unseen task orders and datasets. The proposed approach opens up for new research directions in replay-based continual learning that stems well with real-world needs.
  •  
29.
  • Klasson, Marcus, et al. (författare)
  • Using Variational Multi-view Learning for Classification of Grocery Items
  • 2020
  • Ingår i: Patterns. - : Elsevier. - 2666-3899. ; 1:8
  • Tidskriftsartikel (refereegranskat)abstract
    • An essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in grocery stores. In this paper, we introduce a novel dataset with natural images of groceries—fruits, vegetables, and packaged products—where all images have been taken inside grocery stores to resemble a shopping scenario. Additionally, we download iconic images and text descriptions for each item that can be utilized for better representation learning of groceries. We select a multi-view generative model, which can combine the different item information into lower-dimensional representations. The experiments show that utilizing the additional information yields higher accuracies on classifying grocery items than only using the natural images. We observe that iconic images help to construct representations separated by visual differences of the items, while text descriptions enable the model to distinguish between visually similar items by different ingredients.
  •  
30.
  • Kucherenko, Taras, 1994-, et al. (författare)
  • A neural network approach to missing marker reconstruction in human motion capture
  • 2018
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Optical motion capture systems have become a widely used technology in various fields, such as augmented reality, robotics, movie production, etc. Such systems use a large number of cameras to triangulate the position of optical markers.The marker positions are estimated with high accuracy. However, especially when tracking articulated bodies, a fraction of the markers in each timestep is missing from the reconstruction. In this paper, we propose to use a neural network approach to learn how human motion is temporally and spatially correlated, and reconstruct missing markers positions through this model. We experiment with two different models, one LSTM-based and one time-window-based. Both methods produce state-of-the-art results, while working online, as opposed to most of the alternative methods, which require the complete sequence to be known. The implementation is publicly available at https://github.com/Svito-zar/NN-for-Missing-Marker-Reconstruction .
  •  
31.
  • Kucherenko, Taras, 1994-, et al. (författare)
  • Analyzing Input and Output Representations for Speech-Driven Gesture Generation
  • 2019
  • Ingår i: 19th ACM International Conference on Intelligent Virtual Agents. - New York, NY, USA : ACM Publications. - 9781450366724
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a novel framework for automatic speech-driven gesture generation, applicable to human-agent interaction including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates.Our approach consists of two steps. First, we learn a lower-dimensional representation of human motion using a denoising autoencoder neural network, consisting of a motion encoder MotionE and a motion decoder MotionD. The learned representation preserves the most important aspects of the human pose variation while removing less relevant variation. Second, we train a novel encoder network SpeechE to map from speech to a corresponding motion representation with reduced dimensionality. At test time, the speech encoder and the motion decoder networks are combined: SpeechE predicts motion representations based on a given speech signal and MotionD then decodes these representations to produce motion sequences.We evaluate different representation sizes in order to find the most effective dimensionality for the representation. We also evaluate the effects of using different speech features as input to the model. We find that mel-frequency cepstral coefficients (MFCCs), alone or combined with prosodic features, perform the best. The results of a subsequent user study confirm the benefits of the representation learning.
  •  
32.
  • Kucherenko, Taras, 1994- (författare)
  • Developing and evaluating co-speech gesture-synthesis models for embodied conversational agents
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    •  A  large part of our communication is non-verbal:   humans use non-verbal behaviors to express various aspects of our state or intent.  Embodied artificial agents, such as virtual avatars or robots, should also use non-verbal behavior for efficient and pleasant interaction. A core part of non-verbal communication is gesticulation:  gestures communicate a large share of non-verbal content. For example, around 90\% of spoken utterances in descriptive discourse are accompanied by gestures. Since gestures are important, generating co-speech gestures has been an essential task in the Human-Agent Interaction (HAI) and Computer Graphics communities for several decades.  Evaluating the gesture-generating methods has been an equally important and equally challenging part of field development. Consequently, this thesis contributes to both the development and evaluation of gesture-generation models. This thesis proposes three deep-learning-based gesture-generation models. The first model is deterministic and uses only audio and generates only beat gestures.  The second model is deterministic and uses both audio and text, aiming to generate meaningful gestures.  A final model uses both audio and text and is probabilistic to learn the stochastic character of human gesticulation.  The methods have applications to both virtual agents and social robots. Individual research efforts in the field of gesture generation are difficult to compare, as there are no established benchmarks.  To address this situation, my colleagues and I launched the first-ever gesture-generation challenge, which we called the GENEA Challenge.  We have also investigated if online participants are as attentive as offline participants and found that they are both equally attentive provided that they are well paid.   Finally,  we developed a  system that integrates co-speech gesture-generation models into a real-time interactive embodied conversational agent.  This system is intended to facilitate the evaluation of modern gesture generation models in interaction. To further advance the development of capable gesture-generation methods, we need to advance their evaluation, and the research in the thesis supports an interpretation that evaluation is the main bottleneck that limits the field.  There are currently no comprehensive co-speech gesture datasets, which should be large, high-quality, and diverse. In addition, no strong objective metrics are yet available.  Creating speech-gesture datasets and developing objective metrics are highlighted as essential next steps for further field development.
  •  
33.
  • Kucherenko, Taras, 1994-, et al. (författare)
  • Gesticulator : A framework for semantically-aware speech-driven gesture generation
  • 2020
  • Ingår i: ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction. - New York, NY, USA : Association for Computing Machinery (ACM).
  • Konferensbidrag (refereegranskat)abstract
    • During speech, people spontaneously gesticulate, which plays akey role in conveying information. Similarly, realistic co-speechgestures are crucial to enable natural and smooth interactions withsocial agents. Current end-to-end co-speech gesture generationsystems use a single modality for representing speech: either au-dio or text. These systems are therefore confined to producingeither acoustically-linked beat gestures or semantically-linked ges-ticulation (e.g., raising a hand when saying “high”): they cannotappropriately learn to generate both gesture types. We present amodel designed to produce arbitrary beat and semantic gesturestogether. Our deep-learning based model takes both acoustic andsemantic representations of speech as input, and generates gesturesas a sequence of joint angle rotations as output. The resulting ges-tures can be applied to both virtual agents and humanoid robots.Subjective and objective evaluations confirm the success of ourapproach. The code and video are available at the project page svito-zar.github.io/gesticula
  •  
34.
  • Kucherenko, Taras, 1994-, et al. (författare)
  • Moving Fast and Slow : Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation
  • 2021
  • Ingår i: International Journal of Human-Computer Interaction. - : Informa UK Limited. - 1044-7318 .- 1532-7590. ; 37:14, s. 1300-1316
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a novel framework for speech-driven gesture production, applicable to virtual agents to enhance human-computer interaction. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. We provide an analysis of different representations for the input (speech) and the output (motion) of the network by both objective and subjective evaluations. We also analyze the importance of smoothing of the produced motion. Our results indicated that the proposed method improved on our baseline in terms of objective measures. For example, it better captured the motion dynamics and better matched the motion-speed distribution. Moreover, we performed user studies on two different datasets. The studies confirmed that our proposed method is perceived as more natural than the baseline, although the difference in the studies was eliminated by appropriate post-processing: hip-centering and smoothing. We conclude that it is important to take both motion representation and post-processing into account when designing an automatic gesture-production method.
  •  
35.
  • Kucherenko, Taras, 1994-, et al. (författare)
  • Multimodal analysis of the predictability of hand-gesture properties
  • 2022
  • Ingår i: AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. - : ACM Press. ; , s. 770-779
  • Konferensbidrag (refereegranskat)abstract
    • Embodied conversational agents benefit from being able to accompany their speech with gestures. Although many data-driven approaches to gesture generation have been proposed in recent years, it is still unclear whether such systems can consistently generate gestures that convey meaning. We investigate which gesture properties (phase, category, and semantics) can be predicted from speech text and/or audio using contemporary deep learning. In extensive experiments, we show that gesture properties related to gesture meaning (semantics and category) are predictable from text features (time-aligned FastText embeddings) alone, but not from prosodic audio features, while rhythm-related gesture properties (phase) on the other hand can be predicted from audio features better than from text. These results are encouraging as they indicate that it is possible to equip an embodied agent with content-wise meaningful co-speech gestures using a machine-learning model.
  •  
36.
  • Kucherenko, Taras, 1994-, et al. (författare)
  • On the Importance of Representations for Speech-Driven Gesture Generation : Extended Abstract
  • 2019
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a novel framework for automatic speech-driven gesture generation applicable to human-agent interaction, including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech features as input and produces gestures in the form of sequences of 3D joint coordinates representing motion as output. The results of objective and subjective evaluations confirm the benefits of the representation learning.
  •  
37.
  • Kucherenko, Taras, 1994-, et al. (författare)
  • Speech2Properties2Gestures : Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech
  • 2021
  • Ingår i: IVA '21. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 145-147
  • Konferensbidrag (refereegranskat)abstract
    • We propose a new framework for gesture generation, aiming to allow data-driven approaches to produce more semantically rich gestures. Our approach first predicts whether to gesture, followed by a prediction of the gesture properties. Those properties are then used as conditioning for a modern probabilistic gesture-generation model capable of high-quality output. This empowers the approach to generate gestures that are both diverse and representational. Follow-ups and more information can be found on the project page:https://svito-zar.github.io/speech2properties2gestures
  •  
38.
  • Lawin, Felix Jaremo, et al. (författare)
  • Is Markerless More or Less? : Comparing a Smartphone Computer Vision Method for Equine Lameness Assessment to Multi-Camera Motion Capture
  • 2023
  • Ingår i: Animals. - : MDPI AG. - 2076-2615. ; 13:3
  • Tidskriftsartikel (refereegranskat)abstract
    • Lameness, an alteration of the gait due to pain or dysfunction of the locomotor system, is the most common disease symptom in horses. Yet, it is difficult for veterinarians to correctly assess by visual inspection. Objective tools that can aid clinical decision making and provide early disease detection through sensitive lameness measurements are needed. In this study, we describe how an AI-powered measurement tool on a smartphone can detect lameness in horses without the need to mount equipment on the horse. We compare it to a state-of-the-art multi-camera motion capture system by simultaneous, synchronised recordings from both systems. The mean difference between the systems' output of lameness metrics was below 2.2 mm. Therefore, we conclude that the smartphone measurement tool can detect lameness at relevant levels with easy-of-use for the veterinarian. Computer vision is a subcategory of artificial intelligence focused on extraction of information from images and video. It provides a compelling new means for objective orthopaedic gait assessment in horses using accessible hardware, such as a smartphone, for markerless motion analysis. This study aimed to explore the lameness assessment capacity of a smartphone single camera (SC) markerless computer vision application by comparing measurements of the vertical motion of the head and pelvis to an optical motion capture multi-camera (MC) system using skin attached reflective markers. Twenty-five horses were recorded with a smartphone (60 Hz) and a 13 camera MC-system (200 Hz) while trotting two times back and forth on a 30 m runway. The smartphone video was processed using artificial neural networks detecting the horse's direction, action and motion of body segments. After filtering, the vertical displacement curves from the head and pelvis were synchronised between systems using cross-correlation. This rendered 655 and 404 matching stride segmented curves for the head and pelvis respectively. From the stride segmented vertical displacement signals, differences between the two minima (MinDiff) and the two maxima (MaxDiff) respectively per stride were compared between the systems. Trial mean difference between systems was 2.2 mm (range 0.0-8.7 mm) for head and 2.2 mm (range 0.0-6.5 mm) for pelvis. Within-trial standard deviations ranged between 3.1-28.1 mm for MC and between 3.6-26.2 mm for SC. The ease of use and good agreement with MC indicate that the SC application is a promising tool for detecting clinically relevant levels of asymmetry in horses, enabling frequent and convenient gait monitoring over time.
  •  
39.
  • Maki, Atsuto, et al. (författare)
  • In Memoriam : Jan-Olof Eklundh
  • 2022
  • Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - : IEEE COMPUTER SOC. - 0162-8828 .- 1939-3539. ; 44:9, s. 4488-4489
  • Tidskriftsartikel (refereegranskat)
  •  
40.
  • Mikheeva, Olga, et al. (författare)
  • Aligned Multi-Task Gaussian Process
  • 2022
  • Ingår i: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022. - : ML Research Press. ; , s. 2970-2988
  • Konferensbidrag (refereegranskat)abstract
    • Multi-task learning requires accurate identification of the correlations between tasks. In real-world time-series, tasks are rarely perfectly temporally aligned; traditional multitask models do not account for this and subsequent errors in correlation estimation will result in poor predictive performance and uncertainty quantification. We introduce a method that automatically accounts for temporal misalignment in a unified generative model that improves predictive performance. Our method uses Gaussian processes (GPs) to model the correlations both within and between the tasks. Building on the previous work by Kazlauskaite et al. (2019), we include a separate monotonic warp of the input data to model temporal misalignment. In contrast to previous work, we formulate a lower bound that accounts for uncertainty in both the estimates of the warping process and the underlying functions. Also, our new take on a monotonic stochastic process, with efficient path-wise sampling for the warp functions, allows us to perform full Bayesian inference in the model rather than MAP estimates. Missing data experiments, on synthetic and real time-series, demonstrate the advantages of accounting for misalignments (vs standard unaligned method) as well as modelling the uncertainty in the warping process (vs baseline MAP alignment approach).
  •  
41.
  • Mikheeva, Olga, et al. (författare)
  • Perceptual facial expression representation
  • 2018
  • Ingår i: Proceedings - 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018. - : Institute of Electrical and Electronics Engineers (IEEE). - 9781538623350 ; , s. 179-186
  • Konferensbidrag (refereegranskat)abstract
    • Dissimilarity measures are often used as a proxy or a handle to reason about data. This can be problematic, as the data representation is often a consequence of the capturing process or how the data is visualized, rather than a reflection of the semantics that we want to extract. Facial expressions are a subtle and essential part of human communication but they are challenging to extract from current representations. In this paper we present a method that is capable of learning semantic representations of faces in a data driven manner. Our approach uses sparse human supervision which our method grounds in the data. We provide experimental justification of our approach showing that our representation improves the performance for emotion classification.
  •  
42.
  • Mänttäri, Joonatan, et al. (författare)
  • Interpreting Video Features : A Comparison of 3D Convolutional Networks and Convolutional  LSTM Networks
  • 2020
  • Konferensbidrag (refereegranskat)abstract
    • A number of techniques for interpretability have been presented for deep learning in computer vision, typically with the goal of understanding what the networks have based their classification on. However, interpretability for deep video architectures is still in its infancy and we do not yet have a clear concept of how to decode spatiotemporal features. In this paper, we present a study comparing how 3D convolutional networks and convolutional LSTM networks learn features across temporally dependent frames. This is the first comparison of two video models that both convolve to learn spatial features but have principally different methods of modeling time. Additionally, we extend the concept of meaningful perturbation introduced by Vedaldi et al. to the temporal dimension, to identify the temporal part of a sequence most meaningful to the network for a classification decision. Our findings indicate that the 3D convolutional model concentrates on shorter events in the input sequence, and places its spatial focus on fewer, contiguous areas.
  •  
43.
  • Mänttäri, Joonatan, et al. (författare)
  • Interpreting Video Features : A Comparison of 3D Convolutional Networks and Convolutional LSTM Networks
  • 2021
  • Ingår i: 15th Asian Conference on Computer Vision, ACCV 2020. - Cham : Springer Science and Business Media Deutschland GmbH. ; , s. 411-426
  • Konferensbidrag (refereegranskat)abstract
    • A number of techniques for interpretability have been presented for deep learning in computer vision, typically with the goal of understanding what the networks have based their classification on. However, interpretability for deep video architectures is still in its infancy and we do not yet have a clear concept of how to decode spatiotemporal features. In this paper, we present a study comparing how 3D convolutional networks and convolutional LSTM networks learn features across temporally dependent frames. This is the first comparison of two video models that both convolve to learn spatial features but have principally different methods of modeling time. Additionally, we extend the concept of meaningful perturbation introduced by [1] to the temporal dimension, to identify the temporal part of a sequence most meaningful to the network for a classification decision. Our findings indicate that the 3D convolutional model concentrates on shorter events in the input sequence, and places its spatial focus on fewer, contiguous areas. 
  •  
44.
  • Nagy, Rajmund, et al. (författare)
  • A framework for integrating gesture generation models into interactive conversational agents
  • 2021
  • Ingår i: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS. - : International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS). ; , s. 1767-1769
  • Konferensbidrag (refereegranskat)abstract
    • Embodied conversational agents (ECAs) benefit from non-verbal behavior for natural and efficient interaction with users. Gesticulation - hand and arm movements accompanying speech - is an essential part of non-verbal behavior. Gesture generation models have been developed for several decades: starting with rule-based and ending with mainly data-driven methods. To date, recent end to- end gesture generation methods have not been evaluated in a real-time interaction with users. We present a proof-of-concept framework, which is intended to facilitate evaluation of modern gesture generation models in interaction. We demonstrate an extensible open-source framework that contains three components: 1) a 3D interactive agent; 2) a chatbot backend; 3) a gesticulating system. Each component can be replaced, making the proposed framework applicable for investigating the effect of different gesturing models in real-time interactions with different communication modalities, chatbot backends, or different agent appearances. The code and video are available at the project page https://nagyrajmund.github.io/project/gesturebot. 
  •  
45.
  •  
46.
  • Rashid, Maheen, et al. (författare)
  • Action Graphs : Weakly-supervised Action Localization with Graph Convolution Networks
  • 2020
  • Ingår i: 2020 ieee winter conference on applications of computer vision (wacv). - : IEEE COMPUTER SOC. ; , s. 604-613
  • Konferensbidrag (refereegranskat)abstract
    • We present a method for weakly-supervised action localization based on graph convolutions. In order to find and classify video time segments that correspond to relevant action classes, a system must be able to both identify discriminative time segments in each video, and identify the full extent of each action. Achieving this with weak video level labels requires the system to use similarity and dissimilarity between moments across videos in the training data to understand both how an action appears, as well as the subactions that comprise the action's full extent. However, current methods do not make explicit use of similarity between video moments to inform the localization and classification predictions. We present a novel method that uses graph convolutions to explicitly model similarity between video moments. Our method utilizes similarity graphs that encode appearance and motion, and pushes the state of the art on THUMOS'14, ActivityNet 1.2, and Charades for weakly-supervised action localization.
  •  
47.
  • Rashid, Maheen, et al. (författare)
  • Equine Pain Behavior Classification via Self-Supervised Disentangled Pose Representation
  • 2022
  • Ingår i: 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 152-162
  • Konferensbidrag (refereegranskat)abstract
    • Timely detection of horse pain is important for equine welfare. Horses express pain through their facial and body behavior, but may hide signs of pain from unfamiliar human observers. In addition, collecting visual data with detailed annotation of horse behavior and pain state is both cumbersome and not scalable. Consequently, a pragmatic equine pain classification system would use video of the un-observed horse and weak labels. This paper proposes such a method for equine pain classification by using multi-view surveillance video footage of unobserved horses with induced orthopaedic pain, with temporally sparse video level pain labels. To ensure that pain is learned from horse body language alone, we first train a self-supervised generative model to disentangle horse pose from its appearance and background before using the disentangled horse pose latent representation for pain classification. To make best use of the pain labels, we develop a novel loss that formulates pain classification as a multi-instance learning problem. Our method achieves pain classification accuracy better than human expert performance with 60% accuracy. The learned latent horse pose representation is shown to be viewpoint covariant, and disentangled from horse appearance. Qualitative analysis of pain classified segments shows correspondence between the pain symptoms identified by our model, and equine pain scales used in veterinary practice.
  •  
48.
  • Ringqvist, Carl, et al. (författare)
  • Interpolation in Auto Encoders with Bridge Processes
  • 2020
  • Ingår i: Proceedings of the 25th International Conference on Pattern Recognition, ICPR 2020. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • Auto encoding models have been extensively studied in recent years. They provide an efficient framework for sample generation, as well as for analysing feature learning. Furthermore, they are efficient in performing interpolations between data-points in semantically meaningful ways. In this paper, we introduce a method for generating sequence samples from auto encoders trained on flattened sequences (e.g video sample from auto encoders trained to generate a video frame); as well as a canonical, dimension independent method for generating stochastic interpolations. The distribution of interpolation paths is represented as the distribution of a bridge process constructed from an artificial random data generating process in the latent space, having the prior distribution as its invariant distribution. 
  •  
49.
  • Sorkhei, Mohammad Moein, 1995-, et al. (författare)
  • Full-Glow : Fully conditional Glow for more realistic image generation
  • 2021
  • Ingår i: Pattern Recognition. - Cham, Switzerland : Springer Nature. ; , s. 697-711
  • Konferensbidrag (refereegranskat)abstract
    • Autonomous agents, such as driverless cars, require large amounts of labeled visual data for their training. A viable approach for acquiring such data is training a generative model with collected real data, and then augmenting the collected real dataset with synthetic images from the model, generated with control of the scene layout and ground truth labeling. In this paper we propose Full-Glow, a fully conditional Glow-based architecture for generating plausible and realistic images of novel street scenes given a semantic segmentation map indicating the scene layout. Benchmark comparisons show our model to outperform recent works in terms of the semantic segmentation performance of a pretrained PSPNet. This indicates that images from our model are, to a higher degree than from other models, similar to real images of the same kinds of scenes and objects, making them suitable as training data for a visual semantic segmentation or object recognition system.
  •  
50.
  • Stefanov, Kalin, et al. (författare)
  • Modeling of Human Visual Attention in Multiparty Open-World Dialogues
  • 2019
  • Ingår i: ACM Transactions on Human-Robot Interaction. - : ASSOC COMPUTING MACHINERY. - 2573-9522. ; 8:2
  • Tidskriftsartikel (refereegranskat)abstract
    • This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 60
Typ av publikation
konferensbidrag (37)
tidskriftsartikel (11)
annan publikation (5)
doktorsavhandling (5)
forskningsöversikt (1)
bokkapitel (1)
visa fler...
visa färre...
Typ av innehåll
refereegranskat (49)
övrigt vetenskapligt/konstnärligt (11)
Författare/redaktör
Kucherenko, Taras, 1 ... (12)
Zhang, Cheng (9)
Beskow, Jonas (7)
Henter, Gustav Eje, ... (7)
Haubro Andersen, Pia (7)
Kragic, Danica, 1971 ... (6)
visa fler...
Leite, Iolanda (4)
Hagman, Göran (3)
Kivipelto, Miia (3)
Nagy, Rajmund (3)
Ask, Katrina (3)
Hernlund, Elin (3)
Azizpour, Hossein, 1 ... (3)
Zhang, C. (2)
Ackermann, Paul (2)
Håkansson, Krister (2)
Akenine, Ulrika (2)
Alexanderson, Simon (2)
Neff, Michael (2)
Folkesson, John, Ass ... (2)
Rhodin, Marie (2)
Björkman, Mårten, 19 ... (2)
Zhang, Y. (1)
Salvi, Giampiero (1)
Yin, Hang (1)
Abelho Pereira, Andr ... (1)
Humphreys, K (1)
Johansson, Karl H., ... (1)
Friberg, Anders (1)
Carlsson, Stefan (1)
Holleman, Jasper (1)
Solomon, Alina (1)
Stefanov, Kalin (1)
Ek, Carl Henrik (1)
Henter, Gustav Eje, ... (1)
Höök, Kristina, 1964 ... (1)
Molinari, Marco (1)
Jensfelt, Patric, 19 ... (1)
Belpaeme, Tony (1)
Eriksson, Sara (1)
Clements, M (1)
Triebel, Rudolph (1)
Bertilson, Bo Christ ... (1)
Jelic, Vesna (1)
Karvonen, Andrew (1)
Maki, Atsuto (1)
Unander-Scharin, Car ... (1)
Sullivan, Josephine, ... (1)
Skoglund, Mikael, 19 ... (1)
Cvetkovic, Vladimir (1)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (60)
Sveriges Lantbruksuniversitet (6)
Luleå tekniska universitet (1)
Karlstads universitet (1)
Språk
Engelska (60)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (50)
Teknik (9)
Lantbruksvetenskap (7)
Medicin och hälsovetenskap (2)
Humaniora (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy