SwePub - sökning: hsv:(NATURVETENSKAP) hsv:(Data...

Numrering	Referens	Omslagsbild	Hitta
1.	Chatterjee, Bapi, 1982 (författare) Lock-free Concurrent Search 2017 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract The contemporary computers typically consist of multiple computing cores with high compute power. Such computers make excellent concurrent asynchronous shared memory system. On the other hand, though many celebrated books on data structure and algorithm provide a comprehensive study of sequential search data structures, unfortunately, we do not have such a luxury if concurrency comes in the setting. The present dissertation aims to address this paucity. We describe novel lock-free algorithms for concurrent data structures that target a variety of search problems. (i) Point search (membership query, predecessor query, nearest neighbour query) for 1-dimensional data: Lock-free linked-list; lock-free internal and external binary search trees (BST). (ii) Range search for 1-dimensional data: A range search method for lock-free ordered set data structures - linked-list, skip-list and BST. (iii) Point search for multi-dimensional data: Lock-free kD-tree, specially, a generic method for nearest neighbour search. We prove that the presented algorithms are linearizable i.e. the concurrent data structure operations intuitively display their sequential behaviour to an observer of the concurrent system. The lock-freedom in the introduced algorithms guarantee overall progress in an asynchronous shared memory system. We present the amortized analysis of lock-free data structures to show their efficiency. Moreover, we provide sample implementations of the algorithms and test them over extensive micro-benchmarks. Our experiments demonstrate that the implementations are scalable and perform well when compared to related existing alternative implementations on common multi-core computers. Our focus is on propounding the generic methodologies for efficient lock-free concurrent search. In this direction, we present the notion of help-optimality, which captures the optimization of amortized step complexity of the operations. In addition to that, we explore the language-portable design of lock-free data structures that aims to simplify an implementation from programmer’s point of view. Finally, our techniques to implement lock-free linearizable range search and nearest neighbour search are independent of the underlying data structures and thus are adaptive to similar data structures.
2.	Norlund, Tobias, 1991, et al. (författare) Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it? 2021 Ingår i: Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 149-162, Punta Cana, Dominican Republic. - : Association for Computational Linguistics. Konferensbidrag (refereegranskat)abstract Large language models are known to suffer from the hallucination problem in that they are prone to output statements that are false or inconsistent, indicating a lack of knowledge. A proposed solution to this is to provide the model with additional data modalities that complements the knowledge obtained through text. We investigate the use of visual data to complement the knowledge of large language models by proposing a method for evaluating visual knowledge transfer to text for uni- or multimodal language models. The method is based on two steps, 1) a novel task querying for knowledge of memory colors, i.e. typical colors of well-known objects, and 2) filtering of model training data to clearly separate knowledge contributions. Additionally, we introduce a model architecture that involves a visual imagination step and evaluate it with our proposed method. We find that our method can successfully be used to measure visual knowledge transfer capabilities in models and that our novel model architecture shows promising results for leveraging multimodal knowledge in a unimodal setting.
3.	Yun, Yixiao, 1987, et al. (författare) Maximum-Likelihood Object Tracking from Multi-View Video by Combining Homography and Epipolar Constraints 2012 Ingår i: 6th ACM/IEEE Int'l Conf on Distributed Smart Cameras (ICDSC 12), Oct 30 - Nov.2, 2012, Hong Kong. - 9781450317726 ; , s. 6 pages- Konferensbidrag (refereegranskat)abstract This paper addresses problem of object tracking in occlusion scenarios, where multiple uncalibrated cameras with overlapping fields of view are used. We propose a novel method where tracking is first done independently for each view and then tracking results are mapped between each pair of views to improve the tracking in individual views, under the assumptions that objects are not occluded in all views and move uprightly on a planar ground which may induce a homography relation between each pair of views. The tracking results are mapped by jointly exploiting the geometric constraints of homography, epipolar and vertical vanishing point. Main contributions of this paper include: (a) formulate a reference model of multi-view object appearance using region covariance for each view; (b) define a likelihood measure based on geodesics on a Riemannian manifold that is consistent with the destination view by mapping both the estimated positions and appearances of tracked object from other views; (c) locate object in each individual view based on maximum likelihood criterion from multi-view estimations of object position. Experiments have been conducted on videos from multiple uncalibrated cameras, where targets experience long-term partial or full occlusions. Comparison with two existing methods and performance evaluations are also made. Test results have shown effectiveness of the proposed method in terms of robustness against tracking drifts caused by occlusions.
4.	Liu, Yuanhua, 1971, et al. (författare) Considering the importance of user profiles in interface design 2009 Ingår i: User Interfaces. ; , s. 23- Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract User profile is a popular term widely employed during product design processes by industrial companies. Such a profile is normally intended to represent real users of a product. The ultimate purpose of a user profile is actually to help designers to recognize or learn about the real user by presenting them with a description of a real user’s attributes, for instance; the user’s gender, age, educational level, attitude, technical needs and skill level. The aim of this chapter is to provide information on the current knowledge and research about user profile issues, as well as to emphasize the importance of considering these issues in interface design. In this chapter, we mainly focus on how users’ difference in expertise affects their performance or activity in various interaction contexts. Considering the complex interaction situations in practice, novice and expert users’ interactions with medical user interfaces of different technical complexity will be analyzed as examples: one focuses on novice and expert users’ difference when interacting with simple medical interfaces, and the other focuses on differences when interacting with complex medical interfaces. Four issues will be analyzed and discussed: (1) how novice and expert users differ in terms of performance during the interaction; (2) how novice and expert users differ in the perspective of cognitive mental models during the interaction; (3) how novice and expert users should be defined in practice; and (4) what are the main differences between novice and expert users’ implications for interface design. Besides describing the effect of users’ expertise difference during the interface design process, we will also pinpoint some potential problems for the research on interface design, as well as some future challenges that academic researchers and industrial engineers should face in practice.
5.	Rumman, Nadine Abu, et al. (författare) Skin deformation methods for interactive character animation 2017 Ingår i: Communications in Computer and Information Science. - Cham : Springer International Publishing. - 1865-0937 .- 1865-0929. ; 693, s. 153-174, s. 153-174 Konferensbidrag (refereegranskat)abstract Character animation is a vital component of contemporary computer games, animated feature films and virtual reality applications. The problem of creating appealing character animation can best be described by the title of the animation bible: “The Illusion of Life”. The focus is not on completing a given motion task, but more importantly on how this motion task is performed by the character. This does not necessarily require realistic behavior, but behavior that is believable. This of course includes the skin deformations when the character is moving. In this paper, we focus on the existing research in the area of skin deformation, ranging from skeleton-based deformation and volume preserving techniques to physically based skinning methods. We also summarize the recent contributions in deformable and soft body simulations for articulated characters, and discuss various geometric and example-based approaches. © Springer International Publishing AG 2017.
6.	Somanath, Sanjay, 1994, et al. (författare) Towards Urban Digital Twins: A Workflow for Procedural Visualization Using Geospatial Data 2024 Ingår i: Remote Sensing. - 2072-4292. ; 16:11 Tidskriftsartikel (refereegranskat)abstract A key feature for urban digital twins (DTs) is an automatically generated detailed 3D representation of the built and unbuilt environment from aerial imagery, footprints, LiDAR, or a fusion of these. Such 3D models have applications in architecture, civil engineering, urban planning, construction, real estate, Geographical Information Systems (GIS), and many other areas. While the visualization of large-scale data in conjunction with the generated 3D models is often a recurring and resource-intensive task, an automated workflow is complex, requiring many steps to achieve a high-quality visualization. Methods for building reconstruction approaches have come a long way, from previously manual approaches to semi-automatic or automatic approaches. This paper aims to complement existing methods of 3D building generation. First, we present a literature review covering different options for procedural context generation and visualization methods, focusing on workflows and data pipelines. Next, we present a semi-automated workflow that extends the building reconstruction pipeline to include procedural context generation using Python and Unreal Engine. Finally, we propose a workflow for integrating various types of large-scale urban analysis data for visualization. We conclude with a series of challenges faced in achieving such pipelines and the limitations of the current approach. However, the steps for a complete, end-to-end solution involve further developing robust systems for building detection, rooftop recognition, and geometry generation and importing and visualizing data in the same 3D environment, highlighting a need for further research and development in this field.
7.	Fu, Keren, et al. (författare) Deepside: A general deep framework for salient object detection 2019 Ingår i: Neurocomputing. - : Elsevier BV. - 0925-2312 .- 1872-8286. ; 356, s. 69-82 Tidskriftsartikel (refereegranskat)abstract Deep learning-based salient object detection techniques have shown impressive results compared to con- ventional saliency detection by handcrafted features. Integrating hierarchical features of Convolutional Neural Networks (CNN) to achieve fine-grained saliency detection is a current trend, and various deep architectures are proposed by researchers, including “skip-layer” architecture, “top-down” architecture, “short-connection” architecture and so on. While these architectures have achieved progressive improve- ment on detection accuracy, it is still unclear about the underlying distinctions and connections between these schemes. In this paper, we review and draw underlying connections between these architectures, and show that they actually could be unified into a general framework, which simply just has side struc- tures with different depths. Based on the idea of designing deeper side structures for better detection accuracy, we propose a unified framework called Deepside that can be deeply supervised to incorporate hierarchical CNN features. Additionally, to fuse multiple side outputs from the network, we propose a novel fusion technique based on segmentation-based pooling, which severs as a built-in component in the CNN architecture and guarantees more accurate boundary details of detected salient objects. The effectiveness of the proposed Deepside scheme against state-of-the-art models is validated on 8 benchmark datasets.
8.	Isaksson, Martin, et al. (författare) Adaptive Expert Models for Federated Learning 2023 Ingår i: <em>Lecture Notes in Computer Science </em>Volume 13448 Pages 1 - 16 2023. - Cham : Springer Science and Business Media Deutschland GmbH. - 9783031289958 ; 13448 LNAI, s. 1-16 Konferensbidrag (refereegranskat)abstract Federated Learning (FL) is a promising framework for distributed learning when data is private and sensitive. However, the state-of-the-art solutions in this framework are not optimal when data is heterogeneous and non-IID. We propose a practical and robust approach to personalization in FL that adjusts to heterogeneous and non-IID data by balancing exploration and exploitation of several global models. To achieve our aim of personalization, we use a Mixture of Experts (MoE) that learns to group clients that are similar to each other, while using the global models more efficiently. We show that our approach achieves an accuracy up to 29.78% better than the state-of-the-art and up to 4.38% better compared to a local model in a pathological non-IID setting, even though we tune our approach in the IID setting. © 2023, The Author(s)
9.	Lindén, Joakim, et al. (författare) Evaluating the Robustness of ML Models to Out-of-Distribution Data Through Similarity Analysis 2023 Ingår i: Commun. Comput. Info. Sci.. - : Springer Science and Business Media Deutschland GmbH. - 9783031429408 ; , s. 348-359, s. 348-359 Konferensbidrag (refereegranskat)abstract In Machine Learning systems, several factors impact the performance of a trained model. The most important ones include model architecture, the amount of training time, the dataset size and diversity. We present a method for analyzing datasets from a use-case scenario perspective, detecting and quantifying out-of-distribution (OOD) data on dataset level. Our main contribution is the novel use of similarity metrics for the evaluation of the robustness of a model by introducing relative Fréchet Inception Distance (FID) and relative Kernel Inception Distance (KID) measures. These relative measures are relative to a baseline in-distribution dataset and are used to estimate how the model will perform on OOD data (i.e. estimate the model accuracy drop). We find a correlation between our proposed relative FID/relative KID measure and the drop in Average Precision (AP) accuracy on unseen data.
10.	Frid, Emma, et al. (författare) Perception of Mechanical Sounds Inherent to Expressive Gestures of a NAO Robot - Implications for Movement Sonification of Humanoids 2018 Ingår i: Proceedings of the 15th Sound and Music Computing Conference. - Limassol, Cyprus. - 9789963697304 Konferensbidrag (refereegranskat)abstract In this paper we present a pilot study carried out within the project SONAO. The SONAO project aims to compen- sate for limitations in robot communicative channels with an increased clarity of Non-Verbal Communication (NVC) through expressive gestures and non-verbal sounds. More specifically, the purpose of the project is to use move- ment sonification of expressive robot gestures to improve Human-Robot Interaction (HRI). The pilot study described in this paper focuses on mechanical robot sounds, i.e. sounds that have not been specifically designed for HRI but are inherent to robot movement. Results indicated a low correspondence between perceptual ratings of mechanical robot sounds and emotions communicated through ges- tures. In general, the mechanical sounds themselves ap- peared not to carry much emotional information compared to video stimuli of expressive gestures. However, some mechanical sounds did communicate certain emotions, e.g. frustration. In general, the sounds appeared to commu- nicate arousal more effectively than valence. We discuss potential issues and possibilities for the sonification of ex- pressive robot gestures and the role of mechanical sounds in such a context. Emphasis is put on the need to mask or alter sounds inherent to robot movement, using for exam- ple blended sonification.
11.	Frid, Emma, 1988-, et al. (författare) Perceptual Evaluation of Blended Sonification of Mechanical Robot Sounds Produced by Emotionally Expressive Gestures : Augmenting Consequential Sounds to Improve Non-verbal Robot Communication 2021 Ingår i: International Journal of Social Robotics. - : Springer Nature. - 1875-4791 .- 1875-4805. Tidskriftsartikel (refereegranskat)abstract This paper presents two experiments focusing on perception of mechanical sounds produced by expressive robot movement and blended sonifications thereof. In the first experiment, 31 participants evaluated emotions conveyed by robot sounds through free-form text descriptions. The sounds were inherently produced by the movements of a NAO robot and were not specifically designed for communicative purposes. Results suggested no strong coupling between the emotional expression of gestures and how sounds inherent to these movements were perceived by listeners; joyful gestures did not necessarily result in joyful sounds. A word that reoccurred in text descriptions of all sounds, regardless of the nature of the expressive gesture, was “stress”. In the second experiment, blended sonification was used to enhance and further clarify the emotional expression of the robot sounds evaluated in the first experiment. Analysis of quantitative ratings of 30 participants revealed that the blended sonification successfully contributed to enhancement of the emotional message for sound models designed to convey frustration and joy. Our findings suggest that blended sonification guided by perceptual research on emotion in speech and music can successfully improve communication of emotions through robot sounds in auditory-only conditions.
12.	Latupeirissa, Adrian Benigno, et al. (författare) Exploring emotion perception in sonic HRI 2020 Ingår i: 17th Sound and Music Computing Conference. - Torino : Zenodo. ; , s. 434-441 Konferensbidrag (refereegranskat)abstract Despite the fact that sounds produced by robots can affect the interaction with humans, sound design is often an overlooked aspect in Human-Robot Interaction (HRI). This paper explores how different sets of sounds designed for expressive robot gestures of a humanoid Pepper robot can influence the perception of emotional intentions. In the pilot study presented in this paper, it has been asked to rate different stimuli in terms of perceived affective states. The stimuli were audio, audio-video and video only and contained either Pepper’s original servomotors noises, sawtooth, or more complex designed sounds. The preliminary results show a preference for the use of more complex sounds, thus confirming the necessity of further exploration in sonic HRI.
13.	Dobnik, Simon, 1977 (författare) Coordinating spatial perspective in discourse 2012 Ingår i: Proceedings of the Workshop on Vision and Language 2012 (VL'12): The 2nd Annual Meeting of the EPSRC Network on Vision and Language. Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract We present results of an on-line data collection experiment where we investigate the assignment and coordination of spatial perspective between a pair of dialogue participants situated in a constrained virtual environment.
14.	Suchan, Jakob, et al. (författare) Commonsense Visual Sensemaking for Autonomous Driving : On Generalised Neurosymbolic Online Abduction Integrating Vision and Semantics 2021 Ingår i: Artificial Intelligence. - : Elsevier. - 0004-3702 .- 1872-7921. ; 299 Tidskriftsartikel (refereegranskat)abstract We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking in the backdrop of autonomous driving. A general neurosymbolic method for online visual sensemaking using answer set programming (ASP) is systematically formalised and fully implemented. The method integrates state of the art in visual computing, and is developed as a modular framework that is generally usable within hybrid architectures for realtime perception and control. We evaluate and demonstrate with community established benchmarks KITTIMOD, MOT-2017, and MOT-2020. As use-case, we focus on the significance of human-centred visual sensemaking â€”e.g., involving semantic representation and explainability, question-answering, commonsense interpolationâ€” in safety-critical autonomous driving situations. The developed neurosymbolic framework is domain-independent, with the case of autonomous driving designed to serve as an exemplar for online visual sensemaking in diverse cognitive interaction settings in the backdrop of select human-centred AI technology design considerations.
15.	Barreiro, Anabela, et al. (författare) Multi3Generation : Multitask, Multilingual, Multimodal Language Generation 2022 Ingår i: Proceedings of the 23rd Annual Conference of the European Association for Machine Translation. - : European Association for Machine Translation. ; , s. 345-346 Konferensbidrag (refereegranskat)abstract This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generatio(CA18231), an interdisciplinary networof research groups working on different aspects of language generation. This "meta-paper" will serve as reference for citationof the Action in future publications. It presents the objectives, challenges and a the links for the achieved outcomes.
16.	Blanch, Krister, 1991 (författare) Beyond-application datasets and automated fair benchmarking 2023 Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract Beyond-application perception datasets are generalised datasets that emphasise the fundamental components of good machine perception data. When analysing the history of perception datatsets, notable trends suggest that design of the dataset typically aligns with an application goal. Instead of focusing on a specific application, beyond-application datasets instead look at capturing high-quality, high-volume data from a highly kinematic environment, for the purpose of aiding algorithm development and testing in general. Algorithm benchmarking is a cornerstone of autonomous systems development, and allows developers to demonstrate their results in a comparative manner. However, most benchmarking systems allow developers to use their own hardware or select favourable data. There is also little focus on run time performance and consistency, with benchmarking systems instead showcasing algorithm accuracy. By combining both beyond-application dataset generation and methods for fair benchmarking, there is also the dilemma of how to provide the dataset to developers for this benchmarking, as the result of a high-volume, high-quality dataset generation is a significant increase in dataset size when compared to traditional perception datasets. This thesis presents the first results of attempting the creation of such a dataset. The dataset was built using a maritime platform, selected due to the highly dynamic environment presented on water. The design and initial testing of this platform is detailed, as well as as methods of sensor validation. Continuing, the thesis then presents a method of fair benchmarking, by utilising remote containerisation in a way that allows developers to present their software to the dataset, instead of having to first locally store a copy. To test this dataset and automatic online benchmarking, a number of reference algorithms were required for initial results. Three algorithms were built, using the data from three different sensors captured on the maritime platform. Each algorithm calculates vessel odometry, and the automatic benchmarking system was utilised to show the accuracy and run-time performance of these algorithms. It was found that the containerised approach alleviated data management concerns, prevented inflated accuracy results, and demonstrated precisely how computationally intensive each algorithm was.
17.	Lv, Zhihan, Dr. 1984-, et al. (författare) Deep Learning for Security in Digital Twins of Cooperative Intelligent Transportation Systems 2022 Ingår i: IEEE transactions on intelligent transportation systems (Print). - : Institute of Electrical and Electronics Engineers (IEEE). - 1524-9050 .- 1558-0016. ; 23:9, s. 16666-16675 Tidskriftsartikel (refereegranskat)abstract The purpose is to solve the security problems of the Cooperative Intelligent Transportation System (CITS) Digital Twins (DTs) in the Deep Learning (DL) environment. The DL algorithm is improved; the Convolutional Neural Network (CNN) is combined with Support Vector Regression (SVR); the DTs technology is introduced. Eventually, a CITS DTs model is constructed based on CNN-SVR, whose security performance and effect are analyzed through simulation experiments. Compared with other algorithms, the security prediction accuracy of the proposed algorithm reaches 90.43%. Besides, the proposed algorithm outperforms other algorithms regarding Precision, Recall, and F1. The data transmission performances of the proposed algorithm and other algorithms are compared. The proposed algorithm can ensure that emergency messages can be responded to in time, with a delay of less than 1.8s. Meanwhile, it can better adapt to the road environment, maintain high data transmission speed, and provide reasonable path planning for vehicles so that vehicles can reach their destinations faster. The impacts of different factors on the transportation network are analyzed further. Results suggest that under path guidance, as the Market Penetration Rate (MPR), Following Rate (FR), and Congestion Level (CL) increase, the guidance strategy's effects become more apparent. When MPR ranges between 40% similar to 80% and the congestion is level III, the ATT decreases the fastest, and the improvement effect of the guidance strategy is more apparent. The proposed DL algorithm model can lower the data transmission delay of the system, increase the prediction accuracy, and reasonably changes the paths to suppress the sprawl of traffic congestions, providing an experimental reference for developing and improving urban transportation.
18.	Nguyen, Björnborg, 1992, et al. (författare) Systematic benchmarking for reproducibility of computer vision algorithms for real-time systems: The example of optic flow estimation 2019 Ingår i: IEEE International Conference on Intelligent Robots and Systems. - : IEEE. - 2153-0858 .- 2153-0866. ; , s. 5264-5269 Konferensbidrag (refereegranskat)abstract Until now there have been few formalized methods for conducting systematic benchmarking aiming at reproducible results when it comes to computer vision algorithms. This is evident from lists of algorithms submitted to prominent datasets, authors of a novel method in many cases primarily state the performance of their algorithms in relation to a shallow description of the hardware system where it was evaluated. There are significant problems linked to this non-systematic approach of reporting performance, especially when comparing different approaches and when it comes to the reproducibility of claimed results. Furthermore how to conduct retrospective performance analysis such as an algorithm's suitability for embedded real-time systems over time with underlying hardware and software changes in place. This paper proposes and demonstrates a systematic way of addressing such challenges by adopting containerization of software aiming at formalization and reproducibility of benchmarks. Our results show maintainers of broadly accepted datasets in the computer vision community to strive for systematic comparison and reproducibility of submissions to increase the value and adoption of computer vision algorithms in the future.
19.	Gu, Irene Yu-Hua, 1953, et al. (författare) Grassmann Manifold Online Learning and Partial Occlusion Handling for Visual Object Tracking under Bayesian Formulation 2012 Ingår i: Proceedings - International Conference on Pattern Recognition. - 1051-4651. - 9784990644109 ; , s. 1463-1466 Konferensbidrag (refereegranskat)abstract This paper addresses issues of online learning and occlusion handling in video object tracking. Although manifold tracking is promising, large pose changes and long term partial occlusions of video objects remain challenging.We propose a novel manifold tracking scheme that tackles such problems, with the following main novelties: (a) Online estimation of object appearances on Grassmann manifolds; (b) Optimal criterion-based occlusion handling during online learning; (c) Nonlinear dynamic model for appearance basis matrix and its velocity; (b) Bayesian formulations separately for the tracking and the online learning process. Two particle filters are employed: one is on the manifold for generating appearance particles and another on the linear space for generating affine box particles. Tracking and online updating are performed in alternative fashion to mitigate the tracking drift. Experiments on videos have shown robust tracking performance especially when objects contain significantpose changes accompanied with long-term partial occlusions. Evaluations and comparisons with two existing methods provide further support to the proposed method.
20.	Ali, Muhaddisa Barat, 1986 (författare) Deep Learning Methods for Classification of Gliomas and Their Molecular Subtypes, From Central Learning to Federated Learning 2023 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract The most common type of brain cancer in adults are gliomas. Under the updated 2016 World Health Organization (WHO) tumor classification in central nervous system (CNS), identification of molecular subtypes of gliomas is important. For low grade gliomas (LGGs), prediction of molecular subtypes by observing magnetic resonance imaging (MRI) scans might be difficult without taking biopsy. With the development of machine learning (ML) methods such as deep learning (DL), molecular based classification methods have shown promising results from MRI scans that may assist clinicians for prognosis and deciding on a treatment strategy. However, DL requires large amount of training datasets with tumor class labels and tumor boundary annotations. Manual annotation of tumor boundary is a time consuming and expensive process. The thesis is based on the work developed in five papers on gliomas and their molecular subtypes. We propose novel methods that provide improved performance. The proposed methods consist of a multi-stream convolutional autoencoder (CAE)-based classifier, a deep convolutional generative adversarial network (DCGAN) to enlarge the training dataset, a CycleGAN to handle domain shift, a novel federated learning (FL) scheme to allow local client-based training with dataset protection, and employing bounding boxes to MRIs when tumor boundary annotations are not available. Experimental results showed that DCGAN generated MRIs have enlarged the original training dataset size and have improved the classification performance on test sets. CycleGAN showed good domain adaptation on multiple source datasets and improved the classification performance. The proposed FL scheme showed a slightly degraded performance as compare to that of central learning (CL) approach while protecting dataset privacy. Using tumor bounding boxes showed to be an alternative approach to tumor boundary annotation for tumor classification and segmentation, with a trade-off between a slight decrease in performance and saving time in manual marking by clinicians. The proposed methods may benefit the future research in bringing DL tools into clinical practice for assisting tumor diagnosis and help the decision making process.
21.	Amundin, Mats, et al. (författare) A proposal to use distributional models to analyse dolphin vocalisation 2017 Ingår i: Proceedings of the 1st International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots, VIHAR 2017. - 9782956202905 ; , s. 31-32 Konferensbidrag (refereegranskat)abstract This paper gives a brief introduction to the starting points of an experimental project to study dolphin communicative behaviour using distributional semantics, with methods implemented for the large scale study of human language.
22.	Lindgren, Helena, Professor, et al. (författare) The wasp-ed AI curriculum : A holistic curriculum for artificial intelligence 2023 Ingår i: INTED2023 Proceedings. - : IATED. - 9788409490264 ; , s. 6496-6502 Konferensbidrag (refereegranskat)abstract Efforts in lifelong learning and competence development in Artificial Intelligence (AI) have been on the rise for several years. These initiatives have mostly been applied to Science, Technology, Engineering and Mathematics (STEM) disciplines. Even though there has been significant development in Digital Humanities to incorporate AI methods and tools in higher education, the potential for such competences in Arts, Humanities and Social Sciences is far from being realised. Furthermore, there is an increasing awareness that the STEM disciplines need to include competences relating to AI in humanity and society. This is especially important considering the widening and deepening of the impact of AI on society at large and individuals. The aim of the presented work is to provide a broad and inclusive AI Curriculum that covers the breadth of the topic as it is seen today, which is significantly different from only a decade ago. It is important to note that with the curriculum we mean an overview of the subject itself, rather than a particular education program. The curriculum is intended to be used as a foundation for educational activities in AI to for example harmonize terminology, compare different programs, and identify educational gaps to be filled. An important aspect of the curriculum is the ethical, legal, and societal aspects of AI and to not limit the curriculum to the STEM subjects, instead extending to a holistic, human-centred AI perspective. The curriculum is developed as part of the national research program WASP-ED, the Wallenberg AI and transformative technologies education development program.
23.	Lv, Zhihan, Dr. 1984-, et al. (författare) 5G for mobile augmented reality 2022 Ingår i: International Journal of Communication Systems. - : John Wiley & Sons. - 1074-5351 .- 1099-1131. ; 35:5 Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
24.	Lv, Zhihan, Dr. 1984-, et al. (författare) Editorial : 5G for Augmented Reality 2022 Ingår i: Mobile Networks and Applications. - : Springer. - 1383-469X .- 1572-8153. Tidskriftsartikel (refereegranskat)
25.	Singh, Avinash, 1986-, et al. (författare) Verbal explanations by collaborating robot teams 2021 Ingår i: Paladyn - Journal of Behavioral Robotics. - : De Gruyter Open. - 2080-9778 .- 2081-4836. ; 12:1, s. 47-57 Tidskriftsartikel (refereegranskat)abstract In this article, we present work on collaborating robot teams that use verbal explanations of their actions and intentions in order to be more understandable to the human. For this, we introduce a mechanism that determines what information the robots should verbalize in accordance with Grice’s maxim of quantity, i.e., convey as much information as is required and no more or less. Our setup is a robot team collaborating to achieve a common goal while explaining in natural language what they are currently doing and what they intend to do. The proposed approach is implemented on three Pepper robots moving objects on a table. It is evaluated by human subjects answering a range of questions about the robots’ explanations, which are generated using either our proposed approach or two further approaches implemented for evaluation purposes. Overall, we find that our proposed approach leads to the most understanding of what the robots are doing. In addition, we further propose a method for incorporating policies driving the distribution of tasks among the robots, which may further support understandability.
26.	Ge, Chenjie, 1991, et al. (författare) Co-Saliency-Enhanced Deep Recurrent Convolutional Networks for Human Fall Detection in E-Healthcare 2018 Ingår i: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. - 1557-170X. ; , s. 1572-1575 Konferensbidrag (refereegranskat)abstract This paper addresses the issue of fall detection from videos for e-healthcare and assisted-living. Instead of using conventional hand-crafted features from videos, we propose a fall detection scheme based on co-saliency-enhanced recurrent convolutional network (RCN) architecture for fall detection from videos. In the proposed scheme, a deep learning method RCN is realized by a set of Convolutional Neural Networks (CNNs) in segment-levels followed by a Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), to handle the time-dependent video frames. The co-saliency-based method enhances salient human activity regions hence further improves the deep learning performance. The main contributions of the paper include: (a) propose a recurrent convolutional network (RCN) architecture that is dedicated to the tasks of human fall detection in videos; (b) integrate a co-saliency enhancement to the deep learning scheme for further improving the deep learning performance; (c) extensive empirical tests for performance analysis and evaluation under different network settings and data partitioning. Experiments using the proposed scheme were conducted on an open dataset containing multicamera videos from different view angles, results have shown very good performance (test accuracy 98.96%). Comparisons with two existing methods have provided further support to the proposed scheme.
27.	Höglund, Lars, 1946, et al. (författare) Maskininlärningsbaserad indexering av digitaliserade museiartefakter - projektrapport 2012 Rapport (övrigt vetenskapligt/konstnärligt)abstract Projektet har genomfört försök med maskinbaserad analys och maskininlärning för automatisk indexering och analys av bilder som stöd för registrering av föremål i museibestånd. Resultaten visar att detta är möjligt för avgränsade delmängder i kombination med maskininlärning som stöd för, men inte som ersättning för, manuell analys. Projektet har också funnit behov av utveckling av ett användargränssnitt för både text och bildsökning och utvecklat en prototyplösning för detta, vilket finns dokumenterat i denna rapport och i ett separat appendix till rapporten. Materialet utgör grundunderlag för implementeringar som innebär utökade sökmöjligheter, effektivare registrering samt ett användarvänligt gränssnitt. Arbetet ligger i framkant av forskningsområdets resultat och etablerade metoder och kombinerar statististiska, lingvistiska och datavetenskapliga metoder. Se länk till rapport och även länk till appendix längre ned.
28.	Menghi, Claudio, 1987, et al. (författare) Poster: Property specification patterns for robotic missions 2018 Ingår i: Proceedings - International Conference on Software Engineering. - New York, NY, USA : ACM. - 0270-5257. ; Part F137351, s. 434-435 Konferensbidrag (refereegranskat)abstract Engineering dependable software for mobile robots is becoming increasingly important. A core asset in engineering mobile robots is the mission specification-A formal description of the goals that mobile robots shall achieve. Such mission specifications are used, among others, to synthesize, verify, simulate, or guide the engineering of robot software. Development of precise mission specifications is challenging. Engineers need to translate the mission requirements into specification structures expressed in a logical language-A laborious and error-prone task. To mitigate this problem, we present a catalog of mission specification patterns for mobile robots. Our focus is on robot movement, one of the most prominent and recurrent specification problems for mobile robots. Our catalog maps common mission specification problems to recurrent solutions, which we provide as templates that can be used by engineers. The patterns are the result of analyzing missions extracted from the literature. For each pattern, we describe usage intent, known uses, relationships to other patterns, and-most importantly-A template representing the solution as a logical formula in temporal logic. Our specification patterns constitute reusable building blocks that can be used by engineers to create complex mission specifications while reducing specification mistakes. We believe that our patterns support researchers working on tool support and techniques to synthesize and verify mission specifications, and language designers creating rich domain-specific languages for mobile robots, incorporating our patterns as language concepts.
29.	Man, Yemao, 1987 (författare) Human-Machine Interface Considerations for Design and Testing in Distributed Sociotechnical Systems 2015 Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract The increasing concerns for safety and environmental sustainability create demands on the development of future maritime transportation strategies. One way to meet these demands is the concept of autonomous unmanned vessels for intercontinental voyages. As automation is being introduced onboard and watch keeping operations being migrated to the shore, there is a risk introducing new human factor issues among the various stakeholder groups and add to the complexity of the actors’ roles. This licentiate was based on the context of an EU research project MUNIN (Maritime Unmanned Ship through Intelligence in Networks) about remote monitoring and controlling autonomous unmanned ships where the bridge and engine control room were moved from the ship to a land based control station.Human Machine Interface, as a mediating artefact in the complex system to bridge automation/engine control is of importance for situation awareness, reliability, efficiency, effectiveness, resilience and safety. The purpose of the thesis is to achieve a comprehensive understanding of the complexity of Human Machine Interface in a distributed complex system by exploring the experiences of the human agents during the designing and testing phases of a designed for purpose Human Machine Interface. The results reveal prominent human factor issues related to situation awareness and automation bias within such a complex distributed sociotechnical system, which sheds light on the design considerations of Human Machine Interface. Loss of presence can lead to critical perceptual bottlenecks which could negatively impact upon the operators; the organizational factors also greatly shape individual and team performance. It indicates that the contextual factors in the distributed sociotechnical system must be accommodated by the interface design through a holistic systemic approach. The Human Machine Interface shall not only support data visualization, but also the process and context in which data are utilized and understood for consensus decision-making.
30.	Bagheri, Elahe, et al. (författare) A Novel Model for Emotion Detection from Facial Muscles Activity 2020 Ingår i: Advances in Intelligent Systems and Computing. - Cham : Springer International Publishing. - 2194-5365 .- 2194-5357. ; 1093, s. 237-249 Konferensbidrag (refereegranskat)abstract Considering human’s emotion in different applications and systems has received substantial attention over the last three decades. The traditional approach for emotion detection is to first extract different features and then apply a classifier, like SVM, to find the true class. However, recently proposed Deep Learning based models outperform traditional machine learning approaches without requirement of a separate feature extraction phase. This paper proposes a novel deep learning based facial emotion detection model, which uses facial muscles activities as raw input to recognize the type of the expressed emotion in the real time. To this end, we first use OpenFace to extract the activation values of the facial muscles, which are then presented to a Stacked Auto Encoder (SAE) as feature set. Afterward, the SAE returns the best combination of muscles in describing a particular emotion, these extracted features at the end are applied to a Softmax layer in order to fulfill multi classification task. The proposed model has been applied to the CK+, MMI and RADVESS datasets and achieved respectively average accuracies of 95.63%, 95.58%, and 84.91% for emotion type detection in six classes, which outperforms state-of-the-art algorithms.
31.	Balouji, Ebrahim, 1985, et al. (författare) A LSTM-based Deep Learning Method with Application to Voltage Dip Classification 2018 Ingår i: 2018 18TH INTERNATIONAL CONFERENCE ON HARMONICS AND QUALITY OF POWER (ICHQP). - Piscataway, NJ : Institute of Electrical and Electronics Engineers (IEEE). - 2164-0610. - 9781538605172 - 9781538605172 Konferensbidrag (refereegranskat)abstract In this paper, a deep learning (DL)-based method for automatic feature extraction and classification of voltage dips is proposed. The method consists of a dedicated architecture of Long Short-Term Memory (LSTM), which is a special type of Recurrent Neural Networks (RNNs). A total of 5982 three-phase one-cycle voltage dip RMS sequences, measured from several countries, has been used in our experiments. Our results have shown that the proposedmethod is able to classify the voltage dips from learned features in LSTM, with 93.40% classification accuracy on the test data set. The developed architecture is shown to be novel for feature learning and classification of voltage dips. Different from the conventional machine learning methods, the proposed method is able to learn dip features without requiring transition-event segmentation, selecting thresholds, and using expert rules or human expert knowledge, when a large amount of measurement data is available. This opens a new possibility of exploiting deep learning technology for power quality data analytics and classification.
32.	Bååth, Rasmus, et al. (författare) A prototype based resonance model of rhythm categorization 2014 Ingår i: i-Perception. - : SAGE Publications. - 2041-6695. ; 5:6, s. 548-558 Tidskriftsartikel (refereegranskat)abstract Categorization of rhythmic patterns is prevalent in musical practice, an example of this being the transcription of (possibly not strictly metrical) music into musical notation. In this article we implement a dynamical systems’ model of rhythm categorization based on the resonance theory of rhythm perception developed by Large (2010). This model is used to simulate the categorical choices of participants in two experiments of Desain and Honing (2003). The model accurately replicates the experimental data. Our results support resonance theory as a viable model of rhythm perception and show that by viewing rhythm perception as a dynamical system it is possible to model central properties of rhythm categorization.
33.	Daoud, Adel, 1981, et al. (författare) Using Satellite Images and Deep Learning to Measure Health and Living Standards in India 2023 Ingår i: Social Indicators Research. - : SPRINGER. - 0303-8300 .- 1573-0921. ; 167:1-3, s. 475-505 Tidskriftsartikel (refereegranskat)abstract Using deep learning with satellite images enhances our understanding of human development at a granular spatial and temporal level. Most studies have focused on Africa and on a narrow set of asset-based indicators. This article leverages georeferenced village-level census data from across 40% of the population of India to train deep models that predicts 16 indicators of human well-being from Landsat 7 imagery. Based on the principles of transfer learning, the census-based model is used as a feature extractor to train another model that predicts an even larger set of developmental variables—over 90 variables—included in two rounds of the National Family Health Survey (NFHS). The census-based-feature-extractor model outperforms the current standard in the literature for most of these NFHS variables. Overall, the results show that combining satellite data with Indian Census data unlocks rich information for training deep models that track human development at an unprecedented geographical and temporal resolution.
34.	Dombrowski, Ann Kathrin, et al. (författare) Diffeomorphic Counterfactuals with Generative Models 2024 Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence. - 1939-3539 .- 0162-8828. ; 46:5, s. 3257-3274 Tidskriftsartikel (refereegranskat)abstract Counterfactuals can explain classification decisions of neural networks in a human interpretable way. We propose a simple but effective method to generate such counterfactuals. More specifically, we perform a suitable diffeomorphic coordinate transformation and then perform gradient ascent in these coordinates to find counterfactuals which are classified with great confidence as a specified target class. We propose two methods to leverage generative models to construct such suitable coordinate systems that are either exactly or approximately diffeomorphic. We analyze the generation process theoretically using Riemannian differential geometry and validate the quality of the generated counterfactuals using various qualitative and quantitative measures.
35.	Eriksson, Patric, et al. (författare) A role for 'sensor simulation' and 'pre-emptive learning' in computer aided robotics 1995 Ingår i: 26th International Symposium on Industrial Robots, Symposium Proceedings. - : Mechanical Engineering Publ.. - 1860580009 ; , s. 135-140 Konferensbidrag (refereegranskat)abstract Sensor simulation in Computer Aided Robotics (CAR) can enhance the capabilities of such systems to enable off-line generation of programmes for sensor driven robots. However, such sensor simulation is not commonly supported in current computer aided robotic environments. A generic sensor object model for the simulation of sensors in graphical environments is described in this paper. Such a model can be used to simulate a variety of sensors, for example photoelectric, proximity and ultrasonic sensors. Tests results presented here show that this generic sensor model can be customised to emulate the characteristics of the real sensors. The preliminary findings from the first off-line trained mobile robot are presented. The results indicate that sensor simulation within CARs can be used to train robots to adapt to changing environments.
36.	Fauzi, Nurul Izzatie Husna, et al. (författare) Feature-Based Object Detection and Tracking: A Systematic Literature Review 2024 Ingår i: International Journal of Image and Graphics. - 0219-4678. ; 24:3 Tidskriftsartikel (refereegranskat)abstract Correct object detection plays a key role in generating an accurate object tracking result. Feature-based methods have the capability of handling the critical process of extracting features of an object. This paper aims to investigate object tracking using feature-based methods in terms of (1) identifying and analyzing the existing methods; (2) reporting and scrutinizing the evaluation performance matrices and their implementation usage in measuring the effectiveness of object tracking and detection; (3) revealing and investigating the challenges that affect the accuracy performance of identified tracking methods; (4) measuring the effectiveness of identified methods in terms of revealing to what extent the challenges can impact the accuracy and precision performance based on the evaluation performance matrices reported; and (5) presenting the potential future directions for improvement. The review process of this research was conducted based on standard systematic literature review (SLR) guidelines by Kitchenam's and Charters'. Initially, 157 prospective studies were identified. Through a rigorous study selection strategy, 32 relevant studies were selected to address the listed research questions. Thirty-two methods were identified and analyzed in terms of their aims, introduced improvements, and results achieved, along with presenting a new outlook on the classification of identified methods based on the feature-based method used in detection and tracking process.
37.	Granlund, Gösta H. (författare) A Nonlinear, Image-content Dependent Measure of Image Quality 1977 Rapport (övrigt vetenskapligt/konstnärligt)abstract In recent years, considerable research effort has been devoted to the development of useful descriptors for image quality. The attempts have been hampered by i n complete understanding of the operation of the human visual system. This has made it difficult to relate physical measures and perceptual traits.A new model for determination of image quality is proposed. Its main feature is that it tries to invoke image content into consideration. The model builds upon a theory of image linearization, which means that the information in an image can well enough be represented using linear segments or structures within local spatial regions and frequency ranges. This implies a l so a suggestion that information in an image has to do with one- dimensional correlations. This gives a possibility to separate image content from noise in images, and measure them both.Also a hypothesis is proposed that the visual system of humans does in fact perform such a linearization.
38.	Haghir Chehreghani, Mostafa, et al. (författare) Efficient context-aware K-nearest neighbor search 2018 Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Cham : Springer International Publishing. - 1611-3349 .- 0302-9743. ; 10772, s. 466-478 Konferensbidrag (refereegranskat)abstract We develop a context-sensitive and linear-time K-nearest neighbor search method, wherein the test object and its neighborhood (in the training dataset) are required to share a similar structure via establishing bilateral relations. Our approach particularly enables to deal with two types of irregularities: (i) when the (test) objects are outliers, i.e. they do not belong to any of the existing structures in the (training) dataset, and (ii) when the structures (e.g. classes) in the dataset have diverse densities. Instead of aiming to capture the correct underlying structure of the whole data, we extract the correct structure in the neighborhood of the test object, which leads to computational efficiency of our search strategy. We investigate the performance of our method on a variety of real-world datasets and demonstrate its superior performance compared to the alternatives.
39.	Haghir Chehreghani, Morteza, 1982 (författare) Unsupervised representation learning with Minimax distance measures 2020 Ingår i: Machine Learning. - : Springer Science and Business Media LLC. - 0885-6125 .- 1573-0565. ; 109:11, s. 2063-2097 Tidskriftsartikel (refereegranskat)abstract We investigate the use of Minimax distances to extract in a nonparametric way the features that capture the unknown underlying patterns and structures in the data. We develop a general-purpose and computationally efficient framework to employ Minimax distances with many machine learning methods that perform on numerical data. We study both computing the pairwise Minimax distances for all pairs of objects and as well as computing the Minimax distances of all the objects to/from a fixed (test) object. We first efficiently compute the pairwise Minimax distances between the objects, using the equivalence of Minimax distances over a graph and over a minimum spanning tree constructed on that. Then, we perform an embedding of the pairwise Minimax distances into a new vector space, such that their squared Euclidean distances in the new space equal to the pairwise Minimax distances in the original space. We also study the case of having multiple pairwise Minimax matrices, instead of a single one. Thereby, we propose an embedding via first summing up the centered matrices and then performing an eigenvalue decomposition to obtain the relevant features. In the following, we study computing Minimax distances from a fixed (test) object which can be used for instance in K-nearest neighbor search. Similar to the case of all-pair pairwise Minimax distances, we develop an efficient and general-purpose algorithm that is applicable with any arbitrary base distance measure. Moreover, we investigate in detail the edges selected by the Minimax distances and thereby explore the ability of Minimax distances in detecting outlier objects. Finally, for each setting, we perform several experiments to demonstrate the effectiveness of our framework.
40.	Hornauer, Sascha, et al. (författare) Driving scene retrieval by example from large-scale data 2019 Ingår i: CVPR Workshops 2019. Konferensbidrag (refereegranskat)abstract Many machine learning approaches train networks with input from large datasets to reach high task performance. Collected datasets, such as Berkeley Deep Drive Video (BDD-V) for autonomous driving, contain a large variety of scenes and hence features. However, depending on the task, subsets, containing certain features more densely, support training better than others. For example, training networks on tasks such as image segmentation, bounding box detection or tracking requires an ample amount of objects in the input data. When training a network to perform optical flow estimation from first-person video, over-proportionally many straight driving scenes in the training data may lower generalization to turns. Even though some scenes of the BDD-V dataset are labeled with scene, weather or time of day information, these may be too coarse to filter the dataset best for a particular training task. Furthermore, even defining an exhaustive list of good label-types is complicated as it requires choosing the most relevant concepts of the natural world for a task. Alternatively, we investigate how to use examples of desired data to retrieve more similar data from a large-scale dataset. Following the paradigm of ”I know it when I see it”, we present a deep learning approach to use driving examples for retrieving similar scenes from the BDD-V dataset. Our method leverages only automatically collected labels. We show how we can reliably vary time of the day or objects in our query examples and retrieve nearest neighbors from the dataset. Using this method, already collected data can be filtered to remove bias from a dataset, removing scenes regarded too redundant to train on.
41.	Hoseini, Fazeleh Sadat, 1989, et al. (författare) Memory-Efficient Minimax Distance Measures 2022 Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Cham : Springer International Publishing. - 1611-3349 .- 0302-9743. ; 13280 LNAI, s. 419-431 Konferensbidrag (refereegranskat)abstract Minimax distance measure is a transitive-aware measure that allows us to extract elongated manifolds and structures in the data in an unsupervised manner. Existing methods require a quadratic memory with respect to the number of data points to compute the pairwise Minimax distances. In this paper, we investigate two memory-efficient approaches to reduce the memory requirement and achieve linear space complexity. The first approach proposes a novel hierarchical representation of the data that requires only O(N) memory and from which the pairwise Minimax distances can be derived in a memory-efficient manner. The second approach is an efficient sampling method that adapts well to the proposed hierarchical representation of the data. This approach accurately recovers the majority of Minimax distances, especially the most important ones. It still works in O(N) memory, but with a substantially lower computational cost, and yields impressive results on clustering benchmarks, as a downstream task. We evaluate our methods on synthetic and real-world datasets from a variety of domains.
42.	Inbasekaran, Aravind, et al. (författare) Using Transfer Learning to contextually Optimize Optical Character Recognition (OCR) output and perform new Feature Extraction on a digitized cultural and historical dataset 2021 Ingår i: Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021. ; , s. 2224-2230 Konferensbidrag (refereegranskat)abstract Understanding handwritten and printed text is easier for humans but computers do not have the same level of accuracy. While there are many Optical Character Recognition (OCR) tools like PyTesseract1, Abbyy FineReader2 which extract the text as digital characters from handwritten or printed text images, none of them are without unrecognizable characters or misspelled words. Spelling correction is one of the well-known tasks in Natural Language Processing. Spelling correction of an individual word could be performed through existing tools, however, correcting a word based on the context of the sentence is a challenging task that requires a human-level understanding of the language. In this paper, we introduce a novel experiment of applying Natural Language Processing using a machine learning concept called Transfer Learning3 on the text extracted by OCR tools, thereby optimizing the output text by reducing misspelled words. This experiment is conducted on the OCR output of a sample of newspaper images published between the late 18th century to 19th century. These images were obtained from the Maryland State Archives4 digital archives project named, the Legacy of Slavery5. This Natural Language Processing approach uses pre-trained language transformer models like BERT6 and RoBERTa7 which are used as word-prediction software for spelling correction based on the context of the words in the OCR output. We compare the performance of BERT and RoBERTa on two OCR tool outputs, namely PyTesseract and Abbyy FineReader. A comparative evaluation shows that both the models work fairly well on correcting misspelled words considering the irregularities in the text data from the OCR output. Additionally, with the Transfer Learning output text, a special process is conducted to create a new feature that originally did not exist in the original dataset dataset using Spacy's Entity Recognizer (ER)8. This new extracted values are added to the dataset as a new feature. Also, an existing feature's values are compared to Spacy's ER output and the original hand transcribed data.
43.	Kim, Jinhan, et al. (författare) Guiding Deep Learning System Testing Using Surprise Adequacy 2019 Ingår i: Proceedings - International Conference on Software Engineering. - : IEEE. - 0270-5257. ; 2019-May, s. 1039-1049, s. 1039-1049 Konferensbidrag (refereegranskat)abstract Deep Learning (DL) systems are rapidly being adopted in safety and security critical domains, urgently calling for ways to test their correctness and robustness. Testing of DL systems has traditionally relied on manual collection and labelling of data. Recently, a number of coverage criteria based on neuron activation values have been proposed. These criteria essentially count the number of neurons whose activation during the execution of a DL system satisfied certain properties, such as being above predefined thresholds. However, existing coverage criteria are not sufficiently fine grained to capture subtle behaviours exhibited by DL systems. Moreover, evaluations have focused on showing correlation between adversarial examples and proposed criteria rather than evaluating and guiding their use for actual testing of DL systems. We propose a novel test adequacy criterion for testing of DL systems, called Surprise Adequacy for Deep Learning Systems (SADL), which is based on the behaviour of DL systems with respect to their training data. We measure the surprise of an input as the difference in DL system's behaviour between the input and the training data (i.e., what was learnt during training), and subsequently develop this as an adequacy criterion: a good test input should be sufficiently but not overtly surprising compared to training data. Empirical evaluation using a range of DL systems from simple image classifiers to autonomous driving car platforms shows that systematic sampling of inputs based on their surprise can improve classification accuracy of DL systems against adversarial examples by up to 77.5% via retraining.
44.	Koriakina, Nadezhda, 1991-, et al. (författare) Deep multiple instance learning versus conventional deep single instance learning for interpretable oral cancer detection 2024 Ingår i: PLOS ONE. - : Public Library of Science (PLoS). - 1932-6203. ; 19:4 April Tidskriftsartikel (refereegranskat)abstract The current medical standard for setting an oral cancer (OC) diagnosis is histological examination of a tissue sample taken from the oral cavity. This process is time-consuming and more invasive than an alternative approach of acquiring a brush sample followed by cytological analysis. Using a microscope, skilled cytotechnologists are able to detect changes due to malignancy; however, introducing this approach into clinical routine is associated with challenges such as a lack of resources and experts. To design a trustworthy OC detection system that can assist cytotechnologists, we are interested in deep learning based methods that can reliably detect cancer, given only per-patient labels (thereby minimizing annotation bias), and also provide information regarding which cells are most relevant for the diagnosis (thereby enabling supervision and understanding). In this study, we perform a comparison of two approaches suitable for OC detection and interpretation: (i) conventional single instance learning (SIL) approach and (ii) a modern multiple instance learning (MIL) method. To facilitate systematic evaluation of the considered approaches, we, in addition to a real OC dataset with patient-level ground truth annotations, also introduce a synthetic dataset—PAP-QMNIST. This dataset shares several properties of OC data, such as image size and large and varied number of instances per bag, and may therefore act as a proxy model of a real OC dataset, while, in contrast to OC data, it offers reliable per-instance ground truth, as defined by design. PAP-QMNIST has the additional advantage of being visually interpretable for non-experts, which simplifies analysis of the behavior of methods. For both OC and PAP-QMNIST data, we evaluate performance of the methods utilizing three different neural network architectures. Our study indicates, somewhat surprisingly, that on both synthetic and real data, the performance of the SIL approach is better or equal to the performance of the MIL approach. Visual examination by cytotechnologist indicates that the methods manage to identify cells which deviate from normality, including malignant cells as well as those suspicious for dysplasia. We share the code as open source.
45.	Larsson, Måns, 1989, et al. (författare) A projected gradient descent method for crf inference allowing end-to-end training of arbitrary pairwise potentials 2018 Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Cham : Springer International Publishing. - 1611-3349 .- 0302-9743. - 9783319781983 ; 10746, s. 564-579 Konferensbidrag (refereegranskat)abstract Are we using the right potential functions in the Conditional Random Field models that are popular in the Vision community? Semantic segmentation and other pixel-level labelling tasks have made significant progress recently due to the deep learning paradigm. However, most state-of-the-art structured prediction methods also include a random field model with a hand-crafted Gaussian potential to model spatial priors, label consistencies and feature-based image conditioning. In this paper, we challenge this view by developing a new inference and learning framework which can learn pairwise CRF potentials restricted only by their dependence on the image pixel values and the size of the support. Both standard spatial and high-dimensional bilateral kernels are considered. Our framework is based on the observation that CRF inference can be achieved via projected gradient descent and consequently, can easily be integrated in deep neural networks to allow for end-to-end training. It is empirically demonstrated that such learned potentials can improve segmentation accuracy and that certain label class interactions are indeed better modelled by a non-Gaussian potential. In addition, we compare our inference method to the commonly used mean-field algorithm. Our framework is evaluated on several public benchmarks for semantic segmentation with improved performance compared to previous state-of-the-art CNN+CRF models.
46.	Le, Minh Ha, et al. (författare) AnonFACES: Anonymizing Faces Adjusted to Constraints on Efficacy and Security 2020 Ingår i: WPES 2020 - Proceedings of the 19th Workshop on Privacy in the Electronic Society. - New York, NY, USA : ACM. ; , s. 87-100, s. 87-100 Konferensbidrag (refereegranskat)abstract Image data analysis techniques such as facial recognition can threaten individuals’ privacy. Whereas privacy risks often can be reduced by adding noise to the data, this approach reduces the utility of the images. For this reason, image de-identification techniques typically replace directly identifying features (e.g., faces, car number plates) present in the data with synthesized features, while still preserving other non-identifying features. As of today, existing techniques mostly focus on improving the naturalness of the generated synthesized images, without quantifying their impact on privacy. In this paper, we propose the first methodology and system design to quantify, improve, and tune the privacy-utility trade-off, while simultaneously also improving the naturalness of the generated images. The system design is broken down into three components that address separate but complementing challenges. This includes a two-step cluster analysis component to extract low-dimensional feature vectors representing the images (embedding) and to cluster the images into fixed-sized clusters. While the importance of good clustering mostly has been neglected in previous work, we find that our novel approach of using low-dimensional feature vectors can improve the privacy-utility trade-off by better clustering similar images. The use of these embeddings has been found particularly useful when wanting to ensure high naturalness and utility of the synthetically generated images. By combining improved clustering and incorporating StyleGAN, a state-of-the-art Generative Neural Network, into our solution, we produce more realistic synthesized faces than prior works, while also better preserving properties such as age, gender, skin tone, or even emotional expressions. Finally, our iterative tuning method exploits non-linear relations between privacy and utility to identify good privacy-utility trade-offs. We note that an example benefit of these improvements is that our solution allows car manufacturers to train their autonomous vehicles while complying with privacy laws.
47.	Lee, Joonbum, et al. (författare) Investigating the correspondence between driver head position and glance location 2018 Ingår i: PeerJ Computer Science. - : PeerJ. - 2376-5992. Tidskriftsartikel (refereegranskat)abstract The relationship between a driver's glance orientation and corresponding head rotation is highly complex due to its nonlinear dependence on the individual, task, and driving context. This paper presents expanded analytic detail and findings from an effort that explored the ability of head pose to serve as an estimator for driver gaze by connecting head rotation data with manually coded gaze region data using both a statistical analysis approach and a predictive (i.e., machine learning) approach. For the latter, classification accuracy increased as visual angles between two glance locations increased. In other words, the greater the shift in gaze, the higher the accuracy of classification. This is an intuitive but important concept that we make explicit through our analysis. The highest accuracy achieved was 83% using the method of Hidden Markov Models (HMM) for the binary gaze classification problem of (a) glances to the forward roadway versus (b) glances to the center stack. Results suggest that although there are individual differences in head-glance correspondence while driving, classifier models based on head-rotation data may be robust to these differences and therefore can serve as reasonable estimators for glance location. The results suggest that driver head pose can be used as a surrogate for eye gaze in several key conditions including the identification of high-eccentricity glances. Inexpensive driver head pose tracking may be a key element in detection systems developed to mitigate driver distraction and inattention.
48.	Li, Jason, 1993, et al. (författare) Training Convolutional Neural Networks with Synthesized Data for Object Recognition in Industrial Manufacturing 2019 Ingår i: IEEE International Conference on Emerging Technologies and Factory Automation, ETFA. - 1946-0759 .- 1946-0740. ; , s. 1544-1547 Konferensbidrag (refereegranskat)abstract Visual tasks such as automated quality control or packaging require machines to be able to detect and identify objects automatically. In recent years object detection systems using deep learning have made significant advancements achieving better scores at a higher performance. However, these methods typically require large amounts of annotated images for training, which are costly and labor intensive to create. Therefore, it is an attractive alternative to generate the training data synthetically using computer-generated imagery (CGI). In this paper, we investigate how to add realistic texture to CAD objects to generate synthetic data for training of an instance segmentation network (Mask R-CNN) for recognition of manufacturing components. The results show that it is possible to create synthetic data with negligible human effort when using simple procedural materials.
49.	Liu, Yang, et al. (författare) Movement Status Based Vision Filter for RoboCup Small-Size League 2012 Ingår i: Advances in Automation and Robotics, Vol. 2. - Berlin, Heidelberg : Springer. - 9783642256455 - 9783642256462 ; , s. 79-86 Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract Small-size soccer league is a division of the RoboCup (Robot world cup) competitions. Each team uses its own designed hardware and software to compete with othersunder defined rules. There are two kinds of data which the strategy system will receive from the dedicated server, one of them is the referee commands, and the other one is vision data. However, due to the network delay and the vision noise, we have to process the data before we can actually use it. Therefore, a certain mechanism is needed in this case.Instead of using some prevalent and complex algorithms, this paper proposes to solve this problem from simple kinematics and mathematics point of view, which can be implemented effectively by hobbyists and undergraduate students. We divide this problem by the speed status and deal it in three different situations. Testing results show good performance with this algorithm and great potential in filtering vision data thus forecasting actual coordinates of tracking objects.
50.	Mitsioni, Ioanna, 1991-, et al. (författare) Interpretability in Contact-Rich Manipulation via Kinodynamic Images 2021 Ingår i: Proceedings - IEEE International Conference on Robotics and Automation. - : Institute of Electrical and Electronics Engineers (IEEE). - 1050-4729. ; 2021-May, s. 10175-10181 Konferensbidrag (refereegranskat)abstract Deep Neural Networks (NNs) have been widely utilized in contact-rich manipulation tasks to model the complicated contact dynamics. However, NN-based models are often difficult to decipher which can lead to seemingly inexplicable behaviors and unidentifiable failure cases. In this work, we address the interpretability of NN-based models by introducing the kinodynamic images. We propose a methodology that creates images from kinematic and dynamic data of contact-rich manipulation tasks. By using images as the state representation, we enable the application of interpretability modules that were previously limited to vision-based tasks. We use this representation to train a Convolutional Neural Network (CNN) and we extract interpretations with Grad-CAM to produce visual explanations. Our method is versatile and can be applied to any classification problem in manipulation tasks to visually interpret which parts of the input drive the model's decisions and distinguish its failure modes, regardless of the features used. Our experiments demonstrate that our method enables detailed visual inspections of sequences in a task, and high-level evaluations of a model's behavior.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Träfflista för sökning "hsv:(NATURVETENSKAP) hsv:(Data och informationsvetenskap) hsv:(Datorseende och robotik) "

Avgränsa träffmängd

År