SwePub - search: hsv:(NATURVETENSKAP) hsv:(Data...

Enumeration	Reference	Cover	Find
1.	Norlund, Tobias, 1991, et al. (author) Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it? 2021 In: Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 149-162, Punta Cana, Dominican Republic. - : Association for Computational Linguistics. Conference paper (peer-reviewed)abstract Large language models are known to suffer from the hallucination problem in that they are prone to output statements that are false or inconsistent, indicating a lack of knowledge. A proposed solution to this is to provide the model with additional data modalities that complements the knowledge obtained through text. We investigate the use of visual data to complement the knowledge of large language models by proposing a method for evaluating visual knowledge transfer to text for uni- or multimodal language models. The method is based on two steps, 1) a novel task querying for knowledge of memory colors, i.e. typical colors of well-known objects, and 2) filtering of model training data to clearly separate knowledge contributions. Additionally, we introduce a model architecture that involves a visual imagination step and evaluate it with our proposed method. We find that our method can successfully be used to measure visual knowledge transfer capabilities in models and that our novel model architecture shows promising results for leveraging multimodal knowledge in a unimodal setting.
2.	Sweidan, Dirar, et al. (author) Predicting Customer Churn in Retailing 2022 In: Proceedings 21st IEEE International Conference on Machine Learning and Applications ICMLA 2022. - : IEEE. - 9781665462839 - 9781665462846 ; , s. 635-640 Conference paper (peer-reviewed)abstract Customer churn is one of the most challenging problems for digital retailers. With significantly higher costs for acquiring new customers than retaining existing ones, knowledge about which customers are likely to churn becomes essential. This paper reports a case study where a data-driven approach to churn prediction is used for predicting churners and gaining insights about the problem domain. The real-world data set used contains approximately 200 000 customers, describing each customer using more than 50 features. In the pre-processing, exploration, modeling and analysis, attributes related to recency, frequency, and monetary concepts are identified and utilized. In addition, correlations and feature importance are used to discover and understand churn indicators. One important finding is that the churn rate highly depends on the number of previous purchases. In the segment consisting of customers with only one previous purchase, more than 75% will churn, i.e., not make another purchase in the coming year. For customers with at least four previous purchases, the corresponding churn rate is around 25%. Further analysis shows that churning customers in general, and as expected, make smaller purchases and visit the online store less often. In the experimentation, three modeling techniques are evaluated, and the results show that, in particular, Gradient Boosting models can predict churners with relatively high accuracy while obtaining a good balance between precision and recall.
3.	Fredriksson, Teodor, 1992, et al. (author) Machine learning models for automatic labeling: A systematic literature review 2020 In: ICSOFT 2020 - Proceedings of the 15th International Conference on Software Technologies. - : SCITEPRESS - Science and Technology Publications. ; , s. 552-566 Conference paper (peer-reviewed)abstract Automatic labeling is a type of classification problem. Classification has been studied with the help of statistical methods for a long time. With the explosion of new better computer processing units (CPUs) and graphical processing units (GPUs) the interest in machine learning has grown exponentially and we can use both statistical learning algorithms as well as deep neural networks (DNNs) to solve the classification tasks. Classification is a supervised machine learning problem and there exists a large amount of methodology for performing such task. However, it is very rare in industrial applications that data is fully labeled which is why we need good methodology to obtain error-free labels. The purpose of this paper is to examine the current literature on how to perform labeling using ML, we will compare these models in terms of popularity and on what datatypes they are used on. We performed a systematic literature review of empirical studies for machine learning for labeling. We identified 43 primary studies relevant to our search. From this we were able to determine the most common machine learning models for labeling. Lack of unlabeled instances is a major problem for industry as supervised learning is the most widely used. Obtaining labels is costly in terms of labor and financial costs. Based on our findings in this review we present alternate ways for labeling data for use in supervised learning tasks.
4.	Somanath, Sanjay, 1994, et al. (author) Towards Urban Digital Twins: A Workflow for Procedural Visualization Using Geospatial Data 2024 In: Remote Sensing. - 2072-4292. ; 16:11 Journal article (peer-reviewed)abstract A key feature for urban digital twins (DTs) is an automatically generated detailed 3D representation of the built and unbuilt environment from aerial imagery, footprints, LiDAR, or a fusion of these. Such 3D models have applications in architecture, civil engineering, urban planning, construction, real estate, Geographical Information Systems (GIS), and many other areas. While the visualization of large-scale data in conjunction with the generated 3D models is often a recurring and resource-intensive task, an automated workflow is complex, requiring many steps to achieve a high-quality visualization. Methods for building reconstruction approaches have come a long way, from previously manual approaches to semi-automatic or automatic approaches. This paper aims to complement existing methods of 3D building generation. First, we present a literature review covering different options for procedural context generation and visualization methods, focusing on workflows and data pipelines. Next, we present a semi-automated workflow that extends the building reconstruction pipeline to include procedural context generation using Python and Unreal Engine. Finally, we propose a workflow for integrating various types of large-scale urban analysis data for visualization. We conclude with a series of challenges faced in achieving such pipelines and the limitations of the current approach. However, the steps for a complete, end-to-end solution involve further developing robust systems for building detection, rooftop recognition, and geometry generation and importing and visualizing data in the same 3D environment, highlighting a need for further research and development in this field.
5.	Brunetta, Carlo, 1992 (author) Cryptographic Tools for Privacy Preservation 2021 Doctoral thesis (other academic/artistic)abstract Data permeates every aspect of our daily life and it is the backbone of our digitalized society. Smartphones, smartwatches and many more smart devices measure, collect, modify and share data in what is known as the Internet of Things. Often, these devices don’t have enough computation power/storage space thus out-sourcing some aspects of the data management to the Cloud. Outsourcing computation/storage to a third party poses natural questions regarding the security and privacy of the shared sensitive data. Intuitively, Cryptography is a toolset of primitives/protocols of which security prop- erties are formally proven while Privacy typically captures additional social/legislative requirements that relate more to the concept of “trust” between people, “how” data is used and/or “who” has access to data. This thesis separates the concepts by introducing an abstract model that classifies data leaks into different types of breaches. Each class represents a specific requirement/goal related to cryptography, e.g. confidentiality or integrity, or related to privacy, e.g. liability, sensitive data management and more. The thesis contains cryptographic tools designed to provide privacy guarantees for different application scenarios. In more details, the thesis: (a) defines new encryption schemes that provide formal privacy guarantees such as theoretical privacy definitions like Differential Privacy (DP), or concrete privacy-oriented applications covered by existing regulations such as the European General Data Protection Regulation (GDPR); (b) proposes new tools and procedures for providing verifiable computation’s guarantees in concrete scenarios for post-quantum cryptography or generalisation of signature schemes; (c) proposes a methodology for utilising Machine Learning (ML) for analysing the effective security and privacy of a crypto-tool and, dually, proposes a secure primitive that allows computing specific ML algorithm in a privacy-preserving way; (d) provides an alternative protocol for secure communication between two parties, based on the idea of communicating in a periodically timed fashion.
6.	Dodig-Crnkovic, Gordana, 1955 (author) Cognitive Architectures Based on Natural Info-Computation 2022 In: Studies in Applied Philosophy, Epistemology and Rational Ethics. - Cham : Springer. - 2192-6255 .- 2192-6263. ; , s. 3-13, s. 3-13 Book chapter (peer-reviewed)abstract At the time when the first models of cognitive architectures have been proposed, some forty years ago, understanding of cognition, embodiment and evolution was substantially different from today’s. So was the state of the art of information physics, information chemistry, bioinformatics, neuroinformatics, computational neuroscience, complexity theory, self-organization, theory of evolution, as well as the basic concepts of information and computation. Novel developments support a constructive interdisciplinary framework for cognitive architectures based on natural morphological computing, where interactions between constituents at different levels of organization of matter-energy and their corresponding time-dependent dynamics, lead to complexification of agency and increased cognitive capacities of living organisms that unfold through evolution. Proposed info-computational framework for naturalizing cognition considers present updates (generalizations) of the concepts of information, computation, cognition, and evolution in order to attain an alignment with the current state of the art in corresponding research fields. Some important open questions are suggested for future research with implications for further development of cognitive and intelligent technologies.
7.	Laaber, C., et al. (author) Applying test case prioritization to software microbenchmarks 2021 In: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 26:6 Journal article (peer-reviewed)abstract Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative.
8.	Pir Muhammad, Amna, 1990 (author) Managing Human Factors and Requirements in Agile Development of Automated Vehicles: An Exploration 2022 Licentiate thesis (other academic/artistic)abstract Context: Automated Vehicle (AV) technology has evolved significantly in complexity and impact; it is expected to ultimately change urban transporta- tion. However, research shows that vehicle automation can only live up to this expectation if it is defined with human capabilities and limitations in mind. Therefore, it is necessary to bring human factors knowledge to AV developers. Objective: This thesis aims to empirically study how we can effectively bring the required human factors knowledge into large-scale agile AV develop- ment. The research goals are 1) to explore requirements engineering and human factors in agile AV development, 2) to investigate the problems of requirements engineering, human factors, and agile way of working in AV development, and 3) to demonstrate initial solutions to existing problems in agile AV development. Method: We conducted this research in close collaboration with industry, using different empirical methodologies to collect data—including interviews, workshops, and document analysis. To gain in-depth insights, we did a qualita- tive exploratory study to investigate the problem and used a design science approach to develop initial solution in several iterations. Findings and Conclusions: We found that applying human factors knowledge effectively is one of the key problem areas that need to be solved in agile development of artificial intelligence (AI)-intense systems. This motivated us to do an in-depth interview study on how to manage human factors knowl- edge during AV development. From our data, we derived a working definition of human factors for AV development, discovered the relevant properties of agile and human factors, and defined implications for agile ways of working, managing human factors knowledge, and managing requirements. The design science approach allowed us to identify challenges related to agile requirements engineering in three case companies in iterations. Based on these three case studies, we developed a solution strategy to resolve the RE challenges in agile AV development. Moreover, we derived building blocks and described guide- lines for the creation of a requirements strategy, which should describe how requirements are structured, how work is organized, and how RE is integrated into the agile work and feature flow. Future Outlook: In future work, I plan to define a concrete requirement strategy for human factors knowledge in large-scale agile AV development. It could help establishing clear communication channels and practices for incorporating explicit human factors knowledge into AI-based large-scale agile AV development.
9.	Samoaa, Hazem Peter, et al. (author) A systematic mapping study of source code representation for deep learning in software engineering 2022 In: Iet Software. - : Institution of Engineering and Technology (IET). - 1751-8806 .- 1751-8814. ; 16:4, s. 351-385 Journal article (peer-reviewed)abstract The usage of deep learning (DL) approaches for software engineering has attracted much attention, particularly in source code modelling and analysis. However, in order to use DL, source code needs to be formatted to fit the expected input form of DL models. This problem is known as source code representation. Source code can be represented via different approaches, most importantly, the tree-based, token-based, and graph-based approaches. We use a systematic mapping study to investigate i detail the representation approaches adopted in 103 studies that use DL in the context of software engineering. Thus, studies are collected from 2014 to 2021 from 14 different journals and 27 conferences. We show that each way of representing source code can provide a different, yet orthogonal view of the same source code. Thus, different software engineering tasks might require different (combinations of) code representation approaches, depending on the nature and complexity of the task. Particularly, we show that it is crucial to define whether the DL approach requires lexical, syntactical, or semantic code information. Our analysis shows that a wide range of different representations and combinations of representations (hybrid representations) are used to solve a wide range of common software engineering problems. However, we also observe that current research does not generally attempt to transfer existing representations or models to other studies even though there are other contexts in which these representations and models may also be useful. We believe that there is potential for more reuse and the application of transfer learning when applying DL to software engineering tasks.
10.	Tsaloli, Georgia, 1993 (author) Secure and Privacy-Preserving Cloud-Assisted Computing 2022 Doctoral thesis (other academic/artistic)abstract Smart devices such as smartphones, wearables, and smart appliances collect significant amounts of data and transmit them over the network forming the Internet of Things (IoT). Many applications in our daily lives (e.g., health, smart grid, traffic monitoring) involve IoT devices that often have low computational capabilities. Subsequently, powerful cloud servers are employed to process the data collected from these devices. Nevertheless, security and privacy concerns arise in cloud-assisted computing settings. Collected data can be sensitive, and it is essential to protect their confidentiality. Additionally, outsourcing computations to untrusted cloud servers creates the need to ensure that servers perform the computations as requested and that any misbehavior can be detected, safeguarding security. Cryptographic primitives and protocols are the foundation to design secure and privacy-preserving solutions that address these challenges. This thesis focuses on providing privacy and security guarantees when outsourcing heavy computations on sensitive data to untrusted cloud servers. More concretely, this work: (a) provides solutions for outsourcing the secure computation of the sum and the product functions in the multi-server, multi-client setting, protecting the sensitive data of the data owners, even against potentially untrusted cloud servers; (b) provides integrity guarantees for the proposed protocols, by enabling anyone to verify the correctness of the computed function values. More precisely, the employed servers or the clients (depending on the proposed solution) provide specific values which are the proofs that the computed results are correct; (c) designs decentralized settings, where multiple cloud servers are employed to perform the requested computations as opposed to relying on a single server that might fail or lose connection; (d) suggests ways to protect individual privacy and provide integrity. More pre- cisely, we propose a verifiable differentially private solution that provides verifiability and avoids any leakage of information regardless of the participa- tion of some individual’s sensitive data in the computation or not.
11.	Isaksson, Martin, et al. (author) Adaptive Expert Models for Federated Learning 2023 In: <em>Lecture Notes in Computer Science </em>Volume 13448 Pages 1 - 16 2023. - Cham : Springer Science and Business Media Deutschland GmbH. - 9783031289958 ; 13448 LNAI, s. 1-16 Conference paper (peer-reviewed)abstract Federated Learning (FL) is a promising framework for distributed learning when data is private and sensitive. However, the state-of-the-art solutions in this framework are not optimal when data is heterogeneous and non-IID. We propose a practical and robust approach to personalization in FL that adjusts to heterogeneous and non-IID data by balancing exploration and exploitation of several global models. To achieve our aim of personalization, we use a Mixture of Experts (MoE) that learns to group clients that are similar to each other, while using the global models more efficiently. We show that our approach achieves an accuracy up to 29.78% better than the state-of-the-art and up to 4.38% better compared to a local model in a pathological non-IID setting, even though we tune our approach in the IID setting. © 2023, The Author(s)
12.	Picazo-Sanchez, Pablo, 1985, et al. (author) Are chrome extensions compliant with the spirit of least privilege? 2022 In: International Journal of Information Security. - : Springer Science and Business Media LLC. - 1615-5262 .- 1615-5270. ; 21:6, s. 1283-1297 Journal article (peer-reviewed)abstract Extensions are small applications installed by users and enrich the user experience of browsing the Internet. Browsers expose a set of restricted APIs to extensions. To be used, extensions need to list the permissions associated with these APIs in a mandatory extension file named manifest. In particular, Chrome’s permission ecosystem was designed in the spirit of the least privilege. Yet, this paper demonstrates that 39.8% of the analyzed extensions provided by the official Web Store are compliant with the spirit of least privilege. Also, we develop: (1) a browser extension to make aware regular users of the permissions the extensions they install; (2) a web app where extensions developers can check whether their extensions are compliant with the spirit of the least privileged; and (3) a set of scripts that can be part of the vendors’ acceptance criteria such that when developers upload their extensions to the official repositories, the scripts automatically analyze the extensions and generate a report about the permissions and the usage.
13.	John, Meenu Mary, et al. (author) Towards an AI-driven business development framework: A multi-case study 2023 In: Journal of Software: Evolution and Process. - : Wiley. - 2047-7481 .- 2047-7473. ; 35:6 Journal article (peer-reviewed)abstract Artificial intelligence (AI) and the use of machine learning (ML) and deep learning (DL) technologies are becoming increasingly popular in companies. These technologies enable companies to leverage big quantities of data to improve system performance and accelerate business development. However, despite the appeal of ML/DL, there is a lack of systematic and structured methods and processes to help data scientists and other company roles and functions to develop, deploy and evolve models. In this paper, based on multi-case study research in six companies, we explore practices and challenges practitioners experience in developing ML/DL models as part of large software-intensive embedded systems. Based on our empirical findings, we derive a conceptual framework in which we identify three high-level activities that companies perform in parallel with the development, deployment and evolution of models. Within this framework, we outline activities, iterations and triggers that optimize model design as well as roles and company functions. In this way, we provide practitioners with a blueprint for effectively integrating ML/DL model development into the business to achieve better results than other (algorithmic) approaches. In addition, we show how this framework helps companies solve the challenges we have identified and discuss checkpoints for terminating the business case.
14.	Lindén, Joakim, et al. (author) Evaluating the Robustness of ML Models to Out-of-Distribution Data Through Similarity Analysis 2023 In: Commun. Comput. Info. Sci.. - : Springer Science and Business Media Deutschland GmbH. - 9783031429408 ; , s. 348-359, s. 348-359 Conference paper (peer-reviewed)abstract In Machine Learning systems, several factors impact the performance of a trained model. The most important ones include model architecture, the amount of training time, the dataset size and diversity. We present a method for analyzing datasets from a use-case scenario perspective, detecting and quantifying out-of-distribution (OOD) data on dataset level. Our main contribution is the novel use of similarity metrics for the evaluation of the robustness of a model by introducing relative Fréchet Inception Distance (FID) and relative Kernel Inception Distance (KID) measures. These relative measures are relative to a baseline in-distribution dataset and are used to estimate how the model will perform on OOD data (i.e. estimate the model accuracy drop). We find a correlation between our proposed relative FID/relative KID measure and the drop in Average Precision (AP) accuracy on unseen data.
15.	David, I., et al. (author) Blended modeling in commercial and open-source model-driven software engineering tools: A systematic study 2023 In: Software and Systems Modeling. - : Springer Science and Business Media LLC. - 1619-1366 .- 1619-1374. ; 22, s. 415-447 Journal article (peer-reviewed)abstract Blended modeling aims to improve the user experience of modeling activities by prioritizing the seamless interaction with models through multiple notations over the consistency of the models. Inconsistency tolerance, thus, becomes an important aspect in such settings. To understand the potential of current commercial and open-source modeling tools to support blended modeling, we have designed and carried out a systematic study. We identify challenges and opportunities in the tooling aspect of blended modeling. Specifically, we investigate the user-facing and implementation-related characteristics of existing modeling tools that already support multiple types of notations and map their support for other blended aspects, such as inconsistency tolerance, and elevated user experience. For the sake of completeness, we have conducted a multivocal study, encompassing an academic review, and grey literature review. We have reviewed nearly 5000 academic papers and nearly 1500 entries of grey literature. We have identified 133 candidate tools, and eventually selected 26 of them to represent the current spectrum of modeling tools.
16.	Hujainah, Fadhl Mohammad Omar, 1987, et al. (author) SRPTackle: A semi-automated requirements prioritisation technique for scalable requirements of software system projects 2021 In: Information and Software Technology. - : Elsevier BV. - 0950-5849. ; 131 Journal article (peer-reviewed)abstract Context Requirement prioritisation (RP) is often used to select the most important system requirements as perceived by system stakeholders. RP plays a vital role in ensuring the development of a quality system with defined constraints. However, a closer look at existing RP techniques reveals that these techniques suffer from some key challenges, such as scalability, lack of quantification, insufficient prioritisation of participating stakeholders, overreliance on the participation of professional expertise, lack of automation and excessive time consumption. These key challenges serve as the motivation for the present research. Objective This study aims to propose a new semiautomated scalable prioritisation technique called ‘SRPTackle’ to address the key challenges. Method SRPTackle provides a semiautomated process based on a combination of a constructed requirement priority value formulation function using a multi-criteria decision-making method (i.e. weighted sum model), clustering algorithms (K-means and K-means++) and a binary search tree to minimise the need for expert involvement and increase efficiency. The effectiveness of SRPTackle is assessed by conducting seven experiments using a benchmark dataset from a large actual software project. Results Experiment results reveal that SRPTackle can obtain 93.0% and 94.65% as minimum and maximum accuracy percentages, respectively. These values are better than those of alternative techniques. The findings also demonstrate the capability of SRPTackle to prioritise large-scale requirements with reduced time consumption and its effectiveness in addressing the key challenges in comparison with other techniques. Conclusion With the time effectiveness, ability to scale well with numerous requirements, automation and clear implementation guidelines of SRPTackle, project managers can perform RP for large-scale requirements in a proper manner, without necessitating an extensive amount of effort (e.g. tedious manual processes, need for the involvement of experts and time workload).
17.	Mahmood, Wardah, 1992, et al. (author) Effects of variability in models: a family of experiments 2022 In: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 27:3 Journal article (peer-reviewed)abstract The ever-growing need for customization creates a need to maintain software systems in many different variants. To avoid having to maintain different copies of the same model, developers of modeling languages and tools have recently started to provide implementation techniques for such variant-rich systems, notably variability mechanisms, which support implementing the differences between model variants. Available mechanisms either follow the annotative or the compositional paradigm, each of which have dedicated benefits and drawbacks. Currently, language and tool designers select the used variability mechanism often solely based on intuition. A better empirical understanding of the comprehension of variability mechanisms would help them in improving support for effective modeling. In this article, we present an empirical assessment of annotative and compositional variability mechanisms for three popular types of models. We report and discuss findings from a family of three experiments with 164 participants in total, in which we studied the impact of different variability mechanisms during model comprehension tasks. We experimented with three model types commonly found in modeling languages: class diagrams, state machine diagrams, and activity diagrams. We find that, in two out of three experiments, annotative technique lead to better developer performance. Use of the compositional mechanism correlated with impaired performance. For all three considered tasks, the annotative mechanism was preferred over the compositional one in all experiments. We present actionable recommendations concerning support of flexible, tasks-specific solutions, and the transfer of established best practices from the code domain to models.
18.	Penzenstadler, Birgit, 1981, et al. (author) Bots in Software Engineering 2022 In: IEEE Software. - 1937-4194 .- 0740-7459. ; 39:5, s. 101-104 Research review (peer-reviewed)
19.	Strannegård, Claes, 1962, et al. (author) Ecosystem Models Based on Artificial Intelligence 2022 In: 34th Workshop of the Swedish Artificial Intelligence Society, SAIS 2022. - : IEEE. Conference paper (peer-reviewed)abstract Ecosystem models can be used for understanding general phenomena of evolution, ecology, and ethology. They can also be used for analyzing and predicting the ecological consequences of human activities on specific ecosystems, e.g., the effects of agriculture, forestry, construction, hunting, and fishing. We argue that powerful ecosystem models need to include reasonable models of the physical environment and of animal behavior. We also argue that several well-known ecosystem models are unsatisfactory in this regard. Then we present the open-source ecosystem simulator Ecotwin, which is built on top of the game engine Unity. To model a specific ecosystem in Ecotwin, we first generate a 3D Unity model of the physical environment, based on topographic or bathymetric data. Then we insert digital 3D models of the organisms of interest into the environment model. Each organism is equipped with a genome and capable of sexual or asexual reproduction. An organism dies if it runs out of some vital resource or reaches its maximum age. The animal models are equipped with behavioral models that include sensors, actions, reward signals, and mechanisms of learning and decision-making. Finally, we illustrate how Ecotwin works by building and running one terrestrial and one marine ecosystem model.
20.	Berbyuk Lindström, Nataliya, 1978 (author) It is No Blame Game! Challenges and Best Practices in Communicating Metrics in Software Development Organizations 2023 In: ICIS 2023 (International conference on Information Systems), International Research Workshop on IT Project Management, Hyderabad, India. - Hyderabad, India. Conference paper (peer-reviewed)abstract In the realm of software development, the significance of product and process metrics cannot be overstated. These metrics serve as pivotal tools, guiding the evaluation and enhancement of software quality and efficiency. However, despite their undeniable importance, establishing and executing measurement programs often proves to be an intricate endeavor, which often stems from the intricate interplay of human factors that pervade the software development landscape. In this paper, drawing upon data derived from in-depth interviews and interactive workshops involving developers, stakeholders, and managers from four software development organizations, we identify challenges and best practices in communication around metrics. Based on our findings, we provide communication guidelines, which can steer practitioners and stakeholders toward more adept and proficient practices in the communication of metrics.
21.	Suchan, Jakob, et al. (author) Commonsense Visual Sensemaking for Autonomous Driving : On Generalised Neurosymbolic Online Abduction Integrating Vision and Semantics 2021 In: Artificial Intelligence. - : Elsevier. - 0004-3702 .- 1872-7921. ; 299 Journal article (peer-reviewed)abstract We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking in the backdrop of autonomous driving. A general neurosymbolic method for online visual sensemaking using answer set programming (ASP) is systematically formalised and fully implemented. The method integrates state of the art in visual computing, and is developed as a modular framework that is generally usable within hybrid architectures for realtime perception and control. We evaluate and demonstrate with community established benchmarks KITTIMOD, MOT-2017, and MOT-2020. As use-case, we focus on the significance of human-centred visual sensemaking â€”e.g., involving semantic representation and explainability, question-answering, commonsense interpolationâ€” in safety-critical autonomous driving situations. The developed neurosymbolic framework is domain-independent, with the case of autonomous driving designed to serve as an exemplar for online visual sensemaking in diverse cognitive interaction settings in the backdrop of select human-centred AI technology design considerations.
22.	Barreiro, Anabela, et al. (author) Multi3Generation : Multitask, Multilingual, Multimodal Language Generation 2022 In: Proceedings of the 23rd Annual Conference of the European Association for Machine Translation. - : European Association for Machine Translation. ; , s. 345-346 Conference paper (peer-reviewed)abstract This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generatio(CA18231), an interdisciplinary networof research groups working on different aspects of language generation. This "meta-paper" will serve as reference for citationof the Action in future publications. It presents the objectives, challenges and a the links for the achieved outcomes.
23.	Blanch, Krister, 1991 (author) Beyond-application datasets and automated fair benchmarking 2023 Licentiate thesis (other academic/artistic)abstract Beyond-application perception datasets are generalised datasets that emphasise the fundamental components of good machine perception data. When analysing the history of perception datatsets, notable trends suggest that design of the dataset typically aligns with an application goal. Instead of focusing on a specific application, beyond-application datasets instead look at capturing high-quality, high-volume data from a highly kinematic environment, for the purpose of aiding algorithm development and testing in general. Algorithm benchmarking is a cornerstone of autonomous systems development, and allows developers to demonstrate their results in a comparative manner. However, most benchmarking systems allow developers to use their own hardware or select favourable data. There is also little focus on run time performance and consistency, with benchmarking systems instead showcasing algorithm accuracy. By combining both beyond-application dataset generation and methods for fair benchmarking, there is also the dilemma of how to provide the dataset to developers for this benchmarking, as the result of a high-volume, high-quality dataset generation is a significant increase in dataset size when compared to traditional perception datasets. This thesis presents the first results of attempting the creation of such a dataset. The dataset was built using a maritime platform, selected due to the highly dynamic environment presented on water. The design and initial testing of this platform is detailed, as well as as methods of sensor validation. Continuing, the thesis then presents a method of fair benchmarking, by utilising remote containerisation in a way that allows developers to present their software to the dataset, instead of having to first locally store a copy. To test this dataset and automatic online benchmarking, a number of reference algorithms were required for initial results. Three algorithms were built, using the data from three different sensors captured on the maritime platform. Each algorithm calculates vessel odometry, and the automatic benchmarking system was utilised to show the accuracy and run-time performance of these algorithms. It was found that the containerised approach alleviated data management concerns, prevented inflated accuracy results, and demonstrated precisely how computationally intensive each algorithm was.
24.	Palmquist, Adam, 1983, et al. (author) AUTOMATON : A Gamification Machine Learning Project 2023 In: Encyclopedia of Data Science and Machine Learning. - : IGI Global. - 9781799892205 - 1799892204 - 9781799892212 ; , s. 3090-3101 Book chapter (peer-reviewed)abstract This article displays a design ethnographic case study on an ongoing machine learning project at a Scandinavian gamification start-up company. From late 2020 until early 2021, the project produced a machine learning proof of concept, later implemented in the gamification start-up´s application programming interface to offer smart gamification. The initial results show promise in using prediction models to automate the cluster model selection affording more functional, autonomous, and scalable user segments that are faster to implement. The finding provides opportunities for gamification (e.g., in learning analytics and health informatics). An identified challenge was performance; the neural networks required hyperparameter fine-tuning, which is time-consuming and limits scalability. Interesting further investigations should consider the neural network fine-tuning process, but also attempt to verify the effectiveness of the cluster models selection compared with a control group.
25.	Ali, Muhaddisa Barat, 1986 (author) Deep Learning Methods for Classification of Gliomas and Their Molecular Subtypes, From Central Learning to Federated Learning 2023 Doctoral thesis (other academic/artistic)abstract The most common type of brain cancer in adults are gliomas. Under the updated 2016 World Health Organization (WHO) tumor classification in central nervous system (CNS), identification of molecular subtypes of gliomas is important. For low grade gliomas (LGGs), prediction of molecular subtypes by observing magnetic resonance imaging (MRI) scans might be difficult without taking biopsy. With the development of machine learning (ML) methods such as deep learning (DL), molecular based classification methods have shown promising results from MRI scans that may assist clinicians for prognosis and deciding on a treatment strategy. However, DL requires large amount of training datasets with tumor class labels and tumor boundary annotations. Manual annotation of tumor boundary is a time consuming and expensive process. The thesis is based on the work developed in five papers on gliomas and their molecular subtypes. We propose novel methods that provide improved performance. The proposed methods consist of a multi-stream convolutional autoencoder (CAE)-based classifier, a deep convolutional generative adversarial network (DCGAN) to enlarge the training dataset, a CycleGAN to handle domain shift, a novel federated learning (FL) scheme to allow local client-based training with dataset protection, and employing bounding boxes to MRIs when tumor boundary annotations are not available. Experimental results showed that DCGAN generated MRIs have enlarged the original training dataset size and have improved the classification performance on test sets. CycleGAN showed good domain adaptation on multiple source datasets and improved the classification performance. The proposed FL scheme showed a slightly degraded performance as compare to that of central learning (CL) approach while protecting dataset privacy. Using tumor bounding boxes showed to be an alternative approach to tumor boundary annotation for tumor classification and segmentation, with a trade-off between a slight decrease in performance and saving time in manual marking by clinicians. The proposed methods may benefit the future research in bringing DL tools into clinical practice for assisting tumor diagnosis and help the decision making process.
26.	Lindgren, Helena, Professor, et al. (author) The wasp-ed AI curriculum : A holistic curriculum for artificial intelligence 2023 In: INTED2023 Proceedings. - : IATED. - 9788409490264 ; , s. 6496-6502 Conference paper (peer-reviewed)abstract Efforts in lifelong learning and competence development in Artificial Intelligence (AI) have been on the rise for several years. These initiatives have mostly been applied to Science, Technology, Engineering and Mathematics (STEM) disciplines. Even though there has been significant development in Digital Humanities to incorporate AI methods and tools in higher education, the potential for such competences in Arts, Humanities and Social Sciences is far from being realised. Furthermore, there is an increasing awareness that the STEM disciplines need to include competences relating to AI in humanity and society. This is especially important considering the widening and deepening of the impact of AI on society at large and individuals. The aim of the presented work is to provide a broad and inclusive AI Curriculum that covers the breadth of the topic as it is seen today, which is significantly different from only a decade ago. It is important to note that with the curriculum we mean an overview of the subject itself, rather than a particular education program. The curriculum is intended to be used as a foundation for educational activities in AI to for example harmonize terminology, compare different programs, and identify educational gaps to be filled. An important aspect of the curriculum is the ethical, legal, and societal aspects of AI and to not limit the curriculum to the STEM subjects, instead extending to a holistic, human-centred AI perspective. The curriculum is developed as part of the national research program WASP-ED, the Wallenberg AI and transformative technologies education development program.
27.	Lv, Zhihan, Dr. 1984-, et al. (author) 5G for mobile augmented reality 2022 In: International Journal of Communication Systems. - : John Wiley & Sons. - 1074-5351 .- 1099-1131. ; 35:5 Journal article (other academic/artistic)
28.	Lv, Zhihan, Dr. 1984-, et al. (author) Editorial : 5G for Augmented Reality 2022 In: Mobile Networks and Applications. - : Springer. - 1383-469X .- 1572-8153. Journal article (peer-reviewed)
29.	Singh, Avinash, 1986-, et al. (author) Verbal explanations by collaborating robot teams 2021 In: Paladyn - Journal of Behavioral Robotics. - : De Gruyter Open. - 2080-9778 .- 2081-4836. ; 12:1, s. 47-57 Journal article (peer-reviewed)abstract In this article, we present work on collaborating robot teams that use verbal explanations of their actions and intentions in order to be more understandable to the human. For this, we introduce a mechanism that determines what information the robots should verbalize in accordance with Grice’s maxim of quantity, i.e., convey as much information as is required and no more or less. Our setup is a robot team collaborating to achieve a common goal while explaining in natural language what they are currently doing and what they intend to do. The proposed approach is implemented on three Pepper robots moving objects on a table. It is evaluated by human subjects answering a range of questions about the robots’ explanations, which are generated using either our proposed approach or two further approaches implemented for evaluation purposes. Overall, we find that our proposed approach leads to the most understanding of what the robots are doing. In addition, we further propose a method for incorporating policies driving the distribution of tasks among the robots, which may further support understandability.
30.	Dodig-Crnkovic, Gordana, 1955 (author) On the Foundations of Computing. Computing as the Fourth Great Domain of Science 2023 In: Global Philosophy. - 2948-152X .- 2948-1538. ; 33:16 Journal article (peer-reviewed)abstract This review essay analyzes the book by Giuseppe Primiero, On the foundations of computing. Oxford: Oxford University Press (ISBN 978-0-19-883564-6/hbk; 978-0-19-883565-3/pbk). xix, 296 p. (2020). It gives a critical view from the perspective of physical computing as a foundation of computing and argues that the neglected pillar of material computation (Stepney) should be brought centerstage and computing recognized as the fourth great domain of science (Denning).
31.	Ramadan, Q., et al. (author) A semi-automated BPMN-based framework for detecting conflicts between security, data-minimization, and fairness requirements 2020 In: Software and Systems Modeling. - : Springer Science and Business Media LLC. - 1619-1366 .- 1619-1374. ; 19, s. 1191-1227 Journal article (peer-reviewed)abstract Requirements are inherently prone to conflicts. Security, data-minimization, and fairness requirements are no exception. Importantly, undetected conflicts between such requirements can lead to severe effects, including privacy infringement and legal sanctions. Detecting conflicts between security, data-minimization, and fairness requirements is a challenging task, as such conflicts are context-specific and their detection requires a thorough understanding of the underlying business processes. For example, a process may require anonymous execution of a task that writes data into a secure data storage, where the identity of the writer is needed for the purpose of accountability. Moreover, conflicts not arise from trade-offs between requirements elicited from the stakeholders, but also from misinterpretation of elicited requirements while implementing them in business processes, leading to a non-alignment between the data subjects' requirements and their specifications. Both types of conflicts are substantial challenges for conflict detection. To address these challenges, we propose a BPMN-based framework that supports: (i) the design of business processes considering security, data-minimization and fairness requirements, (ii) the encoding of such requirements as reusable, domain-specific patterns, (iii) the checking of alignment between the encoded requirements and annotated BPMN models based on these patterns, and (iv) the detection of conflicts between the specified requirements in the BPMN models based on a catalog of domain-independent anti-patterns. The security requirements were reused from SecBPMN2, a security-oriented BPMN 2.0 extension, while the fairness and data-minimization parts are new. For formulating our patterns and anti-patterns, we extended a graphical query language called SecBPMN2-Q. We report on the feasibility and the usability of our approach based on a case study featuring a healthcare management system, and an experimental user study.
32.	Scoccia, Gian Luca, et al. (author) Hey, my data are mine! Active data to empower the user 2020 In: Proceedings - International Conference on Software Engineering. - New York, NY, USA : ACM. - 0270-5257. ; , s. 5-8 Conference paper (peer-reviewed)abstract Privacy is increasingly getting importance in modern systems. As a matter of fact, personal data are out of the control of the original owner and remain in the hands of the software-systems producers. In this new ideas paper, we drastically change the nature of data from passive to active as a way to empower the user and preserve both the original ownership of the data and the privacy policies specified by the data owner. We demonstrate the idea of active data in the mobile domain.
33.	Stotsky, Alexander, 1960 (author) Efficient Iterative Solvers in the Least Squares Method 2020 In: Ifac Papersonline. - : Elsevier BV. - 2405-8963. ; 53:2, s. 883-888 Journal article (peer-reviewed)abstract Fast convergent, accurate, computationally efficient, parallelizable, and robust matrix inversion and parameter estimation algorithms are required in many time-critical and accuracy-critical applications such as system identification, signal and image processing, network and big data analysis, machine learning and in many others. This paper introduces new composite power series expansion with optionally chosen rates (which can be calculated simultaneously on parallel units with different computational capacities) for further convergence rate improvement of high order Newton-Schulz iteration. New expansion was integrated into the Richardson iteration and resulted in significant convergence rate improvement. The improvement is quantified via explicit transient models for estimation errors and by simulations. In addition, the recursive and computationally efficient version of the combination of Richardson iteration and Newton-Schulz iteration with composite expansion is developed for simultaneous calculations. Moreover, unified factorization is developed in this paper in the form of tool-kit for power series expansion, which results in a new family of computationally efficient Newton-Schulz algorithms. Copyright (C) 2020 The Authors.
34.	Liu, Yuqi, et al. (author) Integration of Multi-scale Spatial Digital Twins in Metaverse Based on Multi-dimensional Hash Geocoding 2024 In: IMX '24. - : Association for Computing Machinery (ACM). - 9798400705038 ; , s. 56-63 Conference paper (peer-reviewed)abstract With the popularization of the metaverse, virtual reality mapping technology based on digital twins has generated a large amount of spatial data. These data are multidimensional, multi-scale, mobile, and distributed. In order to fully utilize these data, we propose a non mutation multidimensional hash geocoding that can organize and store data with geographic features, and achieve data mapping at different scales from macro to micro. The mapping between them can achieve joint utilization of data of various scales. On this basis, we propose a block network secure storage mapping model for spatial digital twins, which can securely and reliably organize and map spatial data. This article also looks forward to the possible emergence of digital twins of different dimensions and scales in the future metaverse, and proposes an adaptive 3D reconstruction method based on this to adapt to digital twins models of different scales in the metaverse. On the basis of our work, we will further promote the development of the spatial digital twin metaverse.
35.	Alshareef, Hanaa, 1985, et al. (author) Transforming data flow diagrams for privacy compliance 2021 In: MODELSWARD 2021 - Proceedings of the 9th International Conference on Model-Driven Engineering and Software Development. - : SCITEPRESS - Science and Technology Publications. ; , s. 207-215 Conference paper (peer-reviewed)abstract Most software design tools, as for instance Data Flow Diagrams (DFDs), are focused on functional aspects and cannot thus model non-functional aspects like privacy. In this paper, we provide an explicit algorithm and a proof-of-concept implementation to transform DFDs into so-called Privacy-Aware Data Flow Diagrams (PA-DFDs). Our tool systematically inserts privacy checks to a DFD, generating a PA-DFD. We apply our approach to two realistic applications from the construction and online retail sectors.
36.	Bender, Benedikt, et al. (author) Patterns in the Press Releases of Trade Unions: How toUse Structural Topic Models in the Field of Industrial Relations 2022 In: Industrielle Beziehungen. - : Verlag Barbara Budrich GmbH. - 0943-2779 .- 1862-0035. ; 29:2, s. 91-116 Journal article (peer-reviewed)abstract Quantitative text analysis and the use of large data sets have received only limited attention in the field of Industrial Relations. This is unfortunate, given the variety of opportunities and possibilities these methods can address. We demonstrate the use of one promising technique of quantitative text analysis – the Structural Topic Model (STM) – to test the Insider-Outsider theory. This technique allowed us to find underlying topics in atext corpus of nearly 2,000 German trade union press releases (from 2000 to 2014). We provide astep-by-step overview of how to use STMsince we see this method as useful to the future of research in the field of Industrial Relations. Until now the methodological publications regarding STM mostly focus on the mathematics of the method and provide only aminimal discussion of their implementation. Instead, we provide apractical application of STM and apply this method to one of the most prominenttheories in the field of Industrial Relations. Contrary to the original Insider-Outsider arguments, but in line with thecurrent state of research, we show that unions do in fact use topics within their press releases which are relevant for both Insider and Outsider groups.
37.	de Dios, Eddie, et al. (author) Introduction to Deep Learning in Clinical Neuroscience 2022 In: Acta Neurochirurgica, Supplement. - Cham : Springer International Publishing. - 2197-8395 .- 0065-1419. ; 134, s. 79-89 Book chapter (other academic/artistic)abstract The use of deep learning (DL) is rapidly increasing in clinical neuroscience. The term denotes models with multiple sequential layers of learning algorithms, architecturally similar to neural networks of the brain. We provide examples of DL in analyzing MRI data and discuss potential applications and methodological caveats. Important aspects are data pre-processing, volumetric segmentation, and specific task-performing DL methods, such as CNNs and AEs. Additionally, GAN-expansion and domain mapping are useful DL techniques for generating artificial data and combining several smaller datasets. We present results of DL-based segmentation and accuracy in predicting glioma subtypes based on MRI features. Dice scores range from 0.77 to 0.89. In mixed glioma cohorts, IDH mutation can be predicted with a sensitivity of 0.98 and specificity of 0.97. Results in test cohorts have shown improvements of 5–7% in accuracy, following GAN-expansion of data and domain mapping of smaller datasets. The provided DL examples are promising, although not yet in clinical practice. DL has demonstrated usefulness in data augmentation and for overcoming data variability. DL methods should be further studied, developed, and validated for broader clinical use. Ultimately, DL models can serve as effective decision support systems, and are especially well-suited for time-consuming, detail-focused, and data-ample tasks.
38.	Dobslaw, Felix, 1983, et al. (author) Boundary Value Exploration for Software Analysis 2020 In: Proceedings - 2020 IEEE 13th International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2020. - : IEEE. ; , s. 346-353 Conference paper (peer-reviewed)abstract For software to be reliable and resilient, it is widely accepted that tests must be created and maintained alongside the software itself. One safeguard from vulnerabilities and failures in code is to ensure correct behavior on the boundaries between subdomains of the input space. So-called boundary value analysis (BVA) and boundary value testing (BVT) techniques aim to exercise those boundaries and increase test effectiveness. However, the concepts of BVA and BVT themselves are not generally well defined, and it is not clear how to identify relevant sub-domains, and thus the boundaries delineating them, given a specification. This has limited adoption and hindered automation. We clarify BVA and BVT and introduce Boundary Value Exploration (BVE) to describe techniques that support them by helping to detect and identify boundary inputs. Additionally, we propose two concrete BVE techniques based on information-theoretic distance functions: (i) an algorithm for boundary detection and (ii) the usage of software visualization to explore the behavior of the software under test and identify its boundary behavior. As an initial evaluation, we apply these techniques on a much used and well-tested date handling library. Our results reveal questionable behavior at boundaries highlighted by our techniques. In conclusion, we argue that the boundary value exploration that our techniques enable is a step towards automated boundary value analysis and testing, which can foster their wider use and improve test effectiveness and efficiency.
39.	Fredriksson, Teodor, 1992, et al. (author) Machine Learning Algorithms for Labeling: Where and How They are Used? 2022 In: SysCon 2022 - 16th Annual IEEE International Systems Conference, Proceedings. Conference paper (peer-reviewed)abstract With the increased availability of new and better computer processing units (CPUs) as well as graphical processing units (GPUs), the interest in statistical learning and deep learning algorithms for classification tasks has grown exponentially. These classification algorithms often require the presence of fully labeled instances during the training period for maximum classification accuracy. However, in industrial applications, data is commonly not fully labeled, which both reduces the prediction accuracy of the learning algorithms as well as increases the project cost to label the missing instances. The purpose of this paper is to survey the current state-of-the-art literature on machine learning algorithms that are used for assisted or automatic labeling and to understand where these are used. We performed a systematic mapping study and identified 52 primary studies relevant to our research. This paper provides three main contributions. First, we identify the existing machine learning algorithms for labeling and we present a taxonomy of these algorithms. Second, we identify the datasets that are used to evaluate the algorithms and we provide a mapping of the datasets based on the type of data and the application area. Third, we provide a process to support people in industry to optimally label their dataset. The results presented in this paper can be used by both researchers and practitioners aiming to improve the missing labels with the aid of machine algorithms or to select appropriate datasets to compare new state-of-the art algorithms in their respective application area.
40.	Furia, Carlo A, 1979, et al. (author) Bayesian Data Analysis in Empirical Software Engineering Research 2021 In: IEEE Transactions on Software Engineering. - 0098-5589 .- 1939-3520. ; 47:9, s. 1786-1810 Journal article (peer-reviewed)abstract IEEE Statistics comes in two main flavors: frequentist and Bayesian. For historical and technical reasons, frequentist statistics have traditionally dominated empirical data analysis, and certainly remain prevalent in empirical software engineering. This situation is unfortunate because frequentist statistics suffer from a number of shortcomings---such as lack of flexibility and results that are unintuitive and hard to interpret---that curtail their effectiveness when dealing with the heterogeneous data that is increasingly available for empirical analysis of software engineering practice. In this paper, we pinpoint these shortcomings, and present Bayesian data analysis techniques that provide tangible benefits---as they can provide clearer results that are simultaneously robust and nuanced. After a short, high-level introduction to the basic tools of Bayesian statistics, we present the reanalysis of two empirical studies on the effectiveness of automatically generated tests and the performance of programming languages, respectively. By contrasting the original frequentist analyses with our new Bayesian analyses, we demonstrate the concrete advantages of the latter. To conclude we advocate a more prominent role for Bayesian statistical techniques in empirical software engineering research and practice.
41.	Hagström, Lovisa, 1995 (author) A Picture is Worth a Thousand Words: Natural Language Processing in Context 2023 Licentiate thesis (other academic/artistic)abstract Modern NLP models learn language from lexical co-occurrences. While this method has allowed for significant breakthroughs, it has also exposed potential limitations of modern NLP methods. For example, NLP models are prone to hallucinate, represent a biased world view and may learn spurious correlations to solve the data instead of the task at hand. This is to some extent the consequence of training the models exclusively on text. In text, concepts are only defined by the words that accompany them and the information in text is incomplete due to reporting bias. In this work, we investigate whether additional context in the form of multimodal information can be used to improve on the representations of modern NLP models. Specifically, we consider BERT-based vision-and-language models that receive additional context from images. We hypothesize that visual training primarily should improve on the visual commonsense knowledge, i.e. obvious knowledge about visual properties, of the models. To probe for this knowledge we develop the evaluation tasks Memory Colors and Visual Property Norms. Generally, we find that the vision-and-language models considered do not outperform unimodal model counterparts. In addition to this, we find that the models switch their answer depending on prompt when evaluated for the same type of knowledge. We conclude that more work is needed on understanding and developing vision-and-language models, and that extra focus should be put on how to successfully fuse image and language processing. We also reconsider the usefulness of measuring commonsense knowledge in models that cannot represent factual knowledge.
42.	Hagström, Lovisa, 1995, et al. (author) What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge 2022 In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 252–261, Dublin, Ireland. - : Association for Computational Linguistics. - 9781955917230 Conference paper (peer-reviewed)abstract There are limitations in learning language from text alone. Therefore, recent focus has been on developing multimodal models. However, few benchmarks exist that can measure what language models learn about language from multimodal training. We hypothesize that training on a visual modality should improve on the visual commonsense knowledge in language models. Therefore, we introduce two evaluation tasks for measuring visual commonsense knowledge in language models (code publicly available at: github.com/lovhag/measure-visual-commonsense-knowledge) and use them to evaluate different multimodal models and unimodal baselines. Primarily, we find that the visual commonsense knowledge is not significantly different between the multimodal models and unimodal baseline models trained on visual text data.
43.	Hebig, Regina, 1984, et al. (author) How do students experience and judge software comprehension techniques? 2020 In: IEEE International Conference on Program Comprehension. - New York, NY, USA : ACM. ; , s. 425-435 Conference paper (peer-reviewed)abstract Today, there is a wide range of techniques to support softwarecomprehension. However, we do not fully understand yet whattechniques really help novices, to comprehend a software system.In this paper, we present a master level project course on softwareevolution, which has a large focus on software comprehension. Wecollected data about student's experience with diverse comprehension techniques during focus group discussions over the course oftwo years. Our results indicate that systematic code reading canbe supported by additional techniques to guiding reading efforts.Most techniques are considered valuable for gaining an overviewand some techniques are judged to be helpful only in later stagesof software comprehension efforts.
44.	Holtmann, Jörg, 1979, et al. (author) Exploiting Meta-Model Structures in the Generation of Xtext Editors 2023 In: Proceedings of the 11th International Conference on Model-Based Software and Systems Engineering. - Lisbon, Portugal : SCITEPRESS - Science and Technology Publications. - 9789897586330 ; 1, s. 218-225 Conference paper (peer-reviewed)abstract When generating textual editors for large and highly structured meta-models, it is possible to extend Xtext’s generator capabilities and the default implementations it provides. These extensions provide additional features such as formatters and more precise scoping for cross-references. However, for large metamodels in particular, the realization of such extensions typically is a time-consuming, awkward, and repetitive task. For some of these tasks, we motivate, present, and discuss in this position paper automatic solutions that exploit the structure of the underlying metamodel. Furthermore, we demonstrate how we used them in the development of a textual editor for EATXT, a textual concrete syntax for the automotive architecture description language EAST-ADL. This work in progress contributes to our larger goal of building a language workbench for blended modelling.
45.	Lange, Herbert, 1987, et al. (author) Learning Domain-Specific Grammars from a Small Number of Examples 2021 In: Studies in Computational Intelligence. - Cham : Springer International Publishing. - 1860-9503 .- 1860-949X. - 9783030637873 ; 939, s. 105-138 Conference paper (peer-reviewed)abstract In this chapter we investigate the problem of grammar learning from a perspective that diverges from previous approaches. These prevailing approaches to learning grammars usually attempt to infer a grammar directly from example corpora without any additional information. This either requires a large training set or suffers from bad accuracy. We instead view learning grammars as a problem of grammar restriction or subgrammar extraction. We start from a large-scale grammar (called a resource grammar) and a small number of example sentences, and find a subgrammar that still covers all the examples. To accomplish this, we formulate the problem as a constraint satisfaction problem, and use a constraint solver to find the optimal grammar. We created experiments with English, Finnish, German, Swedish, and Spanish, which show that 10–20 examples are often sufficient to learn an interesting grammar for a specific application. We also present two extensions to this basic method: we include negative examples and allow rules to be merged. The resulting grammars can more precisely cover specific linguistic phenomena. Our method, together with the extensions, can be used to provide a grammar learning system for specific applications. This system is easy-to-use, human-centric, and can be used by non-syntacticians. Based on this grammar learning method, we can build applications for computer-assisted language learning and interlingual communication, which rely heavily on the knowledge of language and domain experts who often lack the competence to develop required grammars themselves.
46.	Layegh, Amirhossein, et al. (author) DEF-PIPE : Domain Specific Language Visualization for Big Data Pipelines 2023 In: Proceedings - 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 1534-1540 Conference paper (peer-reviewed)abstract The complexity of Big Data analysis requires a combination of different software components into a pipeline performing different analysis steps. Supporting such pipelines requires different expertise provided by different actors: domain experts and technical/computing experts. The main objective of this work is to support domain experts with a tool for pipeline description, which does not require deep technical knowledge about the deployment and execution of Big Data pipelines. We present a solution to visualize Big Data pipeline description using the DEF-PIPE tool. The solution shows that the process of pipeline description is simple and intuitive for users who are not experts in computing. At the same time, DEF-PIPE automatically generates a textual description of the designed data pipelines, which contains the necessary information for simulation, adaptation, deployment, and resource management of the pipeline. In this case, a separation of concerns between the design and run-time phases of the Big Data pipeline lifecycle is supported. This solution allows us to bridge the gap between domain and technical experts. Providing libraries of steps and pipelines also allows the reusing of previously developed solutions.
47.	Lidberg, Simon, MSc. 1986-, et al. (author) A Knowledge Extraction Platform for Reproducible Decision-Support from Multi-Objective Optimization Data 2022 In: SPS2022. - Amsterdam; Berlin; Washington, DC : IOS Press. - 9781643682686 - 9781643682693 ; 21, s. 725-736 Conference paper (peer-reviewed)abstract Simulation and optimization enables companies to take decision based on data, and allows prescriptive analysis of current and future production scenarios, creating a competitive edge. However, it can be difficult to visualize and extract knowledge from the large amounts of data generated by a many-objective optimization genetic algorithm, especially with conflicting objectives. Existing tools offer capabilities for extracting knowledge in the form of clusters, rules, and connections. Although powerful, most existing software is proprietary and is therefore difficult to obtain, modify, and deploy, as well as for facilitating a reproducible workflow. We propose an open-source web-based application using commonly available packages in the R programming language to extract knowledge from data generated from simulation-based optimization. This application is then verified by replicating the experimental methodology of a peer-reviewed paper on knowledge extraction. Finally, further work is also discussed, focusing on method improvements and reproducible results.
48.	Lindgren, Erik, 1980, et al. (author) Analysis of industrial X-ray computed tomography data with deep neural networks 2021 In: Proceedings of SPIE - The International Society for Optical Engineering. - : SPIE. - 0277-786X .- 1996-756X. ; 11840 Conference paper (peer-reviewed)abstract X-ray computed tomography (XCT) is increasingly utilized industrially at material- and process development as well as in non-destructive quality control; XCT is important to many emerging manufacturing technologies, for example metal additive manufacturing. These trends lead to increased needs of safe automatic or semi-automatic data interpretation, considered an open research question for many critical high value industrial products such as within the aerospace industry. By safe, we mean that the interpretation is not allowed to unawarely or unexpectedly fail; specifically the algorithms must react sensibly to inputs dissimilar to the training data, so called out-of-distribution (OOD) inputs. In this work we explore data interpretation with deep neural networks to address: robust safe data interpretation which includes a confidence estimate with respect to OOD data, an OOD detector; generation of realistic synthetic material aw indications for the material science and nondestructive evaluation community. We have focused on industrial XCT related challenges, addressing difficulties with spatially correlated X-ray quantum noise. Results are reported on training auto-encoders (AE) and generative adversarial networks (GAN), on a publicly available XCT dataset of additively manufactured metal. We demonstrate that adding modeled X-ray noise during training reduces artefacts in the generated imperfection indications as well as improves the OOD detector performance. In addition, we show that the OOD detector can detect real and synthetic OOD data and still model the accepted in-distribution data down to the X-ray noise levels.
49.	Mohamad, Mazen, 1984, et al. (author) Asset-driven Security Assurance Cases with Built-in Quality Assurance 2021 In: 2021 IEEE/ACM 2nd International Workshop on Engineering and Cybersecurity of Critical Systems (EnCyCriS 2021). - 9781665445535 ; , s. 29-36 Conference paper (peer-reviewed)abstract Security Assurance Cases (SAC) are structured arguments and evidence bodies used to reason about the security of a certain system. SACs are gaining focus in the automotive domain as the needs for security assurance are growing. In this study, we present an approach for creating SAC. The approach is inspired by the upcoming security standards ISO/SAE-21434 as well as the internal needs of automotive Original Equipment Manufacturers (OEMs). We created the approach by extracting relevant requirements from ISO/SAE-21434 and illustrated it using an example case of the headlamp items provided in the standard. We found that the approach is applicable and helps to satisfy the requirements for security assurance in the standard as well as the internal compliance needs in an automotive OEM.
50.	Munappy, Aiswarya Raj, 1990 (author) Data management and Data Pipelines: An empirical investigation in the embedded systems domain 2021 Licentiate thesis (other academic/artistic)abstract Context: Companies are increasingly collecting data from all possible sources to extract insights that help in data-driven decision-making. Increased data volume, variety, and velocity and the impact of poor quality data on the development of data products are leading companies to look for an improved data management approach that can accelerate the development of high-quality data products. Further, AI is being applied in a growing number of fields, and thus it is evolving as a horizontal technology. Consequently, AI components are increasingly been integrated into embedded systems along with electronics and software. We refer to these systems as AI-enhanced embedded systems. Given the strong dependence of AI on data, this expansion also creates a new space for applying data management techniques. Objective: The overall goal of this thesis is to empirically identify the data management challenges encountered during the development and maintenance of AI-enhanced embedded systems, propose an improved data management approach and empirically validate the proposed approach. Method: To achieve the goal, we conducted this research in close collaboration with Software Center companies using a combination of different empirical research methods: case studies, literature reviews, and action research. Results and conclusions: This research provides five main results. First, it identifies key data management challenges specific to Deep Learning models developed at embedded system companies. Second, it examines the practices such as DataOps and data pipelines that help to address data management challenges. We observed that DataOps is the best data management practice that improves the data quality and reduces the time tdevelop data products. The data pipeline is the critical component of DataOps that manages the data life cycle activities. The study also provides the potential faults at each step of the data pipeline and the corresponding mitigation strategies. Finally, the data pipeline model is realized in a small piece of data pipeline and calculated the percentage of saved data dumps through the implementation. Future work: As future work, we plan to realize the conceptual data pipeline model so that companies can build customized robust data pipelines. We also plan to analyze the impact and value of data pipelines in cross-domain AI systems and data applications. We also plan to develop AI-based fault detection and mitigation system suitable for data pipelines.

Skapa referenser, mejla, bekava och länka

Permalink

Träfflista för sökning "hsv:(NATURVETENSKAP) hsv:(Data och informationsvetenskap) hsv:(Datavetenskap) srt2:(2020-2024)"

Refine your search

Year