SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1382 3256 OR L773:1573 7616 "

Sökning: L773:1382 3256 OR L773:1573 7616

  • Resultat 1-50 av 132
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Afzal, Wasif, et al. (författare)
  • An experiment on the effectiveness and efficiency of exploratory testing
  • 2015
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 20:3, s. 844-878
  • Tidskriftsartikel (refereegranskat)abstract
    • The exploratory testing (ET) approach is commonly applied in industry, but lacks scientific research. The scientific community needs quantitative results on the performance of ET taken from realistic experimental settings. The objective of this paper is to quantify the effectiveness and efficiency of ET vs. testing with documented test cases (test case based testing, TCT). We performed four controlled experiments where a total of 24 practitioners and 46 students performed manual functional testing using ET and TCT. We measured the number of identified defects in the 90-minute testing sessions, the detection difficulty, severity and types of the detected defects, and the number of false defect reports. The results show that ET found a significantly greater number of defects. ET also found significantly more defects of varying levels of difficulty, types and severity levels. However, the two testing approaches did not differ significantly in terms of the number of false defect reports submitted. We conclude that ET was more efficient than TCT in our experiment. ET was also more effective than TCT when detection difficulty, type of defects and severity levels are considered. The two approaches are comparable when it comes to the number of false defect reports submitted.
  •  
2.
  • Agren, P., et al. (författare)
  • Agile software development one year into the COVID-19 pandemic
  • 2022
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 27:6
  • Tidskriftsartikel (refereegranskat)abstract
    • As a result of the COVID-19 pandemic, many agile practitioners had to transition into a remote work environment. Despite remote work not being a new concept for agile software practitioners, the forced or recommended nature of remote work is new. This study investigates how the involuntary shift to remote work and how social restrictions imposed by the COVID-19 pandemic have affected agile software development (ASD), and how agile practitioners have been affected in terms of ways of working. An explanatory sequential mixed methods study was performed. Data were collected one year into the COVID-19 pandemic through a questionnaire with 96 respondents and in-depth semi-structured interviews with seven practitioners from seven different companies. Data were analyzed through Bayesian analysis and thematic analysis. The results show, in general, that the aspects of ASD that have been the most affected is communication and social interactions, while technical work aspects have not experienced the same changes. Moreover, feeling forced to work remotely has a significant impact on different aspects of ASD, e.g., productivity and communication, and industry practitioners' employment of agile development and ways of working have primarily been affected by the lack of social interaction and the shift to digital communication. The results also suggest that there may be a group maturing debt when teams do go back into office, as digital communication and the lack of psychological safety stand in the way for practitioners' ability to have sensitive discussions and progress as a team in a remote setting.
  •  
3.
  • Al Mamun, Md Abdullah, 1982, et al. (författare)
  • Effects of measurements on correlations of software code metrics
  • 2019
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 24:4, s. 2764-2818
  • Tidskriftsartikel (refereegranskat)abstract
    • ContextSoftware metrics play a significant role in many areas in the life-cycle of software including forecasting defects and foretelling stories regarding maintenance, cost, etc. through predictive analysis. Many studies have found code metrics correlated to each other at such a high level that such correlated code metrics are considered redundant, which implies it is enough to keep track of a single metric from a list of highly correlated metrics.ObjectiveSoftware is developed incrementally over a period. Traditionally, code metrics are measured cumulatively as cumulative sum or running sum. When a code metric is measured based on the values from individual revisions or commits without consolidating values from past revisions, indicating the natural development of software, this study identifies such a type of measure as organic. Density and average are two other ways of measuring metrics. This empirical study focuses on whether measurement types influence correlations of code metrics.MethodTo investigate the objective, this empirical study has collected 24 code metrics classified into four categories, according to the measurement types of the metrics, from 11,874 software revisions (i.e., commits) of 21 open source projects from eight well-known organizations. Kendall's tau-B is used for computing correlations. To determine whether there is a significant difference between cumulative and organic metrics, Mann-Whitney U test, Wilcoxon signed rank test, and paired-samples sign test are performed.ResultsThe cumulative metrics are found to be highly correlated to each other with an average coefficient of 0.79. For corresponding organic metrics, it is 0.49. When individual correlation coefficients between these two measure types are compared, correlations between organic metrics are found to be significantly lower (with p <0.01) than cumulative metrics. Our results indicate that the cumulative nature of metrics makes them highly correlated, implying cumulative measurement is a major source of collinearity between cumulative metrics. Another interesting observation is that correlations between metrics from different categories are weak.ConclusionsResults of this study reveal that measurement types may have a significant impact on the correlations of code metrics and that transforming metrics into a different type can give us metrics with low collinearity. These findings provide us a simple understanding how feature transformation to a different measurement type can produce new non-collinear input features for predictive models.
  •  
4.
  • Alégroth, Emil, et al. (författare)
  • On the long-term use of visual gui testing in industrial practice : a case study
  • 2017
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 22:6, s. 2937-2971
  • Tidskriftsartikel (refereegranskat)abstract
    • Visual GUI Testing (VGT) is a tool-driven technique for automated GUI-based testing that uses image recognition to interact with and assert the correctness of the behavior of a system through its GUI as it is shown to the user. The technique’s applicability, e.g. defect-finding ability, and feasibility, e.g. time to positive return on investment, have been shown through empirical studies in industrial practice. However, there is a lack of studies that evaluate the usefulness and challenges associated with VGT when used long-term (years) in industrial practice. This paper evaluates how VGT was adopted, applied and why it was abandoned at the music streaming application development company, Spotify, after several years of use. A qualitative study with two workshops and five well chosen employees is performed at the company, supported by a survey, which is analyzed with a grounded theory approach to answer the study’s three research questions. The interviews provide insights into the challenges, problems and limitations, but also benefits, that Spotify experienced during the adoption and use of VGT. However, due to the technique’s drawbacks, VGT has been abandoned for a new technique/framework, simply called the Test interface. The Test interface is considered more robust and flexible for Spotify’s needs but has several drawbacks, including that it does not test the actual GUI as shown to the user like VGT does. From the study’s results it is concluded that VGT can be used long-term in industrial practice but it requires organizational change as well as engineering best practices to be beneficial. Through synthesis of the study’s results, and results from previous work, a set of guidelines are presented that aim to aid practitioners to adopt and use VGT in industrial practice. However, due to the abandonment of the technique, future research is required to analyze in what types of projects the technique is, and is not, long-term viable. To this end, we also present Spotify’s Test interface solution for automated GUI-based testing and conclude that it has its own benefits and drawbacks.
  •  
5.
  • Alégroth, Emil, 1984-, et al. (författare)
  • Practitioners' best practices to Adopt, Use or Abandon Model-based Testing with Graphical models for Software-intensive Systems
  • 2022
  • Ingår i: Empirical Software Engineering. - : SPRINGER. - 1382-3256 .- 1573-7616. ; 27:5
  • Tidskriftsartikel (refereegranskat)abstract
    • Model-based testing (MBT) has been extensively researched for software-intensive systems but, despite the academic interest, adoption of the technique in industry has been sparse. This phenomenon has been observed by our industrial partners for MBT with graphical models. They perceive one cause to be a lack of evidence-based MBT guidelines that, in addition to technical guidelines, also take non-technical aspects into account. This hypothesis is supported by a lack of such guidelines in the literature. Objective: The objective of this study is to elicit, and synthesize, MBT experts' best practices for MBT with graphical models. The results aim to give guidance to practitioners and aspire to give researchers new insights to inspire future research. Method: An interview survey is conducted using deep, semi-structured, interviews with an international sample of 17 MBT experts, in different roles, from software industry. Interview results are synthesised through semantic equivalence analysis and verified by MBT experts from industrial practice. Results: 13 synthesised conclusions are drawn from which 23 best-practice guidelines are derived for the adoption, use and abandonment of the technique. In addition, observations and expert insights are discussed that help explain the lack of wide-spread adoption of MBT with graphical models in industrial practice. Conclusions: Several technical aspects of MBT are covered by the results as well as conclusions that cover process- and organizational factors. These factors relate to the mindset, knowledge, organization, mandate and resources that enable the technique to be used effectively within an organization. The guidelines presented in this work complement existing knowledge and, as a primary objective, provide guidance for industrial practitioners to better succeed with MBT with graphical models.
  •  
6.
  • Alégroth, Emil, 1984, et al. (författare)
  • Visual GUI testing in practice: challenges, problemsand limitations
  • 2015
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 20:3, s. 694-744
  • Tidskriftsartikel (refereegranskat)abstract
    • In today's software development industry, high-level tests such as Graphical User Interface (GUI) based system and acceptance tests are mostly performed with manual practices that are often costly, tedious and error prone. Test automation has been proposed to solve these problems but most automation techniques approach testing from a lower level of system abstraction. Their suitability for high-level tests has therefore been questioned. High-level test automation techniques such as Record and Replay exist, but studies suggest that these techniques suffer from limitations, e.g. sensitivity to GUI layout or code changes, system implementation dependencies, etc. Visual GUI Testing (VGT) is an emerging technique in industrial practice with perceived higher flexibility and robustness to certain GUI changes than previous high-level (GUI) test automation techniques. The core of VGT is image recognition which is applied to analyze and interact with the bitmap layer of a system's front end. By coupling image recognition with test scripts, VGT tools can emulate end user behavior on almost any GUI-based system, regardless of implementation language, operating system or platform. However, VGT is not without its own challenges, problems and limitations (CPLs) but, like for many other automated test techniques, there is a lack of empirically-based knowledge of these CPLs and how they impact industrial applicability. Crucially, there is also a lack of information on the cost of applying this type of test automation in industry. This manuscript reports an empirical, multi-unit case study performed at two Swedish companies that develop safety-critical software. It studies their transition from manual system test cases into tests automated with VGT. In total, four different test suites that together include more than 300 high-level system test cases were automated for two multi-million lines of code systems. The results show that the transitioned test cases could find defects in the tested systems and that all applicable test cases could be automated. However, during these transition projects a number of hurdles had to be addressed; a total of 58 different CPLs were identified and then categorized into 26 types. We present these CPL types and an analysis of the implications for the transition to and use of VGT in industrial software development practice. In addition, four high-level solutions are presented that were identified during the study, which would address about half of the identified CPLs. Furthermore, collected metrics on cost and return on investment of the VGT transition are reported together with information about the VGT suites' defect finding ability. Nine of the identified defects are reported, 5 of which were unknown to testers with extensive experience from using the manual test suites. The main conclusion from this study is that even though there are many challenges related to the transition and usage of VGT, the technique is still valuable, flexible and considered cost-effective by the industrial practitioners. The presented CPLs also provide decision support in the use and advancement of VGT and potentially other automated testing techniques similar to VGT, e.g. Record and Replay.
  •  
7.
  •  
8.
  • Ali, Nour, et al. (författare)
  • Architecture consistency : State of the practice, challenges and requirements
  • 2018
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 23:1, s. 224-258
  • Tidskriftsartikel (refereegranskat)abstract
    • Architecture Consistency (AC) aims to align implemented systems with their intended architectures. Several AC approaches and tools have been proposed and empirically evaluated, suggesting favourable results. In this paper, we empirically examine the state of practice with respect to Architecture Consistency, through interviews with nineteen experienced software engineers. Our goal is to identify 1) any practises that the companies these architects work for, currently undertake to achieve AC; 2) any barriers to undertaking explicit AC approaches in these companies; 3) software development situations where practitioners perceive AC approaches would be useful, and 4) AC tool needs, as perceived by practitioners. We also assess current commercial AC tool offerings in terms of these perceived needs. The study reveals that many practitioners apply informal AC approaches as there are barriers for adopting more formal and explicit approaches. These barriers are: 1) Difficulty in quantifying architectural inconsistency effects, and thus justifying the allocation of resources to fix them to senior management, 2) The near invisibility of architectural inconsistency to customers, 3) Practitioners’ reluctance towards fixing architectural inconsistencies, and 4) Practitioners perception that huge effort is required to map the system to the architecture when using more formal AC approaches and tools. Practitioners still believe that AC would be useful in supporting several of the software development activities such as auditing, evolution and ensuring quality attributes. After reviewing several commercial tools, we posit that AC tool vendors need to work on their ability to support analysis of systems made up of different technologies, that AC tools need to enhance their capabilities with respect to artefacts such as services and meta-data, and to focus more on non-maintainability architectural concerns.
  •  
9.
  • Ali, Nauman bin, et al. (författare)
  • On the search for industry-relevant regression testing research
  • 2019
  • Ingår i: Empirical Software Engineering. - New York, NY : Springer. - 1382-3256 .- 1573-7616. ; 24:4, s. 2020-2055
  • Tidskriftsartikel (refereegranskat)abstract
    • Regression testing is a means to assure that a change in the software, or its execution environment, does not introduce new defects. It involves the expensive undertaking of rerunning test cases. Several techniques have been proposed to reduce the number of test cases to execute in regression testing, however, there is no research on how to assess industrial relevance and applicability of such techniques. We conducted a systematic literature review with the following two goals: firstly, to enable researchers to design and present regression testing research with a focus on industrial relevance and applicability and secondly, to facilitate the industrial adoption of such research by addressing the attributes of concern from the practitioners' perspective. Using a reference-based search approach, we identified 1068 papers on regression testing. We then reduced the scope to only include papers with explicit discussions about relevance and applicability (i.e. mainly studies involving industrial stakeholders). Uniquely in this literature review, practitioners were consulted at several steps to increase the likelihood of achieving our aim of identifying factors important for relevance and applicability. We have summarised the results of these consultations and an analysis of the literature in three taxonomies, which capture aspects of industrial-relevance regarding the regression testing techniques. Based on these taxonomies, we mapped 38 papers reporting the evaluation of 26 regression testing techniques in industrial settings. © The Author(s) 2019
  •  
10.
  • Almulla, H., et al. (författare)
  • Learning how to search: generating effective test cases through adaptive fitness function selection
  • 2022
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 27:2
  • Tidskriftsartikel (refereegranskat)abstract
    • Search-based test generation is guided by feedback from one or more fitness functions-scoring functions that judge solution optimality. Choosing informative fitness functions is crucial to meeting the goals of a tester. Unfortunately, many goals-such as forcing the class-under-test to throw exceptions, increasing test suite diversity, and attaining Strong Mutation Coverage-do not have effective fitness function formulations. We propose that meeting such goals requires treating fitness function identification as a secondary optimization step. An adaptive algorithm that can vary the selection of fitness functions could adjust its selection throughout the generation process to maximize goal attainment, based on the current population of test suites. To test this hypothesis, we have implemented two reinforcement learning algorithms in the EvoSuite unit test generation framework, and used these algorithms to dynamically set the fitness functions used during generation for the three goals identified above. We have evaluated our framework, EvoSuiteFIT, on a set of Java case examples. EvoSuiteFIT techniques attain significant improvements for two of the three goals, and show limited improvements on the third when the number of generations of evolution is fixed. Additionally, for two of the three goals, EvoSuiteFIT detects faults missed by the other techniques. The ability to adjust fitness functions allows strategic choices that efficiently produce more effective test suites, and examining these choices offers insight into how to attain our testing goals. We find that adaptive fitness function selection is a powerful technique to apply when an effective fitness function does not already exist for achieving a testing goal.
  •  
11.
  • Andersson, Carina (författare)
  • A replicated empirical study of a selection method for software reliability growth models
  • 2007
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 12:2, s. 161-182
  • Tidskriftsartikel (refereegranskat)abstract
    • Replications are commonly considered to be important contributions to investigate the generality of empirical studies. By replicating an original study it may be shown that the results are either valid or invalid in another context, outside the specific environment in which the original study was launched. The results of the replicated study show how much confidence we could possibly have in the original study. We present a replication of a method for selecting software reliability growth models to decide whether to stop testing and release software. We applied the selection method in an empirical study, conducted in a different development environment than the original study. The results of the replication study show that with the changed values of stability and curve fit, the selection method works well on the empirical system test data available, i.e., the method was applicable in an environment that was different from the original one. The application of the SRGMs to failures during functional testing resulted in predictions with low relative error, thus providing a useful approach in giving good estimates of the total number of failures to expect during functional testing.
  •  
12.
  •  
13.
  • Assar, Saïd, et al. (författare)
  • Using Text Clustering to Predict Defect Resolution Time: A Conceptual Replication and an Evaluation of Prediction Accuracy
  • 2015
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256.
  • Tidskriftsartikel (refereegranskat)abstract
    • Defect management is a central task in software maintenance. When a defect is reported, appropriate resources must be allocated to analyze and resolve the defect. An important issue in resource allocation is the estimation of Defect Resolution Time (DRT). Prior research has considered different approaches for DRT prediction exploiting information retrieval techniques and similarity in textual defect descriptions. In this article, we investigate the potential of text clustering for DRT prediction. We build on a study published by Raja (2013) which demonstrated that clusters of similar defect reports had statistically significant differences in DRT. Raja’s study also suggested that this difference between clusters could be used for DRT prediction. Our aims are twofold: First, to conceptually replicate Raja’s study and to assess the repeatability of its results in different settings; Second, to investigate the potential of textual clustering of issue reports for DRT prediction with focus on accuracy. Using different data sets and a different text mining tool and clustering technique, we first conduct an independent replication of the original study. Then we design a fully automated prediction method based on clustering with a simulated test scenario to check the accuracy of our method. The results of our independent replication are comparable to those of the original study and we confirm the initial findings regarding significant differences in DRT between clusters of defect reports. However, the simulated test scenario used to assess our prediction method yields poor results in terms of DRT prediction accuracy. Although our replication confirms the main finding from the original study, our attempt to use text clustering as the basis for DRT prediction did not achieve practically useful levels of accuracy.
  •  
14.
  • Ayala, Claudia, et al. (författare)
  • System requirements-OSS components: matching and mismatch resolution practices – an empirical study
  • 2018
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 23:6, s. 3073-3128
  • Tidskriftsartikel (refereegranskat)abstract
    • Developing systems by integrating Open Source Software (OSS) is increasingly gaining importance in the software industry. Although the literature claims that this approach highly impacts Requirements Engineering (RE) practices, there is a lack of empirical evidence to demonstrate this statement. To explore and understand problems and challenges of current system requirement–OSS component matching and mismatches resolution practices in software development projects that integrate one or more OSS components into their software products. Semi-structured in-depth interviews with 25 respondents that have performed RE activities in software development projects that integrate OSS components in 25 different software development companies in Spain, Norway, Sweden, and Denmark. The study uncovers 15 observations regarding system requirements-OSS components matching and mismatch resolution practices used in industrial projects that integrate OSS components. The assessed projects focused mainly on pre-release stages of software applications that integrate OSS components in an opportunistic way. The results also provide details of a set of previously unexplored scenarios when solving system requirement–OSS component mismatches; and clarify some challenges and related problems. For instance, although licensing issues and the potential changes in OSS components by their corresponding communities and/or changes in system requirements have been greatly discussed in the RE literature as problems for OSS component integration, they did not appear to be relevant in our assessed projects. Instead, practitioners highlighted the problem of getting suitable OSS component documentation/information.
  •  
15.
  •  
16.
  • Bate, Iain, et al. (författare)
  • WCET analysis of modern processors using multi-criteria optimisation
  • 2011
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 16:1, s. 5-28
  • Tidskriftsartikel (refereegranskat)abstract
    • The Worst-Case Execution Time (WCET) is an important execution metric for real-time systems, and an accurate estimate for this increases the reliability of subsequent schedulability analysis. Performance enhancing features on modern processors, such as pipelines and caches, however, make it difficult to accurately predict the WCET. One technique for finding the WCET is to use test data generated using search algorithms. Existing work on search-based approaches has been successfully used in both industry and academia based on a single criterion function, the WCET, but only for simple processors. This paper investigates how effective this strategy is for more complex processors and to what extent other criteria help guide the search, e.g. the number of cache misses. Not unexpectedly the work shows no single choice of criteria work best across all problems. Based on the findings recommendations are proposed on which criteria are useful in particular situations.
  •  
17.
  • Bell, R, et al. (författare)
  • The limited impact of individual developer data on software defect prediction
  • 2013
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 18:3, s. 478-505
  • Tidskriftsartikel (refereegranskat)abstract
    • Previous research has provided evidence that a combination of static code metrics and software history metrics can be used to predict with surprising success which files in the next release of a large system will havethe largest numbers of defects. In contrast, very little research exists to indicate whether information about individual developers can profitably be used to improve predictions. We investigate whether files in a large system that are modified by an individual developer consistently contain either more or fewer faults than the average of all files in the system. The goal of the investigation is to determine whether information about which particular developer modified a file is able to improve defect predictions. We also extend earlier research evaluating use of counts of the number of developers who modified a file as predictors of the file's future faultiness. We analyze change reports filed for three large systems, each containing 18 releases, with a combined total of nearly 4 million LOC and over 11,000 files. A buggy file ratio is defined for programmers, measuring the proportion of faulty files in Release R out of all files modified by the programmer in Release R-1. We assess the consistency of the buggy file ratio across releases for individual programmers both visually and within the context of a fault prediction model. Buggy file ratios for individual programmers often varied widely across all the releases that they participated in. A prediction model that takes account of the history of faulty files that were changed by individual developers shows improvement over the standard negative binomial model of less than 0.13% according to one measure, and no improvement at all according to another measure. In contrast, augmenting a standard model with counts of cumulative developers changing files in prior releases produced up to a 2% improvement in the percentage of faults detected in the top 20% of predicted faulty files. The cumulative number of developers interacting with a file can be a useful variable for defect prediction. However, the study indicates that adding information to a model about which particular developermodified a file is not likely to improve defect predictions.
  •  
18.
  • Berger, Thorsten, 1981, et al. (författare)
  • The state of adoption and the challenges of systematic variability management in industry
  • 2020
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; :25, s. 1755-1797
  • Tidskriftsartikel (refereegranskat)abstract
    • Handling large-scale software variability is still a challenge for many organizations. After decades of research on variability management concepts, many industrial organizations have introduced techniques known from research, but still lament that pure textbook approaches are not applicable or efficient. For instance, software product line engineering-an approach to systematically develop portfolios of products-is difficult to adopt given the high upfront investments; and even when adopted, organizations are challenged by evolving their complex product lines. Consequently, the research community now mainly focuses on re-engineering and evolution techniques for product lines; yet, understanding the current state of adoption and the industrial challenges for organizations is necessary to conceive effective techniques. In this multiple-case study, we analyze the current adoption of variability management techniques in twelve medium- to large-scale industrial cases in domains such as automotive, aerospace or railway systems. We identify the current state of variability management, emphasizing the techniques and concepts they adopted. We elicit the needs and challenges expressed for these cases, triangulated with results from a literature review. We believe our results help to understand the current state of adoption and shed light on gaps to address in industrial practice.
  •  
19.
  • Bernardi, Simona, et al. (författare)
  • A systematic approach for performance assessment using process mining : An industrial experience report
  • 2018
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 23:6, s. 3394-3441
  • Tidskriftsartikel (refereegranskat)abstract
    • Software performance engineering is a mature field that offers methods to assess system performance. Process mining is a promising research field applied to gain insight on system processes. The interplay of these two fields opens promising applications in the industry. In this work, we report our experience applying a methodology, based on process mining techniques, for the performance assessment of a commercial data-intensive software application. The methodology has successfully assessed the scalability of future versions of this system. Moreover, it has identified bottlenecks components and replication needs for fulfilling business rules. The system, an integrated port operations management system, has been developed by Prodevelop, a medium-sized software enterprise with high expertise in geospatial technologies. The performance assessment has been carried out by a team composed by practitioners and researchers. Finally, the paper offers a deep discussion on the lessons learned during the experience, that will be useful for practitioners to adopt the methodology and for researcher to find new routes.
  •  
20.
  • Bjarnason, Elizabeth, et al. (författare)
  • Challenges and practices in aligning requirements with verification and validation : a case study of six companies
  • 2014
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 19:6, s. 1809-1855
  • Tidskriftsartikel (refereegranskat)abstract
    • Weak alignment of requirements engineering (RE) with verification and validation (VV) may lead to problems in delivering the required products in time with the right quality. For example, weak communication of requirements changes to testers may result in lack of verification of new requirements and incorrect verification of old invalid requirements, leading to software quality problems, wasted effort and delays. However, despite the serious implications of weak alignment research and practice both tend to focus on one or the other of RE or VV rather than on the alignment of the two. We have performed a multi-unit case study to gain insight into issues around aligning RE and VV by interviewing 30 practitioners from 6 software developing companies, involving 10 researchers in a flexible research process for case studies. The results describe current industry challenges and practices in aligning RE with VV, ranging from quality of the individual RE and VV activities, through tracing and tools, to change control and sharing a common understanding at strategy, goal and design level. The study identified that human aspects are central, i.e. cooperation and communication, and that requirements engineering practices are a critical basis for alignment. Further, the size of an organisation and its motivation for applying alignment practices, e.g. external enforcement of traceability, are variation factors that play a key role in achieving alignment. Our results provide a strategic roadmap for practitioners improvement work to address alignment challenges. Furthermore, the study provides a foundation for continued research to improve the alignment of RE with VV.
  •  
21.
  • Bjarnason, Elizabeth, et al. (författare)
  • Improving requirements-test alignment by prescribing practices that mitigate communication gaps
  • 2019
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 24:4, s. 2364-2409
  • Tidskriftsartikel (refereegranskat)abstract
    • The communication of requirements within software development is vital for project success. Requirements engineering and testing are two processes that when aligned can enable the discovery of issues and misunderstandings earlier, rather than later, and avoid costly and time-consuming rework and delays. There are a number of practices that support requirements-test alignment. However, each organisation and project is different and there is no one-fits-all set of practices. The software process improvement method called Gap Finder is designed to increase requirements-tes t alignment. The method contains two parts: an assessment part and a prescriptive part. It detects potential communication gaps between people and between artefacts (the assessment part), and identifies practices for mitigating these gaps (the prescriptive part). This paper presents the design and formative evaluation of the prescriptive part; an evaluation of the assessment part was published previously. The Gap Finder method was constructed using a design science research approach and is built on the Theory of Distances for Software Engineering, which in turn is grounded in empirical evidence from five case companies. The formative evaluation was performed through a case study in which Gap Finder was applied to an on-going development project. A qualitative and mixed-method approachwas taken in the evaluation, including ethnographically-informed observations. The results show that Gap Finder can detect relevant communication gaps and seven of the nine prescribed practices were deemed practically relevant for mitigating these gaps. The project team found the method to be useful and supported joint reflection and improvement of their requirementscommunication. Our findings demonstrate that an empirically-based theory can be used to improve software development practices and provide a foundation for further research on factors that affect requirements communication.
  •  
22.
  • Bjarnason, Elizabeth, et al. (författare)
  • Inter-Team Communication in Large-Scale Co-Located Software Engineering: A Case Study
  • 2022
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 27:2
  • Tidskriftsartikel (refereegranskat)abstract
    • Large-scale software engineering is a collaborative effort where teams need to communicate to develop software products. Managers face the challenge of how to organise work to facilitate necessary communication between teams and individuals. This includes a range of decisions from distributing work over teams located in multiple buildings and sites, through work processes and tools for coordinating work, to softer issues including ensuring well-functioning teams. In this case study, we focus on inter-team communication by considering geographical, cognitive and psychological distances between teams, and factors and strategies that can affect this communication. Data was collected for ten test teams within a large development organisation, in two main phases: (1) measuring cognitive and psychological distance between teams using interactive posters, and (2) five focus group sessions where the obtained distance measurements were discussed. We present ten factors and five strategies, and how these relate to inter-team communication. We see three types of arenas that facilitate inter-team communication, namely physical, virtual and organisational arenas. Our findings can support managers in assessing and improving communication within large development organisations. In addition, the findings can provide insights into factors that may explain the challenges of scaling development organisations, in particular agile organisations that place a large emphasis on direct communication over written documentation.
  •  
23.
  • Bjarnason, Elizabeth, et al. (författare)
  • Software selection in large-scale software engineering : A model and criteria based on interactive rapid reviews
  • 2023
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 28:2
  • Tidskriftsartikel (refereegranskat)abstract
    • Context: Software selection in large-scale software development continues to be ad hoc and ill-structured. Previous proposals for software component selection tend to be technology-specific and/or do not consider business or ecosystem concerns. Objective: Our main aim is to develop an industrially relevant technology-agnostic method that can support practitioners in making informed decisions when selecting software components for use in tools or in products based on a holistic perspective of the overall environment. Method: We used method engineering to iteratively develop a software selection method for Ericsson AB based on a combination of published research and practitioner insights. We used interactive rapid reviews to systematically identify and analyse scientific literature and to support close cooperation and co-design with practitioners from Ericsson. The model has been validated through a focus group and by practical use at the case company. Results: The model consists of a high-level selection process and a wide range of criteria for assessing and for evaluating software to include in business products and tools. Conclusions: We have developed an industrially relevant model for component selection through active engagement from a company. Co-designing the model based on previous knowledge demonstrates a viable approach to industry-academia collaboration and provides a practical solution that can support practitioners in making informed decisions based on a holistic analysis of business, organisation and technical factors. © 2023, The Author(s).
  •  
24.
  • Blincoe, Kelly, et al. (författare)
  • High-level software requirements and iteration changes : a predictive model
  • 2019
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 24:3, s. 1610-1648
  • Tidskriftsartikel (refereegranskat)abstract
    • Knowing whether a software feature will be completed in its planned iteration can help with release planning decisions. However, existing research has focused on predictions of only low-level software tasks, like bug fixes. In this paper, we describe a mixed-method empirical study on three large IBM projects. We investigated the types of iteration changes that occur. We show that up to 54% of high-level requirements do not make their planned iteration. Requirements are most often pushed out to the next iteration, but high-level requirements are also commonly moved to the next minor or major release or returned to the product or release backlog. We developed and evaluated a model that uses machine learning to predict if a high-level requirement will be completed within its planned iteration. The model includes 29 features that were engineered based on prior work, interviews with IBM developers, and domain knowledge. Predictions were made at four different stages of the requirement lifetime. Our model is able to achieve up to 100% precision. We ranked the importance of our model features and found that some features are highly dependent on project and prediction stage. However, some features (e.g., the time remaining in the iteration and creator of the requirement) emerge as important across all projects and stages. We conclude with a discussion on future research directions.
  •  
25.
  • Borg, Markus, et al. (författare)
  • Recovering from a Decade: A Systematic Mapping of Information Retrieval Approaches to Software Traceability
  • 2014
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 19:6, s. 1565-1616
  • Tidskriftsartikel (refereegranskat)abstract
    • Engineers in large-scale software development have to manage large amounts of information, spread across many artifacts. Several researchers have proposed expressing retrieval of trace links among artifacts, i.e. trace recovery, as an Information Retrieval (IR) problem. The objective of this study is to produce a map of work on IR-based trace recovery, with a particular focus on previous evaluations and strength of evidence. We conducted a systematic mapping of IR-based trace recovery. Of the 79 publications classified, a majority applied algebraic IR models. While a set of studies on students indicate that IR-based trace recovery tools support certain work tasks, most previous studies do not go beyond reporting precision and recall of candidate trace links from evaluations using datasets containing less than 500 artifacts. Our review identified a need of industrial case studies. Furthermore, we conclude that the overall quality of reporting should be improved regarding both context and tool details, measures reported, and use of IR terminology. Finally, based on our empirical findings, we present suggestions on how to advance research on IR-based trace recovery.
  •  
26.
  • Businge, J., et al. (författare)
  • Reuse and maintenance practices among divergent forks in three software ecosystems
  • 2022
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 27:2
  • Tidskriftsartikel (refereegranskat)abstract
    • With the rise of social coding platforms that rely on distributed version control systems, software reuse is also on the rise. Many software developers leverage this reuse by creating variants through forking, to account for different customer needs, markets, or environments. Forked variants then form a so-called software family; they share a common code base and are maintained in parallel by same or different developers. As such, software families can easily arise within software ecosystems, which are large collections of interdependent software components maintained by communities of collaborating contributors. However, little is known about the existence and characteristics of such families within ecosystems, especially about their maintenance practices. Improving our empirical understanding of such families will help build better tools for maintaining and evolving such families. We empirically explore maintenance practices in such fork-based software families within ecosystems of open-source software. Our focus is on three of the largest software ecosystems existence today: Android, .NET, and JavaScript. We identify and analyze software families that are maintained together and that exist both on the official distribution platform (Google play, nuget, and npm) as well as on GitHub , allowing us to analyze reuse practices in depth. We mine and identify 38 software families, 526 software families, and 8,837 software families from the ecosystems of Android, .NET, and JavaScript, to study their characteristics and code-propagation practices. We provide scripts for analyzing code integration within our families. Interestingly, our results show that there is little code integration across the studied software families from the three ecosystems. Our studied families also show that techniques of direct integration using git outside of GitHub is more commonly used than GitHub pull requests. Overall, we hope to raise awareness about the existence of software families within larger ecosystems of software, calling for further research and better tools support to effectively maintain and evolve them.
  •  
27.
  • Börstler, Jürgen, 1960-, et al. (författare)
  • Developers talking about code quality
  • 2023
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 28:6
  • Tidskriftsartikel (refereegranskat)abstract
    • There are many aspects of code quality, some of which are difficult to capture or to measure. Despite the importance of software quality, there is a lack of commonly accepted measures or indicators for code quality that can be linked to quality attributes. We investigate software developers’ perceptions of source code quality and the practices they recommend to achieve these qualities. We analyze data from semi-structured interviews with 34 professional software developers, programming teachers and students from Europe and the U.S. For the interviews, participants were asked to bring code examples to exemplify what they consider good and bad code, respectively. Readability and structure were used most commonly as defining properties for quality code. Together with documentation, they were also suggested as the most common target properties for quality improvement. When discussing actual code, developers focused on structure, comprehensibility and readability as quality properties. When analyzing relationships between properties, the most commonly talked about target property was comprehensibility. Documentation, structure and readability were named most frequently as source properties to achieve good comprehensibility. Some of the most important source code properties contributing to code quality as perceived by developers lack clear definitions and are difficult to capture. More research is therefore necessary to measure the structure, comprehensibility and readability of code in ways that matter for developers and to relate these measures of code structure, comprehensibility and readability to common software quality attributes.
  •  
28.
  • Danglot, Benjamin, et al. (författare)
  • An approach and benchmark to detect behavioral changes of commits in continuous integration
  • 2020
  • Ingår i: Empirical Software Engineering. - : Springer Nature. - 1382-3256 .- 1573-7616. ; 25:4, s. 2379-2415
  • Tidskriftsartikel (refereegranskat)abstract
    • When a developer pushes a change to an application’s codebase, a good practice is to have a test case specifying this behavioral change. Thanks to continuous integration (CI), the test is run on subsequent commits to check that they do no introduce a regression for that behavior. In this paper, we propose an approach that detects behavioral changes in commits. As input, it takes a program, its test suite, and a commit. Its output is a set of test methods that capture the behavioral difference between the pre-commit and post-commit versions of the program. We call our approach DCI (Detecting behavioral changes in CI). It works by generating variations of the existing test cases through (i) assertion amplification and (ii) a search-based exploration of the input space. We evaluate our approach on a curated set of 60 commits from 6 open source Java projects. To our knowledge, this is the first ever curated dataset of real-world behavioral changes. Our evaluation shows that DCI is able to generate test methods that detect behavioral changes. Our approach is fully automated and can be integrated into current development processes. The main limitations are that it targets unit tests and works on a relatively small fraction of commits. More specifically, DCI works on commits that have a unit test that already executes the modified code. In practice, from our benchmark projects, we found 15.29% of commits to meet the conditions required by DCI.
  •  
29.
  • Danglot, Benjamin, et al. (författare)
  • Automatic test improvement with DSpot : a study with ten mature open-source projects
  • 2019
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 24:4, s. 2603-2635
  • Tidskriftsartikel (refereegranskat)abstract
    • In the literature, there is a rather clear segregation between manually written tests by developers and automatically generated ones. In this paper, we explore a third solution: to automatically improve existing test cases written by developers. We present the concept, design and implementation of a system called DSpot, that takes developer-written test cases as input (JUnit tests in Java) and synthesizes improved versions of them as output. Those test improvements are given back to developers as patches or pull requests, that can be directly integrated in the main branch of the test code base. We have evaluated DSpot in a deep, systematic manner over 40 real-world unit test classes from 10 notable and open-source software projects. We have amplified all test methods from those 40 unit test classes. In 26/40 cases, DSpot is able to automatically improve the test under study, by triggering new behaviors and adding new valuable assertions. Next, for ten projects under consideration, we have proposed a test improvement automatically synthesized by DSpot to the lead developers. In total, 13/19 proposed test improvements were accepted by the developers and merged into the main code base. This shows that DSpot is capable of automatically improving unit-tests in real-world, large scale Java software.
  •  
30.
  • Danglot, Benjamin, et al. (författare)
  • Correctness attraction : a study of stability of software behavior under runtime perturbation
  • 2018
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 23:4, s. 2086-2119
  • Tidskriftsartikel (refereegranskat)abstract
    • Can the execution of software be perturbed without breaking the correctness of the output? In this paper, we devise a protocol to answer this question from a novel perspective. In an experimental study, we observe that many perturbations do not break the correctness in ten subject programs. We call this phenomenon “correctness attraction”. The uniqueness of this protocol is that it considers a systematic exploration of the perturbation space as well as perfect oracles to determine the correctness of the output. To this extent, our findings on the stability of software under execution perturbations have a level of validity that has never been reported before in the scarce related work. A qualitative manual analysis enables us to set up the first taxonomy ever of the reasons behind correctness attraction.
  •  
31.
  • Daniela S., Cruzes, et al. (författare)
  • Case studies synthesis: a thematic, cross-case, and narrative synthesis worked example
  • 2015
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 20:6, s. 1634-1665
  • Tidskriftsartikel (refereegranskat)abstract
    • Case studies are largely used for investigating software engineering practices. They are characterized by their flexible nature, multiple forms of data collection, and are mostly informed by qualitative data. Synthesis of case studies is necessary to build a body of knowledge from individual cases. There are many methods for such synthesis, but they are yet not well explored in software engineering. The objective of this research is to demonstrate the similarities and differences of the results and conclusions when applying three different methods of synthesis, and to discuss the challenges of synthesizing evidence from reported case studies in SE. We describe a worked example of three such methods where three independent teams synthesized two studies that investigated critical factors of trust in outsourced projects through thematic synthesis and cross-case analysis, and compared these to each other and also to an already published narrative synthesis. In addition, despite that the primary studies were well presented for synthesis, we identified challenges in the use of case studies synthesis methods related to the goals and research questions of the synthesis, the types and number of case studies, variations in context, limited access to raw data, and quality of the case studies. Thus, we recommend that the analysts should be aware of these challenges and try to account for them during the execution of the synthesis. We also recommend that analysts consider using more than one method of synthesis for sake of reliability of the results and conclusions.
  •  
32.
  • Dittrich, Yvonne, et al. (författare)
  • Cooperative method development : Combining qualitative empirical research with method, techniqueand process improvement
  • 2008
  • Ingår i: Empirical Software Engineering. - : Springer Netherlands. - 1382-3256 .- 1573-7616. ; 13:3, s. 231-260
  • Tidskriftsartikel (refereegranskat)abstract
    • The development of methods tools and process improvements is best to be based on the understanding of the development practice to be supported. Qualitative research has been proposed as a method for understanding the social and cooperative aspects of software development. However, qualitative research is not easily combined with the improvement orientation of an engineering discipline. During the last 6 years, we have applied an approach we call ‘cooperative method development’, which combines qualitative social science fieldwork, with problem-oriented method, technique and process improvement. The action research based approach focusing on shop floor software development practices allows an understanding of how contextual contingencies influence the deployment and applicability of methods, processes and techniques. This article summarizes the experiences and discusses the further development of this approach based on several research projects in cooperation with industrial partners.
  •  
33.
  • Dyba, Tore, et al. (författare)
  • Qualitative research in software engineering
  • 2011
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 16:4, s. 425-429
  • Tidskriftsartikel (refereegranskat)
  •  
34.
  • Engström, Emelie, et al. (författare)
  • How software engineering research aligns with design science : a review
  • 2020
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 25:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Assessing and communicating software engineering research can be challenging. Design science is recognized as an appropriate research paradigm for applied research, but is rarely explicitly used as a way to present planned or achieved research contributions in software engineering. Applying the design science lens to software engineering research may improve the assessment and communication of research contributions. Aim: The aim of this study is 1) to understand whether the design science lens helps summarize and assess software engineering research contributions, and 2) to characterize different types of design science contributions in the software engineering literature. Method: In previous research, we developed a visual abstract template, summarizing the core constructs of the design science paradigm. In this study, we use this template in a review of a set of 38 award winning software engineering publications to extract, analyze and characterize their design science contributions. Results: We identified five clusters of papers, classifying them according to their different types of design science contributions. Conclusions: The design science lens helps emphasize the theoretical contribution of research output—in terms of technological rules—and reflect on the practical relevance, novelty and rigor of the rules proposed by the research.
  •  
35.
  • Ernst, Neil A., et al. (författare)
  • Understanding peer review of software engineering papers
  • 2021
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 26:5
  • Tidskriftsartikel (refereegranskat)abstract
    • Context: Peer review is a key activity intended to preserve the quality and integrity of scientific publications. However, in practice it is far from perfect. Objective: We aim at understanding how reviewers, including those who have won awards for reviewing, perform their reviews of software engineering papers to identify both what makes a good reviewing approach and what makes a good paper. Method: We first conducted a series of interviews with recognised reviewers in the software engineering field. Then, we used the results of those interviews to develop a questionnaire used in an online survey and sent out to reviewers from well-respected venues covering a number of software engineering disciplines, some of whom had won awards for their reviewing efforts. Results: We analyzed the responses from the interviews and from 175 reviewers who completed the online survey (including both reviewers who had won awards and those who had not). We report on several descriptive results, including: Nearly half of award-winners (45%) are reviewing 20+ conference papers a year, while 28% of non-award winners conduct that many. The majority of reviewers (88%) are taking more than two hours on journal reviews. We also report on qualitative results. Our findings suggest that the most important criteria of a good review is that it should be factual and helpful, which ranked above others such as being detailed or kind. The most important features of papers that result in positive reviews are a clear and supported validation, an interesting problem, and novelty. Conversely, negative reviews tend to result from papers that have a mismatch between the method and the claims and from papers with overly grandiose claims. Further insights include, if not limited to, that reviewers view data availability and its consistency as being important or that authors need to make their contribution of the work very clear in their paper. Conclusions: Based on the insights we gained through our study, we conclude our work by compiling a proto-guideline for reviewing. One hope we associate with our work is to contribute to the ongoing debate and contemporary effort to further improve our peer review models in the future. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  •  
36.
  • Falessi, Davide, et al. (författare)
  • Empirical software engineering experts on the use of students and professionals in experiments
  • 2018
  • Ingår i: Empirical Software Engineering. - : Springer New York LLC. - 1382-3256 .- 1573-7616. ; 23:1, s. 452-489
  • Tidskriftsartikel (refereegranskat)abstract
    • [Context] Controlled experiments are an important empirical method to generate and validate theories. Many software engineering experiments are conducted with students. It is often claimed that the use of students as participants in experiments comes at the cost of low external validity while using professionals does not. [Objective] We believe a deeper understanding is needed on the external validity of software engineering experiments conducted with students or with professionals. We aim to gain insight about the pros and cons of using students and professionals in experiments. [Method] We performed an unconventional, focus group approach and a follow-up survey. First, during a session at ISERN 2014, 65 empirical researchers, including the seven authors, argued and discussed the use of students in experiments with an open mind. Afterwards, we revisited the topic and elicited experts’ opinions to foster discussions. Then we derived 14 statements and asked the ISERN attendees excluding the authors, to provide their level of agreement with the statements. Finally, we analyzed the researchers’ opinions and used the findings to further discuss the statements. [Results] Our survey results showed that, in general, the respondents disagreed with us about the drawbacks of professionals. We, on the contrary, strongly believe that no population (students, professionals, or others) can be deemed better than another in absolute terms. [Conclusion] Using students as participants remains a valid simplification of reality needed in laboratory contexts. It is an effective way to advance software engineering theories and technologies but, like any other aspect of study settings, should be carefully considered during the design, execution, interpretation, and reporting of an experiment. The key is to understand which developer population portion is being represented by the participants in an experiment. Thus, a proposal for describing experimental participants is put forward.
  •  
37.
  •  
38.
  • Fernandez-Saez, A. M., et al. (författare)
  • An industrial case study on the use of UML in software maintenance and its perceived benefits and hurdles
  • 2018
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 23:6, s. 3281-3345
  • Tidskriftsartikel (refereegranskat)abstract
    • UML is a commonly-used graphical language for the modelling of software. Works regarding UML's effectiveness have studied projects that develop software systems from scratch. Yet the maintenance of software consumes a large share of the overall time and effort required to develop software systems. This study, therefore, focuses on the use of UML in software maintenance. We wish to elicit the practices of the software modelling used during maintenance in industry and understand what are perceived as hurdles and benefits when using modelling. In order to achieve a high level of realism, we performed a case study in a multinational company's ICT department. The analysis is based on 31 interviews with employees who work on software maintenance projects. The interviewees played different roles and provided complementary views about the use, hurdles and benefits of software modelling and the use of UML. Our study uncovered a broad range of modelling-related practices, which are presented in a theoretical framework that illustrates how these practices are linked to the specific goals and context of software engineering projects. We present a list of recommended practices that contribute to the increased effectiveness of software modelling. The use of software modelling notations (like UML) is considered beneficial for software maintenance, but needs to be tailored to its context. Various practices that contribute to the effective use of modelling are commonly overlooked, suggesting that a more conscious holistic approach with which to integrate modelling practices into the overall software engineering approach is required.
  •  
39.
  • Fernandez-Saez, A. M., et al. (författare)
  • Does the level of detail of UML diagrams affect the maintainability of source code? A family of experiments
  • 2016
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 21:1, s. 212-259
  • Tidskriftsartikel (refereegranskat)abstract
    • Although the UML is considered to be the de facto standard notation with which to model software, there is still resistance to model-based development. UML modeling is perceived to be expensive and not necessarily cost-effective. It is therefore important to collect empirical evidence concerning the conditions under which the use of UML makes a practical difference. The focus of this paper is to investigate whether and how the Level of Detail (LoD) of UML diagrams impacts on the performance of maintenance tasks in a model-centric approach. A family of experiments consisting of one controlled experiment and three replications has therefore been carried out with 81 students with different abilities and levels of experience from 3 countries (The Netherlands, Spain, and Italy). The analysis of the results of the experiments indicates that there is no strong statistical evidence as to the influence of different LoDs. The analysis suggests a slight tendency toward better results when using low LoD UML diagrams, especially if used for the modification of the source code, while a high LoD would appear to be helpful in understanding the system. The participants in our study also favored low LoD diagrams because they were perceived as easier to read. Although the participants expressed a preference for low LoD diagrams, no statistically significant conclusions can be drawn from the set of experiments. One important finding attained from this family of experiments was that the participants minimized or avoided the use of UML diagrams, regardless of their LoD. This effect was probably the result of using small software systems from well-known domains as experimental materials.
  •  
40.
  • García Gonzalo, Sergio, 1989, et al. (författare)
  • Software variability in service robotics
  • 2023
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 28:2
  • Tidskriftsartikel (refereegranskat)abstract
    • Robots artificially replicate human capabilities thanks to their software, the main embodiment of intelligence. However, engineering robotics software has become increasingly challenging. Developers need expertise from different disciplines as well as they are faced with heterogeneous hardware and uncertain operating environments. To this end, the software needs to be variable—to customize robots for different customers, hardware, and operating environments. However, variability adds substantial complexity and needs to be managed—yet, ad hoc practices prevail in the robotics domain, challenging effective software reuse, maintenance, and evolution. To improve the situation, we need to enhance our empirical understanding of variability in robotics. We present a multiple-case study on software variability in the vibrant and challenging domain of service robotics. We investigated drivers, practices, methods, and challenges of variability from industrial companies building service robots. We analyzed the state-of-the-practice and the state-of-the-art—the former via an experience report and eleven interviews with two service robotics companies; the latter via a systematic literature review. We triangulated from these sources, reporting observations with actionable recommendations for researchers, tool providers, and practitioners. We formulated hypotheses trying to explain our observations, and also compared the state-of-the-art from the literature with the-state-of-the-practice we observed in our cases. We learned that the level of abstraction in robotics software needs to be raised for simplifying variability management and software integration, while keeping a sufficient level of customization to boost efficiency and effectiveness in their robots’ operation. Planning and realizing variability for specific requirements and implementing robust abstractions permit robotic applications to operate robustly in dynamic environments, which are often only partially known and controllable. With this aim, our companies use a number of mechanisms, some of them based on formalisms used to specify robotic behavior, such as finite-state machines and behavior trees. To foster software reuse, the service robotics domain will greatly benefit from having software components—completely decoupled from hardware—with harmonized and standardized interfaces, and organized in an ecosystem shared among various companies.
  •  
41.
  • Garousi, Vahid, et al. (författare)
  • Characterizing industry-academia collaborations in software engineering : evidence from 101 projects
  • 2019
  • Ingår i: Empirical Software Engineering. - : Springer New York LLC. - 1382-3256 .- 1573-7616. ; 24:4, s. 2540-2602
  • Tidskriftsartikel (refereegranskat)abstract
    • Research collaboration between industry and academia supports improvement and innovation in industry and helps ensure the industrial relevance of academic research. However, many researchers and practitioners in the community believe that the level of joint industry-academia collaboration (IAC) projects in Software Engineering (SE) research is relatively low, creating a barrier between research and practice. The goal of the empirical study reported in this paper is to explore and characterize the state of IAC with respect to industrial needs, developed solutions, impacts of the projects and also a set of challenges, patterns and anti-patterns identified by a recent Systematic Literature Review (SLR) study. To address the above goal, we conducted an opinion survey among researchers and practitioners with respect to their experience in IAC. Our dataset includes 101 data points from IAC projects conducted in 21 different countries. Our findings include: (1) the most popular topics of the IAC projects, in the dataset, are: software testing, quality, process, and project managements; (2) over 90% of IAC projects result in at least one publication; (3) almost 50% of IACs are initiated by industry, busting the myth that industry tends to avoid IACs; and (4) 61% of the IAC projects report having a positive impact on their industrial context, while 31% report no noticeable impacts or were “not sure”. To improve this situation, we present evidence-based recommendations to increase the success of IAC projects, such as the importance of testing pilot solutions before using them in industry. This study aims to contribute to the body of evidence in the area of IAC, and benefit researchers and practitioners. Using the data and evidence presented in this paper, they can conduct more successful IAC projects in SE by being aware of the challenges and how to overcome them, by applying best practices (patterns), and by preventing anti-patterns. © 2019, The Author(s).
  •  
42.
  • Garousi, Vahid, et al. (författare)
  • Practical relevance of software engineering research : synthesizing the community’s voice
  • 2020
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 25:3, s. 1687-1754
  • Tidskriftsartikel (refereegranskat)abstract
    • Software engineering (SE) research should be relevant to industrial practice. There have been regular discussions in the SE community on this issue since the 1980’s, led by pioneers such as Robert Glass. As we recently passed the milestone of “50 years of software engineering”, some recent positive efforts have been made in this direction, e.g., establishing “industrial” tracks in several SE conferences. However, many researchers and practitioners believe that we, as a community, are still struggling with research relevance and utility. The goal of this paper is to synthesize the evidence and experience-based opinions shared on this topic so far in the SE community, and to encourage the community to further reflect and act on the research relevance. For this purpose, we have conducted a Multi-vocal Literature Review (MLR) of 54 systematically-selected sources (papers and non peer-reviewed articles). Instead of relying on and considering the individual opinions on research relevance, mentioned in each of the sources, the MLR aims to synthesize and provide the “holistic” view on the topic. The highlights of our MLR findings are as follows. The top three root causes of low relevance, discussed in the community, are: (1) Researchers having simplistic views (or wrong assumptions) about SE in practice; (2) Lack of connection with industry; and (3) Wrong identification of research problems. The top three suggestions for improving research relevance are: (1) Using appropriate research approaches such as action-research; (2) Choosing relevant (practical) research problems; and (3) Collaborating with industry. By synthesizing all the discussions on this important topic so far, this paper aims to encourage further discussions and actions in the community to increase our collective efforts to improve the research relevance. Furthermore, we raise the need for empirically-grounded and rigorous studies on the relevance problem in SE research, as carried out in other fields such as management science.
  •  
43.
  • Ginelli, Davide, et al. (författare)
  • A comprehensive study of code-removal patches in automated program repair
  • 2022
  • Ingår i: Empirical Software Engineering. - : Springer Nature. - 1382-3256 .- 1573-7616. ; 27:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Automatic Program Repair (APR) techniques can promisingly help reduce the cost of debugging. Many relevant APR techniques follow the generate-and-validate approach, that is, the faulty program is iteratively modified with different change operators and then validated with a test suite until a plausible patch is generated. In particular, Kali is a generate-and-validate technique developed to investigate the possibility of generating plausible patches by only removing code. Former studies show that indeed Kali successfully addressed several faults. This paper addresses the single and particular case of code-removal patches in automated program repair. We investigate the reasons and the scenarios that make their creation possible, and the relationship with patches implemented by developers. Our study reveals that code-removal patches are often insufficient to fix bugs, and proposes a comprehensive taxonomy of code-removal patches that provides evidence of the problems that may affect test suites, opening new opportunities for researchers in the field of automatic program repair.
  •  
44.
  • Grindal, Mats, et al. (författare)
  • An Evaluation of Combination Strategies for Test Case Selection
  • 2006
  • Ingår i: Empirical Software Engineering. - : Springer. - 1382-3256 .- 1573-7616. ; 11:4, s. 583-611
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents results from a comparative evaluation of five combination strategies. Combination strategies are test case selection methods that combine “interesting” values of the input parameters of a test subject to form test cases. This research comparatively evaluated five combination strategies; the All Combination strategy (AC), the Each Choice strategy (EC), the Base Choice strategy (BC), Orthogonal Arrays (OA) and the algorithm from the Automatic Efficient Test Generator (AETG). AC satisfies n-wise coverage, EC and BC satisfy 1-wise coverage, and OA and AETG satisfy pair-wise coverage. The All Combinations strategy was used as a “gold standard” strategy; it subsumes the others but is usually too expensive for practical use. The others were used in an experiment that used five programs seeded with 128 faults. The combination strategies were evaluated with respect to the number of test cases, the number of faults found, failure size, and number of decisions covered. The strategy that requires the least number of tests, Each Choice, found the smallest number of faults. Although the Base Choice strategy requires fewer test cases than Orthogonal Arrays and AETG, it found as many faults. Analysis also shows some properties of the combination strategies that appear significant. The two most important results are that the Each Choice strategy is unpredictable in terms of which faults will be revealed, possibly indicating that faults are found by chance, and that the Base Choice and the pair-wise combination strategies to some extent target different types of faults.
  •  
45.
  • Halin, Axel, et al. (författare)
  • Test them all, is it worth it? : Assessing configuration sampling on the JHipster Web development stack
  • 2019
  • Ingår i: Empirical Software Engineering. - : Springer Nature. - 1382-3256 .- 1573-7616. ; 24:2, s. 674-717
  • Tidskriftsartikel (refereegranskat)abstract
    • Many approaches for testing configurable software systems start from the same assumption: it is impossible to test all configurations. This motivated the definition of variability-aware abstractions and sampling techniques to cope with large configuration spaces. Yet, there is no theoretical barrier that prevents the exhaustive testing of all configurations by simply enumerating them if the effort required to do so remains acceptable. Not only this: we believe there is a lot to be learned by systematically and exhaustively testing a configurable system. In this case study, we report on the first ever endeavour to test all possible configurations of the industry-strength, open source configurable software system JHipster, a popular code generator for web applications. We built a testing scaffold for the 26,000+ configurations of JHipster using a cluster of 80 machines during 4 nights for a total of 4,376 hours (182 days) CPU time. We find that 35.70% configurations fail and we identify the feature interactions that cause the errors. We show that sampling strategies (like dissimilarity and 2-wise): (1) are more effective to find faults than the 12 default configurations used in the JHipster continuous integration; (2) can be too costly and exceed the available testing budget. We cross this quantitative analysis with the qualitative assessment of JHipster's lead developers.
  •  
46.
  • Hatamian, Majid, et al. (författare)
  • A privacy and security analysis of early-deployed COVID-19 contact tracing Android apps
  • 2021
  • Ingår i: Empirical Software Engineering. - : Springer Nature. - 1382-3256 .- 1573-7616. ; 26:3
  • Tidskriftsartikel (refereegranskat)abstract
    • As this article is being drafted, the SARS-CoV-2/COVID-19 pandemic is causing harm and disruption across the world. Many countries aimed at supporting their contact tracers with the use of digital contact tracing apps in order to manage and control the spread of the virus. Their idea is the automatic registration of meetings between smartphone owners for the quicker processing of infection chains. To date, there are many contact tracing apps that have already been launched and used in 2020. There has been a lot of speculations about the privacy and security aspects of these apps and their potential violation of data protection principles. Therefore, the developers of these apps are constantly criticized because of undermining users’ privacy, neglecting essential privacy and security requirements, and developing apps under time pressure without considering privacy- and security-by-design. In this study, we analyze the privacy and security performance of 28 contact tracing apps available on Android platform from various perspectives, including their code’s privileges, promises made in their privacy policies, and static and dynamic performances. Our methodology is based on the collection of various types of data concerning these 28 apps, namely permission requests, privacy policy texts, run-time resource accesses, and existing security vulnerabilities. Based on the analysis of these data, we quantify and assess the impact of these apps on users’ privacy. We aimed at providing a quick and systematic inspection of the earliest contact tracing apps that have been deployed on multiple continents. Our findings have revealed that the developers of these apps need to take more cautionary steps to ensure code quality and to address security and privacy vulnerabilities. They should more consciously follow legal requirements with respect to apps’ permission declarations, privacy principles, and privacy policy contents.
  •  
47.
  •  
48.
  • Idowu, Samuel, 1985, et al. (författare)
  • Machine learning experiment management tools: a mixed-methods empirical study
  • 2024
  • Ingår i: Empirical Software Engineering. - 1573-7616 .- 1382-3256. ; 29:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Machine Learning (ML) experiment management tools support ML practitioners and software engineers when building intelligent software systems. By managing large numbers of ML experiments comprising many different ML assets, they not only facilitate engineering ML models and ML-enabled systems, but also managing their evolution—for instance, tracing system behavior to concrete experiments when the model performance drifts. However, while ML experiment management tools have become increasingly popular, little is known about their effectiveness in practice, as well as their actual benefits and challenges. We present a mixed-methods empirical study of experiment management tools and the support they provide to users. First, our survey of 81 ML practitioners sought to determine the benefits and challenges of ML experiment management and of the existing tool landscape. Second, a controlled experiment with 15 student developers investigated the effectiveness of ML experiment management tools. We learned that 70% of our survey respondents perform ML experiments using specialized tools, while out of those who do not use such tools, 52% are unaware of experiment management tools or of their benefits. The controlled experiment showed that experiment management tools offer valuable support to users to systematically track and retrieve ML assets. Using ML experiment management tools reduced error rates and increased completion rates. By presenting a user’s perspective on experiment management tools, and the first controlled experiment in this area, we hope that our results foster the adoption of these tools in practice, as well as they direct tool builders and researchers to improve the tool landscape overall.
  •  
49.
  • Itkonen, Juha, et al. (författare)
  • Are test cases needed? Replicated comparison between exploratory and test-case-based software testing
  • 2014
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 19:2, s. 303-342
  • Tidskriftsartikel (refereegranskat)abstract
    • Manual software testing is a widely practiced verification and validation method that is unlikely to fade away despite the advances in test automation. In the domain of manual testing, many practitioners advocate exploratory testing (ET), i.e., creative, experience-based testing without predesigned test cases, and they claim that it is more efficient than testing with detailed test cases. This paper reports a replicated experiment comparing effectiveness, efficiency, and perceived differences between ET and test-case-based testing (TCT) using 51 students as subjects, who performed manual functional testing on the jEdit text editor. Our results confirm the findings of the original study: 1) there is no difference in the defect detection effectiveness between ET and TCT, 2) ET is more efficient by requiring less design effort, and 3) TCT produces more false-positive defect reports than ET. Based on the small differences in the experimental design, we also put forward a hypothesis that the effectiveness of the TCT approach would suffer more than ET from time pressure. We also found that both approaches had distinctive issues: in TCT, the problems were related to correct abstraction levels of test cases, and the problems in ET were related to test design and logging of the test execution and results. Finally, we recognize that TCT has other benefits over ET in managing and controlling testing in large organizations.
  •  
50.
  • Ivarsson, Martin, 1980, et al. (författare)
  • A Method for Evaluating Rigor and Industrial Relevance of Technology Evaluations
  • 2011
  • Ingår i: Empirical Software Engineering. - : Springer verlag. - 1382-3256 .- 1573-7616. ; 16:3, s. 365-395
  • Tidskriftsartikel (refereegranskat)abstract
    • One of the main goals of an applied research field such as software engineering is the transfer and widespread use of research results in industry. To impact industry, researchers developing technologies in academia need to provide tangible evidence of the advantages of using them. This can be done trough step-wise validation, enabling researchers to gradually test and evaluate technologies to finally try them in real settings with real users and applications. The evidence obtained, together with detailed information on how the validation was conducted, offers rich decision support material for industry practitioners seeking to adopt new technologies and researchers looking for an empirical basis on which to build new or refined technologies. This paper presents model for evaluating the rigor and industrial relevance of technology evaluations in software engineering. The model is applied and validated in a comprehensive systematic literature review of evaluations of requirements engineering technologies published in software engineering journals. The aim is to show the applicability of the model and to characterize how evaluations are carried out and reported to evaluate the state-of-research. The review shows that the model can be applied to characterize evaluations in requirements engineering. The findings from applying the model also show that the majority of technology evaluations in requirements engineering lack both industrial relevance and rigor. In addition, the research field does not show any improvements in terms of industrial relevance over time.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 132
Typ av publikation
tidskriftsartikel (132)
Typ av innehåll
refereegranskat (127)
övrigt vetenskapligt/konstnärligt (4)
populärvet., debatt m.m. (1)
Författare/redaktör
Runeson, Per (17)
Wohlin, Claes (14)
Monperrus, Martin (12)
Regnell, Björn (9)
Höst, Martin (6)
Berger, Thorsten, 19 ... (6)
visa fler...
Torkar, Richard, 197 ... (6)
Feldt, Robert, 1972 (6)
Bjarnason, Elizabeth (6)
Baudry, Benoit (6)
Šmite, Darja (5)
Engström, Emelie (5)
Thelin, Thomas (5)
Borg, Markus (5)
Chaudron, Michel, 19 ... (4)
Bell, R (4)
weyuker, elaine (4)
Staron, Miroslaw, 19 ... (3)
Unterkalmsteiner, Mi ... (3)
Wnuk, Krzysztof, 198 ... (3)
Feldt, Robert (3)
Gorschek, Tony, 1972 ... (3)
Alégroth, Emil, 1984 ... (3)
Petersen, Kai (3)
Pfahl, Dietmar (3)
Mendes, Emilia (3)
Juristo, Natalia (3)
Turhan, Burak (3)
Martínez, Matías (3)
Ostrand, T (3)
Prikladnicki, Rafael (2)
Mendez, Daniel (2)
Börstler, Jürgen (2)
Hebig, Regina (2)
Itkonen, Juha (2)
Andrews, Anneliese (2)
Moe, Nils Brede (2)
Steghöfer, Jan-Phili ... (2)
Strüber, Daniel, 198 ... (2)
Ali, Nour (2)
Baker, Sean (2)
O'Crowley, Ross (2)
Herold, Sebastian (2)
Buckley, Jim (2)
Mäntylä, Mika (2)
Andersson, Carina (2)
Rönkkö, Kari (2)
Ros, Rasmus (2)
Besker, Terese, 1970 (2)
Börstler, Jürgen, 19 ... (2)
visa färre...
Lärosäte
Blekinge Tekniska Högskola (47)
Lunds universitet (35)
Chalmers tekniska högskola (29)
Göteborgs universitet (26)
Kungliga Tekniska Högskolan (15)
Mälardalens universitet (6)
visa fler...
RISE (5)
Karlstads universitet (5)
Högskolan i Skövde (4)
Linköpings universitet (3)
Malmö universitet (3)
Linnéuniversitetet (3)
Örebro universitet (2)
Högskolan i Halmstad (1)
Stockholms universitet (1)
Jönköping University (1)
visa färre...
Språk
Engelska (132)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (117)
Teknik (16)
Samhällsvetenskap (7)
Medicin och hälsovetenskap (2)
Humaniora (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy