SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Mäntylä Mika) "

Search: WFRF:(Mäntylä Mika)

  • Result 1-16 of 16
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Ahmad, Azeem, 1984- (author)
  • Contributions to Improving Feedback and Trust in Automated Testing and Continuous Integration and Delivery
  • 2022
  • Doctoral thesis (other academic/artistic)abstract
    • An integrated release version (also known as a release candidate in software engineering) is produced by merging, building, and testing code on a regular basis as part of the Continuous Integration and Continuous Delivery (CI/CD) practices. Several benefits, including improved software quality and shorter release cycles, have been claimed for CI/CD. On the other hand, recent research has uncovered a plethora of problems and bad practices related to CI/CD adoption, necessitating some optimization. Some of the problems addressed in this work include the ability to respond to practitioners’ questions and obtain quick and trustworthy feedback in CI/CD. To be more specific, our effort concentrated on: 1) identifying the information needs of software practitioners engaged in CI/CD; 2) adopting test optimization approaches to obtain faster feedback that are realistic for use in CI/CD environments without introducing excessive technical requirements; 3) identifying perceived causes and automated root cause analysis of test flakiness, thereby providing developers with guidance on how to resolve test flakiness; and 4) identifying challenges in addressing information needs, providing faster and more trustworthy feedback. The findings of the research reported in this thesis are based on data from three single-case studies and three multiple-case studies. The research uses quantitative and qualitative data collected via interviews, site visits, and workshops. To perform our analyses, we used data from firms producing embedded software as well as open-source repositories. The following are major research and practical contributions. Information Needs: The initial contribution to research is a list of information needs in CI/CD. This list contains 27 frequently asked questions on continuous integration and continuous delivery by software practitioners. The identified information needs have been classified as related to testing, code & commit, confidence, bug, and artifacts. We investigated how companies deal with information needs, what tools they use to deal with them, and who is interested in them. We concluded that there is a discrepancy between the identified needs and the techniques employed to meet them. Since some information needs cannot be met by current tools, manual inspections are required, which adds time to the process. Information about code & commit, confidence level, and testing is the most frequently sought for and most important information. Evaluation of Diversity Based Techniques/Tool: The contribution is to conduct a detailed examination of diversity-based techniques using industry test cases to determine if there is a difference between diversity functions in selecting integrationlevel automated test. Additionally, how diversity-based testing compares to other optimization techniques used in industry in terms of fault detection rates, feature coverage, and execution time. This enables us to observe how coverage changes when we run fewer test cases. We concluded that some of the techniques can eliminate up to 85% of test cases (provided by the case company) while still covering all distinct features/requirements. The techniques are developed and made available as an open-source tool for further research and application. Test Flakiness Detection, Prediction & Automated Root Cause Analysis: We identified 19 factors that professionals perceive affect test flakiness. These perceived factors are divided into four categories: test code, system under test, CI/test infrastructure, and organizational. We concluded that some of the perceived factors of test flakiness in closed-source development are directly related to non-determinism, whereas other perceived factors concern different aspects e.g., lack of good properties of a test case (i.e., small, simple and robust), deviations from the established  processes, etc. To see if the developers’ perceptions were in line with what they had labelled as flaky or not, we examined the test artifacts that were readily available. We verified that two of the identified perceived factors (i.e., test case size and simplicity) are indeed indicative of test flakiness. Furthermore, we proposed a light weight technique named trace-back coverage to detect flaky tests. Trace-back coverage was combined with other factors such as test smells indicating test flakiness, flakiness frequency and test case size to investigate the effect on revealing test flakiness. When all factors are taken into consideration, the precision of flaky test detection is increased from 57% (using single factor) to 86% (combination of different factors). 
  •  
2.
  • Ali, Nauman Bin, et al. (author)
  • Testing highly complex system of systems : An industrial case study
  • 2012
  • In: [Host publication title missing]. - Lund : ACM/IEEE. - 9781450310567 ; , s. 211-220
  • Conference paper (peer-reviewed)abstract
    • Systems of systems (SoS) are highly complex and are integrated on multiple levels (unit, component, system, system of systems). Many of the characteristics of SoS (such as operational and managerial independence, integration of system into system of systems, SoS comprised of complex systems) make their development and testing challenging. Contribution: This paper provides an understanding of SoS testing in large-scale industry settings with respect to challenges and how to address them. Method: The research method used is case study research. As data collection methods we used interviews, documentation, and fault slippage data. Results: We identified challenges related to SoS with respect to fault slippage, test turn-around time, and test maintainability. We also classified the testing challenges to general testing challenges, challenges amplified by SoS, and challenges that are SoS specific. Interestingly, the interviewees agreed on the challenges, even though we sampled them with diversity in mind, which meant that the number of interviews conducted was sufficient to answer our research questions. We also identified solution proposals to the challenges that were categorized under four classes of developer quality assurance, function test, testing in all levels, and requirements engineering and communication. Conclusion: We conclude that although over half of the challenges we identified can be categorized as general testing challenges still SoS systems have their unique and amplified challenges stemming from SoS characteristics. Furthermore, it was found that interviews and fault slippage data indicated that different areas in the software process should be improved, which indicates that using only one of these methods would have led to an incomplete picture of the challenges in the case company.
  •  
3.
  • Borg, Markus, et al. (author)
  • A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects
  • 2014
  • In: [Host publication title missing]. - New York, NY, USA : ACM.
  • Conference paper (peer-reviewed)abstract
    • Context: Duplicate detection is a fundamental part of issue management. Systems able to predict whether a new defect report will be closed as a duplicate, may decrease costs by limiting rework and collecting related pieces of information. Goal: Our work explores using Apache Lucene for large-scale duplicate detection based on textual content. Also, we evaluate the previous claim that results are improved if the title is weighted as more important than the description. Method: We conduct a conceptual replication of a well-cited study conducted at Sony Ericsson, using Lucene for searching in the public Android defect repository. In line with the original study, we explore how varying the weighting of the title and the description affects the accuracy. Results: We show that Lucene obtains the best results when the defect report title is weighted three times higher than the description, a bigger difference than has been previously acknowledged. Conclusions: Our work shows the potential of using Lucene as a scalable solution for duplicate detection.
  •  
4.
  • Engström, Emelie, et al. (author)
  • Supporting Regression Test Scoping with Visual Analytics
  • 2014
  • In: [Host publication title missing]. ; , s. 283-292
  • Conference paper (peer-reviewed)abstract
    • Test managers have to repeatedly select test cases for test activities during evolution of large software systems. Researchers have widely studied automated test scoping, but have not fully investigated decision support with human interaction. We previously proposed the introduction of visual analytics for this purpose. Aim: In this empirical study we investigate how to design such decision support. Method: We explored the use of visual analytics using heat maps of historical test data for test scoping support by letting test managers evaluate prototype visualizations in three focus groups with in total nine industrial test experts. Results: All test managers in the study found the visual analytics useful for supporting test planning. However, our results show that different tasks and contexts require different types of visualizations. Conclusion: Important properties for test planning support are: ability to overview testing from different perspectives, ability to filter and zoom to compare subsets of the testing with respect to various attributes and the ability to manipulate the subset under analysis by selecting and deselecting test cases. Our results may be used to support the introduction of visual test analytics in practice.
  •  
5.
  • Garousi, Vahid, et al. (author)
  • Benefitting from the Grey Literature in Software Engineering Research
  • 2020
  • In: Contemporary Empirical Methods in Software Engineering. - Cham : Springer Nature. - 9783030324889 ; , s. 385-413
  • Book chapter (peer-reviewed)abstract
    • Researchers generally place the most trust in peer-reviewed, published information, such as journals and conference papers. By contrast, software engineering (SE) practitioners typically do not have the time, access, or expertise to review and benefit from such publications. As a result, practitioners are more likely to turn to other sources of information that they trust, e.g., trade magazines, online blog posts, survey results, or technical reports, collectively referred to as grey literature (GL). Furthermore, practitioners also share their ideas and experiences as GL, which can serve as a valuable data source for research. While GL itself is not a new topic in SE, using, benefitting, and synthesizing knowledge from the GL in SE is a contemporary topic in empirical SE research and we are seeing that researchers are increasingly benefitting from the knowledge available within GL. The goal of this chapter is to provide an overview of GL in SE, together with insights on how SE researchers can effectively use and benefit from the knowledge and evidence available in the vast amount of GL.
  •  
6.
  • Garousi, Vahid, et al. (author)
  • Characterizing industry-academia collaborations in software engineering : evidence from 101 projects
  • 2019
  • In: Empirical Software Engineering. - : Springer New York LLC. - 1382-3256 .- 1573-7616. ; 24:4, s. 2540-2602
  • Journal article (peer-reviewed)abstract
    • Research collaboration between industry and academia supports improvement and innovation in industry and helps ensure the industrial relevance of academic research. However, many researchers and practitioners in the community believe that the level of joint industry-academia collaboration (IAC) projects in Software Engineering (SE) research is relatively low, creating a barrier between research and practice. The goal of the empirical study reported in this paper is to explore and characterize the state of IAC with respect to industrial needs, developed solutions, impacts of the projects and also a set of challenges, patterns and anti-patterns identified by a recent Systematic Literature Review (SLR) study. To address the above goal, we conducted an opinion survey among researchers and practitioners with respect to their experience in IAC. Our dataset includes 101 data points from IAC projects conducted in 21 different countries. Our findings include: (1) the most popular topics of the IAC projects, in the dataset, are: software testing, quality, process, and project managements; (2) over 90% of IAC projects result in at least one publication; (3) almost 50% of IACs are initiated by industry, busting the myth that industry tends to avoid IACs; and (4) 61% of the IAC projects report having a positive impact on their industrial context, while 31% report no noticeable impacts or were “not sure”. To improve this situation, we present evidence-based recommendations to increase the success of IAC projects, such as the importance of testing pilot solutions before using them in industry. This study aims to contribute to the body of evidence in the area of IAC, and benefit researchers and practitioners. Using the data and evidence presented in this paper, they can conduct more successful IAC projects in SE by being aware of the challenges and how to overcome them, by applying best practices (patterns), and by preventing anti-patterns. © 2019, The Author(s).
  •  
7.
  • Garousi, Vahid, et al. (author)
  • Guidelines for including grey literature and conducting multivocal literature reviews in software engineering
  • 2019
  • In: Information and Software Technology. - : Elsevier B.V.. - 0950-5849 .- 1873-6025. ; 106, s. 101-121
  • Journal article (peer-reviewed)abstract
    • Context: A Multivocal Literature Review (MLR) is a form of a Systematic Literature Review (SLR) which includes the grey literature (e.g., blog posts, videos and white papers) in addition to the published (formal) literature (e.g., journal and conference papers). MLRs are useful for both researchers and practitioners since they provide summaries both the state-of-the art and –practice in a given area. MLRs are popular in other fields and have recently started to appear in software engineering (SE). As more MLR studies are conducted and reported, it is important to have a set of guidelines to ensure high quality of MLR processes and their results. Objective: There are several guidelines to conduct SLR studies in SE. However, several phases of MLRs differ from those of traditional SLRs, for instance with respect to the search process and source quality assessment. Therefore, SLR guidelines are only partially useful for conducting MLR studies. Our goal in this paper is to present guidelines on how to conduct MLR studies in SE. Method: To develop the MLR guidelines, we benefit from several inputs: (1) existing SLR guidelines in SE, (2), a literature survey of MLR guidelines and experience papers in other fields, and (3) our own experiences in conducting several MLRs in SE. We took the popular SLR guidelines of Kitchenham and Charters as the baseline and extended/adopted them to conduct MLR studies in SE. All derived guidelines are discussed in the context of an already-published MLR in SE as the running example. Results: The resulting guidelines cover all phases of conducting and reporting MLRs in SE from the planning phase, over conducting the review to the final reporting of the review. In particular, we believe that incorporating and adopting a vast set of experience-based recommendations from MLR guidelines and experience papers in other fields have enabled us to propose a set of guidelines with solid foundations. Conclusion: Having been developed on the basis of several types of experience and evidence, the provided MLR guidelines will support researchers to effectively and efficiently conduct new MLRs in any area of SE. The authors recommend the researchers to utilize these guidelines in their MLR studies and then share their lessons learned and experiences. © 2018
  •  
8.
  • Garousi, Vahid, et al. (author)
  • Introduction to the Special Issue on : Grey Literature and Multivocal Literature Reviews (MLRs) in software engineering
  • 2022
  • In: Information and Software Technology. - : Elsevier B.V.. - 0950-5849 .- 1873-6025. ; 141
  • Journal article (peer-reviewed)abstract
    • In parallel to academic (peer-reviewed) literature (e.g., journal and conference papers), an enormous extent of grey literature (GL) has accumulated since the inception of software engineering (SE). GL is often defined as “literature that is not formally published in sources such as books or journal articles”, e.g., in the form of trade magazines, online blog-posts, technical reports, and online videos such as tutorial and presentation videos. GL is typically produced by SE practitioners. We have observed that researchers are increasingly using and benefitting from the knowledge available within GL. Related to the notion of GL is the notion of Multivocal Literature Reviews (MLRs) in SE, i.e., a MLR is a form of a Systematic Literature Review (SLR) which includes knowledge and/or evidence from the GL in addition to the peer-reviewed literature. MLRs are useful for both researchers and practitioners because they provide summaries of both the state-of-the-art and -practice in a given area. MLRs are popular in other fields and have started to appear in SE community. It is timely then for a Special Issue (SI) focusing on GL and MLRs in SE. From the pool of 13 submitted papers, and after following a rigorous peer review process, seven papers were accepted for this SI. In this introduction we provide a brief overview of GL and MLRs in SE, and then a brief summary of the seven papers published in this SI. © 2021
  •  
9.
  • Itkonen, Juha, et al. (author)
  • Are test cases needed? Replicated comparison between exploratory and test-case-based software testing
  • 2014
  • In: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1573-7616 .- 1382-3256. ; 19:2, s. 303-342
  • Journal article (peer-reviewed)abstract
    • Manual software testing is a widely practiced verification and validation method that is unlikely to fade away despite the advances in test automation. In the domain of manual testing, many practitioners advocate exploratory testing (ET), i.e., creative, experience-based testing without predesigned test cases, and they claim that it is more efficient than testing with detailed test cases. This paper reports a replicated experiment comparing effectiveness, efficiency, and perceived differences between ET and test-case-based testing (TCT) using 51 students as subjects, who performed manual functional testing on the jEdit text editor. Our results confirm the findings of the original study: 1) there is no difference in the defect detection effectiveness between ET and TCT, 2) ET is more efficient by requiring less design effort, and 3) TCT produces more false-positive defect reports than ET. Based on the small differences in the experimental design, we also put forward a hypothesis that the effectiveness of the TCT approach would suffer more than ET from time pressure. We also found that both approaches had distinctive issues: in TCT, the problems were related to correct abstraction levels of test cases, and the problems in ET were related to test design and logging of the test execution and results. Finally, we recognize that TCT has other benefits over ET in managing and controlling testing in large organizations.
  •  
10.
  • Kasoju, Abhinaya, et al. (author)
  • Analyzing an automotive testing process with evidence-based software engineering
  • 2013
  • In: Information and Software Technology. - : Elsevier BV. - 0950-5849 .- 1873-6025. ; 55:7, s. 1237-1259
  • Journal article (peer-reviewed)abstract
    • Context: Evidence-based software engineering (EBSE) provides a process for solving practical problems based on a rigorous research approach. The primary focus so far was on mapping and aggregating evidence through systematic reviews. Objectives: We extend existing work on evidence-based software engineering by using the EBSE process in an industrial case to help an organization to improve its automotive testing process. With this we contribute in (1) providing experiences on using evidence based processes to analyze a real world automotive test process and (2) provide evidence of challenges and related solutions for automotive software testing processes. Methods: In this study we perform an in-depth investigation of an automotive test process using an extended EBSE process including case study research (gain an understanding of practical questions to define a research scope), systematic literature review (identify solutions through systematic literature), and value stream mapping (map out an improved automotive test process based on the current situation and improvement suggestions identified). These are followed by reflections on the EBSE process used. Results: In the first step of the EBSE process we identified 10 challenge areas with a total of 26 individual challenges. For 15 out of those 26 challenges our domain specific systematic literature review identified solutions. Based on the input from the challenges and the solutions, we created a value stream map of the current and future process. Conclusions: Overall, we found that the evidence-based process as presented in this study helps in technology transfer of research results to industry, but at the same time some challenges lie ahead (e.g. koping systematic reviews to focus more on concrete industry problems, and understanding strategies of conducting EBSE with respect to effort and quality of the evidence). (C) 2013 Elsevier B.V. All rights reserved.
  •  
11.
  • Mäntylä, Mika, et al. (author)
  • How many individuals to use in a QA task with fixed total effort?
  • 2012
  • Conference paper (peer-reviewed)abstract
    • Increasing the number of persons working on quality assurance (QA) tasks, e.g., reviews and testing, increases the number of defects detected - but it also increases the total effort unless effort is controlled with fixed effort budgets. Our research investigates how QA tasks should be configured regarding two parameters, i.e., time and number of people. We define an optimization problem to answer this question. As a core element of the optimization problem we discuss and describe how defect detection probability should be modeled as a function of time. We apply the formulas used in the definition of the optimization problem to empirical defect data of an experiment previously conducted with university students. The results show that the optimal choice of the number of persons depends on the actual defect detection probabilities of the individual defects over time, but also on the size of the effort budget. Future work will focus on generalizing the optimization problem to a larger set of parameters, including not only task time and number of persons but also experience and knowledge of the personnel involved, and methods and tools applied when performing a QA task.
  •  
12.
  • Mäntylä, Mika, et al. (author)
  • How many individuals to use in a QA task with fixed total effort? - Defect detection as a function of time
  • 2012
  • In: [Host publication title missing]. - 1938-6451. ; , s. 311-314
  • Conference paper (peer-reviewed)abstract
    • Increasing the number of persons working on quality assurance (QA) tasks, e.g., reviews and testing, increases the number of defects detected – but it also increases the total effort unless effort is controlled with fixed effort budgets. Our research investigates how QA tasks should be configured regarding two parameters, i.e., time and number of people. We define an optimization problem to answer this question. As a core element of the optimization problem we discuss and describe how defect detection probability should be modeled as a function of time. We apply the formulas used in the definition of the optimization problem to empirical defect data of an experiment previously conducted with university students. The results show that the optimal choice of the number of persons depends on the actual defect detection probabilities of the individual defects over time, but also on the size of the effort budget. Future work will focus on generalizing the optimization problem to a larger set of parameters, including not only task time and number of persons but also experience and knowledge of the personnel involved, and methods and tools applied when performing a QA task.
  •  
13.
  • Mäntylä, Mika, et al. (author)
  • More testers - The effect of crowd size and time restriction in software testing
  • 2013
  • In: Information and Software Technology. - : Elsevier BV. - 0950-5849. ; 55:6, s. 986-1003
  • Journal article (peer-reviewed)abstract
    • Context: The questions of how many individuals and how much time to use for a single testing task are critical in software verification and validation. In software review and usability evaluation contexts, positive effects of using multiple individuals for a task have been found, but software testing has not been studied from this viewpoint. Objective: We study how adding individuals and imposing time pressure affects the effectiveness and efficiency of manual testing tasks. We applied the group productivity theory from social psychology to characterize the type of software testing tasks. Method: We conducted an experiment where 130 students performed manual testing under two conditions, one with a time restriction and pressure, i.e., a 2-h fixed slot, and another where the individuals could use as much time as they needed. Results: We found evidence that manual software testing is an additive task with a ceiling effect, like software reviews and usability inspections. Our results show that a crowd of five time-restricted testers using 10 h in total detected 71% more defects than a single non-time-restricted tester using 9.9 h. Furthermore, we use F-score measure from the information retrieval domain to analyze the optimal number of testers in terms of both effectiveness and validity of testing results. We suggest that future studies on verification and validation practices use F-score to provide a more transparent view of the results. Conclusions: The results seem promising for the time-pressured crowds by indicating that multiple time-pressured individuals deliver superior defect detection effectiveness in comparison to non-time-pressured individuals. However, caution is needed, as the limitations of this study need to be addressed in future works. Finally, we suggest that the size of the crowd used in software testing tasks should be determined based on the share of duplicate and invalid reports produced by the crowd and by the effectiveness of the duplicate handling mechanisms. (C) 2012 Elsevier B.V. All rights reserved.
  •  
14.
  • Mäntylä, Mika, et al. (author)
  • On Rapid Releases and Software Testing
  • 2013
  • In: [Host publication title missing]. - Eindhoven, Netherlands. - 1063-6773. ; , s. 20-29
  • Conference paper (peer-reviewed)abstract
    • Abstract—Large open and closed source organizations like Google, Facebook and Mozilla are migrating their products towards rapid releases. While this allows faster time-to-market and user feedback, it also implies less time for testing and bug fixing. Since initial research results indeed show that rapid releases fix proportionally less reported bugs than traditional releases, this paper investigates the changes in software testing effort after moving to rapid releases. We analyze the results of 312,502 execution runs of the 1,547 mostly manual systemlevel test cases of Mozilla Firefox from 2006 to 2012 (5 major traditional and 9 major rapid releases), and triangulated our findings with a Mozilla QA engineer. In rapid releases, testing has a narrower scope that enables deeper investigation of the features and regressions with the highest risk, while traditional releases run the whole test suite. Furthermore, rapid releases make it more difficult to build a large testing community, forcing Mozilla to increase contractor resources in order to sustain testing for rapid releases.
  •  
15.
  • Mäntylä, Mika, et al. (author)
  • Time pressure : a controlled experiment of test case development and requirements review
  • 2014
  • In: 36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2014). - Hyderabad : ACM. - 9781450327565
  • Conference paper (peer-reviewed)abstract
    • Time pressure is prevalent in the software industry in which shorter and shorter deadlines and high customer demands lead to increasingly tight deadlines. However, the effects of time pressure have received little attention in software engineering research. We performed a controlled experiment on time pressure with 97 observations from 54 subjects. Using a two-by-two crossover design, our subjects performed requirements review and test case development tasks. We found statistically significant evidence that time pressure increases efficiency in test case development (high effect size Cohen’s d=1.279) and in requirements review (medium effect size Cohen’s d=0.650). However, we found no statistically significant evidence that time pressure would decrease effectiveness or cause adverse effects on motivation, frustration or perceived performance. We also investigated the role of knowledge but found no evidence of the mediating role of knowledge in time pressure as suggested by prior work, possibly due to our subjects. We conclude that applying moderate time pressure for limited periods could be used to increase efficiency in software engineering tasks that are well structured and straight forward.
  •  
16.
  • Rafi, Dudekula Mohammad, et al. (author)
  • Benefits and limitations of automated software testing : Systematic literature review and practitioner survey
  • 2012
  • In: 2012 7th International Workshop on Automation of Software Test (AST). - Zurich. - 9781467318211 ; , s. 36-42
  • Conference paper (peer-reviewed)abstract
    • There is a documented gap between academic and practitioner views on software testing. This paper tries to close the gap by investigating both views regarding the benefits and limits of test automation. The academic views are studied with a systematic literature review while the practitioners views are assessed with a survey, where we received responses from 115 software professionals. The results of the systematic literature review show that the source of evidence regarding benefits and limitations is quite shallow as only 25 papers provide the evidence. Furthermore, it was found that benefits often originated from stronger sources of evidence (experiments and case studies), while limitations often originated from experience reports. We believe that this is caused by publication bias of positive results. The survey showed that benefits of test automation were related to test reusability, repeatability, test coverage and effort saved in test executions. The limitations were high initial invests in automation setup, tool selection and training. Additionally, 45% of the respondents agreed that available tools in the market offer a poor fit for their needs. Finally, it was found that 80% of the practitioners disagreed with the vision that automated testing would fully replace manual testing.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-16 of 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view