SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Leitner Philipp 1982) "

Sökning: WFRF:(Leitner Philipp 1982)

  • Resultat 1-50 av 57
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Samoaa, Hazem, 1991, et al. (författare)
  • Batch Mode Deep Active Learning for Regression on Graph Data
  • 2023
  • Ingår i: Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023. ; , s. 5904-5913
  • Konferensbidrag (refereegranskat)abstract
    • Acquiring labelled data for machine learning tasks, for example, for software performance prediction, remains a resource-intensive task. This study extends our previous work by introducing a batch-mode deep active learning approach tailored for regression in graph-structured data. Our framework leverages the source code conversion into Flow Augmented-AST graphs (FA-AST), subsequently utilizing both supervised and unsupervised graph embeddings. In contrast to single-instance querying, the batch-mode paradigm adaptively selects clusters of unlabeled data for labelling. We deploy an array of base kernels, kernel transformations, and selection methods, informed by both Bayesian and non-Bayesian strategies, to enhance the sample efficiency of neural network regression. Our experimental evaluation, conducted on multiple real-world software performance datasets, demonstrates the efficacy of the batch mode deep active learning approach in achieving robust performance with a reduced labelling budget. The methodology scales effectively to larger datasets and requires minimal alterations to existing neural network architectures.
  •  
2.
  • Samoaa, Hazem, 1991, et al. (författare)
  • TEP-GNN: Accurate Execution Time Prediction of Functional Tests Using Graph Neural Networks
  • 2022
  • Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Cham : Springer International Publishing. - 1611-3349 .- 0302-9743. ; 13709 LNCS, s. 464-479
  • Konferensbidrag (refereegranskat)abstract
    • Predicting the performance of production code prior to actual execution is known to be highly challenging. In this paper, we propose a predictive model, dubbed TEP-GNN, which demonstrates that high-accuracy performance prediction is possible for the special case of predicting unit test execution times. TEP-GNN uses FA-ASTs, or flow-augmented ASTs, as a graph-based code representation approach, and predicts test execution times using a powerful graph neural network (GNN) deep learning model. We evaluate TEP-GNN using four real-life Java open source programs, based on 922 test files mined from the projects’ public repositories. We find that our approach achieves a high Pearson correlation of 0.789, considerable outperforming a baseline deep learning model. Our work demonstrates that FA-ASTs and GNNs are a feasible approach for predicting absolute performance values, and serves as an important intermediary step towards being able to predict the performance of arbitrary code prior to execution.
  •  
3.
  • Scheuner, Joel, 1991, et al. (författare)
  • CrossFit: Fine-grained Benchmarking of Serverless Application Performance across Cloud Providers
  • 2022
  • Ingår i: Proceedings - 2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing, UCC 2022. ; , s. 51-60
  • Konferensbidrag (refereegranskat)abstract
    • Serverless computing emerged as a promising cloud computing paradigm for deploying cloud-native applications but raises new performance challenges. Existing performance evaluation studies focus on micro-benchmarking to measure an individual aspect of serverless functions, such as CPU speed, but lack an in-depth analysis of differences in application performance across cloud providers. This paper presents CrossFit, an approach for detailed and fair cross-provider performance benchmarking of serverless applications based on a providerindependent tracing model. Our case study demonstrates how detailed distributed tracing enables drill-down analysis to explain performance differences between two leading cloud providers, AWS and Azure. The results for an asynchronous application show that trigger time contributes most delay to the end-to-end latency and explains the main performance difference between cloud providers. Our results further reveal how increasing and bursty workloads affect performance stability, median latency, and tail latency.
  •  
4.
  • Scheuner, Joel, 1991, et al. (författare)
  • TriggerBench: A Performance Benchmark for Serverless Function Triggers
  • 2022
  • Ingår i: Proceedings - 2022 IEEE International Conference on Cloud Engineering, IC2E 2022. ; , s. 96-103
  • Konferensbidrag (refereegranskat)abstract
    • Serverless computing offers a scalable event-based paradigm for deploying managed cloud-native applications. Function triggers are essential building blocks in serverless, as they initiate any function execution. However, function triggering is insufficiently studied and inherently hard to measure given the distributed, ephemeral, and asynchronous nature of event-based function coordination. To address this gap, we present TriggerBench, a cross-provider benchmark for evaluating serverless function triggers based on distributed tracing. We evaluate the trigger latency (i.e., time to transition between two functions) of eight types of triggers in Microsoft Azure and three in AWS. Our results show that all triggers suffer from long tail latency, storage triggers introduce variable multi-second delays, and HTTP triggers are most suitable for interactive applications. Our insights can guide developers in choosing optimal event or messaging triggers for latency-sensitive applications. Researchers can extend TriggerBench to study the latency, scalability, and reliability of further trigger types and cloud providers.
  •  
5.
  • Cito, Jurgen, et al. (författare)
  • Interactive Production Performance Feedback in the IDE
  • 2019
  • Ingår i: Proceedings - International Conference on Software Engineering. - 0270-5257. ; 2019-May, s. 971-981
  • Konferensbidrag (refereegranskat)abstract
    • Performance problems are hard to track and debug, especially when detected in production and originating from development. Software developers try to reproduce the perfor- mance problem locally and debug it in the source code. However, production environments are too different to what profiling and testing can simulate locally in development environments. Software developers need to consult production monitoring tools to reason about and debug the issue. We propose an integrated approach that constructs an In-IDE performance model from monitoring data from production environments. When developers change source code, we perform incremental analysis to update our performance model to reflect the impact of these changes. This allows us to provide performance feedback to developers in near real time to enable them to prevent performance problems from reaching production. We present a tool, PerformanceHat, an Eclipse plugin that we evaluated in a controlled experiment with 20 professional software developers, in which they work on soft- ware maintenance tasks using our approach and a representative baseline (Kibana). We found that developers were significantly faster in (1) detecting the performance problem, and (2) finding the root-cause of the problem. We conclude that our approach helps detect, prevent and debug performance problems faster.
  •  
6.
  • Cito, J., et al. (författare)
  • Interactive Production Performance Feedback in the IDE
  • 2019
  • Ingår i: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). - : IEEE. - 9781728108698 ; , s. 971-981
  • Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract
    • Because of differences between development and production environments, many software performance problems are detected only after software enters production. We present PerformanceHat, a new system that uses profiling information from production executions to develop a global performance model suitable for integration into interactive development environments. PerformanceHat's ability to incrementally update this global model as the software is changed in the development environment enables it to deliver near real-time predictions of performance consequences reflecting the impact on the production environment. We build PerformanceHat as an Eclipse plugin and evaluate it in a controlled experiment with 20 professional software developers implementing several software maintenance tasks using our approach and a representative baseline (Kibana). Our results indicate that developers using PerformanceHat were significantly faster in (1) detecting the performance problem, and (2) finding the root-cause of the problem. These results provide encouraging evidence that our approach helps developers detect, prevent, and debug production performance problems during development before the problem manifests in production.
  •  
7.
  • Cito, Jürgen, et al. (författare)
  • PerformanceHat - Augmenting Source Code with Runtime Performance Traces in the IDE
  • 2018
  • Ingår i: Proceedings - International Conference on Software Engineering. - New York, NY, USA : ACM. - 0270-5257.
  • Konferensbidrag (refereegranskat)abstract
    • Performance problems observed in production environments that have their origin in program code are immensely hard to localize and prevent. Data that can help solve such problems is usually found in external dashboards and is thus not integrated into the software development process. We propose an approach that augments source code with runtime traces to tightly integrate runtime performance traces into developer workflows. Our goal is to create operational awareness of performance problems in developers' code and contextualize this information to tasks they are currently working on. We implemented this approach as an Eclipse IDE plugin for Java applications that is available as an open source project on GitHub. A video of PerformanceHat in action is online: https://youtu.be/fTBBiylRhag.
  •  
8.
  • Costa, D., et al. (författare)
  • What's Wrong with My Benchmark Results? Studying Bad Practices in JMH Benchmarks
  • 2021
  • Ingår i: Ieee Transactions on Software Engineering. - : Institute of Electrical and Electronics Engineers (IEEE). - 0098-5589 .- 1939-3520 .- 2326-3881. ; 47:7, s. 1452-1467
  • Tidskriftsartikel (refereegranskat)abstract
    • Microbenchmarking frameworks, such as Java's Microbenchmark Harness (JMH), allow developers to write fine-grained performance test suites at the method or statement level. However, due to the complexities of the Java Virtual Machine, developers often struggle with writing expressive JMH benchmarks which accurately represent the performance of such methods or statements. In this paper, we empirically study bad practices of JMH benchmarks. We present a tool that leverages static analysis to identify 5 bad JMH practices. Our empirical study of 123 open source Java-based systems shows that each of these 5 bad practices are prevalent in open source software. Further, we conduct several experiments to quantify the impact of each bad practice in multiple case studies, and find that bad practices often significantly impact the benchmark results. To validate our experimental results, we constructed seven patches that fix the identified bad practices for six of the studied open source projects, of which six were merged into the main branch of the project. In this paper, we show that developers struggle with accurate Java microbenchmarking, and provide several recommendations to developers of microbenchmarking frameworks on how to improve future versions of their framework.
  •  
9.
  • Damasceno Costa, Diego Elias, et al. (författare)
  • What's Wrong With My Benchmark Results? Studying Bad Practices in JMH Benchmarks
  • 2021
  • Ingår i: IEEE Transactions on Software Engineering. - 0098-5589 .- 1939-3520. ; 47:7, s. 1452-1467
  • Tidskriftsartikel (refereegranskat)abstract
    • Microbenchmarking frameworks, such as Java's Microbenchmark Harness (JMH), allow developers to write fine-grained performance test suites at the method or statement level. However, due to the complexities of the Java Virtual Machine, developers often struggle with writing expressive JMH benchmarks which accurately represent the performance of such methods or statements. In this paper, we empirically study bad practices of JMH benchmarks. We present a tool that leverages static analysis to identify 5 bad JMH practices. Our empirical study of 123 open source Java-based systems shows that each of these 5 bad practices are prevalent in open source software. Further, we conduct several experiments to quantify the impact of each bad practice in multiple case studies, and find that bad practices often significantly impact the benchmark results. To validate our experimental results, we constructed patches that fix the identified bad practices for six of the studied open source projects, of which five were merged into the main branch of the project. In this paper, we show that developers struggle with accurate Java microbenchmarking, and provide several recommendations to developers of microbenchmarking frameworks on how to improve future versions of their framework.
  •  
10.
  • Erlenhov, Linda, 1979, et al. (författare)
  • An empirical study of bots in software development: Characteristics and challenges from a practitioner's perspective
  • 2020
  • Ingår i: ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. - New York, NY, USA : ACM.
  • Konferensbidrag (refereegranskat)abstract
    • © 2020 ACM. Software engineering bots - automated tools that handle tedious tasks - are increasingly used by industrial and open source projects to improve developer productivity. Current research in this area is held back by a lack of consensus of what software engineering bots (DevBots) actually are, what characteristics distinguish them from other tools, and what benefits and challenges are associated with DevBot usage. In this paper we report on a mixed-method empirical study of DevBot usage in industrial practice. We report on findings from interviewing 21 and surveying a total of 111 developers. We identify three different personas among DevBot users (focusing on autonomy, chat interfaces, and "smartness"), each with different definitions of what a DevBot is, why developers use them, and what they struggle with.We conclude that future DevBot research should situate their work within our framework, to clearly identify what type of bot the work targets, and what advantages practitioners can expect. Further, we find that there currently is a lack of general purpose "smart"bots that go beyond simple automation tools or chat interfaces. This is problematic, as we have seen that such bots, if available, can have a transformative effect on the projects that use them.
  •  
11.
  •  
12.
  • Erlenhov, Linda, 1979, et al. (författare)
  • Current and Future Bots in Software Development
  • 2019
  • Ingår i: 2019 IEEE/ACM 1st International Workshop on Bots in Software Engineering (BotSE). - 9781728122625 ; , s. 7-11
  • Konferensbidrag (refereegranskat)abstract
    • Bots that support software development ("DevBots") are seen as a promising approach to deal with the ever-increasing complexity of modern software engineering and development. Existing DevBots are already able to relieve developers from routine tasks such as building project images or keeping dependencies up-to-date. However, advances in machine learning and artificial intelligence hold the promise of future, significantly more advanced, DevBots. In this paper, we introduce the terminology of contemporary and ideal DevBots. Contemporary DevBots represent the current state of practice, which we characterise using a facet-based taxonomy. We exemplify this taxonomy using 11 existing, industrial-strength bots. We further provide a vision and definition of future (ideal) DevBots, which are not only autonomous, but also adaptive, as well as technically and socially competent. These properties may allow ideal DevBots to act more akin to artificial team mates than simple development tools.
  •  
13.
  • Erlenhov, Linda, 1979, et al. (författare)
  • Dependency management bots in open-source systems—prevalence and adoption
  • 2022
  • Ingår i: PeerJ Computer Science. - : PeerJ. - 2376-5992. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • Bots have become active contributors in maintaining open-source repositories. However, the definitions of bot activity in open-source software vary from a more lenient stance encompassing every non-human contributions vs frameworks that cover contributions from tools that have autonomy or human-like traits (i.e., Devbots). Understanding which of those definitions are being used is essential to enable (i) reliable sampling of bots and (ii) fair comparison of their practical impact in, e.g., developers’ productivity. This paper reports on an empirical study composed of both quantitative and qualitative analysis of bot activity. By analysing those two bot definitions in an existing dataset of bot commits, we see that only 10 out of 54 listed tools (mainly dependency management) comply with the characteristics of Devbots. Moreover, five of those Devbots have similar patterns of contributions over 93 projects, such as similar proportions of merged pull-requests and days until issues are closed. Our analysis also reveals that most projects (77%) experiment with more than one bot before deciding to adopt or switch between bots. In fact, a thematic analysis of developers’ comments in those projects reveal factors driving the discussions about Devbot adoption or removal, such as the impact of the generated noise and the needed adaptation in development practices within the project.
  •  
14.
  • Grambow, M., et al. (författare)
  • Using application benchmark call graphs to quantify and improve the practical relevance of microbenchmark suites
  • 2021
  • Ingår i: Peerj Computer Science. - : PeerJ. - 2376-5992. ; 7
  • Tidskriftsartikel (refereegranskat)abstract
    • Performance problems in applications should ideally be detected as soon as they occur, i.e., directly when the causing code modification is added to the code repository. To this end, complex and cost-intensive application benchmarks or lightweight but less relevant microbenchmarks can be added to existing build pipelines to ensure performance goals. In this paper, we show how the practical relevance of microbenchmark suites can be improved and verified based on the application flow during an application benchmark run. We propose an approach to determine the overlap of common function calls between application and microbenchmarks, describe a method which identifies redundant microbenchmarks, and present a recommendation algorithm which reveals relevant functions that are not covered by microbenchmarks yet. A microbenchmark suite optimized in this way can easily test all functions determined to be relevant by application benchmarks after every code change, thus, significantly reducing the risk of undetected performance problems. Our evaluation using two time series databases shows that, depending on the specific application scenario, application benchmarks cover different functions of the system under test. Their respective microbenchmark suites cover between 35.62% and 66.29% of the functions called during the application benchmark, offering substantial room for improvement. Through two use cases-removing redundancies in the microbenchmark suite and recommendation of yet uncovered functions-we decrease the total number of microbenchmarks and increase the practical relevance of both suites. Removing redundancies can significantly reduce the number of microbenchmarks (and thus the execution time as well) to similar to 10% and similar to 23% of the original microbenchmark suites, whereas recommendation identifies up to 26 and 14 newly, uncovered functions to benchmark to improve the relevance. By utilizing the differences and synergies of application benchmarks and microbenchmarks, our approach potentially enables effective software performance assurance with performance tests of multiple granularities.
  •  
15.
  • Grambow, Martin, et al. (författare)
  • Using Microbenchmark Suites to Detect Application Performance Changes
  • 2023
  • Ingår i: IEEE Transactions on Cloud Computing. - 2168-7161. ; 11:3, s. 2575-2590
  • Tidskriftsartikel (refereegranskat)abstract
    • Software performance changes are costly and often hard to detect pre-release. Similar to software testing frameworks, either application benchmarks or microbenchmarks can be integrated into quality assurance pipelines to detect performance changes before releasing a new application version. Unfortunately, extensive benchmarking studies usually take several hours which is problematic when examining dozens of daily code changes in detail; hence, trade-offs have to be made. Optimized microbenchmark suites, which only include a small subset of the full suite, are a potential solution for this problem, given that they still reliably detect the majority of the application performance changes such as an increased request latency. It is, however, unclear whether microbenchmarks and application benchmarks detect the same performance problems and one can be a proxy for the other. In this paper, we explore whether microbenchmark suites can detect the same application performance changes as an application benchmark. For this, we run extensive benchmark experiments with both the complete and the optimized microbenchmark suites of two time-series database systems, i.e., InfluxDB and VictoriaMetrics, and compare their results to the results of corresponding application benchmarks. We do this for 70 and 110 commits, respectively. Our results show that it is not trivial to detect application performance changes using an optimized microbenchmark suite. The detection (i) is only possible if the optimized microbenchmark suite covers all application-relevant code sections, (ii) is prone to false alarms, and (iii) cannot precisely quantify the impact on application performance. For certain software projects, an optimized microbenchmark suite can, thus, provide fast performance feedback to developers (e.g., as part of a local build process), help estimating the impact of code changes on application performance, and support a detailed analysis while a daily application benchmark detects major performance problems. Thus, although a regular application benchmark cannot be substituted for both studied systems, our results motivate further studies to validate and optimize microbenchmark suites.
  •  
16.
  • Grambow, M., et al. (författare)
  • Using Microbenchmark Suites to Detect Application Performance Changes
  • 2023
  • Ingår i: Ieee Transactions on Cloud Computing. - : Institute of Electrical and Electronics Engineers (IEEE). - 2168-7161 .- 2372-0018. ; 11:3, s. 2575-2590
  • Tidskriftsartikel (refereegranskat)abstract
    • Software performance changes are costly and often hard to detect pre-release. Similar to software testing frameworks, either application benchmarks or microbenchmarks can be integrated into quality assurance pipelines to detect performance changes before releasing a new application version. Unfortunately, extensive benchmarking studies usually take several hours which is problematic when examining dozens of daily code changes in detail; hence, trade-offs have to be made. Optimized microbenchmark suites, which only include a small subset of the full suite, are a potential solution for this problem, given that they still reliably detect the majority of the application performance changes such as an increased request latency. It is, however, unclear whether microbenchmarks and application benchmarks detect the same performance problems and one can be a proxy for the other. In this paper, we explore whether microbenchmark suites can detect the same application performance changes as an application benchmark. For this, we run extensive benchmark experiments with both the complete and the optimized microbenchmark suites of the two time-series database systems InfluxDB and VictoriaMetrics and compare their results to the results of corresponding application benchmarks. We do this for 70 and 110 commits, respectively. Our results show that it is possible to detect application performance changes using an optimized microbenchmark suite if frequent false-positive alarms can be tolerated.
  •  
17.
  • Guo, Yunfang, et al. (författare)
  • Studying the impact of CI on pull request delivery time in open source projects-a conceptual replication
  • 2019
  • Ingår i: Peerj Computer Science. - : PeerJ. - 2376-5992. ; 5
  • Tidskriftsartikel (refereegranskat)abstract
    • Nowadays, continuous integration (CI) is indispensable in the software development process. A central promise of adopting CI is that new features or bug fixes can be delivered more quickly. A recent repository mining study by Bernardo, da Costa & Kulesza (2018) found that only about half of the investigated open source projects actually deliver pull requests (PR) faster after adopting CI, with small effect sizes. However, there are some concerns regarding the methodology used by Bernardo et al., which may potentially limit the trustworthiness of this finding. Particularly, they do not explicitly control for normal changes in the pull request delivery time during a project's lifetime (independently of CI introduction). Hence, in our work, we conduct a conceptual replication of this study. In a first step, we replicate their study results using the same subjects and methodology. In a second step, we address the same core research question using an adapted methodology. We use a different statistical method (regression discontinuity design, RDD) that is more robust towards the confounding factor of projects potentially getting faster in delivering PRs over time naturally, and we introduce a control group of comparable projects that never applied CI. Finally, we also evaluate the generalizability of the original findings on a set of new open source projects sampled using the same methodology. We find that the results of the study by Bernardo et al. largely hold in our replication. Using RDD, we do not find robust evidence of projects getting faster at delivering PRs without CI, and we similarly do not see a speed-up in our control group that never introduced CI. Further, results obtained from a newly mined set of projects are comparable to the original findings. In conclusion, we consider the replication successful.
  •  
18.
  • Guo, Yunfang, et al. (författare)
  • Studying the impact of CI on pull request delivery time in open source projects - a conceptual replication
  • 2019
  • Ingår i: PeerJ Computer Science. - : PeerJ. - 2376-5992. ; 5
  • Tidskriftsartikel (refereegranskat)abstract
    • Nowadays, continuous integration (CI) is indispensable in the software development process. A central promise of adopting CI is that new features or bug fixes can be delivered more quickly. A recent repository mining study by Bernardo, da Costa & Kulesza (2018) found that only about half of the investigated open source projects actually deliver pull requests (PR) faster after adopting CI, with small effect sizes. However, there are some concerns regarding the methodology used by Bernardo et al., which may potentially limit the trustworthiness of this finding. Particularly, they do not explicitly control for normal changes in the pull request delivery time during a project’s lifetime (independently of CI introduction). Hence, in our work, we conduct a conceptual replication of this study. In a first step, we replicate their study results using the same subjects and methodology. In a second step, we address the same core research question using an adapted methodology. We use a different statistical method (regression discontinuity design, RDD) that is more robust towards the confounding factor of projects potentially getting faster in delivering PRs over time naturally, and we introduce a control group of comparable projects that never applied CI. Finally, we also evaluate the generalizability of the original findings on a set of new open source projects sampled using the same methodology. We find that the results of the study by Bernardo et al. largely hold in our replication. Using RDD, we do not find robust evidence of projects getting faster at delivering PRs without CI, and we similarly do not see a speed-up in our control group that never introduced CI. Further, results obtained from a newly mined set of projects are comparable to the original findings. In conclusion, we consider the replication successful.
  •  
19.
  • Jangali, Mostafa, et al. (författare)
  • Automated Generation and Evaluation of JMH Microbenchmark Suites From Unit Tests
  • 2023
  • Ingår i: IEEE Transactions on Software Engineering. - 0098-5589 .- 1939-3520. ; 49:4, s. 1704-1725
  • Tidskriftsartikel (refereegranskat)abstract
    • Performance is a crucial non-functional requirement of many software systems. Despite the widespread use of performance testing, developers still struggle to construct and evaluate the quality of performance tests. To address these two major challenges, we implement a framework, dubbed ju2jmh, to automatically generate performance microbenchmarks from JUnit tests and use mutation testing to study the quality of generated microbenchmarks. Specifically, we compare our ju2jmh generated benchmarks to manually written JMH benchmarks and to automatically generated JMH benchmarks using the AutoJMH framework, as well as directly measuring system performance with JUnit tests. For this purpose, we have conducted a study on three subjects (Rxjava, Eclipse-collections, and Zipkin) with $\sim$454 K source lines of code (SLOC), 2,417 JMH benchmarks (including manually written and generated AutoJMH benchmarks) and 35,084 JUnit tests. Our results show that the ju2jmh generated JMH benchmarks consistently outperform using the execution time and throughput of JUnit tests as a proxy of performance and JMH benchmarks automatically generated using the AutoJMH framework while being comparable to JMH benchmarks manually written by developers in terms of tests’ stability and ability to detect performance bugs. Nevertheless, ju2jmh benchmarks are able to cover more of the software applications than manually written JMH benchmarks during the microbenchmark execution. Furthermore, ju2jmh benchmarks are generated automatically, while manually written JMH benchmarks requires many hours of hard work and attention; therefore our study can reduce developers’ effort to construct microbenchmarks. In addition, we identify three factors (too low test workload, unstable tests and limited mutant coverage) that affect a benchmark’s ability to detect performance bugs. To the best of our knowledge, this is the first study aimed at assisting developers in fully automated microbenchmark creation and assessing microbenchmark quality for performance testing.
  •  
20.
  • Johansson, Albin, et al. (författare)
  • The Impact of Compiler Warnings on Code Quality in C++ Projects
  • 2024
  • Ingår i: IEEE International Conference on Program Comprehension. - 2643-7147 .- 2643-7171. ; , s. 270-279
  • Konferensbidrag (refereegranskat)abstract
    • Modern compilers often offer a variety of warning flags, which developers can enable to get feedback on code that, while syntactically correct, may be problematic. In the case of C++, one example of such 'correct but problematic' code is code that leads to undefined behavior (UB). The usage of compiler warnings has long been suspected as a way to decrease bugs and increase code quality. However, empirical evidence that supports this hypothesis is rare. In this study, we present evidence from a study of 127 open source C++ projects. We categorize their usage of compiler warnings into five groups based on which warning flags are being used, and analyse the relationship between compiler warnings and five quality metrics (bugs, critical issues, vulnerabilities, code smells, and technical debt) using Bayesian analysis. We conclude that, in general, compiler warnings indeed correlate with, and potentially cause, higher code quality, with the clearest impact being on the number of critical issues in a project. Using stricter warning flags expectantly correlates with higher code quality in our study objects. However, there are substantial differences between projects, which we attribute to the project's individual development culture. That is, while warnings matter, other factors such as quality culture, are likely to be even more important to source code quality.
  •  
21.
  • Khojah, Ranim, 1999, et al. (författare)
  • From Human-to-Human to Human-to-Bot Conversations in Software Engineering
  • 2024
  • Ingår i: AIware 2024 - Proceedings of the 1st ACM International Conference on AI-Powered Software, Co-located with: ESEC/FSE 2024. ; , s. 38-44
  • Konferensbidrag (refereegranskat)abstract
    • Software developers use natural language to interact not only with other humans, but increasingly also with chatbots. These interactions have different properties and flow differently based on what goal the developer wants to achieve and who they interact with. In this paper, we aim to understand the dynamics of conversations that occur during modern software development after the integration of AI and chatbots, enabling a deeper recognition of the advantages and disadvantages of including chatbot interactions in addition to human conversations in collaborative work. We compile existing conversation attributes with humans and NLU-based chatbots and adapt them to the context of software development. Then, we extend the comparison to include LLM-powered chatbots based on an observational study. We present similarities and differences between human-to-human and human-to-bot conversations, also distinguishing between NLU- and LLM-based chatbots. Furthermore, we discuss how understanding the differences among the conversation styles guides the developer on how to shape their expectations from a conversation and consequently support the communication within a software team. We conclude that the recent conversation styles that we observe with LLM-chatbots can not replace conversations with humans due to certain attributes regarding social aspects despite their ability to support productivity and decrease the developers' mental load.
  •  
22.
  • Laaber, C., et al. (författare)
  • An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment
  • 2018
  • Ingår i: MSR '18 Proceedings of the 15th International Conference on Mining Software Repositories. - New York, NY, USA : ACM Digital Library. - 9781450357166 ; , s. 119-130, s. 119-130
  • Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract
    • Continuous integration (CI) emphasizes quick feedback to developers. This is at odds with current practice of performance testing, which predominantely focuses on long-running tests against entire systems in production-like environments. Alternatively, software microbenchmarking attempts to establish a performance baseline for small code fragments in short time. This paper investigates the quality of microbenchmark suites with a focus on suitability to deliver quick performance feedback and CI integration. We study ten open-source libraries written in Java and Go with benchmark suite sizes ranging from 16 to 983 tests, and runtimes between 11 minutes and 8.75 hours. We show that our study subjects include benchmarks with result variability of 50% or higher, indicating that not all benchmarks are useful for reliable discovery of slow-downs. We further artificially inject actual slowdowns into public API methods of the study subjects and test whether test suites are able to discover them. We introduce a performance-test quality metric called the API benchmarking score (ABS). ABS represents a benchmark suite's ability to find slowdowns among a set of defined core API methods. Resulting benchmarking scores (i.e., fraction of discovered slowdowns) vary between 10% and 100% for the study subjects. This paper's methodology and results can be used to (1) assess the quality of existing microbenchmark suites, (2) select a set of tests to be run as part of CI, and (3) suggest or generate benchmarks for currently untested parts of an API.
  •  
23.
  • Laaber, C., et al. (författare)
  • Applying test case prioritization to software microbenchmarks
  • 2021
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 26:6
  • Tidskriftsartikel (refereegranskat)abstract
    • Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative.
  •  
24.
  • Laaber, Christoph, et al. (författare)
  • Dynamically reconfiguring software microbenchmarks: Reducing execution time without sacrificing result quality
  • 2020
  • Ingår i: ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. - New York, NY, USA : ACM. ; , s. 989-1001
  • Konferensbidrag (refereegranskat)abstract
    • Executing software microbenchmarks, a form of small-scale performance tests predominantly used for libraries and frameworks, is a costly endeavor. Full benchmark suites take up to multiple hours or days to execute, rendering frequent checks, e.g., as part of continuous integration (CI), infeasible. However, altering benchmark configurations to reduce execution time without considering the impact on result quality can lead to benchmark results that are not representative of the software's true performance. We propose the first technique to dynamically stop software microbenchmark executions when their results are sufficiently stable. Our approach implements three statistical stoppage criteria and is capable of reducing Java Microbenchmark Harness (JMH) suite execution times by 48.4% to 86.0%. At the same time it retains the same result quality for 78.8% to 87.6% of the benchmarks, compared to executing the suite for the default duration. The proposed approach does not require developers to manually craft custom benchmark configurations; instead, it provides automated mechanisms for dynamic reconfiguration. Hence, making dynamic reconfiguration highly effective and efficient, potentially paving the way to inclusion of JMH microbenchmarks in CI.
  •  
25.
  • Laaber, C., et al. (författare)
  • Software microbenchmarking in the cloud. How bad is it really?
  • 2019
  • Ingår i: Empirical Software Engineering. - : Springer Science and Business Media LLC. - 1382-3256 .- 1573-7616. ; 24:4, s. 2469-2508
  • Tidskriftsartikel (refereegranskat)abstract
    • Rigorous performance engineering traditionally assumes measuring on bare-metal environments to control for as many confounding factors as possible. Unfortunately, some researchers and practitioners might not have access, knowledge, or funds to operate dedicated performance-testing hardware, making public clouds an attractive alternative. However, shared public cloud environments are inherently unpredictable in terms of the system performance they provide. In this study, we explore the effects of cloud environments on the variability of performance test results and to what extent slowdowns can still be reliably detected even in a public cloud. We focus on software microbenchmarks as an example of performance tests and execute extensive experiments on three different well-known public cloud services (AWS, GCE, and Azure) using three different cloud instance types per service. We also compare the results to a hosted bare-metal offering from IBM Bluemix. In total, we gathered more than 4.5 million unique microbenchmarking data points from benchmarks written in Java and Go. We find that the variability of results differs substantially between benchmarks and instance types (by a coefficient of variation from 0.03% to >100%). However, executing test and control experiments on the same instances (in randomized order) allows us to detect slowdowns of 10% or less with high confidence, using state-of-the-art statistical tests (i.e., Wilcoxon rank-sum and overlapping bootstrapped confidence intervals). Finally, our results indicate that Wilcoxon rank-sum manages to detect smaller slowdowns in cloud environments.
  •  
26.
  • Leitner, Philipp, 1982, et al. (författare)
  • A mixed-method empirical study of Function-as-a-Service software development in industrial practice
  • 2019
  • Ingår i: Journal of Systems and Software. - : Elsevier BV. - 0164-1212. ; 149, s. 340-359
  • Tidskriftsartikel (refereegranskat)abstract
    • Function-as-a-Service (FaaS) describes cloud computing services that make infrastructure components transparent to application developers, thus falling in the larger group of "serverless" computing models. When using FaaS offerings, such as AWS Lambda, developers provide atomic and short-running code for their functions, and FaaS providers execute and horizontally scale them on-demand. Currently, there is no systematic research on how developers use serverless, what types of applications lend themselves to this model, or what architectural styles and practices FaaS-based applications are based on. We present results from a mixed-method study, combining interviews with practitioners who develop applications and systems that use FaaS, a systematic analysis of grey literature, and a Web-based survey. We find that successfully adopting FaaS requires a different mental model, where systems are primarily constructed by composing pre-existing services, with FaaS often acting as the "glue" that brings these services together. Tooling availability and maturity, especially related to testing and deployment, remains a major difficulty. Further, we find that current FaaS systems lack systematic support for function reuse, and abstractions and programming models for building non-trivial FaaS applications are limited. We conclude with a discussion of implications for FaaS providers, software developers, and researchers.
  •  
27.
  • Markusse, Florian, et al. (författare)
  • Towards Continuous Performance Assessment of Java Applications With PerfBot
  • 2023
  • Ingår i: 2023 IEEE/ACM 5TH INTERNATIONAL WORKSHOP ON BOTS IN SOFTWARE ENGINEERING, BOTSE. - 9798350302127 ; , s. 6-8
  • Konferensbidrag (refereegranskat)abstract
    • While many routine tasks in software development are already well supported through widely known open source bots, we currently lack general tools for automated continuous performance assessment. In this short paper, we summarize the results of an earlier study arguing for the benefits of using bots to continuously benchmark software performance, and present the design of a currently under development bot for Java (PerfBot).
  •  
28.
  • Markusse, Florian, et al. (författare)
  • Using Benchmarking Bots for Continuous Performance Assessment
  • 2022
  • Ingår i: IEEE Software. - 1937-4194 .- 0740-7459. ; 39:5, s. 50-55
  • Tidskriftsartikel (refereegranskat)abstract
    • Benchmarking bots are starting to see use as a productivity tool, helping large open source projects judge whether new code contributions negatively impact performance. We discuss how and why projects use benchmarking bots, and present an in-depth case study of The Nanosoldier bot used by the team behind the Julia programming language.
  •  
29.
  • Michael Ayas, Hamdy, 1994, et al. (författare)
  • An Empirical Analysis of Microservices Systems Using Consumer-Driven Contract Testing
  • 2022
  • Ingår i: Proceedings - 48th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2022. ; , s. 92-99
  • Konferensbidrag (refereegranskat)abstract
    • Testing has a prominent role in revealing faults in software based on microservices. One of the most important discussion points in MSAs is the granularity of services, often in different levels of abstraction. Similarly, the granularity of tests in MSAs is reflected in different test types. However, it is challenging to conceptualize how the overall testing architecture comes together when combining testing in different levels of abstraction for microservices. There is no empirical evidence on the overall testing architecture in such microservices implementations. Furthermore, there is a need to empirically understand how the current state of practice resonates with existing best practices on testing. In this study, we mine Github to find different candidate projects for an in-depth, qualitative assessment of their test artifacts. We analyze 16 repositories that use microservices and include various test artifacts. We focus on four projects that use consumer-driven-contract testing. Our results demonstrate how these projects cover different levels of testing. This study (i) drafts a testing architecture including activities and artifacts, and (ii) demonstrates how these align with best practices and guidelines. Our proposed architecture helps the categorization of system and test artifacts in empirical studies of microservices. Finally, we showcase a view of the boundaries between different levels of testing in systems using microservices.
  •  
30.
  • Michael Ayas, Hamdy, 1994, et al. (författare)
  • An empirical investigation on the competences and roles of practitioners in Microservices-based Architectures
  • 2024
  • Ingår i: Journal of Systems and Software. - 0164-1212. ; 213
  • Tidskriftsartikel (refereegranskat)abstract
    • Microservices-based Architectures (MSAs) are gaining popularity since, among others, they enable rapid and independent delivery of software at scale, facilitating the delivery of business value. Additionally, there are attempts towards understanding practitioners’ roles and technical knowledge. MSAs call for affinity in several technologies as well as business domains. This diversity makes it challenging to scope and describe the roles of practitioners. In addition, practitioners often do not receive training and contents of MSA training remain largely undefined, even though there are challenges in finding or developing relevant technical expertise. In this research, we determine the different technical roles that are required in MSAs, along with their detailed competences. We use public online forums (e.g., StackOverflow), where developers share technical knowledge. We analyze 13,517 public profiles of software engineers, deriving their technical competences. Our taxonomy of technical competences in MSAs, contains 11 competences clusters, organized in 3 collections of competences — Web Technologies, DevOps, and Data Technologies. In addition, we derive the roles of microservice practitioners and the characteristics of their roles. Our findings organize the technical competences of MSAs practitioners and determine the training topics and combination of topics that can prepare engineers for MSAs.
  •  
31.
  • Michael Ayas, Hamdy, 1994, et al. (författare)
  • An empirical study of the systemic and technical migration towards microservices
  • 2023
  • Ingår i: Empirical Software Engineering. - 1382-3256 .- 1573-7616. ; 28:4
  • Tidskriftsartikel (refereegranskat)abstract
    • ContextAs many organizations modernize their software architecture and transition to the cloud, migrations towards microservices become more popular. Even though such migrations help to achieve organizational agility and effectiveness in software development, they are also highly complex, long-running, and multi-faceted.ObjectiveIn this study we aim to comprehensively map the journey towards microservices and describe in detail what such a migration entails. In particular, we aim to discuss not only the technical migration, but also the long-term journey of change, on a systemic level.MethodOur research method is an inductive, qualitative study on two data sources. Two main methodological steps take place - interviews and analysis of discussions from StackOverflow. The analysis of both, the 19 interviews and 215 StackOverflow discussions, is based on techniques found in grounded theory.ResultsOur results depict the migration journey, as it materializes within the migrating organization, from structural changes to specific technical changes that take place in the work of engineers. We provide an overview of how microservices migrations take place as well as a deconstruction of high level modes of change to specific solution outcomes. Our theory contains 2 modes of change taking place in migration iterations, 14 activities and 53 solution outcomes of engineers. One of our findings is on the architectural change that is iterative and needs both a long and short term perspective, including both business and technical understanding. In addition, we found that a big proportion of the technical migration has to do with setting up supporting artifacts and changing the paradigm that software is developed.
  •  
32.
  • Michael Ayas, Hamdy, 1994, et al. (författare)
  • Facing the giant: A grounded theory study of decision-making in microservices migrations
  • 2021
  • Ingår i: International Symposium on Empirical Software Engineering and Measurement. - New York, NY, USA : ACM. - 1949-3789 .- 1949-3770.
  • Konferensbidrag (refereegranskat)abstract
    • Background: Microservices migrations are challenging and expensive projects with many decisions that need to be made in a multitude of dimensions. Existing research tends to focus on technical issues and decisions (e.g., how to split services). Equally important organizational or business issues and their relations with technical aspects often remain out of scope or on a high level of abstraction. Aims: In this study,we aim to holistically chart the decision-making that happens on all dimensions of a migration project towards microservices (including, but not limited to, the technical dimension). Method: We investigate 16 different migration cases in a grounded theory interview study, with 19 participants that recently migrated towards microservices. This study strongly focuses on the human aspects of a migration, through stakeholders and their decisions. Results: We identify 3 decision-making processes consisting of 22 decision-points and their alternative options. The decision-points are related to creating stakeholder engagement and assessing feasibility, technical implementation, and organizational restructuring. Conclusions: Our study provides an initial theory of decisionmaking in migrations to microservices. It also outfits practitioners with a roadmap of which decisions they should be prepared to make and at which point in the migration.
  •  
33.
  • Michael Ayas, Hamdy, 1994, et al. (författare)
  • The Migration Journey Towards Microservices
  • 2021
  • Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Cham : Springer International Publishing. - 1611-3349 .- 0302-9743. ; 13126 LNCS, s. 20-35
  • Konferensbidrag (refereegranskat)abstract
    • Organizations initiate migration projects in order to change their software architecture towards microservices and ripe the many benefits that microservices have to offer. However, migrations often take place in unstructured, non-systemic, and trial-and-error manners, resulting in unclarity and uncertainty in such projects. In this study, we investigate 16 software development organizations that migrated towards microservices and we chart their detailed migration journey. We do so by conducting an interview survey using some of the tools from Grounded Theory in 19 interviews from 16 organizations. Our results showcase the evolutionary and iterative nature of the migration journey at an architectural-level and system-implementation level. Also, we identify 18 detailed activities that take place in these levels, categorized in the four phases of 1) designing the architecture, 2) altering the system, 3) setting up supporting artifacts, and 4) implementing additional technical artifacts.
  •  
34.
  • Michael Ayas, Hamdy, 1994, et al. (författare)
  • The Perceived Impact and Sequence of Activities When Transitioning to Microservices
  • 2023
  • Ingår i: Proceedings - 17th IEEE International Conference on Service-Oriented System Engineering, SOSE 2023. ; , s. 156-160
  • Konferensbidrag (refereegranskat)abstract
    • Microservices migrations are often present in organizations that want to modernize and integrate their software systems. There is little empirical evidence showing how migration projects take place. Investigating migration activities and the perceived impact of migrations from practitioners is important to understand how migration projects materialize. In this study, we ask 54 practitioners about their views on specific aspects of microservices migrations. Specifically, we derive the sequence of migration activities as well as the perceived impact of microservices on the development process. Database refactoring, back-end refactoring, and setting up DevOps tend to take place before front-end refactoring, setting up communication patterns, and splitting teams. In addition, aligning teams with profitable value propositions, improving the testing process, and having fewer dependencies between teams are among the prominent impact areas of microservices. Our results call for further empirical research in understanding transitions toward MSAs.
  •  
35.
  • Michael Ayas, Hamdy, 1994, et al. (författare)
  • The roles, responsibilities, and skills of engineers in the era of microservices-based architectures
  • 2024
  • Ingår i: Proceedings - 2024 IEEE/ACM 17th International Conference on Cooperative and Human Aspects of Software Engineering, CHASE 2024. ; , s. 13-23
  • Konferensbidrag (refereegranskat)abstract
    • Enterprises often try to tame the complexity of their software using microservices and practitioners generally perceive the impact of microservices as positive. However, different responsibilities fall in the hands of practitioners and new skill-sets are required to address the challenges and reap the benefits of microservices. The objective of this study is to gather and organize what industry requires from microservices practitioners. To achieve this, we conduct a qualitative analysis of 125 job-Ads related to microservices that are gathered from 7 different countries, across 5 continents, posted during 14 consecutive days, sampled from a total of 1351 job-Ads. We contribute with detailed taxonomies on roles, responsibilities, soft-and hard-skills that are necessary for microservices practitioners. Specifically, we detail 5 families of responsibilities, 3 of which are human-centered, describe 8 themes of popular soft-skills and describe 11 themes of popular hard-skills, along with how they relate to soft-skills. Our results indicate the importance of human-centered responsibilities and skills in microservices practitioners, and point to the existence of a job market for microservices software architects with a high demand on human aspects. Hence, our findings can help unravel organizational structures in microservices, improve training programmes, and understand the manifestation of human aspects in microservices.
  •  
36.
  • Nilsfors, Jonathan, et al. (författare)
  • Cachematic – Automatic Invalidation in Application-Level Caching Systems
  • 2019
  • Ingår i: ICPE 2019 - Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering. - New York, NY, USA : ACM. ; , s. 167-178
  • Konferensbidrag (refereegranskat)abstract
    • Caching is a common method for improving the performance of modern web applications. Due to the varying architecture of web applications, and the lack of a standardized approach to cache man- agement, ad-hoc solutions are common. These solutions tend to be hard to maintain as a code base grows, and are a common source of bugs. We present Cachematic, a general purpose application-level caching system with an automatic cache management strategy. Cachematic provides a simple programming model, allowing de- velopers to explicitly denote a function as cacheable. The result of a cacheable function will transparently be cached without the developer having to worry about cache management. We present algorithms that automatically handle cache management, handling the cache dependency tree, and cache invalidation. Our experiments showed that the deployment of Cachematic decreased response time for read requests, compared to a manual cache management strategy for a representative case study conducted in collaboration with Bison, an US-based business intelligence company. We also found that, compared to the manual strategy, the cache hit rate was increased with a factor of around 1.64x. However, we observe a sig- nificant increase in response time for write requests. We conclude that automatic cache management as implemented in Cachematic is attractive for read-domminant use cases, but the substantial write overhead in our current proof-of-concept implementation represents a challenge.
  •  
37.
  • Samoaa, Hazem Peter, et al. (författare)
  • A systematic mapping study of source code representation for deep learning in software engineering
  • 2022
  • Ingår i: Iet Software. - : Institution of Engineering and Technology (IET). - 1751-8806 .- 1751-8814. ; 16:4, s. 351-385
  • Tidskriftsartikel (refereegranskat)abstract
    • The usage of deep learning (DL) approaches for software engineering has attracted much attention, particularly in source code modelling and analysis. However, in order to use DL, source code needs to be formatted to fit the expected input form of DL models. This problem is known as source code representation. Source code can be represented via different approaches, most importantly, the tree-based, token-based, and graph-based approaches. We use a systematic mapping study to investigate i detail the representation approaches adopted in 103 studies that use DL in the context of software engineering. Thus, studies are collected from 2014 to 2021 from 14 different journals and 27 conferences. We show that each way of representing source code can provide a different, yet orthogonal view of the same source code. Thus, different software engineering tasks might require different (combinations of) code representation approaches, depending on the nature and complexity of the task. Particularly, we show that it is crucial to define whether the DL approach requires lexical, syntactical, or semantic code information. Our analysis shows that a wide range of different representations and combinations of representations (hybrid representations) are used to solve a wide range of common software engineering problems. However, we also observe that current research does not generally attempt to transfer existing representations or models to other studies even though there are other contexts in which these representations and models may also be useful. We believe that there is potential for more reuse and the application of transfer learning when applying DL to software engineering tasks.
  •  
38.
  • Samoaa, Peter, et al. (författare)
  • An Exploratory Study of the Impact of Parameterization on JMH Measurement Results in Open-Source Projects
  • 2021
  • Ingår i: ICPE 2021 - Proceedings of the ACM/SPEC International Conference on Performance Engineering. - New York, NY, USA : ACM. ; , s. 213-224
  • Konferensbidrag (refereegranskat)abstract
    • The Java Microbenchmarking Harness (JMH) is a widely used tool for testing performance-critical code on a low level. One of the key features of JMH is the support for user-defined parameters, which allows executing the same benchmark with different workloads. However, a benchmark configured with n parameters with m different values each requires JMH to execute the benchmark mn times (once for each combination of configured parameter values). Consequently, even fairly modest parameterization leads to a combinatorial explosion of benchmarks that have to be executed, hence dramatically increasing execution time. However, so far no research has investigated how this type of parameterization is used in practice, and how important different parameters are to benchmarking results. In this paper, we statistically study how strongly different user parameters impact benchmark measurements for 126 JMH benchmarks from five well-known open source projects. We show that 40% of the studied metric parameters have no correlation with the resulting measurement, i.e., testing with different values in these parameters does not lead to any insights. If there is a correlation, it is often strongly predictable following a power law, linear, or step function curve. Our results provide a first understanding of practical usage of user-defined JMH parameters, and how they correlate with the measurements produced by benchmarks. We further show that a machine learning model based on Random Forest ensembles can be used to predict the measured performance of an untested metric parameter value with an accuracy of 93% or higher for all but one benchmark class, demonstrating that given sufficient training data JMH performance test results for different parameterizations are highly predictable.
  •  
39.
  • Schermann, Gerald, et al. (författare)
  • Continuous Experimentation - Challenges, Implementation Techniques, and Current Research
  • 2018
  • Ingår i: IEEE Software. - 1937-4194 .- 0740-7459. ; 35:2, s. 26-31
  • Tidskriftsartikel (refereegranskat)abstract
    • Continuous experimentation is an up-and-coming technique for requirements engineering and testing, particularly for web-based systems. On the basis of a practitioner survey, this article gives an overview of challenges, implementation techniques, and current research in the field. This article is part of a theme issue on release engineering.
  •  
40.
  • Schermann, G., et al. (författare)
  • Search-Based Scheduling of Experiments in Continuous Deployment
  • 2018
  • Ingår i: 2018 IEEE International Conference on Software Maintenance and Evolution. - : IEEE. - 9781538678701
  • Konferensbidrag (refereegranskat)abstract
    • Continuous experimentation involves practices for testing new functionality on a small fraction of the user base in production environments. Running multiple experiments in parallel requires handling user assignments (i.e., which users are part of which experiments) carefully as experiments might overlap and influence each other. Furthermore, experiments are prone to change, get canceled, or are adjusted and restarted, and new ones are added regularly. We formulate this as an optimization problem, fostering the parallel execution of experiments and making sure that enough data is collected for every experiment avoiding overlapping experiments. We propose a genetic algorithm that is capable of (re-)scheduling experiments and compare with other search-based approaches (random sampling, local search, and simulated annealing). Our evaluation shows that our genetic implementation outperforms the other approaches by up to 19% regarding the fitness of the solutions identified and up to a factor three in execution time in our evaluation scenarios.
  •  
41.
  • Schermann, Gerald, et al. (författare)
  • Search-Based Scheduling of Experiments in Continuous Deployment
  • 2018
  • Ingår i: 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). - 9781538678701 ; , s. 485-495
  • Konferensbidrag (refereegranskat)abstract
    • Continuous experimentation involves practices for testing new functionality on a small fraction of the user base in production environments. Running multiple experiments in parallel requires handling user assignments (i.e., which users are part of which experiments) carefully as experiments might overlap and influence each other. Furthermore, experiments are prone to change, get canceled, or are adjusted and restarted, and new ones are added regularly. We formulate this as an optimization problem, fostering the parallel execution of experiments and making sure that enough data is collected for every experiment avoiding overlapping experiments. We propose a genetic algorithm that is capable of (re-)scheduling experiments and compare with other search-based approaches (random sampling, local search, and simulated annealing). Our evaluation shows that our genetic implementation outperforms the other approaches by up to 19% regarding the fitness of the solutions identified and up to a factor three in execution time in our evaluation scenarios.
  •  
42.
  • Schermann, Gerald, et al. (författare)
  • Topology-aware continuous experimentation in microservice-based applications
  • 2020
  • Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Cham : Springer International Publishing. - 1611-3349 .- 0302-9743. ; 12571, s. 19-35
  • Konferensbidrag (refereegranskat)abstract
    • Continuous experiments, including practices such as canary releases or A/B testing, test new functionality on a small fraction of the user base in production environments. Monitoring data collected on different versions of a service is essential for decision-making on whether to continue or abort experiments. Existing approaches for decision-making rely on service-level metrics in isolation, ignoring that new functionality might introduce changes affecting other services or the overall application’s health state. Keeping track of these changes in applications comprising dozens or hundreds of services is challenging. We propose a holistic approach implemented as a research prototype to identify, visualize, and rank topological changes from distributed tracing data. We devise three ranking heuristics assessing how the changes impact the experiment’s outcome and the application’s health state. An evaluation on two case study scenarios shows that a hybrid heuristic based on structural analysis and a simple root-cause examination outperforms other heuristics in terms of ranking quality.
  •  
43.
  • Schermann, Gerald, et al. (författare)
  • We’re Doing It Live: A Multi-Method Empirical Study on Continuous Experimentation
  • 2018
  • Ingår i: Information and Software Technology. - : Elsevier BV. - 0950-5849. ; 99, s. 41-57
  • Tidskriftsartikel (refereegranskat)abstract
    • Context: Continuous experimentation guides development activities based on data collected on a subset of online users on a new experimental version of the software. It includes practices such as canary releases, gradual rollouts, dark launches, or A/B testing. Objective: Unfortunately, our knowledge of continuous experimentation is currently primarily based on well-known and outspoken industrial leaders. To assess the actual state of practice in continuous experimentation, we conducted a mixed-method empirical study. Method: In our empirical study consisting of four steps, we interviewed 31 developers or release engineers, and performed a survey that attracted 187 complete responses. We analyzed the resulting data using statistical analysis and open coding. Results: Our results lead to several conclusions: (1) from a software architecture perspective, continuous experimentation is especially enabled by architectures that foster independently deployable services, such as microservices-based architectures; (2) from a developer perspective, experiments require extensive monitoring and analytics to discover runtime problems, consequently leading to developer on call policies and influencing the role and skill sets required by developers; and (3) from a process perspective, many organizations conduct experiments based on intuition rather than clear guidelines and robust statistics. Conclusion: Our findings show that more principled and structured approaches for release decision making are needed, striving for highly automated, systematic, and data- and hypothesis-driven deployment and experimentation.
  •  
44.
  • Scheuner, Joel, 1991, et al. (författare)
  • A Cloud Benchmark Suite Combining Micro and Applications Benchmarks
  • 2018
  • Ingår i: ACM/SPEC International Conference on Performance Engineering Companion. - New York, NY, USA : ACM. - 9781450356299 ; , s. 161-166
  • Konferensbidrag (refereegranskat)abstract
    • Micro and application performance benchmarks are commonly used to guide cloud service selection. However, they are often considered in isolation in a hardly reproducible setup with a flawed execution strategy. This paper presents a new execution methodology that combines micro and application benchmarks into a benchmark suite called RMIT Combined, integrates this suite into an automated cloud benchmarking environment, and implements a repeatable execution strategy. Additionally, a newly crafted Web serving benchmark called WPBench with three different load scenarios is contributed. A case study in the Amazon EC2 cloud demonstrates that choosing a cost-efficient instance type can deliver up to 40% better performance with 40% lower costs at the same time for the Web serving benchmark WPBench. Contrary to prior research, our findings reveal that network performance does not vary relevantly anymore. Our results also show that choosing a modern type of virtualization can improve disk utilization up to 10% for I/O-heavy workloads.
  •  
45.
  • Scheuner, Joel, 1991, et al. (författare)
  • Estimating Cloud Application Performance Based on Micro-Benchmark Profiling
  • 2018
  • Ingår i: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). - 2159-6190 .- 2159-6190. - 9781538672358 ; 2018-July, s. 90-97
  • Konferensbidrag (refereegranskat)abstract
    • The continuing growth of the cloud computing market has led to an unprecedented diversity of cloud services. To support service selection, micro-benchmarks are commonly used to identify the best performing cloud service. However, it remains unclear how relevant these synthetic micro-benchmarks are for gaining insights into the performance of real-world applications. Therefore, this paper develops a cloud benchmarking methodology that uses micro-benchmarks to profile applications and subsequently predicts how an application performs on a wide range of cloud services. A study with a real cloud provider (Amazon EC2) has been conducted to quantitatively evaluate the estimation model with 38 metrics from 23 micro-benchmarks and 2 applications from different domains. The results reveal remarkably low variability in cloud service performance and show that selected micro-benchmarks can estimate the duration of a scientific computing application with a relative error of less than 10% and the response time of a Web serving application with a relative error between 10% and 20%. In conclusion, this paper emphasizes the importance of cloud benchmarking by substantiating the suitability of micro-benchmarks for estimating application performance in comparison to common baselines but also highlights that only selected micro-benchmarks are relevant to estimate the performance of a particular application.
  •  
46.
  • Scheuner, Joel, 1991, et al. (författare)
  • Function-as-a-Service Performance Evaluation: A Multivocal Literature Review
  • 2020
  • Ingår i: Journal of Systems and Software. - : Elsevier BV. - 0164-1212. ; 170
  • Tidskriftsartikel (refereegranskat)abstract
    • Function-as-a-Service (FaaS) is one form of the serverless cloud computing paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing event-triggered code snippets (i.e., functions). Many studies that empirically evaluate the performance of such FaaS platforms have started to appear but we are currently lacking a comprehensive understanding of the overall domain. To address this gap, we conducted a multivocal literature review (MLR) covering 112 studies from academic (51) and grey (61) literature. We find that existing work mainly studies the AWS Lambda platform and focuses on micro-benchmarks using simple functions to measure CPU speed and FaaS platform overhead (i.e., container cold starts). Further, we discover a mismatch between academic and industrial sources on tested platform configurations, find that function triggers remain insufficiently studied, and identify HTTP API gateways and cloud storages as the most used external service integrations. Following existing guidelines on experimentation in cloud systems, we discover many flaws threatening the reproducibility of experiments presented in the surveyed studies. We conclude with a discussion of gaps in literature and highlight methodological suggestions that may serve to improve future FaaS performance evaluation studies.
  •  
47.
  • Scheuner, Joel, 1991, et al. (författare)
  • Performance Benchmarking of Infrastructure-as-a-Service (IaaS) Clouds with Cloud WorkBench (Tutorial)
  • 2019
  • Ingår i: 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W). - 9781728124063 ; June 2019, s. 257-258
  • Konferensbidrag (refereegranskat)abstract
    • The continuing growth of the cloud computing market has led to an unprecedented diversity of cloud services with different performance characteristics. To support service selection, researchers and practitioners conduct cloud performance benchmarking by measuring and objectively comparing the performance of different providers and configurations (e.g., instance types in different data center regions). In this tutorial, we demonstrate how to write performance tests for IaaS clouds using the Web-based benchmarking tool Cloud WorkBench (CWB). We will motivate and introduce benchmarking of IaaS cloud in general, demonstrate the execution of a simple benchmark in a public cloud environment, summarize the CWB tool architecture, and interactively develop and deploy a more advanced benchmark together with the participants.
  •  
48.
  • Scheuner, Joel, 1991, et al. (författare)
  • Performance Benchmarking of Infrastructure-as-a-Service (IaaS) Clouds with CloudWorkBench
  • 2019
  • Ingår i: ICPE 2019 - Companion of the 2019 ACM/SPEC International Conference on Performance Engineering. - New York, NY, USA : ACM. ; , s. 53-56
  • Konferensbidrag (refereegranskat)abstract
    • The continuing growth of the cloud computing market has led to an unprecedented diversity of cloud services with different performance characteristics. To support service selection, researchers and practitioners conduct cloud performance benchmarking by measuring and objectively comparing the performance of different providers and configurations (e.g., instance types in different data center regions). In this tutorial, we demonstrate how to write performance tests for IaaS clouds using the Web-based benchmarking tool Cloud WorkBench (CWB). We will motivate and introduce benchmarking of IaaS cloud in general, demonstrate the execution of a simple benchmark in a public cloud environment, summarize the CWB tool architecture, and interactively develop and deploy a more advanced benchmark together with the participants.
  •  
49.
  • Scheuner, Joel, 1991, et al. (författare)
  • Transpiling Applications into Optimized Serverless Orchestrations
  • 2019
  • Ingår i: Proceedings - 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems, FAS*W 2019. - : IEEE. ; June 2019, s. 72-73
  • Konferensbidrag (refereegranskat)abstract
    • The serverless computing paradigm promises increased development productivity by abstracting the underlying hardware infrastructure and software runtime when building distributed cloud applications. However, composing a serverless application consisting of many tiny functions is still a cumbersome and inflexible process due to the lack of a unified source code view and strong coupling to non-standardized function-level interfaces for code and configuration. In our vision, developers can focus on writing readable source code in a logical structure, which then gets transformed into an optimized multi-function serverless orchestration. Our idea involves transpilation (i.e., source-to-source transformation) based on an optimization model (e.g., cost optimization) by dynamically deciding which set of methods will be grouped into individual deployment units. A successful implementation of our vision would enable a broader range of serverless applications and allow for dynamic deployment optimization based on monitoring runtime metrics. Further, we would expect increased developer productivity by using more familiar abstractions and facilitating clean coding practices and code reuse.
  •  
50.
  • Skarlat, O., et al. (författare)
  • Optimized IoT service placement in the fog
  • 2017
  • Ingår i: Service Oriented Computing and Applications. - : Springer Science and Business Media LLC. - 1863-2394 .- 1863-2386. ; 11:4, s. 427-443
  • Tidskriftsartikel (refereegranskat)abstract
    • © 2017 The Author(s) The Internet of Things (IoT) leads to an ever-growing presence of ubiquitous networked computing devices in public, business, and private spaces. These devices do not simply act as sensors, but feature computational, storage, and networking resources. Being located at the edge of the network, these resources can be exploited to execute IoT applications in a distributed manner. This concept is known as fog computing. While the theoretical foundations of fog computing are already established, there is a lack of resource provisioning approaches to enable the exploitation of fog-based computational resources. To resolve this shortcoming, we present a conceptual fog computing framework. Then, we model the service placement problem for IoT applications over fog resources as an optimization problem, which explicitly considers the heterogeneity of applications and resources in terms of Quality of Service attributes. Finally, we propose a genetic algorithm as a problem resolution heuristic and show, through experiments, that the service execution can achieve a reduction of network communication delays when the genetic algorithm is used, and a better utilization of fog resources when the exact optimization method is applied.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 57
Typ av publikation
konferensbidrag (29)
tidskriftsartikel (24)
bokkapitel (3)
rapport (1)
Typ av innehåll
refereegranskat (53)
övrigt vetenskapligt/konstnärligt (4)
Författare/redaktör
Leitner, Philipp, 19 ... (57)
Scheuner, Joel, 1991 (8)
Gomes, Francisco, 19 ... (4)
Cito, Jurgen (4)
Gall, Harald C. (4)
de Oliveira Neto, Fr ... (3)
visa fler...
Ali-Eldin Hassan, Ah ... (3)
Alhanahnah, Mohannad (3)
Erlenhov, Linda, 197 ... (3)
Serebrenik, Alexande ... (2)
Steghöfer, Jan-Phili ... (2)
Hebig, Regina, 1984 (2)
Scandariato, Riccard ... (2)
Haghir Chehreghani, ... (2)
Gall, H. C. (2)
Chard, Kyle (1)
Wac, Katarzyna (1)
Arvidsson, Åke (1)
Lassila, Pasi (1)
Johansson, Albin (1)
Costa, D. (1)
Aronsson, Linus, 199 ... (1)
Liberal, Fidel (1)
van Hoorn, André (1)
Barker, Adam (1)
Bayram, Firas (1)
Deng, Rui (1)
Bermbach, David (1)
Bertilsson, Marcus (1)
Burakowski, Wojciech (1)
Galinac Grbac, Tihan ... (1)
Rinard, Martin (1)
Cito, J. (1)
Rinard, M. (1)
Ieee,, Ieee, (1)
Bosshard, Christian (1)
Knecht, Markus (1)
Mazlami, Genc (1)
Bezemer, C. P. (1)
Andrzejak, A. (1)
Ganchev, Ivan (1)
Damasceno Costa, Die ... (1)
Bezemer, Cor Paul (1)
Andrzejak, Artur (1)
Scheuner, Joel (1)
Schulte, S. (1)
Erlenhov, Linda (1)
Jamshidi, Pooyan (1)
Ray, Suprio (1)
Fischer, Hartmut (1)
visa färre...
Lärosäte
Chalmers tekniska högskola (51)
Göteborgs universitet (21)
Högskolan Kristianstad (1)
Karlstads universitet (1)
Blekinge Tekniska Högskola (1)
Språk
Engelska (57)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (57)
Teknik (12)
Samhällsvetenskap (3)
Medicin och hälsovetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy