MaLTeSQuE 2020

Accepted papers

Technical papers

A Preliminary Study on the Adequacy of Static Analysis Warnings with Respect to Code Smell Prediction

Savanna Lujan savanna.lujan@tuni.fi (Tampere University), Fabiano Pecorelli fpecorelli@unisa.it (SeSa Lab - University of Salerno),
Fabio Palomba fpalomba@unisa.it (SeSa Lab - University of Salerno), Andrea De Lucia adelucia@unisa.it (SeSa Lab - University of Salerno),
Valentina Lenarduzzi valentina.lenarduzzi@lut.fi (LUT University)

Abstract: Code smells are poor implementation choices applied during software evolution that can affect source code maintainability. While several heuristic-based approaches have been proposed in the past, machine learning solutions have recently gained attention since they may potentially address some limitations of state-of-the-art approaches. Unfortunately, however, machine learning-based code smell detectors still suffer from low accuracy. In this paper, we aim at advancing the knowledge in the field by investigating the role of static analysis warnings as features of machine learning models for the detection of three code smell types. We first verify the potential contribution given by these features. Then, we build code smell prediction models exploiting the most relevant features coming from the first analysis. The main finding of the study reports that the warnings given by the considered tools lead the performance of code smell prediction models to drastically increase with respect to what reported by previous research in the field.

RARE: A Labeled Dataset for Cloud-Native Memory Anomalies

Francesco Lomio francesco.lomio@tuni.fi (Tampere University), Diego Martínez Baselga diego.martinezbaselga@tuni.fi (Tampere University),
Sergio Moreschini sergio.moreschini@tuni.fi (Tampere University), Heikki Huttunen heikki.huttunen@tuni.fi (Tampere University),
Davide Taibi davide.taibi@tuni.fi (Tampere University)

Abstract: Anomaly detection has been attracting interest from both the industry and the research community for many years, as the number of published papers and services adopted grew exponentially over the last decade. One of the reasons behind this is the wide adoption of cloud systems from the majority of players in multiple industries, such as online shopping, advertisement or remote computing. In this work we propose a Dataset foR cloud-nAtive memoRy anomaliEs: RARE. It includes labelled anomaly time-series data, comprising of over 900 unique metrics. This dataset has been generated using a microservice for injecting artificial byte stream in order to overload the nodes, provoking memory anomalies, which in some cases resulted in a crash. The system was built using a Kafka server deployed on a Kubernetes system. Moreover, in order to get access and download the metrics related to the server, we utilised Prometheus. In this paper we present a dataset that can be used coupled with machine learning algorithms for detecting anomalies in a cloud based system. The dataset will be available in the form of CSV file through an online repository. Moreover, we also included an example of application using a Random Forest algorithm for classifying the data as anomalous or not. The goal of the RARE dataset is to help in the development of more accurate and reliable machine learning methods for anomaly detection in cloud based systems.

TraceSim: A Method for Calculating Stack Trace Similarity

Roman Vasiliev roman.vasiliev@jetbrains.com (JetBrains), Dmitrij Koznov d.koznov@spbu.ru (Saint-Petersburg State University),
George Chernishev chernishev@gmail.com (Saint-Petersburg University, Russia), Aleksandr Khvorov aleksandr.khvorov@jetbrains.com (JetBrains),
Dmitry Luciv d.luciv@spbu.ru (Saint-Petersburg State University), Nikita Povarov nikita.povarov@jetbrains.com (JetBrains)

Abstract: Many contemporary software products have subsystems for automatic crash reporting. However, it is well-known that the same bug can produce slightly different reports. To manage this problem, reports are usually grouped, often manually by developers. Manual triaging, however, becomes infeasible for products that have large userbases, which is the reason for many different approaches to automating this task. Moreover, it is important to improve quality of triaging due to the big volume of reports that needs to be processed properly. Therefore, even a relatively small improvement could play a significant role in overall accuracy of report bucketing. The majority of existing studies use some kind of a stack trace similarity metric, either based on information retrieval techniques or string matching methods. However, it should be stressed that the quality of triaging is still insufficient.
In this paper, we describe TraceSim a novel approach to address this problem which combines TF-IDF, Levenshtein distance, and machine learning to construct a similarity metric. Our metric has been implemented inside an industrial-grade report triaging system. The evaluation on a manually labeled dataset shows significantly better results compared to baseline approaches.

Speeding Up the Data Extraction of Machine Learning Approaches: A Distributed Framework

Martin Steinhauer m.steinhauer@studenti.unisa.it (University of Salerno), Fabio Palomba fpalomba@unisa.it (University of Salerno)

Abstract: In the last decade, mining software repositories (MSR) has become one of the most important sources to feed machine learning models. Especially open-source projects on platforms like GitHub are providing a tremendous amount of data and make them easily accessible. Nevertheless, there is still is a lack of standardized pipelines to extract data in an automated and fast way. Even though several frameworks and tools exist which can fulfill specific tasks or parts of the data extraction process, none of them allow neither building an automated mining pipeline nor the possibility for full parallelization. As a consequence, researchers interested in using mining software repositories to feed machine learning models are often forced to re-implement commonly used tasks leading to additional development time and libraries may not be integrated optimally.
This preliminary study aims to demonstrate current limitations of existing tools and Git itself which are threatening the prospects of standardization and parallelization. We also introduce the multi-dimensionality aspects of a Git repository and how they affects the computation time. Finally, as a proof of concept, we define an exemplary pipeline for predicting refactoring operations, assessing its performance. Finally, we discuss the limitations of the pipeline and further optimizations to be done.

Singling the Odd Ones Out: A Novelty Detection Approach to Find Defects in Infrastructure-as-Code

Stefano Dalla Palma s.dallapalma@uvt.nl (Jheronimus Academy of Data Science), Majid Mohammadi m.mohammadi1@tue.nl (Jheronimus Academy of Data Science), Dario Di Nucci d.dinucci@uvt.nl (Jheronimus Academy of Data Science), Damian A. Tamburri d.a.tamburri@tue.nl (Jheronimus Academy of Data Science)

Abstract: Although Infrastructure-as-Code (IaC) is increasingly adopted, little is known about how to best maintain and evolve it. Previous studies focused on defining Machine-Learning models to predict defect-prone blueprints with the final goal of helping DevOps engineers scheduling testing and maintenance activities. However, the dominant technique for IaC defect prediction is supervised binary classification, which uses defective and non-defective instances for training but Such methods require labeled data points to train the classifier. Furthermore, the high imbalance between defective and non-defective samples makes the training more difficult and leads to unreliable classifiers. In this work, we tackle the defect-prediction problem from a different perspective using novelty detection and evaluate the performance of three techniques, namely OneClassSVM, LocalOutlierFactor, and IsolationForest, and compare their performance with a baseline RandomForest binary classifier. Such models are trained using only non-defective samples. At the same time, defective data points are treated as novelty because the number of defective samples is too little compared to defective ones. We conduct an empirical study on an extremely imbalanced dataset consisting of 85 real-world Ansible projects containing only small amounts of defective instances. We found that novelty detection techniques can recognize defects with a high level of precision and recall, an AUC-PR up to 0.86, and an MCC up to 0.31. We deem our results can influence the current trends in defect detection and put forward a new research path toward dealing with this problem.

DeepIaC: Deep Learning-based Linguistic Anti-pattern Detection in IaC

Nemania Borovits n.borovits@tilburguniversity.edu (Tilburg University/JADS), Indika Kumara i.p.k.weerasingha.dewage@tue.nl (Eindhoven University of Technology/JADS), Parvathy Krishnan parvathykrishnank@gmail.com (Tilburg University/JADS), Stefano Dalla Palma s.dalla.palma@uvt.nl (Tilburg University/JADS), Dario Di Nucci d.dinucci@uvt.nl (Tilburg University/JADS), Fabio Palomba fpalomba@unisa.it (University of Salerno), Damian Andrew Tamburri d.a.tamburri@tue.nl (Eindhoven University of Technology/JADS), Willem-Jan van den Heuvel W.J.A.M.v.d.Heuvel@jads.nl (Tilburg University/JADS)

Abstract: Linguistic anti-patterns are recurring poor practices concerning inconsistencies among the naming, documentation, and implementation of an entity. They impede readability, understandability, and maintainability of source code. In this paper, we attempt to detect linguistic anti-patterns in infrastructure as code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their names. To this end, we propose a novel automated approach that employs word embeddings and deep learning techniques. We build and use the abstract syntax tree of IaC code units to create their code embedments. Our experiments with a dataset systematically extracted from open source repositories show that our approach yields an accuracy between 0.785 and 0.915 in detecting inconsistencies.

Presentation abstract

An Effective Sequence Alignment Method for Duplicate Crash Report Detection

Irving Muller Rodrigues irving.rodrigues@gmail.com (Polytechnique Montreal), Daniel Aloise daniel.aloise@polymtl.ca (Polytechnique Montreal),
Eraldo Rezende Fernandes eraldo@facom.ufms.br (Universidade Federal de Mato Grosso do Sul)

Abstract: Software systems can automatically send crash reports to developers for investigation when a program failure occurs. A significant portion of these crash reports are duplicate, i.e., they were caused by the same software issue. In general, developers want to group duplicate crash reports into the same cluster, denoted bucket, to better analyze the software failure. However, to manually perform this task is time consuming, laborious and impractical in many software systems. In this paper, we present a novel method to automatically detect duplicate crash reports based on stack traces generated when the system crashes. Our technique is an extension of a previous method based on the Needleman-Wunsch algorithm. This previous method computes the similarity between two stack traces by means of edit operations considering fixed penalties. We propose a mechanism that incorporates the position and the frequency of functions in the stack trace in order to compute these penalties. We demonstrate that our technique outperforms state-of-the-art systems and strong baselines in different scenarios.