T4: Data Science Approaches in Software Engineering

Date

July 22 (Monday)

Period

1:30 - 4:30 pm (Half-day)

Tutorial Outline

Industrial mathematical models and simulations – strengths, weaknesses, limitations
Generalizability and specificity – balancing these in a software development environment
Choosing variables (including the question ‘to fish or not to fish?’)
Correlation and causality – can we test for causality, and if so, how?
Linkages from in-process to customer experience, then to customer sentiment metrics
Case studies – general and specific models that have worked well and that haven’t
Impact v. precision/recall – how to identify which problems to address, and where to invest
Practical limitations of models in industrial settings
Customer sentiment models and models characterizing non-functional requirements
Reporting/goaling/governance and the best-in-class paradigm
Measuring model adherence/adoption and effectiveness
Practical considerations in integrating models into the engineering workflow
Use of data science in corporate quality programs
What has worked and what has not – what are the next steps for computational engineering

Abstract

There is an abundance of metrics in the field of software reliability engineering. However, it is challenging to incorporate the right metrics in predictive models that are both mathematically sound and also reflect the behavior of software in a large, productional environment. And most importantly, the metrics and models need to help the engineering teams improve their practices and processes, thereby enabling them to produce quality products.

High performance models are needed to enable software practitioners to identify deficient (and superior) development and test practices. Even using standard practice metrics, and models derived from these metrics, software development teams can, and do, vary substantially in practice adoption and effectiveness. One challenge for researchers and analysts in these organizations is to develop and implement mathematical models that adequately characterize the health of individual practices (such as code review, unit test, static analysis, regression testing, etc.). These models can enable process and quality assurance groups to assist engineering teams in surgically repairing broken practices or replacing them with more effective and efficient ones.

In this tutorial, we will describe our experience with various types of in-process and downstream metrics. We then describe how such metrics are used in model building and implementation, and describe the boundaries within which certain types of models perform well. We will also address how to balance model generalizability and specificity in order to integrate the correct computational strategies and methods into everyday engineering workflows.

An important part of the the analysis and modeling effort needs to be the correlative ‘linking’ of the development and test metric values to customer experience outcomes, and then 'linking' outcomes to customer sentiment (i.e., satisfaction). These 'linkages' are essential in not only convincing engineering leadership to use certain computational tools in practice, but also are needed to enable investigators, at an early stage, to design experiments and pilots to test model applicability for future products and releases. After convincing experiments and pilots have been demonstrated, much work remains: Choosing a useful, and manageable, set of metrics; establishing goals and tracking/reporting mechanisms; and planning and implementing the tooling, training, and rollout steps. These practical considerations invariably put a strain on the models, therefore the models and ancillary analyses must be resilient, ‘industrial strength.’

Understanding a model’s practical limitations and strengths are important aspects of its use – just as the mathematical and statistical limitations and strengths underscore a model’s scientific validity. Both factors, mathematical and practical, need to mesh properly in a workable way in a data-driven engineering environment. The tutorial addresses the integration of these factors.

We will describe our experience in building and implementing models that are used by engineering teams that employ diverse development approaches, including waterfall, hybrid and agile development. We will show how we link in-process measures/metrics (from development and test) to customer experience, and then to customer satisfaction, which in turn correlates strongly with company revenue and market share. We will discuss the steps involved in choosing the most valuable metrics, setting goals for these metrics, and using them to help in improving development and test practices and processes.

Speaker

Pete Rotella USA

Cisco Systems, Inc, USA

Pete Rotella has over 30 years’ experience in the software industry, as a leader of large-scale development projects and senior software engineering researcher. He has led major system development projects at IBM Corporation, U.S. Environmental Protection Agency, U.S. National Institutes of Health, GlaxoSmithKline plc, Unisys Corporation, and several statistical systems startups. For the past 16 years, he has been focusing on improving software reliability at Cisco Systems, Inc.

Sunita Chulani USA

Cisco Systems, Inc, USA

Sunita Chulani is an Advisory Engineer/Senior Manager of Analytical Models and Insights at Cisco Systems. Sunita has deep subject matter expertise in the area of software metrics, measurement and modeling and is responsible for developing insights derived from descriptive and prescriptive quality data analytics. Sunita has a good understanding of the mix between engineering and management with good analytical, communication and leadership skills. Her team’s charter focuses on Analytic Models and Customer/Product Insights. Sunita is a go to expert with a 9-year tenure at Cisco. She has several patents and co-authored a book, several book chapters, encyclopedia articles and more than five dozen papers/presentations at prestigious conferences. She is also very active in IEEE with a strong influence in the field of software reliability and has taught graduate level courses at Carnegie Mellon University. Prior to Cisco, Sunita was as a Research Staff Member at IBM Research in the Center for Software Engineering. She received her Ph.D. and Masters in Computer Science (with an emphasis on Statistics/Data Analysis and Software Economics) from the University of Southern California.

T4: Data Science Approaches in Software Engineering

Date

Period

Tutorial Outline

Abstract

Speaker

Pete Rotella USA

Sunita Chulani USA

Sponsored by

Conference patron