UNDERSTANDING THE DEVELOPER PARTICIPATION IN BUG FIX PROCESS

Prediction of the bug fix time in open source softwares is a challenging job. A software bug consists of many attributes that define the characteristics of the bug. Some of the attributes get filled at the time of reporting and some are at the time of bug fixing. In this paper, 836 bug reports of two products namely Thunderbird and Webtools of Mozilla open source project have been considered. In bug report, we see that there is no linear relationship among the bug attributes namely bug fix time, developers, cc count and severity. This paper has analyzed the interdependence among these attributes through graphical representation. The results conclude that : Case 1. 73% of bugs reported for Webtools are fixed by 17% developers and 61% of bugs are fixed by 14% developers for Thundebird. Case 2. We tried to find a relationship between the time taken by a developer in fixing a bug and the corresponding developer. We also observed that there is a significant variation in bug fixing process, bugs may take 1 day to 4 years in fixing. Case 3 . There is no linear relationship between cc count i.e. manpower involved in bug fixing process and bug fix time. Case 4 . Maximum number of developers are involved in fixing bugs for major severity class.


INTRODUCTION
In open source software development process, software bug repositories provide crucial information to users/developers for the success of open source projects. An open source bug repository is the collection of bug reports that is available to the users/developers. The open source software provides source code of the software for further development and enhancement of the software. The data varies from versions to versions, change log data, web usage data, version archives, discussion forums on bug reports etc. The failure data is also maintained using different bug reporting and tracking system. The reported bugs contains many attributes such as severity, priority, components, operating system used, summary, description of the reports and status updates of the bug reports as time series. This data is very useful in conducting the research on software reliability, finding developer expertise, quality of software, resource utilization, effort, cost and time estimation, duplicate detection, dependency analysis, bug prediction, impact analysis, guiding co-change analysis, change prediction, and many more. To perform these analyses, we require the access to software repositories and analyze it which is called mining software repositories (MSR) [11].
A software bug has many attributes which are used to measure the quality and performance of the software. In this paper, we have considered the bug reports of two products of Mozilla open source software project. We have taken 4 quantified attributes namely bug fix time, cc count, developer id, and severity.
A software bug report is characterized by the following attributes [7].

Bug Id
The unique numeric id of a bug.

Priority
This field describes the importance and order in which a bug should be fixed compared to other bugs. P1 is considered the highest and P5 is the lowest.

Resolution
The resolution field indicates what happened to this bug. e.g. FIXED

Status
The Status field indicates the current state of a bug. e.g. NEW, RESOLVED # Comments Bugs have comments added to them by users. Number of comments made to a bug report.
Create Date When the bug was reported.

Dependencies
If this bug cannot be fixed unless other bugs are fixed (depends on), or this bug stops other bugs being fixed (blocks), their numbers are recorded here.

Summary
A one-sentence summary of the problem.
Date of Close When the bug was closed.

Keywords
The administrator can define keywords which you can use to tag and categorize bugs -e.g. The Mozilla Project has keywords like crash and regression.

Version
The version field defines the version of the software the bug was found in.
Cc Count A list of people involved directly or indirectly in bug fix process. Who get mail when the bug changes.
Platform and OS These indicate the computing environment where the bug was found. The rest of the paper is organized as follows. Section 2 of the paper describes the description of datasets. Results have been presented in section 3. Section 4 presents the related work and finally the paper is concluded in section 5.

DESCRIPTION OF DATASETS
We have taken bug reports of two products: Thunderbird ( Client Software) for the period of april 2000 to march 2013 and Webtools (Server Software) for period of october 1998 to august 2013 of Mozilla open source software project. We Considered 221 bug reports of Thunderbird and 615 bug reports of Webtools . We collected bug reports for resolution "fixed" and status "verified", "resolved" and "closed". Some of the bug attributes are quantitative and some of them are qualitative in nature. So the qualitative bug attributes such as bug severity needs to quantify. We take from 1 to 7 for blocker to enhancement severity levels. F e b r u a r y 2 7 , 2 0 1 4 In this paper, we have taken 4 quantified bug attributes: bug severity, bug fix time, developer id, and cc count.
Bug severity. Bug severity is the degree of impact of the bug on the functionality of the software or product. In Mozilla open source software project seven different severity levels are defined.
Developer Id. A developer plays a major role in the software development process. If any bug is reported, it must be fixed by some developer to improve the development and performance of the software. Here, we assigned a number (1 to n) to each developer to do the analysis.
Cc Count. Manpower involved in monitoring the progress of bug fixing process.
Bug fix time. The time taken by a bug to get fixed, (Last resolved time -Opened Time).

RESULTS AND ANALYSIS
We considered three main cases to analyze the relationships between attributes of a bug.
In case 1 we show the distribution of bug count for developers who participated in fixing the bug , to analyse which developer have highest participation to fix the bug.
In case 2 we have taken first 25 developers and show the distribution of bug fix time for each developer.
In case 3 we show the distribution of Cc Count for each Bug fix time .
In case 4 . Maximum number of developers are involved in fixing bugs for major severity class.

Case 1 :
In this case we have taken 615 bug reports from Webtools product and we saw that 83 developers are involved in fixing of these bugs. But when we take those developer id's who fix more than 10 bugs , then the count of bug reports reduced 615 to 453 and the corresponding number of developers involved in fixing these bugs reduced from 83 to 14. This shows that 73 % of bugs are fixed by only 17% developers as shown in the figure 1.
Similar analysis we have done for Thunderbird product, we have taken 221 bug reports and saw that 49 developers are involved in fixing these bugs. But we take those developer id's who fix more than 10 bugs , then the count of bug reports reduced 221 to 134 and the corresponding number of developers involved in fixing these bugs reduced 49 to 7. This shows that 61% of bugs are fixed by only 14% developers as shown in the figure 2.

Case 2:
We have analyzed the distribution of bug fix time for each developer and seen that it is very difficult to predict that how much time a bug will take to fix at the time of reporting. There is a significant variation in bug fixing process, bugs may take 1 day to 4 years in fixing.

Case 4:
Here we computed how many developers are involved to fix in each bug severity and how many bugs are their in each severity .It has been shown in figure 4 and 5 the maximun developer involves to fix the most severe bug i.e major(4) and maximum number of bugs also lie in it.

RELATED WORK
Bhattacharya and Neamtiu [9] proposed an idea of reducing tossing path lengths of a bug to 1.5-2 tosses for most bugs, which represents a reduction of up to 86.31% compared to original tossing paths. This reduction in tossing path length improved triaging accuracy and got 83.62% prediction accuracy in bug triaging.. They validated the approach on 856,259 bug reports of two software projects, Mozilla and Eclipse and 21 cumulative years of development . They have shown how intra-fold updates are beneficial for achieving higher prediction accuracy in bug triaging when using classifiers in isolation.
Bhattacharya and Neamtiu [3] used multivariate and univariate regression testing to test the prediction capability of existing models on 512,474 bug reports from five open source projects: Eclipse, Chrome and three products from the Mozilla project -Firefox, Seamonkey and Thunderbird. They have shown that the predictive power of existing models is between 30% and 49% so a room for more independent attributes is available. They demonstrate that, the bug-fix time in open source projects is not influenced by the bug-opener's reputation. They proposed that various bug report attributes which have been previously used to build bug-fix time prediction models do not always correlate with bug-fix time.