You are on page 1of 99

1

STAR: STACK TRACE BASED


AUTOMATIC CRASH REPRODUCTION
PhD Thesis Defence

Ning Chen
Advisor: Sunghun Kim
November 05, 2013

Outline
1. Motivation & Related Work
2. Approaches of STAR
1) Crash Precondition Computation
2) Input Model Generation
3) Test Input Generation
3. Evaluation Study
4. Challenges & Future Work
5. Contributions

Motivation
Failure reproduction is a difficult and time consuming task.

But it is necessary for fixing the corresponding bug.

For example: https://issues.apache.org/jira/browse/COLLECTIONS-70


Have not been fixed for five months due to difficulties in
reproducing the bug.
After a test case was submit, it was soon fixed with a comment:

As always, a good test case makes all the difference.

Problem Statement
The intention of this research is to propose a stack trace based

automatic crash reproduction framework, which is efficient and


applicable to real world object-oriented programs.
Sub-problem 1:

Propose an efficient crash precondition computation approach which


is applicable to non-trivial real world programs.
Sub-problem 2:

Propose a novel method sequence composition approach which can


generate crash reproducible test cases for object-oriented programs.

Contributions
Study the scalability challenge of automatic crash reproduction, and

propose approaches to improve its efficiency.


Study the object creation challenge for reproducing object-oriented

crashes, and propose a novel method sequence composition


approach to address it.
A novel framework, STAR, which combines the proposed approaches

to achieve automatic crash reproduction using only crash stack trace.


A detailed empirical evaluation to investigate the usefulness of STAR.

Related Work

Related Work
Record-and-replay approaches:
Jrapture, 2000,
BugNet, 2005,
ReCrash/ReCrashJ, 2008
LEAP/LEAN, 2010
Post-failure-process approaches:
Microsoft PSE, 2004
IBM SnuggleBug, 2009
XyLem, 2009
ESD, 2010
BugRedux, 2012

Record-and-replay Approaches
Approach:
Monitoring Phase: Captures/Stores runtime heap & stack objects.
Test Generation Phase: Generates tests that loads the correct

objects with the crashed methods.


Original Program Execution
Store from heap & stack
Stored
Objects
Load as crashed method params
Recreated Test Case

Record-and-replay Approaches
Frameworks

Instrumenta
tion

Data Collections

Memory
Overhead

Performance
Overhead

Jrapture00

Required

All Interactions

N/A

N/A

BugNet05

Required /
Hardware

All Inputs/
Executed Code

N/A

N/A

ReCrash08

Required

Stack Objects

7% - 90%

31% - 60%

LEAP10

Required

SPE Access /
Thread Info

N/A

7% - 600%

Limitations:
Require up-front instrumentations or special hardware deployment.
Collect client-side data, which may raise privacy concern. [Clause

et. al, 2010]


Non-trivial memory and runtime overheads.

10

Post-failure-process Approaches
Perform analyses on crashes only after they have

occurred.
Advantages
Usually do not record runtime data.
Incur no or very little performance overhead.

11

Post-failure-process Approaches
Crash Explanation Approaches
Microsoft PSE [Manevich et. al, 2004]
IBM SnuggleBug [Chandra et. al, 2009]
XyLem [Nanda et. al, 2009]

Assist crash debugging by providing hints on the target

crashes:
Potential crash traces
Potential crash conditions

Could not reproduce the target crashes.

12

Post-failure-process Approaches
Crash Reproduction Approaches
Core dump-based Approaches
Cdd [Leitner et. al, 2009]
RECORE [Roler et. al, 2013]

Symbolic execution-based approaches


ESD [Zamfir et. al, 2009]
BugRedux [Jin et. al, 2012]

Aims to reproduce crashes using only post-failure data

such as
Crash stack traces
Memory core dump at the time of the crash

13

Crash Reproduction Approaches


Core dump-based approaches
E.g. Cdd [Leitner et. al, 2009] and RECORE [Roler et. al, 2013]
Leverage the memory core dump and even some developer written

contracts to guide the crash reproduction process.

Advantage
Higher chance of reproducing a crash as more data is provided.

Limitations
Requires not just stack trace, but the entire memory core dump at

the time of the crash.


Less capable in reality due to the lack of memory core dump.

14

Crash Reproduction Approaches


Symbolic execution-based approaches
E.g. ESD [Zamfir et. al, 2009] and
BugRedux [Jin et. al, 2012]
Perform symbolic execution-based analysis to identify crash paths

and generate crash reproducible test cases.

15

Crash Reproduction Approaches


Advantages:
Use only crash stack trace to achieve crash reproduction.
No runtime overhead is incurred at client-side.

Limitations:
Existing approaches rely on forward symbolic executions to

compute crash preconditions, which is less efficient.


Could not be fully optimized due to the nature of forward symbolic

execution.
Could not reproduce non-trivial crashes from object-oriented

programs due to the object-creation challenge.

16

Crash Reproduction Approaches


STAR: Stack Traced based Automatic crash Reproduction
Advantages:
Approaches

Limitations

Advantages of STAR

Record-replay

Data collection

No runtime data collection

Record-replay

Performance overhead

No performance overhead

Core dump
based

Memory Core dump and


developer written contracts

Crash stack trace

Symbolic.
Exec.-based

Lack of optimizations

Symbolic
Exec.-based

Lack of support for objectoriented programs

Optimizations to greatly improve


the crash reproduction process.
Capable of reproducing non-trivial
crashes for object-oriented programs.

17

Overview of STAR
1

stack trace

Crash Precondition Computation

Crash
Preconditions

Input Model Generation

program
Crash Models
test
cases
3

Test Input Generation

Crash Precondition Computation

19

Crash Precondition Computation

Crash Precondition Computation


1

stack trace

Crash Precondition Computation

Crash
Preconditions

Input Model Generation

program
Crash Models
test
cases
3

Test Input Generation

Crash Precondition Computation

Crash Precondition Computation


Crash Precondition
the conditions of inputs at a method entry that can trigger the
crash.
It specifies in what kind of memory state can the crash be

reproduced.

20

Crash Precondition Computation

21

Crash Precondition Computation


Existing approaches such as ESD and BugRedux use forward

symbolic executions to compute the crash preconditions.


Program is executed in the same direction as normal executions.
Inputs and variables are represented as symbolic values instead of

concrete values.
Limitations of forward symbolic execution
Non-demand-driven: Need to execute many paths not related to

crash
Limited optimization: Difficult perform optimizations using the
crash information

Crash Precondition Computation

22

Crash Precondition Computation


STAR performs a backward symbolic execution to compute the

crash precondition.
Program is executed from crash location to method entry.

Advantages of backward symbolic execution


Demand-driven: Only paths related to the crash are executed.
Optimizations: Optimizations can be performed using the crash

information.

Crash Precondition Computation

23

Backward Symbolic Execution


Given a program P, a crash location L and the crash condition C

at L, we execute P from L to a method entry with C as the initial


crash precondition.
The precondition is updated along the execution path according

to the executed statements.


E.g. int var3 = var1 + var2;

-> all occurrences of var3 are replaced by var1 + var2


E.g. if (var1 != null)

-> Coming from true branch: var1 != null is added to precondition


-> Coming from false branch: var1 == null is added to precondition

The preconditions at method entries are save as the final crash

preconditions.

24

Crash Precondition Computation

Backward Symbolic Execution


Precondition

Method
Entry

If (i < buffer.length)

T
buffer[i] = 0;

Symbolic Execution

int i = this.last;

{buffer != null}
{last < 0 or last >=
buffer.length}
{last < buffer.length}

{buffer != null}
{i < 0 or i >= buffer.length}
{i < buffer.length}

{buffer != null}
{i < 0 or i >= buffer.length}

TRUE

AIOBE

25

Crash Precondition Computation

Challenge Path explosion


isDebugging()

print()

debugLog()

buffer = new int[16]

index >= buffer.length

i=0

i = index

buffer[i] = 0

AIOBE

Crash Precondition Computation

Optimizations
STAR introduces three different approaches to improve

crash precondition computation process:


Static Path Reduction
Heuristic backtracking
Early detection of inner contradictions

26

Crash Precondition Computation

27

Static Path Reduction


Observation:
Only a subset of the conditional branches and method calls
contribute to the target crash.
E.g. Methods that perform runtime logging can be safely skipped
E.g. Branches which do not modify the crash related variables can

be safely skipped.

Optimization:
STAR detects and skips branches or method calls that do not
contribute to the target crash during symbolic execution.

28

Crash Precondition Computation

Static Path Reduction


isDebugging()

print()

debugLog()

buffer = new int[16]

index >= buffer.length

i=0

i = index

buffer[i] = 0

AIOBE

method isDebugging() does


not contribute to the crash

29

Crash Precondition Computation

Static Path Reduction


isDebugging()

print()

debugLog()

buffer = new int[16]

index >= buffer.length

i=0

i = index

buffer[i] = 0

AIOBE

the conditional branch does not


contribute to the crash as well.

30

Crash Precondition Computation

Static Path Reduction


isDebugging()

print()

debugLog()

buffer = new int[16]

index >= buffer.length

i=0

i = index

buffer[i] = 0

AIOBE

STAR can detect and skip


over methods and branches
not contributing to the crash

Crash Precondition Computation

31

Static Path Reduction


A conditional branch or a method call is contributive to the

crash if:
It can modify any stack location referenced in the current crash

precondition formula.
It can modify any heap location referenced in the current crash

precondition formula.

However, in backward execution, the actual heap

locations may not be decidable until they are explicitly


defined.

Crash Precondition Computation

32

Static Path Reduction


For any reference whose heap location cannot be decide:
Compare whether the modified heap location and the reference

has compatible data types.


Compare whether the modified heap location and the reference

has the same field name (exception array)


If both of the above criterion are satisfied, the heap locations are

considered the same.

In Java, the same heap location can only be accessed

through the same field name, except for array fields.

Crash Precondition Computation

33

Heuristic Backtracking
Observation:
Backtracking execution to the most recent branching point is likely
inefficient, as the contradictions are usually introduced much earlier.

Optimization:
STAR can efficiently backtrack to the most relevant branches where
contradictions may still be avoided.

34

Crash Precondition Computation

Heuristic Backtracking
An executed path is not satisfiable
according to the SMT solver.

isDebugging()

print()

debugLog()

buffer = new int[16]

index >= buffer.length

i=0

i = index

buffer[i] = 0

AIOBE

35

Crash Precondition Computation

Heuristic Backtracking
Typical backtracking is not
efficient.

isDebugging()

print()

debugLog()

buffer = new int[16]

index >= buffer.length

i=0

i = index

buffer[i] = 0

AIOBE

36

Crash Precondition Computation

Heuristic Backtracking
STAR can quickly backtrack to
the most relevant branches

isDebugging()

print()

debugLog()

buffer = new int[16]

index >= buffer.length

i=0

i = index

buffer[i] = 0

AIOBE

Crash Precondition Computation

37

Heuristic Backtracking
The unsatisfiable core of the last unsatisfied path

conditions.
A subset of the path conditions which are still unsatisfied by

themselves

A branching point is considered relevant to the last

unsatisfaction and will be backtracked to only if:


A condition in the unsatisfiable core was added in this branch, or
A variables concrete value in the unsatisfiable core was decided in

this branch, or
A variables actual heap location in the unsatisfiable core was

decided in this branch.

38

Crash Precondition Computation

Inner Contradiction Detection


isDebugging()

print()

debugLog()

buffer = new int[16]

index >= buffer.length

i=0

i = index

buffer[i] = 0

AIOBE

STAR quickly discovers innercontradictions in the current


precondition during execution.

39

Crash Precondition Computation

Inner Contradiction Detection


isDebugging()

STAR quickly discovers innercontradictions in the current


precondition during execution.

print()

debugLog()

buffer = new int[16]

index >= buffer.length

i=0

i = index

buffer[i] = 0

AIOBE

Crash Precondition:
index < 0 or index >= 16
Index < 16

Crash Precondition Computation

40

Other Details
Loops and recursive calls
Options for the maximum loop unrollment and maximum recursive call

depth

Call graph construction


User can specify a pointer analysis algorithm to use
Option for maximum call targets

String operations
Strings are treated as arrays of characters.
Complex string operations/regular expressions are not support: require

the usage of more specialized constraint solvers: Z3-str, HAMPI

Input Model Generation

42

Input Model Generation

Input Model Generation


1

stack trace

Crash Precondition Computation

Crash
Preconditions

Input Model Generation

program
Crash Models
test
cases
3

Test Input Generation

Input Model Generation

43

Input Model Generation


After computing the crash precondition, we need to

compute a model (object state) which satisfies this


precondition.
However, for one precondition, there could be many

models that can satisfy it.


E.g. For precondition: {ArrayList.size != 0}, there could be infinite

number of models satisfying it.

Input Model Generation

44

Generating Feasible Input Models


Object Creation Challenge [Xiao et. al, 2011]
Not every model satisfying a precondition is feasible to be
generated.
For precondition: ArrayList.size != 0, an input model: ArrayList.size

== -1 can satisfy it, but such object can never be generated.

Therefore, we want to obtain input models whose objects

are actually feasible to generate.

45

Input Model Generation

Generating Practical Input Models


For different input models, the difficulties in generating the

corresponding objects can be very different.


Model 1:
ArrayList.size == 100

Model 2:
ArrayList.size == 1

Requires add() 100 times

Requires add() 1 time

Therefore, we also want to obtain input models whose

values are as close to the initial values as possible.

Input Model Generation

46

Class Information
STAR has an input model generation approach that can
Generate feasible models
Generate practical models
Extracts and uses the class semantic information to

guide the input model generation process.


The initial value for each class member field.
The potential value range for each numerical field:
e.g. ArrayList.size >= 0

47

Input Model Generation

Input Model Generation


Crash Precondition
ArrayList.size !
=0

Class Information

Value Range

ArrayList.size
>= 0

Initial Value

ArrayList.size
starts from 0

SMT
Solver

ArrayList.size == 1

A feasible and practical model

Test Input Generation

49

Test Input Generation

Test Input Generation


1

stack trace

Crash Precondition Computation

Crash
Preconditions

Input Model Generation

program
Crash Models
test
cases
3

Test Input Generation

Test Input Generation

50

Test Input Generation


Given a crashing model, it is necessary to generate test

inputs that can satisfy it.


However, it could be challenging to generate object test

inputs [Xiao et. al, 2011]


Non-public fields are not assignable
Class invariants are easily broken if generate using reflection.

A legitimate method sequence that can create and

mutate an object to satisfy the target model (target object


state).

Test Input Generation

51

Test Input Generation


Randomized techniques
Randoop [Pacheco et. al, 2007]

Dynamic analysis
Palulu [Artzi et. al, 2009]
Palus [Zhang et. al, 2011]

Codebase mining
MSeqGen [Thummalapenta et. al, 2009]

Not efficient as their input generation process are not demand-

driven, and may rely on existing code bases.

Test Input Generation

Test Input Generation


STAR proposes a novel demand-driven test input

generation approach.

52

Test Input Generation

Summary Extraction

Forward symbolic execution to obtain the summary of

each method.

53

Test Input Generation

54

Summary Extraction
Summary of a method
the collection of the summaries of its individual paths.
Summary of a method path: , where
: the path conditions represented as a conjunction of constraints
over the method inputs (heap locations read by the method)
: postcondition of the path represented as a conjunction of

constraints over the method outputs (heap locations written by the


method)
Essentially, it is the final effect of this method path.

55

Test Input Generation

Summary Extraction
Method
Entry

We perform a forward symbolic


execution to the target method.

obj != null

Path Effect

list[size] = obj

e = new
Exception()

size += 1

throw e

Path Condition

Path 1
obj != null
list[size] = obj
size += 1

Path 2
Method
Exit

obj == null
throw new
Exception

Test Input Generation

56

Method Sequence Deduction

STAR introduced a deductive-style approach to construct

method sequences that can achieve the target object state

57

Test Input Generation

Method Sequence Deduction


Given a target object state , the path summaries for each method, the

approach finds a method sequence that can produces an object


satisfying in a recursive deduction.
Recursive deduction for parameter

Deductive
Engine

Input Parameters
Object States

satisfies

Candidate Method
Constraint
Solver

Method Path:

By taking this path, the target


object state can be achieved
.

Test Input Generation

Example
public class Container {
public Container()
public void add(Object);
public void remove(Object);
public void clear();
}
Desired object state (Input model): Container.size == 10

58

59

Test Input Generation

Example Summary Extraction


Container()

Path 1

clear()

TRUE

TRUE
remove all in list
size = 0

size = 0

add(obj)

Path 1

Path 2

obj != null

obj == null

list[size] = obj
size += 1

remove(obj)

Path 1

throw an
exception

Path 1

Path 2

obj in list

obj not in list

remove from list


size -= 1

No effect

60

Test Input Generation

Example Sequence Deduction


Method
Container.size
== 10

Deduction
Can add() produce target state?

Select add(obj)

Yes, this.size == 9 && obj != null

Deductive
Engine

Can clear() produce target state?


Container.size
== 9

Select clear()

No, not satisfiable


Constraint
Solver

61

Test Input Generation

Example Sequence Deduction


Method
Container.size
== 10

Deduction
Can add() produce target state?

Select add(obj)

Yes, this.size == 9 && obj != null

Deductive
Engine

Can add() produce target state?


Container.size
== 9

Select add(obj)

Yes, this.size == 8 && obj != null


Constraint
Solver

Can Contaier() produce target state?


Container.size
== 0

Select
Container()

Yes, no parameter requirement

Test Input Generation

62

Example Final Sequence


Combine in reverse direction to form the whole sequence

void sequence() {
Container container = new Container();
Object o1 = new Object();
container.add(o1);
(10 times)
}

Test Input Generation

63

Other Details
The forward symbolic execution in method summary extraction

follows similar settings as precondition computation


E.g. Loops and recursive calls are expanded for only limited

times/depth. (So the extracted path summary total method paths)


The incompleteness of method path summary does not affect

the precision of the method sequence composition.


Generated method sequences are still correct.
Method sequences may not be generated due to missing path summary.

Optimizations have been applied to reduce the number of

methods and method paths to examine.

Evaluation

65

Research Questions
Research Question 1
How many crashes can STAR compute their crash triggering
preconditions?
Research Question 2
How many crashes can STAR reproduce based on the crash
triggering preconditions?
Research Question 3
How many crash reproductions by STAR are useful for revealing the
actual cause of the crashes?

66

Evaluation Setup
Subjects:
Apache-Commons-Collection (ACC):

data container library that implements additional data structures over


JDK. 60kLOC.
Ant (ANT)

Java build tool that supports a number of built-in and extension tasks such
as compile, test and run Java applications. 100kLOC.
Log4j (LOG)

logging package for printing log output to different local and remote
destinations. 20kLOC.

67

Evaluation Setup
Crash Report Collection:
Collect from the issue tracking system of each subject.
Only confirmed and fixed crashes were collected.
Crashes with no or incorrect stack trace information were discarded.
Three major types of crashes: custom thrown exceptions, NPE and AIOBE. (covers

80% of crashes, Nam et. al, 2009)

Subject

# of Crashes

Versions

Avg. Fix Time

Report Period

ACC

12

2.0 4.0

42 days

Oct. 03 Jun. 12

ANT

21

1.6.1 1.8.3

25 days

Apr. 04 Aug. 12

LOG

19

1.0.0 1.2.16

77 days

Jan. 01 Oct. 09

52 crashes were obtained from the three subjects.

68

Evaluation Setup
Our evaluation study has the largest number of crashes

compared to previous studies


Subject

Number of Crashes

RECRASH

11

ESD

BugRedux

17

RECORE

STAR

52

69

Research Question 1
How many crashes can STAR compute their crash

preconditions?
How many crashes can STAR compute crash precondition without

the optimization approaches.


How many crashes can STAR compute crash precondition with the

optimization approaches.

We applied STAR to compute the preconditions for each

crash.

70

Research Question 1
Percentage of crashes whose preconditions were computed by STAR
Crashes with preconditions (%)

80

71.4

70 66.7
60

+57.1

75
+36.9

73.7

73.1

+38.5

50
36.8

40

34.6

30
20

14.3

10
0
Without Optimizations

ACC

ANT

LOG

With Optimizations

Overall

71

Research Question 1
Average time to compute the crash preconditions (The lower the better)
Average time spent (second)

100
90.4
90
80
70
59.3
55.1
60
50
40
30
18.5
20
10
4.9
3.3
2.4
2.1
0
ACC
ANT
LOG Overall

Without Optimizations

With Optimizations

72

Research Question 1
Percentage of crashes whose preconditions were computed by STAR
Break down by each optimization
Crashes with preconditions (%)

80

75

75

73.7

71.4

70 66.7 66.766.7

73.1

60
50

47.4
42.1
36.8 36.8

40
30

44.2
34.6

38.5
36.5

23.8
23.8

20

14.3

14.3

10
0
No Optimization
Heuristic Backtracking
All Optimizations

ACC

ANT

LOG

Overall

Static Path Reduction


Contradiction Detect

73

Research Question 1
STAR successfully computed crash preconditions for 38

(73.1%) out of the 52 crashes.


STARs optimization approaches have significantly

improved the overall result by 20 (38.5%) crashes.


Static path reduction is the most effective optimization, but

the application of all three optimizations together can


achieve a much higher improvement.

74

Research Question 2
How many crashes can STAR reproduce based on the

crash preconditions?
Criterion of Reproduction [ReCrash, 2008]
A crash is considered reproduced if the generated test case can
trigger the same type of exception at the same crash line.
We applied STAR to generate crash reproducible test

cases for each computed crash precondition.

75

Research Question 2
Overall crash reproductions achieved by STAR for each

subject:
Subject

# of Crashes

# of
Precondition

# of
Reproduced

Ratio

ACC

12

66.7%
(88.9%)

ANT

21

15

12

57.1%
(80.0%)

LOG

19

14

11

57.9%
(78.6%)

Total

52

38

31

59.6%
(81.6%)

76

Research Question 2
More statistics for the test case generation process by

STAR

Subject

Average # of
Objects

Avg. Candidate
Methods

Min Max
Sequence

Average
Sequence

ACC

1.5

35.5

2 - 19

9.4

ANT

1.4

11.7

2 - 14

6.2

LOG

1.5

21.8

2 - 17

8.1

Total

1.5

21.4

2 - 19

7.7

77

Research Question 3
Criterion of Reproduction does not require a crash

reproduction to match the complete stack trace frames.


A partial match of only the top stack frames is still considered as a

valid reproduction of the target crash according to the criterion.

The root causes of more than 60% of crashes lie in the

top three stack frames [Schroter et. al, 2010]


It is not necessary to reproduce the complete stack trace to reveal

the root cause of a crash.

78

Research Question 3
Drawbacks of Criterion of Reproduction
The crash reproduction may not be the same crash.
The crash reproduction may not be useful for revealing the crash

triggering bug.

Reproduced

Buggy frame

79

Research Question 3
How many crash reproductions by STAR are useful for

revealing the actual causes of the crashes?


Criterion of useful crash reproduction
A crash reproduction is considered useful if it can trigger the same
incorrect behaviors at the buggy location, and eventually causes the
crash to re-appear.
We manually examined the original and fixed versions of

the program to identify the actual buggy location for each


crash.

80

Research Question 3
Overall useful crash reproductions achieved by STAR for

each subject:
Subject

# of Reproduced

# of Useful

Ratio (Total)

ACC

87.5% (58.3%)

ANT

12

58.3% (33.3%)

LOG

11

72.7% (42.1%)

Total

31

22

71.0% (42.3%)

81

Comparison Study
We compared STAR with two different crash reproduction

frameworks:
Randoop: feedback-directed test input generation framework. It is

capable of generating thousands of test inputs that may reproduce the


target crashes.
Maximum of 1000 seconds to generate test cases. (10 times of STAR)
Manually provide the crash related class list to increase its probabilities.

BugRedux: a state-of-the-art crash reproduction framework. It can

compute crash preconditions and generate crash reproducible test


cases.

We apply the two frameworks to the same set of crashes

used in our evaluation.

82

Comparison Study
The number of crashes reproduced by the three approaches
40

38

35
Number of Crashes

31

30
25

22

20

18

15

12

10

10

5
0

Precondition
0
Reproduction

Randoop

BugRedux

STAR

Usefulness

83

Comparison Study
STAR

Randoop
12 crashes

BugRedux
5 crashes

10 crashes

84

Comparison Study
STAR outperformed Randoop because:
Randoop uses a randomized search technique to generate method
sequences. Can generate many method sequences but not guided.
Due to the large search space of real world programs, the

probabilities to generate crash reproducible sequences are low.

STAR outperformed BugRedux because:


Several effective optimizations to improve the efficiency of the

crash precondition computation process.


A method sequence composition approach that can generate

complex input objects satisfying the crash preconditions.

85

Case Study
https://issues.apache.org/jira/browse/collections-411
An IndexOutOfBoundsException could be raised in method

ListOrderedMap.putAll() due to incorrect index increment.


01 public void putAll(int index, Map map) {
02
for (Map.Entry entry : map.entrySet()) {
03
put(index, entry.getKey(), entry.getValue();
04
++index; / / buggy increment
05
}
06 }

This bug was soon fixed by the developers by adding checkers

to make sure index is incremented only in certain cases.

86

Case Study
STAR was applied to generate a crash reproducible test case

for this crash:


Surprisingly, it successfully generated a test case that could crash both the

original and fixed (latest) version of the program.

We reported this potential issue discovered by STAR to the

project developers
https://issues.apache.org/jira/browse/collections-474
We also attached the auto-generated test case by STAR in our bug

report.

87

Case Study
Developers quickly confirmed:
The original patch for bug ACC-411 was actually incomplete. It

missed a corner case that can still crash the program.


Neither the developers nor the original bug reporter identified this

corner case in over a year.


It only took developers a few hours to confirmed and fixed the bug

after STARs test case demonstrated this corner case.


The crash reproducible test case by STAR was added to the

official test suite of the Apache Commons Collections project by


the developers.
http://svn.apache.org/r1496168

88

Case Study
STAR is capable of identifying and reproducing crashes that

are even difficult for experienced developers.


STAR can be used to confirm the completeness of bug fixes.
If a bug fix is incomplete, STAR may generate a crash reproducible

test case to demonstrate the missing corner case.

Challenges & Future Work

90

Challenges
We manually examined each not reproduced crashes to identify the

major challenges of reproduction:


Environment dependency (36.7%)
File input.
Network input.

SMT Solver Limitation (23.3%)


Complex string constraints (e.g. regular expressions)
Non-linear arithmetic

Concurrency & Non-determinism (16.7%)


Some crashes are only reproducible non-deterministically or under concurrent

execution.
Path Explosion (6.7%)

91

Future Work
Improving reproducibility
Support for environment simulation, e.g. file inputs
Incorporate specialized SMT solver: string solver like Z3-str

Automatic fault localization


Existing fault localization approaches requires both passing and failing

test cases locate faulty statements.


STARs ability to generate failing test cases can help automate the fault

localization process.

Crash reproduction for mobile applications


Android applications are similar to desktop Java programs in many

aspects.

92

Conclusions
We proposed STAR, an automatic crash reproduction

framework using stack trace.


Successfully reproduced 31 (59.6%) out of 52 real world crashes

from three non-trivial programs.


The reproduced crashes can effectively help developers reveal the

underlying crash triggering bugs, or even identify unknown bug.


A comparison study demonstrates that STAR can significantly

outperform existing crash reproduction approaches.

Thank You!

Appendix

95

Subject Sizes
Our evaluation study has one of the largest subject size

compared to previous studies


Subject

Subject Sizes

Average Subject Size

RECRASH

200 86,000

47,000

ESD

100 100,000

N/A

BugRedux

500 241,000

27,000

RECORE

68 62,000

35,000

STAR

20,000 100,000

60,000

96

Research Question 1
Average time to compute the crash preconditions (The lower the better)
Break down by each optimization
Average time spent (second)

100
90
80
70
60
50
40
30
20 18.511.815.913.8
10
2.1
0
ACC

No Optimization
Heuristic Backtracking
All Optimizations

90.4

86.8
74.8

67.5
59.3

55.1
48.2
47.8

54.3
50

39.2
28.3

4.9

ANT

3.3

2.4

LOG

Overall

Static Path Reduction


Contradiction Detect

97

Comparison Study
Average time to reproduce crashes (The lower the better)
Only the common reproductions
35
Average time spent (second)
29.9
30
25
20
15

10.8

10

4.283.75

5 2.4 2.3
0
BugRedux

ACC
STAR

8.7

ANT

LOG

4.6
Overall

98

User Survey
Survey Sent

Responses

Confirmed
Correctness

Confirmed
Usefulness

31

6 (19%)

ACC-53

The auto-generated test case would reproduce the bug. . . I


think that having such a test case would have been useful.

Comparison Study
Branch coverage achieved by different test case generation approaches
80
Branch Coverage (%)

74

69

70

58

60

61
54

54

50
40

40
29

30
20

16

29
19

36

30

22

20 22

12

10
0

ACC

Sample Execution
RecGen

JSAP
Randoop
Palus

Palulu
STAR

0
SAT4J

You might also like