Professional Documents
Culture Documents
Ning Chen
Advisor: Sunghun Kim
November 05, 2013
Outline
1. Motivation & Related Work
2. Approaches of STAR
1) Crash Precondition Computation
2) Input Model Generation
3) Test Input Generation
3. Evaluation Study
4. Challenges & Future Work
5. Contributions
Motivation
Failure reproduction is a difficult and time consuming task.
Problem Statement
The intention of this research is to propose a stack trace based
Contributions
Study the scalability challenge of automatic crash reproduction, and
Related Work
Related Work
Record-and-replay approaches:
Jrapture, 2000,
BugNet, 2005,
ReCrash/ReCrashJ, 2008
LEAP/LEAN, 2010
Post-failure-process approaches:
Microsoft PSE, 2004
IBM SnuggleBug, 2009
XyLem, 2009
ESD, 2010
BugRedux, 2012
Record-and-replay Approaches
Approach:
Monitoring Phase: Captures/Stores runtime heap & stack objects.
Test Generation Phase: Generates tests that loads the correct
Record-and-replay Approaches
Frameworks
Instrumenta
tion
Data Collections
Memory
Overhead
Performance
Overhead
Jrapture00
Required
All Interactions
N/A
N/A
BugNet05
Required /
Hardware
All Inputs/
Executed Code
N/A
N/A
ReCrash08
Required
Stack Objects
7% - 90%
31% - 60%
LEAP10
Required
SPE Access /
Thread Info
N/A
7% - 600%
Limitations:
Require up-front instrumentations or special hardware deployment.
Collect client-side data, which may raise privacy concern. [Clause
10
Post-failure-process Approaches
Perform analyses on crashes only after they have
occurred.
Advantages
Usually do not record runtime data.
Incur no or very little performance overhead.
11
Post-failure-process Approaches
Crash Explanation Approaches
Microsoft PSE [Manevich et. al, 2004]
IBM SnuggleBug [Chandra et. al, 2009]
XyLem [Nanda et. al, 2009]
crashes:
Potential crash traces
Potential crash conditions
12
Post-failure-process Approaches
Crash Reproduction Approaches
Core dump-based Approaches
Cdd [Leitner et. al, 2009]
RECORE [Roler et. al, 2013]
such as
Crash stack traces
Memory core dump at the time of the crash
13
Advantage
Higher chance of reproducing a crash as more data is provided.
Limitations
Requires not just stack trace, but the entire memory core dump at
14
15
Limitations:
Existing approaches rely on forward symbolic executions to
execution.
Could not reproduce non-trivial crashes from object-oriented
16
Limitations
Advantages of STAR
Record-replay
Data collection
Record-replay
Performance overhead
No performance overhead
Core dump
based
Symbolic.
Exec.-based
Lack of optimizations
Symbolic
Exec.-based
17
Overview of STAR
1
stack trace
Crash
Preconditions
program
Crash Models
test
cases
3
19
stack trace
Crash
Preconditions
program
Crash Models
test
cases
3
reproduced.
20
21
concrete values.
Limitations of forward symbolic execution
Non-demand-driven: Need to execute many paths not related to
crash
Limited optimization: Difficult perform optimizations using the
crash information
22
crash precondition.
Program is executed from crash location to method entry.
information.
23
preconditions.
24
Method
Entry
If (i < buffer.length)
T
buffer[i] = 0;
Symbolic Execution
int i = this.last;
{buffer != null}
{last < 0 or last >=
buffer.length}
{last < buffer.length}
{buffer != null}
{i < 0 or i >= buffer.length}
{i < buffer.length}
{buffer != null}
{i < 0 or i >= buffer.length}
TRUE
AIOBE
25
print()
debugLog()
i=0
i = index
buffer[i] = 0
AIOBE
Optimizations
STAR introduces three different approaches to improve
26
27
be safely skipped.
Optimization:
STAR detects and skips branches or method calls that do not
contribute to the target crash during symbolic execution.
28
print()
debugLog()
i=0
i = index
buffer[i] = 0
AIOBE
29
print()
debugLog()
i=0
i = index
buffer[i] = 0
AIOBE
30
print()
debugLog()
i=0
i = index
buffer[i] = 0
AIOBE
31
crash if:
It can modify any stack location referenced in the current crash
precondition formula.
It can modify any heap location referenced in the current crash
precondition formula.
32
33
Heuristic Backtracking
Observation:
Backtracking execution to the most recent branching point is likely
inefficient, as the contradictions are usually introduced much earlier.
Optimization:
STAR can efficiently backtrack to the most relevant branches where
contradictions may still be avoided.
34
Heuristic Backtracking
An executed path is not satisfiable
according to the SMT solver.
isDebugging()
print()
debugLog()
i=0
i = index
buffer[i] = 0
AIOBE
35
Heuristic Backtracking
Typical backtracking is not
efficient.
isDebugging()
print()
debugLog()
i=0
i = index
buffer[i] = 0
AIOBE
36
Heuristic Backtracking
STAR can quickly backtrack to
the most relevant branches
isDebugging()
print()
debugLog()
i=0
i = index
buffer[i] = 0
AIOBE
37
Heuristic Backtracking
The unsatisfiable core of the last unsatisfied path
conditions.
A subset of the path conditions which are still unsatisfied by
themselves
this branch, or
A variables actual heap location in the unsatisfiable core was
38
print()
debugLog()
i=0
i = index
buffer[i] = 0
AIOBE
39
print()
debugLog()
i=0
i = index
buffer[i] = 0
AIOBE
Crash Precondition:
index < 0 or index >= 16
Index < 16
40
Other Details
Loops and recursive calls
Options for the maximum loop unrollment and maximum recursive call
depth
String operations
Strings are treated as arrays of characters.
Complex string operations/regular expressions are not support: require
42
stack trace
Crash
Preconditions
program
Crash Models
test
cases
3
43
44
45
Model 2:
ArrayList.size == 1
46
Class Information
STAR has an input model generation approach that can
Generate feasible models
Generate practical models
Extracts and uses the class semantic information to
47
Class Information
Value Range
ArrayList.size
>= 0
Initial Value
ArrayList.size
starts from 0
SMT
Solver
ArrayList.size == 1
49
stack trace
Crash
Preconditions
program
Crash Models
test
cases
3
50
51
Dynamic analysis
Palulu [Artzi et. al, 2009]
Palus [Zhang et. al, 2011]
Codebase mining
MSeqGen [Thummalapenta et. al, 2009]
generation approach.
52
Summary Extraction
each method.
53
54
Summary Extraction
Summary of a method
the collection of the summaries of its individual paths.
Summary of a method path: , where
: the path conditions represented as a conjunction of constraints
over the method inputs (heap locations read by the method)
: postcondition of the path represented as a conjunction of
55
Summary Extraction
Method
Entry
obj != null
Path Effect
list[size] = obj
e = new
Exception()
size += 1
throw e
Path Condition
Path 1
obj != null
list[size] = obj
size += 1
Path 2
Method
Exit
obj == null
throw new
Exception
56
57
Deductive
Engine
Input Parameters
Object States
satisfies
Candidate Method
Constraint
Solver
Method Path:
Example
public class Container {
public Container()
public void add(Object);
public void remove(Object);
public void clear();
}
Desired object state (Input model): Container.size == 10
58
59
Path 1
clear()
TRUE
TRUE
remove all in list
size = 0
size = 0
add(obj)
Path 1
Path 2
obj != null
obj == null
list[size] = obj
size += 1
remove(obj)
Path 1
throw an
exception
Path 1
Path 2
obj in list
No effect
60
Deduction
Can add() produce target state?
Select add(obj)
Deductive
Engine
Select clear()
61
Deduction
Can add() produce target state?
Select add(obj)
Deductive
Engine
Select add(obj)
Select
Container()
62
void sequence() {
Container container = new Container();
Object o1 = new Object();
container.add(o1);
(10 times)
}
63
Other Details
The forward symbolic execution in method summary extraction
Evaluation
65
Research Questions
Research Question 1
How many crashes can STAR compute their crash triggering
preconditions?
Research Question 2
How many crashes can STAR reproduce based on the crash
triggering preconditions?
Research Question 3
How many crash reproductions by STAR are useful for revealing the
actual cause of the crashes?
66
Evaluation Setup
Subjects:
Apache-Commons-Collection (ACC):
Java build tool that supports a number of built-in and extension tasks such
as compile, test and run Java applications. 100kLOC.
Log4j (LOG)
logging package for printing log output to different local and remote
destinations. 20kLOC.
67
Evaluation Setup
Crash Report Collection:
Collect from the issue tracking system of each subject.
Only confirmed and fixed crashes were collected.
Crashes with no or incorrect stack trace information were discarded.
Three major types of crashes: custom thrown exceptions, NPE and AIOBE. (covers
Subject
# of Crashes
Versions
Report Period
ACC
12
2.0 4.0
42 days
Oct. 03 Jun. 12
ANT
21
1.6.1 1.8.3
25 days
Apr. 04 Aug. 12
LOG
19
1.0.0 1.2.16
77 days
Jan. 01 Oct. 09
68
Evaluation Setup
Our evaluation study has the largest number of crashes
Number of Crashes
RECRASH
11
ESD
BugRedux
17
RECORE
STAR
52
69
Research Question 1
How many crashes can STAR compute their crash
preconditions?
How many crashes can STAR compute crash precondition without
optimization approaches.
crash.
70
Research Question 1
Percentage of crashes whose preconditions were computed by STAR
Crashes with preconditions (%)
80
71.4
70 66.7
60
+57.1
75
+36.9
73.7
73.1
+38.5
50
36.8
40
34.6
30
20
14.3
10
0
Without Optimizations
ACC
ANT
LOG
With Optimizations
Overall
71
Research Question 1
Average time to compute the crash preconditions (The lower the better)
Average time spent (second)
100
90.4
90
80
70
59.3
55.1
60
50
40
30
18.5
20
10
4.9
3.3
2.4
2.1
0
ACC
ANT
LOG Overall
Without Optimizations
With Optimizations
72
Research Question 1
Percentage of crashes whose preconditions were computed by STAR
Break down by each optimization
Crashes with preconditions (%)
80
75
75
73.7
71.4
70 66.7 66.766.7
73.1
60
50
47.4
42.1
36.8 36.8
40
30
44.2
34.6
38.5
36.5
23.8
23.8
20
14.3
14.3
10
0
No Optimization
Heuristic Backtracking
All Optimizations
ACC
ANT
LOG
Overall
73
Research Question 1
STAR successfully computed crash preconditions for 38
74
Research Question 2
How many crashes can STAR reproduce based on the
crash preconditions?
Criterion of Reproduction [ReCrash, 2008]
A crash is considered reproduced if the generated test case can
trigger the same type of exception at the same crash line.
We applied STAR to generate crash reproducible test
75
Research Question 2
Overall crash reproductions achieved by STAR for each
subject:
Subject
# of Crashes
# of
Precondition
# of
Reproduced
Ratio
ACC
12
66.7%
(88.9%)
ANT
21
15
12
57.1%
(80.0%)
LOG
19
14
11
57.9%
(78.6%)
Total
52
38
31
59.6%
(81.6%)
76
Research Question 2
More statistics for the test case generation process by
STAR
Subject
Average # of
Objects
Avg. Candidate
Methods
Min Max
Sequence
Average
Sequence
ACC
1.5
35.5
2 - 19
9.4
ANT
1.4
11.7
2 - 14
6.2
LOG
1.5
21.8
2 - 17
8.1
Total
1.5
21.4
2 - 19
7.7
77
Research Question 3
Criterion of Reproduction does not require a crash
78
Research Question 3
Drawbacks of Criterion of Reproduction
The crash reproduction may not be the same crash.
The crash reproduction may not be useful for revealing the crash
triggering bug.
Reproduced
Buggy frame
79
Research Question 3
How many crash reproductions by STAR are useful for
80
Research Question 3
Overall useful crash reproductions achieved by STAR for
each subject:
Subject
# of Reproduced
# of Useful
Ratio (Total)
ACC
87.5% (58.3%)
ANT
12
58.3% (33.3%)
LOG
11
72.7% (42.1%)
Total
31
22
71.0% (42.3%)
81
Comparison Study
We compared STAR with two different crash reproduction
frameworks:
Randoop: feedback-directed test input generation framework. It is
82
Comparison Study
The number of crashes reproduced by the three approaches
40
38
35
Number of Crashes
31
30
25
22
20
18
15
12
10
10
5
0
Precondition
0
Reproduction
Randoop
BugRedux
STAR
Usefulness
83
Comparison Study
STAR
Randoop
12 crashes
BugRedux
5 crashes
10 crashes
84
Comparison Study
STAR outperformed Randoop because:
Randoop uses a randomized search technique to generate method
sequences. Can generate many method sequences but not guided.
Due to the large search space of real world programs, the
85
Case Study
https://issues.apache.org/jira/browse/collections-411
An IndexOutOfBoundsException could be raised in method
86
Case Study
STAR was applied to generate a crash reproducible test case
project developers
https://issues.apache.org/jira/browse/collections-474
We also attached the auto-generated test case by STAR in our bug
report.
87
Case Study
Developers quickly confirmed:
The original patch for bug ACC-411 was actually incomplete. It
88
Case Study
STAR is capable of identifying and reproducing crashes that
90
Challenges
We manually examined each not reproduced crashes to identify the
execution.
Path Explosion (6.7%)
91
Future Work
Improving reproducibility
Support for environment simulation, e.g. file inputs
Incorporate specialized SMT solver: string solver like Z3-str
localization process.
aspects.
92
Conclusions
We proposed STAR, an automatic crash reproduction
Thank You!
Appendix
95
Subject Sizes
Our evaluation study has one of the largest subject size
Subject Sizes
RECRASH
200 86,000
47,000
ESD
100 100,000
N/A
BugRedux
500 241,000
27,000
RECORE
68 62,000
35,000
STAR
20,000 100,000
60,000
96
Research Question 1
Average time to compute the crash preconditions (The lower the better)
Break down by each optimization
Average time spent (second)
100
90
80
70
60
50
40
30
20 18.511.815.913.8
10
2.1
0
ACC
No Optimization
Heuristic Backtracking
All Optimizations
90.4
86.8
74.8
67.5
59.3
55.1
48.2
47.8
54.3
50
39.2
28.3
4.9
ANT
3.3
2.4
LOG
Overall
97
Comparison Study
Average time to reproduce crashes (The lower the better)
Only the common reproductions
35
Average time spent (second)
29.9
30
25
20
15
10.8
10
4.283.75
5 2.4 2.3
0
BugRedux
ACC
STAR
8.7
ANT
LOG
4.6
Overall
98
User Survey
Survey Sent
Responses
Confirmed
Correctness
Confirmed
Usefulness
31
6 (19%)
ACC-53
Comparison Study
Branch coverage achieved by different test case generation approaches
80
Branch Coverage (%)
74
69
70
58
60
61
54
54
50
40
40
29
30
20
16
29
19
36
30
22
20 22
12
10
0
ACC
Sample Execution
RecGen
JSAP
Randoop
Palus
Palulu
STAR
0
SAT4J