Professional Documents
Culture Documents
Relationship
INCONSISTENT
SUBTRACT CHANGE
C
C C
ADD Code Snippet
CONSISTENT D D D D
SAME SHIFT
CHANGE A A A A A
B B B B
Clone Group Add Consistent Change Inconsistent Change Subtract
Subtract
Figure 1: The relationship among evolution patterns
Evolution Patterns
Vi Vi+1 Vi+2 V i+3 V i+4
1. MODEL OF CLONE GENEALOGY
To study clone evolution structurally and semantically rather Figure 2: An example clone lineage
than quantitatively, we defined a model of clone genealogy.
The genealogy of code clones describes how groups of code
clones change over multiple versions of a program. In a TextSimilarity(NG.text,OG.text) = 1
clone’s genealogy, the origin of a group to which the clone all cn:CodeSnippet | some co:CodeSnippet | cn in NG ⇒
belongs is traced to the previous version. The model as- co in OG && LocationOverlapping(cn,co) = 1
sociates related clone groups that have originated from the all co:CodeSnippet | some cn:CodeSnippet | co in OG ⇒
same ancestor clone group. In addition, the genealogy con- cn in NG && LocationOverlapping(cn,co) = 1
tains information about how each element in a group of
clones has changed with respect to other elements in the
• Add: at least one code snippet in N G is a newly added
same group.
one. For example, programmers added a new code
snippet to N G by copying an old code snippet in OG.
We wrote our model in the Alloy modeling language [?] to
TextSimilarity(NG.text,OG.text) ≥ simth
check whether several evolution patterns can describe all
some cn:CodeSnippet | all co:CodeSnippet | co in OG ⇒
possible changes to a clone group and to clarify the rela-
cn in NG && LocationOverlapping(cn,co) = 0
tionship among evolution patterns. (Our entire model is
available on the web [?].)
• Subtract: at least one code snippet in OG does not
The basic unit in our model is a Code Snippet, which has appear in N G. For example, programmers refactored
two attributes, Text and Location. Text is an internal repre- or removed a code clone.
sentation of code that a clone detector uses to compare code TextSimilarity(NG.text,OG.text) ≥ simth
snippets. For example, when using CCFinder [?], text is a some co:CodeSnippet | all cn:CodeSnippet | cn in NG ⇒
parametrized token sequence, whereas when using CloneDr co in OG && LocationOverlapping(cn,co) = 0
[?], text is an isomorphic AST. A Location is used to trace
code snippets across multiple versions of a program; thus, • Consistent Change: all code snippets in OG have changed
every code snippet in a particular version of a program has a consistently; thus they belong to N G together. For
unique location. To determine how much the text of a code example, programmers applied the same change con-
snippet has changed across versions, we define a TextSimi- sistently to all code clones in OG.
larity function that measures the text similarity between two simth ≤TextSimilarity(NG.text,OG.text)< 1
texts t1 and t2 (0 ≤ TextSimilarity(t1, t2) ≤ 1). To trace a all co:CodeSnippet | some cn:CodeSnippet | co in OG ⇒
code snippet across versions, we define a LocationOverlap- cn in NG && LocationOverlapping(cn,co) > 0
ping function that measures how much two locations l1 and
l2 overlap each other (0 ≤ LocationOverlapping(l1, l2) ≤ 1). • Inconsistent Change: at least one code snippet in OG
A Clone Group is a set of code snippets with identical text. changed inconsistently; thus it does not belong to N G
CG.text is a syntactic sugar for the text of any code snippet anymore. For example, a programmer forgot to change
in a clone group CG. A Cloning Relationship is defined be- one code snippet in OG.
tween two clone groups CG1 and CG2 if and only if TextSim- simth ≤TextSimilarity(NG.text,OG.text)< 1
ilarity(CG1 .text, CG2 .text) ≥ simth , where simth is a con- some co:CodeSnippet | all cn:CodeSnippet | cn in NG ⇒
stant between 0 and 1. An Evolution Pattern is defined be- co in OG && LocationOverlapping(cn,co) = 0
tween an old clone group OG in the k − 1th version and a
new clone group N G in the k th version such that there exists • Shift: at least one code snippet in N G partially over-
a cloning relationship between N G and OG. laps with at least one code snippet in OG.1
TextSimilarity(NG.text,OG.text) = 1
We defined several evolution patterns that describe all pos- some cn:CodeSnippet | some co:CodeSnippet | cn in NG
sible changes to a clone group. The relationship among evo- && co in OG && (1 >LocationOverlapping(cn,co) > 0)
lution patterns is shown in the Venn diagram in Figure 1.
1
This unintuitive pattern was found when we used Alloy to
• Same: all code snippets in N G did not change from check whether the combination of patterns can describe all
OG. possible changes to a clone group.
Inconsistent Change Add & o1=o2 // exactly the same location
Subtract & Add
& Subtract Consistent Change
}
E
F
H fun overlap (o1:Location,o2:Location) {
F F
G G #OrdPrevs(o1) = # OrdPrevs(o2) +1
G
|| #OrdPrevs(o2) = #OrdPrevs(o1)+1
C C // partially overlap
D D D D }
A A A A A
B B B B
fun notoverlap (o1:Location, o2:Location) {
Add Consistent Change Inconsistent Change Subtract ! overlaphigh(o1,o2) &&
& Subtract !overlap(o1,o2) // does not overlap at all
Vi Vi+1 Vi+2 V i+3 V i+4 }
fun SUBTRACT(r:Relationship) {
(similarhigh(r.new.group.text, r.old.group.text) ||
similar(r.new.group.text,r.old.group.text))
some cso:Code | all csn:Code |
csn in r.new.group => cso in r.old.group &&
notoverlap(csn.location,cso.location)
}
//run SUBTRACT for 5
fun CONSISTENT(r:Relationship) {
similar(r.new.group.text,r.old.group.text)
all cso:Code | some csn:Code |
cso in r.old.group => csn in r.new.group && (
overlap(csn.location,cso.location) ||
overlaphigh(csn.location,cso.location))
}
//run CONSISTENT for 5
fun INCONSISTENT(r:Relationship) {
similar(r.new.group.text,r.old.group.text)
some cso:Code | all csn:Code |
csn in r.new.group => cso in r.old.group &&
notoverlap(csn.location,cso.location)
}
//run INCONSISTENT for 5
assert ALL_EXHAUSTIVE {
all r:Relationship |
!notsimilar(r.new.group.text,r.old.group.text) =>
ADD(r)|| SHIFT(r) || SAME(r) ||
SUBTRACT(r) || CONSISTENT(r) || INCONSISTENT(r)
}
//check ALL_EXHAUSTIVE for 5
//proved true