| Term (link to details) | Short description |
|---|---|
| Candidate Pair | A candidate pair is a pair of code fragments for which there is some information regarding whether or not one is a clone of the other. The information may be that the pair is in fact a clone pair, but it could also be that the pair is not a clone pair. |
| Code Fragment | A code fragment is any contiguous sequence of text lines in a source code file. |
| Clone | One code fragment is a clone of another fragment if it is conceivable that a rational developer created one fragment by copying (and possibly modifying) the other. |
| Clone Pair | A clone pair is a pair of code fragments for which there is some evidence that one fragment is a clone of the other. That is, it is a candidate pair where the information is in support of the clone relationship existing. |
| Cluster | A cluster is a set of code fragments where, for every code fragment, there is at least one other code fragment such that the two fragments together are a clone pair. Note that this is the "connected component" definition. The "clique" variant would require that every pair of code fragments form a clone pair, but this variant is not used in the Collection. |
| Confidence Level | Confidence level is an ordinal-scale value indicating the degree of confidence regarding some datum. |
| ELOC | "Executable" lines of code --- lines that are not blank, are not entirely comments, and contain more than braces. |
| Master File | This is the authoritative data source for clone information. There is one for each system version. |
| Provenance | Provenance in the context of the Collection, refers to identifying the origin and (ideally) processes for creating the data that provides supporting evidence for the clone data. |
| Clone type | There have been several classifications proposed for code clones, the one that seems to be referred to the most is the clone 'type'. The categories in this classification are Type-1 Type-2, Type-3, and Type-4 (definitions taken from Roy et al.). There is not unanimous agreement on these categories, especially what's in Type-3 and Type-4 is not considered in the Collection. |
| Type-1 clone | Identical code fragments except for variations in whitespace, layout and comments. |
| Type-2 clone | Syntactically identical fragments except for variations in identifiers, literals, types, whitespace, layout and comments. |
| Type-3 clone | Copied fragments with further modifications such as changed, added or removed statements, in addition to variations in identifiers, literals, types, whitespace, layout and comments. |
| Type-4 clone | Two or more code fragments that perform the same computation but are implemented by different syntactic variants. |