Qualitas Corpus Glossary

This page (is supposed to) contains all the corpus-specific terms that are used, with links to the full documentation. . These measurements only apply to those classes specified by sourcepackages.
Term (link to details) Short description
Attributes Part of the metadata provided with the corpus is the values for a set of attributes for each system or sysver.
bin Used both to refer to the compiled form of any types and the place in a sysver structure where the compiled form is kept.
contents.csv Part of the metadata provided with the corpus is information for every type that is included in the corpus for each system.
Distribution An attribute indicating the distribution a sysver can be found in.
Domain An attribute indicating the domain the system belongs to.
Full Name An attribute indicating the full name of the system (often the same as System).
.install The name of the install script for a sysver.
JRE Version An attribute indicating the earliest version of the JRE needed for the sysver.
License An attribute indicating the license the system is released under.
LOC (Lines of Code) The number of lines of code. When applied to a text file, this is what the Unix wc -l command will give, except that if the last line of the file does not end in a linefeed, then the LOC will be one greater than the wc value.
loc(both) An attribute indicating the LOC for the .java files that are in src and compiled forms exist in bin for types belonging to packages that match sourcepackages
n_bin An attribute indicating the number of .class files in bin for types belonging to packages that match sourcepackages
n_both An attribute indicating the number of .class files in bin for which there is source code in src for types belonging to packages that match sourcepackages
n_files An attribute indicating the number of .java files in src for types belonging to packages that match sourcepackages
n_top(bin) An attribute indicating the number of .class files for top level types belonging to packages that match sourcepackages
NCLOC (Non-comment, Non-blank Lines of Code) Lines of code counted in LOC excluding blank lines and lines that are entirely comments. Lines of text that contain both code (even if it is a single character such as "}") and comments are included.
ncloc(both) An attribute indicating NCLOC for classes that are in both src and bin.
.properties File containing Sysver-specific attribute values.
Recent Version Refers to the most recent version of a system in the corpus (which may not be the most recent version of the system that's available).
Release Date An attribute indicating the date the version was released on.
Size metrics There are measurements for 6 size metrics provided for each sysver: 2 "lines of code" metrics (loc(both), ncloc(both)), 3 "number of classes" metrics (n_bin, n_both, n_top(bin)) and number of files (n_files)
Source Packages An attribute indicating prefixes of packages that contain code written for the system.
summary.csv File containing the attribute values of all sysvers in the corpus.
systemnotes An attribute indicating notes for a system.
src Used both to refer to the source form of any types and the place in a sysver structure where the source form is kept.
Status An attribute indicating the development status of the system.
System An attribute indicating the name of the system the system version belongs to.
Sysver An attribute indicating system and version
Standard Sysver Structure How the entry for a sysver is organised.
sysvercount An attribute of a system indicating the number of versions in the corpus for that.
Type Refers to any Java type-like entity (including classes, interfaces, enums, annotations). This is used rather than "class" as the general term to avoid ambiguity because "class" can also mean things that aren't interfaces, enums, or annotations.
Top Level Type A type that is not a nested type.
versionnotes An attribute indicating notes specific to a sysver.
url An attribute indicating the web address (typically the home page of a system).