Qualitas Corpus JRE Versions

The metadata for entries in the corpus includes the jreversion attribute. This is the earliest version of the JRE (really the JDK) that is needed to compile that system. This has been determine automatically using the version identification procedure described below, and so may not necessarily correspond to what the developers of the system think. For example, they may have been using the JDK 1.4 toolset, but in fact only ever use types found in JDK 1.3. In which case, the system website may report JDK 1.4 as a requirement to use the system, but the metadata reports JDK 1.3. There are other issues, as described below.

API Identification

Determining what is the API for the JDK turns out to be a non-trivial problem. At some level, the "ground truth" is provided by the compiler and the libraries it uses. The libraries include rt.jar and similar jar files that come with a Java installation. However, that means that if those files are different (e.g. different vendors) then the API can be different. In this case, we use the Oracle jar files.

Adding to the difficulty is that the library jar files contain implementations of types that are there solely to support the JDK implementation (e.g. types in the com.sun package and subpackages) and are not intended to be used by Java developers. Such types should not really be considered part of the API.

Given that Java developers will make decisions about what types they can used based on what's documented in the "API Specification" JavaDoc pages, those pages perhaps represent the most authoritative source of what the API is. For the purposes of this exercise, the files that make up the JavaDoc pages will be used to identify the API contents.

The API consists of the types, and the accessible members of those types. Such members are any that are non-private, since such members could be used by any Java developer.

API Identification Procedure

This describes how the contents of a particular version of the JRE are determined.
  1. Determine the types in the API by listing all the html files in the java, javax, and org directories of the JavaDoc source for the JRE version, and stripping out the non-class files.
  2. For each type identified in the previous step, pull out the relevant .class file from the appropriate jar file.
  3. Use a bytecode library (BCEL in this case) to determine which members of that type are non-private.
The result is a list of types and non-private members for those type that constitutes the contents of the API for the particular version of the JRE. The APIs that have been identified by this procedure are:

JRE Version Identification

The principle of identifying a particular JRE version is that if a system depends on one version of the JRE, and not an earlier version, then it will probably use something (a member of a type) that was not available in the earlier version.

JRE Version Identification Procedure

This identifies the earliest version of the JRE that a given system requires in order to compile.
  1. For the system (from bytecode), identify all members of the JRE it uses
  2. For each api version from the list above from earliest to latest version
    1. For each used member identified
      1. If the used member does not appear in the api version, continue with the next version
    2. If all used members appear in the api version then record that version as the earliest required and stop.

Threats to validity