The goal for this release was to reorganise some of the metadata, specificially that data that does (or should) appear in the .properties files and the summary.csv file. As part of this exercise, some issues with the metadata were uncovered and fixed (details below).
Some new systems were added and the evolution release systems were updated with all their new versions since the last release of the corpus. One of the new systems (freemind) was also added as an evolution system.
In summary, 5 new systems were added, bringing the total to 111, and a total of 76 new sysvers were added (including for the new systems), bringing the total to 661 sysvers.
The "r" distribution has versions of 111 systems (3.27 GiB uninstalled, 9.56 GiB installed). The "e" distribution contains the 14 systems for which there are 10 or more versions. This distribution (12.12 GiB uninstalled, 45.64 GiB installed) is intended for evolution studies. The full distribution (at 15.69 GiB uninstalled, 56.20 GiB installed) can be made available on request.
The attributes metadata has undergone extensive change. Previously this data was found in two places, the individual .properties files and the global summary.csv file. Some attributes were found in both places, and some where only found in one place, so it was non-trivial to get all the attribute values. With this release, attribute values can be found in the same two places, but now all attribute values are found in both places.
Some new attributes have been added, most notably license (albeit incompletely), status, jreversion, and distribution. The versionnotes, which had been used as a catch-all for any notes regarding a given version, including internal management notes, has been re-purposed to contain only version-specific notes of use to a corpus user. The internal notes have been removed.
Some of the attribute names have been modified to provide a more consistent naming scheme. This affects the old names of systemversion (now sysver), LOC(Both) (now loc(both)), NCLOC(Both) (now ncloc(both)), #Both (now n_both), #Bin (now n_bin), #Top(Bin) (now n_top(bin)), and #Files (now n_files). Other attributes are now not provided, being either for internal use only or obsolete (notes, acquisitiondate, acquisitionperson, language, languageversion, origin, opensource, obfuscated, source).
During this exercise, a lot of checks were made of existing metadata, and some errors or missing data were uncovered. Below are the changes to attribute values that are not represented by the comments above: