Qualitas Corpus Content Structure

The Qualitas Corpus contains a set of "systems", by which we mean software systems that each have been developed as a unit and intended to be deployed as a unit. Some systems are not intended to be used as standalone systems but provide frameworks or other infrastructure for other systems, nevertheless we still call them "systems". We identify each system with a name clearly related (and usually identical to) the name used by the developers.

For many systems we have multiple "versions" in the corpus. Each version is distinguished by the system name and an system-unique identifier, usually that used by the developers. See Naming conventions for details.

The structure of the contents of the Qualitas Corpus is as follows:

Systems
   |
   +--sysname
        |
        +--sysname-version_id
              |
              +--bin
              |
              +--compressed
              |
              +--metadata
              |
              +--src
              |
              +--.properties
              |
              +--.install
Systems
The top-level directory. (See distribution structure for more details.)
system
This directory contains everything for the specified system. It typically will contain only subdirectories corresponding to each version of the system in the corpus.
system-version_id
This directory contains everything for a specific version of the system.
compressed
This directory contains what (typically archive files such as zip or tar files) was retrieved from the system download site. This will usually be two compressed files; one containing the system version ready for deployment and the other containing the system version source code and other resource files. Sometimes the deployment version is contained within the source distribution, in which case only the source distribution "compressed" file will be provided. Departures from these two cases will be described in README files in this directory.
bin
The compiled form of the system version as provided in the deployment form. Usually the layout of this directory matches that provided in the relevant compressed file. In order to aid analysis, the -reduced option deletes all but what we have identified as being those files that directly related to the development of the system (no third-party code or other resource files). See our discussion on how we decide what is in an system for more details. Note: The relevant archive file in compressed will still contain anything we have decided not to include in bin.
metadata
This contains some of the metadata for the system version. (See also .properties below)
src
This contains the source code. It is usually exactly the relevant compressed file uncompressed with no changes.
.properties
The file containing all metadata relevant this version of the system.
.install
A script to populate the src and bin folders. This will not be present if corpus is unpacked with install.pl. (See Installation for more details.)

Example

An example of how part of the contents of Systems might be:
ant
 +
 |
 +-- ant-1.1
 |     |
 |     +-- .install
 |     |
 |     +-- .properties
 |     |
 |     +-- bin
 |     |    |
 |     |    +-- jakarta-ant
 |     |            |
 |     |            +-- lib
 |     |                 |
 |     |                 +-- ant.jar
 |     |
 |     +-- compressed
 |     |    |
 |     |    +-- jakarta-ant-1.1-bin.zip
 |     |    |
 |     |    +-- jakarta-ant-1.1-src.zip
 |     |
 |     +-- src
 |     |
 |     |
 |     +-- metadata
 |          |
 |          +-- contents.csv
 |
 +-- ant-1.2
 |
 +-- ant-1.3
 |
 +-- ant-1.4
This shows the system ant (that is system is ant), with 4 versions of it (version_ids 1.1, 1.2, 1.3, 1.4) with only the details of 1.1. shown. In that version, there are two separate compressed files, one for compiled (-bin) and one for source (-src). The bin compressed file has been unpacked into the bin subdirectory of ant-1.1, but all files except ant.jar have been deleted. (The source file has also been unpacked but it is not shown here.)