Qualitas Corpus Content Structure
The Qualitas Corpus contains a set of "systems", by which we mean
software systems that each have been developed as a unit and intended to
be deployed as a unit. Some systems are not intended to be used as
standalone systems but provide frameworks or other infrastructure for
other systems, nevertheless we still call them "systems".
We identify each system with a name clearly related (and usually
identical to) the name used by the developers.
For many systems we have multiple "versions" in the
corpus. Each version is distinguished by the system name and
an system-unique identifier,
usually that used by the developers. See
Naming conventions for details.
The structure of the contents of the Qualitas Corpus is as follows:
Systems
|
+--sysname
|
+--sysname-version_id
|
+--bin
|
+--compressed
|
+--metadata
|
+--src
|
+--.properties
|
+--.install
- Systems
-
The top-level directory. (See distribution
structure
for more details.)
- system
-
This directory contains everything for the specified system. It
typically will contain only subdirectories corresponding to each version
of the system in the corpus.
- system-version_id
-
This directory contains everything for a specific version of the system.
- compressed
-
This directory contains what (typically archive files such
as zip or tar files)
was retrieved from the system download
site. This will usually be two compressed files; one containing the
system version ready for deployment and the other containing the
system version source code and other resource files. Sometimes
the deployment version is contained within the source distribution,
in which case only the source distribution "compressed" file will be
provided. Departures from these two cases will be described in
README files in this directory.
- bin
-
The compiled form of the system version as provided in the
deployment form. Usually the layout of this directory matches that
provided in the relevant compressed file. In order to aid analysis,
the -reduced option deletes all but what we have identified
as being those files that directly related to the development of
the system (no third-party code or other resource files).
See our discussion on how
we decide what is in an system for more details.
Note: The relevant archive file in compressed
will still contain anything we have decided not to include in
bin.
- metadata
-
This contains some of the metadata for
the system version. (See also .properties below)
- src
-
This contains the source code. It is usually exactly the
relevant compressed file uncompressed with no changes.
- .properties
-
The file containing all metadata relevant this
version of the system.
- .install
-
A script to populate the src and
bin folders. This will not be present if corpus is
unpacked with install.pl. (See Installation
for more details.)
Example
An example of how part of the contents of Systems
might be:
ant
+
|
+-- ant-1.1
| |
| +-- .install
| |
| +-- .properties
| |
| +-- bin
| | |
| | +-- jakarta-ant
| | |
| | +-- lib
| | |
| | +-- ant.jar
| |
| +-- compressed
| | |
| | +-- jakarta-ant-1.1-bin.zip
| | |
| | +-- jakarta-ant-1.1-src.zip
| |
| +-- src
| |
| |
| +-- metadata
| |
| +-- contents.csv
|
+-- ant-1.2
|
+-- ant-1.3
|
+-- ant-1.4
This shows the system ant (that is
system is ant), with 4 versions of
it (version_ids
1.1, 1.2, 1.3, 1.4)
with only the details of 1.1. shown. In that version,
there are two separate compressed files, one for compiled (-bin)
and one for source (-src). The bin compressed file has been
unpacked into the bin subdirectory of ant-1.1,
but all files except ant.jar have been deleted.
(The source file has also been unpacked but it is not shown
here.)