Qualitas Corpus Overview

The Qualitas Corpus is in fact a set of corpora (so perhaps it should have been called the Qualitas Corpora but it's probably too late now!), each containing a different set of "releases". A release consists of a set of versions of systems.

What distinguishes releases from each other is their contents. Some releases are distributed in different ways, each way being referred to as a "distribution". To use the corpus, one decides which distribution is required, which dictates which release one will get, and then goes about acquiring the distribution. Each distribution contains, for each system version, archive files corresponding to what is provide on the system download site, plus various support information and metadata.

Having obtained a distribution, one must now install it, which will result in some documentation (including these pages), and a set of versions of systems, organised hierarchically by system and then version, with the archive files partially unpacked. Note that if the installation is not done, then none of the systems will be unpacked, and so there will be little to analyse. Also, what is place in the bin directory is usually not everything that appears in the relevant archive file, however as we provide the original archive file anything we don't unpack is still available.

We provide metadata for each version of each system in the corpus. Possibly the most useful is the sourcepackages data. A typical analysis run would load up all the source or bytecodes and then use the sourcepackages values to distinguish the types that were written for the system from those from external libraries.