Development of the Qualitas Corpus
The Qualitas Corpus has only been made available outside the Qualitas
Research Group since January 2008 and so there has been little external
pressure affecting its organisation. We expect that feedback from other
researchers will result in require rethinking of some of the decisions we
have made in the short term, although the basic structure is unlikely to
We have a reasonable degree of confidence about the "binary" or compiled
forms of each version of systems in the corpus, in that each version has
only been included if it meets a set of criteria. As discussed in "defining systems" there are a number of issues
regarding identifying what should be considered "in" an system which leads
to a small level of uncertainty.
As of the July 2010 release, we now
have metadata that fairly completely
describes what is in each of bin and
src. This allows anyone using the corpus to quantify the level of
Our current todo list includes:
Add more metadata. Possibilities include, adding more attributes (e.g. several
were added for the April 2012 release), adding data from analysis (e.g.
measurements from various metrics).
Continue developing and improving the quality control of the corpus
Add new systems to the corpus. We particularly want to
add more large systems (more than 1000 types) (ongoing).
Add new versions of existing systems to the corpus (ongoing)
Develop more software to support analysis of the corpus.
Add systems written in languages other than Java.