Development of the Qualitas Corpus

Current Status

The Qualitas Corpus has only been made available outside the Qualitas Research Group since January 2008 and so there has been little external pressure affecting its organisation. We expect that feedback from other researchers will result in require rethinking of some of the decisions we have made in the short term, although the basic structure is unlikely to change.

We have a reasonable degree of confidence about the "binary" or compiled forms of each version of systems in the corpus, in that each version has only been included if it meets a set of criteria. As discussed in "defining systems" there are a number of issues regarding identifying what should be considered "in" an system which leads to a small level of uncertainty.

As of the July 2010 release, we now have metadata that fairly completely describes what is in each of bin and src. This allows anyone using the corpus to quantify the level of uncertainty.

Future Plans

Our current todo list includes:

Add more metadata. Possibilities include, adding more attributes (e.g. several were added for the April 2012 release), adding data from analysis (e.g. measurements from various metrics).
Continue developing and improving the quality control of the corpus (ongoing).
Add new systems to the corpus. We particularly want to add more large systems (more than 1000 types) (ongoing).
Add new versions of existing systems to the corpus (ongoing)
Develop more software to support analysis of the corpus.
Add systems written in languages other than Java.