The Collection uses a structure similar to the Corpus, in that data is associated with each system version, and multiple versions of the same system are kept together. However, for each system version, the actual data consists of a master file (planned to be in RCF format, but not yet implemented) plus the provenance data. The provenance data is clearly separated from the master file, with the intent that some distibutions of the Collection will only include the master files.
An example of the structure is shown below. It shows 4 systems, with the first (ant) having at least two versions (1.1 and 1.8.2). Within the directory for a version, there is the master file, and a directory containing the provenance information, in this case the data files from the mete-cmcd tool.
.../QCCC/ Systems ant/ ant-1.1/ ant-1.1-clones.rcf provenance/ ant-1.1-mete-cmcd.csv.gz ... ant-1.8.2/ ant-1.8.2-clones.rcf provenance/ ant-1.8.2-mete-cmcd.csv.gz ... /antlr ... /aoi ... ...This structure is chosen to support future expansion of the Collection.