Qualitas Corpus Clone Collection Data Provenance

All data in the Collection should have include its provenance — where it came from. This is to aid assessing how trustworth it is. The provenance information is considered part of the Collection Some distributions may not include it, but it should be available to anyone who wants to check it.

Details about what provenance information is available are accessible from this page.

mete-cmcd is a clone detector. Much of the original data in the collection was produced by this tool. The raw data produced by this tool is available as described here.