Qualitas Corpus Domain Model

In order for any corpus to be useful, it must be representative. If it only contains particular kinds of things, or the things come from a limited source, or similar restrictions on its contents, then there is the possibility of bias that will impact the validity of any conclusions drawn from its use (specifically, threats to external validity). Ideally, the corpus should contain a representative sample of its population, but in reality this is impractical. This is acknowledged in fields such as computational linguistics, which makes heavy use of corpora of language use. Hunston observes that "The real question as regards representativeness is how the balance of a corpus should be taken into account when interpreting data from that corpus." [Hun2002] That is the philosophy of the Qualitas Corpus.

To support understanding the balance, or representativeness, of the Qualitas Corpus, there needs to be some way to characterise its representativeness. This is the goal of the Qualitas Corpus Domain Model. It provides a set of categories and all entries in the corpus are classified into one of the categories. Hopefuly looking at what is in each category gives a sense of what is "in" the corpus.

The domain model described below is just a start. There are some issues with it:

But it is a start. As the corpus develops, hopefully some of these issues will be resolved.
3D/graphics/media
Systems that provide some sort of media support, in particular graphics. This should be compared with diagram/visualisation.
IDE
Provides a tool that supports code development (particularly the edit/execute cycle).
SDK
Provides the base libraries for programming in a particular language.
database
Provides some sort of database management.
diagram/visualisation
Provides diagrams or visual presentation of some sort of data.
games
Is either a game, or provides support for game development.
middleware
Provides support for typically middleware services, such as transactions, persistence, concurrency.
parsers/generators/make
Provides support for creating parsers or building systems.
programming language
Provides a new programming language.
testing
Provides support for automated support.
tool
Everything else.

References

[Hun2002]
Susan Hunston 'Corpora in Applied Linguistics' Cambridge University Press 2002.