Friday, January 13, 2006

Linux repository classification schemes

Originally, I was going to write this as an extended paper with a detailed review of how the ten main Linux distributions organise their software into repositories. However, I don't seem to have the time to do so, and will provide a brief overview here instead.

This is relevant to developing tools that allow comparisons of repositories, e.g. comparison of software availability (how many software packages are available; how quickly are new versions released, how current are the current versions, how many versions are released in a given time - three sides of a triangle; other comparisons might take into account stability and other criteria), such as whohas.

In any case, there are three main ways to classify repositories:
  • Maturity
  • Providence
  • Function
The classic example of repositories organised by maturity would be Debian, which at any given time has three branches which may be more or less distinct (there is a graph of the relationship over time somewhere on the web...) A peculiarity - indeed, a feature - of Debian is that one can almost freely mix packages from different repositories; so while one may be running a stable kernel, one could have an "unstable" version (the quality of unstable aka. still in development software is actually fairly high in Debian) of Mozilla-based (and -dependent) products. Many distributions (except source-based and advanced binary-based ones (Arch Linux)) occur as distinct releases in the wild, but Debian is the only one in which mixing repositories is common enough practice to actually work (in terms of documentation and being taken into account in development, if marginally).

A classic example of a providence-based repository classification is given by Fedora, which is now distributed as Core and Extras. Another common classification, especially used by RPM-based distros (for no technical reason as far as I know), is "Contrib", sometimes called Community.

Arch Linux has a hybrid of these two, in that Current correspond to Core, Extra and Community are self-explanatory providence-based contrasts, but there are aso Testing and Unstable repositories, which are code-maturity classifications and mostly contain packages that would otherwise be found in Core. To make things entirely confusing, there is a repository Unsupported, to which users can contribute buildscripts, so it is actually a fourth kind of classification, which I might phrase as binary-source-buildscript. Note that distributions will provide either source or buildscripts, but not both separately.

But to return to the original big three, the most prominent example of a functional classification would be Slackware, which classifies packages into base, latex, gnome etc.; however, these are not strictly repositories in that they would be separately specified in a package manager config file. Again, many hybrids exist - in Arch Linux, we also find an underlying functional classification into "categories", which resemble those in Slackware: x11, system, network, gnome etc.

Being aware of the different classification schemes used, one can get the full benefit of tools such as whohas.

No comments: