Starchive implementation

This describes a first implementation of Starchive - an associative and versioned Starkit repository.

The entire repository core consist of two sets of files:

  1. A huge collection of files, stored in the data/ directory, with meaningless names ("signatures") derived from the file contents and size. Each file is stored exactly once (in compressed form).
  2. A collection of starkit "maps", one per starkit version, in the kits/ directory. Each contains a listing of the files and directories in that starkit, as well as the corresponding names in the data/ area. These maps are relatively small, about 1% of the original starkits on average.

Access to the repository is by starkit name and version ID. If you know both, the Starchive has the information needed to reconstitute the original starkit. If you know only a starkit name, you can enumerate the available versions and their latest modification dates.

Internally, starchive keeps all pieces separately, but has all the information needed to reconstruct each version. The advantage of this approach is that multiple version of a starkit which have a lot of the same file versions, are stored very efficiently. Changing a single file and resubmitting a starkit to the archive will add one new file and one new starkit map to the starchive repository, regardless of how many files the starkit contains.

Some more comments about this approach:

This work-in-progress Starchive is at https://www.equi4.com/starch/.