Starsync

Starsync is a technology to fetch and update starkits over the net. It sends only changed files, in compressed form.

Right now, February 2003, there is an implementation of this mechanism, embedded in the SDX "update" command. This lets you inspect, obtain, and update starkits from SDarchive. Here are some examples, to illustrate it all:

let's examine the "Fractal" Tk demo by Keith Vetter:

	  $ sdx update -n fractal.kit
	   FRACTAL: looking up on http://mini.net/sync.cgi ...
	    3 differences:
	          69  main.tcl
	       35188  lib/app-fractal/fractal.tcl
	          76  lib/app-fractal/pkgIndex.tcl
	  $

well, it looks ok, so let's get that starkit:

	  $ sdx update fractal.kit
	   FRACTAL: fetching from http://mini.net/sync.cgi ...
	    File created.
	  $

subsequent updates would be done the same way:

	  $ sdx update fractal.kit
	   FRACTAL: updating from http://mini.net/sync.cgi ...
	    No change.
	  $

The implementation of this all, server and client, takes advantage of several aspects of the Metakit database:

changes are transacted, aborted updates can always be resumed
simple and efficient client-side implementation and protocol
on-the-fly restructuring to make a table-of-contents
only changed files are fetched, in zlib-compressed form
server also stores starkits, no packing/unpacking whatsoever

The first Starsync server implementation is a simple CGI script (all of the server logic is also in SDX, see below). All client-server interactions are stateless and use a normal HTTP "POST" request, with a very compact Metakit-based "table of contents" exchange (typically less than 0.1% of the starkit size).

The protocol is based on a single request/reply transaction, and is therefore stateless. The first implementation consists of a package called - what else? - "starsync" (it's part of SDX), which contains the logic for both clients and servers in less than 300 lines of Tcl.

The new "sdx update" mechanism supports using other servers ("-from" option), i.e. synchronization is not limited to sdarchive. More info on this is being worked out right now, to make it simple to set up more starsync servers (mirrors, proprietary repositories, personal collections, etc). This will - of course - be deployed as a starkit. Everything is open source, all details of code and data are public.

For now, "sdx update" is intended as a "public preview" which hopefully will make it much simpler to take advantage of what is in sdarchive today. It lacks goodies such as a progress indicator, but it is fully operational.

Starsync is a new development - it is currently being refined for use in a commercial product, and is sponsored by Unified Technologies Corporation. I am *extremely* grateful for UTCorp's permission to spin off the general purpose technology behind Starsync, and to make it available to others as Open Source Software. This is a perfect example of how open source and proprietary commercial development can benefit from each other IMO.

Setting up a CGI-based Starsync server

SDX includes a basic implementation of a Starsync server. It serves starkits contained in a single directory on the server. This initial implementation is not particularly efficient (sync catalog comparison for a starkit such as kitten may take several seconds of CPU time), but for simple purposes it should nevertheless be quite workable.

What you need:

a directory with *.kit files to be made available
tclkit and sdx
a web server set up to launch CGI scripts

The information below assumes a Unix system with all necessary files in the directory "/my/starkits/", and a server URL of http://my.server.com/cgi-bin/sync:

create the "sync" CGI script as a shell script containing the following lines:

        #!/bin/sh
        cd /my/starkits/
        exec ./tclkit ./sdx.kit starsync starlog

make sure the shell script has the proper permissions:

        chmod a+rx sync

create an empty file to which log entries will be appended:

        >/my/starkits/starlog
        chmod a+rw /my/starkits/starlog

make sure /my/starkits is readable from your CGI script:

        chmod a+x /my/starkits /my/starkits/tclkit
        chmod a+r /my/starkits/*

note that the /my/starkits/ directory need not be accessible from the web

At this point, you should be able to test access to CGI from a web browser, i.e. http://my.server.com/cgi-bin/sync - try it. You should get back an empty page, no errors, and no errors in the webserver log. You should also see your access logged as "0 0 -" (since it was not a meaningful request).

That's it. You should now be able to fetch starkits, using the SDX "update" command, by specifying the path to your server:

sdx update -from http://my.server.com/cgi-bin/sync oneofmystarkits.kit

Robustness and security

Updates are a matter of replacing the starkits on the server, and doing an "sdx update" from the clients. If you add starkits to the server, they become available to clients for downloading. If you remove a starkit, clients will check-and-ignore the server and continue as is (starkits are not deleted on the client side, clients simply cease to track updates). All accesses will be logged.

If you are concerned about race conditions during an update of a very active server, then the way to update starkits is to not overwrite the starkit, but to *move* them in place (i.e. rename) as last step. The server logic is such that this will not cause a failed sync ever, even for sessions which are in progress. With this approach, clients always see either the old *or* the new version of a starkit, never anything in between.

The server never alters any starkits, not does it access any files outside the current directory. For security reasons, only a very strict set of starkit names are recognized on the server:

the files have to have a suffix ".kit"
filenames must be lowercase and pass Tcl's "string is wordchar" test, i.e. alphabetics, numerics, and "_"
anything not matching the above is not visible or accessible through the CGI interface
the CGI interface does not let you browse, you have to know a starkit's name to fetch/update it

There is one exception to the read-only behavior of the server: log entries are appended to the optinally specified logfile ("starlog" in the example above).