Using Metakit as an embedded database is quite robust for a number of reasons:

That leaves the following possibilities for damaging datafiles:

The power-off damage can be prevented with a UPS (i.e. avoiding shutdowns from happening in an uncontrolled manner).

In-memory data corruption

The last case is the one where client-server databases have a clear advantage: if the database code is in a separate process and if all requests are properly checked for validity, then memory corruption can never damage the db.

With Metakit it is possible for the calling code to write into memory managed by Metakit, in a way which causes a subsequent commit to fail... slightly! - anything from changing a byte so a wrong value is written, to altering memory in such a way that the commit completes, but with a damaged data structure on disk.

A solution

The solution is to run the commit code in a separate process. One could call it a "half client/server" approach: clients can still access/read data as before, at maximum speed from their memory-mapped files in their own address space, but they must pass all modification requests and commits on to a backend process.

This offers a range of benefits beyond the memory protection guarantee:

Full timing tests have not been performed, but basic socket-based I/O within a single machine is very fast (I see up to 6 Kreq/seq sync and 50 Kreq/seq async on a Core 2 Duo Mac).

Implementation

It turns out that this is all very easy to implement in pure Tcl: roughly 150 lines of code, evenly split between frontend and backend, is all it takes.

Here's how this is being used in this first experimental version:

The trick is to separate the different mk::* commands into different categories:

It all depends a bit on whether the frontend will use Metakit or Ratcl calls.

Using Metakit

With Metakit in both processes, the basic idea is to open the same datafile in both. In the front-end, it is opened read-only. In the backend: read-write.

When making changes, you basically apply the same mk::* commands to both. The nice detail is that all change commands sent to the backend can be sent asynchronously, which is substantially more quicker.

On commit, the following must be done:

While I have not yet explored this scenario, it looks like it should be simple.

Using Ratcl

With Ratcl, a similar approach is taken but a bit more work is needed since changes can not be done in the same way. There are two choices here:

For now, I've only tried the first approach, making chnages by "sending" them to the backend, and assuming I'll get to see the effects after the next commit.

It's trivial to do this. I've defined a few helper procs for this:

    proc db {args} {
        global Db Datafile
        if {![info exists Db]} {
            set Db [view $Datafile open]
        }
        uplevel [list view $Db get 0 {*}$args]
    }

    proc mkdo {cmd args} {
        backer send mk::$cmd {*}$args
    }

    proc commitdb {} {
        global Db
        unset -nocomplain Db
        backer call mk::file commit db
    }

The db proc is a convenience command for Ratcl, it opens (and re-opens!) the datafile whenever it needs to. So for example, a datafile with people names would be accessed as follows in "plain" Ratcl:

    set Datafile myfile.db
    view $Datafile open | get 0 people | loop { puts "name: $(name)" }

With the "db" utility definition, this becomes:

    db people | loop { puts "name: $(name)" }

The 'mkdo proc is again a convenience. It gets used to replace ordinary command. So for example this:

    mk::row append db.people name Joe

becomes this when using the backend:

    mkdo row append db.people name Joe

Note that mkdo calls cannot return a value since they are asynchronous. If a return value is needed, you'll need to use this instead:

    set row [backer call mk::row append db.people name Joe]

It's worth trying to avoid that, since it prevents some paralellism.

And finally, when it's time to commit the changes:

    commitdb

That's it. The side-effect of commitdb is that it undefines the Db global variable storing the database view for Ratcl. So on next use, "db" will re-open the datafile and automatically pick up all the changes (opening a datafile is very fast in Ratcl).

The Ratcl-front / Metakit-back approach looks very promising so far. It not only separates functionality, it actually makes it unnecessary to have Metakit in the frontend. I'm currently using an 8.5 Tclkit Lite build for the frontend and a "classical" 8.4 Tclkit for the backend, and so far it's going nicely.

The backend

The backend process needs to be managed. The current code makes this quite easy for a very specific scenario: running one backend for one frontend: it creates the backend process whenever the frontend starts, and stops it when it exits. All combinations are handled, i.e. when either side crashes, and there is logic in the frontend to transparently restart its backend if ever needed.

The backend server socket is bound to the loopback interface by default, so that outside access is prevented. Because that still leaves acces open within the same machine and since the backend has r/w access to datafiles, a simple passkey check is enforced when clients connect.

Still some rough edges to work out, but not bad for a couple of hours work!

-jcw, 2007-06-17



2007-06-17

Created

2007-06-17 jcw

(Changed: area more)

2007-06-17 jcw

(Changed: desc)

2007-06-23

(Changed: stat desc)


TN03 - Technical Notes

  • In-memory data corruption

  • A solution

  • Implementation

  • Using Metakit

  • Using Ratcl

  • The backend