The State of Metakit

The Metakit package is a very exciting product. Yeah, well, I mean that... :o)

Things that have worked out well include:

Performance has been going up in big jumps over the various releases
On-the-fly restructuring is working out nicely, and is transacted like everything else
The headers are small, internals are very nicely tucked away (as a result, very substantial changes were feasible without affecting the APPI that much)
Things like new datatypes and columns-with-a-gap optimizations were added with full backward compatibility
Datafiles are compatible, the latest release still reads 1.0 files just fine
Portability has turned out to be excellent, from 16- to 64-bit platforms
Support for memory mapped files was added later on, though it now plays a central role
The class hierarchy is very flat, there are very few virtual members
Modularity is good, apps which do not call all functions will be considerably smaller than those who do
The quality seems to work out ok, very few bugs tend to come up nowadays
The "strategy" class is extremely flexible for non-standard I/O contexts

There are also some things which have not been addressed, or are still weak:

There is no multi-user support, other than many-readers-no-writer
Threading support is very basic: one thread per open storage object
Though performance is amazing in some cases, it doesn't really scale well enough beyond say 100 Mb datafiles
On platforms without MMF, even that is too optimistic, a "few dozen Mb" on the Mac is probably a more reasonable limit
Likewise, performance degrades with say 100,000 rows, or 10,000 subviews, or even well before that in the case of string fields
Datafile opening performance is not optimal (proportional to file complexity), and string fields have to be scanned
Commits should use a lower granularity so small changes imply quick commits, and do too much too soon (free space management could be delayed)
The file format is good once open, but could be improved to support more refined "staged" opening (this would dramatically speed up file opens)
Inserts/deletes in views with >50 properties is not optimal (there's a fixed parameter set to 50, it should be adaptive)
There is a good opportunity to introduce B-trees transparently
Need better (read: more fundamental) support for compression and encryption
Large strings are still copied more often than needed (i.e. those straddling a 4k boundary)
Memo fields need an API to read/write portions, and could easily be extended to also allow inserts/deletes of data bytes in any position
Data should be aligned on file, so MMF works better
Remove two known limitations of 32-bite file addressing
Expand adaptive integer sizing to 1..64 bits, instead of just 1..32 bits
Better support for shrinking files, also add explicit reorganization calls
Add locking (as implemented experimentally in Mk4tcl)
Add hierarchical/heterogenous data storage (see experimental e4Tree module)
The Mk4tcl offset-data trick should be incorporated into Metakit

Last but not least, there are some very fundamental issues which need to be addressed:

The "attached" vs. "unattached" dichotomy must be resolved and removed
Cache coherence with multi-user access must be solved (propagate diffs)
Change propagation for sorting is flawed, it needs to be fixed to handle all cases (this requires a fundamental rewrite of the notification mechanism)
Change propagation for the newer view operators must be implemented
A few more operations are needed to offer full relational functionality.

A lot of the above - as well as some brand new functionality - will be addressed in the next generation Metakit software. I have quite a few answers to many (but definitely not all!) of the above issues