Ratcl views are values. When you open an existing Metakit datafile, this will quickly become apparent - especially in interactive mode. The following line generates a result which appears to contain a complete copy of the datafile:
view myfile.db open
Unfortunately the result is not readable text, and it can become quite large. In an interactive tclsh or tkcon session, opening a large file will appear to "hang" (at best), or it may even crash as Tcl struggles to represent the huge value as a string.
What's going on here?
First of all, keep in mind that a view acts like a value. When you sort a large view, you operate on a large input value and you generate a large result. As strings, these values are usually meaningless. And mighty inconvenient.
Internally, something very different is happening. Data is maintained as an efficient (compact and performant) column-wise nested structure. When the data comes from an on-disk dataset, then it isn't even in memory unless needed - the memory-mapped file mechanism makes sure that data gets paged in on-demand, and this is done using hardware-assisted virtual memory paging & page-faulting.
None of this matters in normal use because it all happens under the hood. In a way, you could say that Ratcl's views can be represented as strings, but they rarely should be because such a conversion undoes the benefits of the internal representation.
Conforming to Tcl's "everything is a string" mantra might at times seem useful. You can for example take a view and send it across to another machine as string. This would work, but it's inefficient. The best way to handle such cases is to use Ratcl's "save" and "load" operators. These transfer exactly the same data but in a compact (binary) string. For simple cases, either way is ok of course.
One problem is Tcl's interactive mode, be it tclsh or tkcon. They assume that commands return something which can be meaningfully displayed as a string. Alas, with Ratcl, that just doesn't work well.
The solution
One solution is to make sure that the most common interactive commands don't return a view. The two main ones are:
view ... | to myvar
view ... | as mycmd
These were recently changed to return variable and command names, respectively.
Well... not quite.
When "to myvar" is used, it returns "@myvar". Which brings us to another topic.
References
In recent versions of Vlerq and Ratcl, "@myvar" can be used wherever a view is expected. This is a reference to the variable myvar. When view operators need to do some work (this can be much later than their initial definition), they will lookup the variable and use its value instead.
So @myvar is similar to Tcl's $myvar, with one crucial difference: dereferencing takes place when needed, not at the call site as $myvar does.
For interactive use, references solve the problem described above. By using a reference to the main view instead of the view itself, views derived from this value no longer explode to huge strings. The string representation will now show "@..." instead - a perfectly readable and compact notation.
Here's an example without references:
view myfile.db open to v
puts [view $v sort]
puts [view @v sort]
If myfile.db is huge, then the first puts will "explode", whereas the second one simply prints the string:
sort @v
Yet both produce the same results when displaying the resulting view:
puts [view $v sort | dump]
puts [view @v sort | dump]
So the best way to avoid view string conversion, is to 1) open your datafile, 2) put the result in a variable, and 3) use a reference to that variable when deriving views from then on.
view myfile.db open to v
view @v ...
If you prefer to hide that mechanism so the rest of your code remains the same, you could add an extra layer:
set v [view myfile.db open to v']
view $v ...
Note that the $v gets dereferenced every time it is used, but this simply causes it to be replaced by "@v'". The apostrophe has no special meaning, it's just a remnant of my maths education :) - it does have the benefit of being less convenient to enter since you'd have to type ${v'} to use it i.s.o. $v.
Convenience
There is another step to take so that working with Metakit datafiles becomes convenient from the interactive Tcl prompt. This is related to the fact that Metakit datafiles always consist of a single row, with all the "main" views being subviews in that row.
To access view "people" in file "myfile.db", you would normally do:
view myfile.db open | get 0 people | to ...
This can get a bit tedious with multiple views in the datafile. Here's another way to define a single "db" command which acts as convenient shorthand:
view myfile.db open | to ::db
interp alias {} db {} view @::db get 0
With this, you can access the "people" view as follows:
db people | ...
It also lets you quickly access a single value inside any view:
db people 123 name
Which in Metakit notation would be:
mk::get db.people!123 name
As you can see, the args passed to "db" get passed on to the "get" operator, so all the get variations are available: "db people #", etc.
Since the "db" command uses "@::db" you also get the benefit of references. The database view is stored in a global variable, so that it is available anytime.
Note that Metakit files always have the extra single-row structure, but that files saved with Ratcl no longer have this limitation. If you need to create datafiles from Ratcl which are still compatible with Metakit, some extra work is needed. An example - say you want to save views "one" and "two":
view $one group {} one | to v1
view $two group {} two | to v2
view $v1 pair $v2 | save mynewfile.db
The explanation for this trickery is that the "group" operator is used with no common columns, only a name to use for the new subview. So what it does is find that all rows are in the same "group", and it creates a new view with a single row, having the original view as subview. The name of the subview is the last arg passed to group, one / two in this example. The pair operation joins two views side by side, each consisting of exactly one row.
Recipes
With the above, it should now become clear how views are indeed not just data but also recipes. If the base views are accessed through references, then the string representation of a derived view becomes considerably more useful.
Such views are now compact recipes, which can be saved and re-used at will. All you need to do is to make sure that all the references inside are valid before use, i.e. that all the variables have been set up with the proper views/values.
You may even be tempted to turn the whole system on its head and define the recipes before loading or defining any of the input data views:
set v [view @::x1 where { $(age) > 18 }]
view {name age:I} def {...} | to ::x1
puts [view $v dump]
Warning: this won't work in all cases yet. The problem is that not all view operators are lazy right now. And those which aren't will break the system as they imediately try to evaluate their not-yet-defined input view.
(Not to mention dataflow, where changes to v propagate to all derived views)
Note: The above examples have not yet been verified in actual use.
- 2007-06-14
Created
- 2007-06-14
(Changed: stat)
