[Metakit] Hash view questions
Jean-Claude Wippler
jcw at equi4.com
Wed Feb 1 11:17:27 CET 2006
gary.h.merrill at GSK.COM wrote:
> 'The mapview must be empty the first time this hash view is used,
> so that Metakit can fill it based on whatever rows are already
> present in the underlying view.'
>
> This seems to imply that you can't "impose" a hash view on an
> existing view. But I seem to be able to do this. That is, in an
> existing storage I get an existing view (either via .view()
> or .getas()) and then create a hash view on it. This does work,
> doesn't it?
Yes. There are two views involved: the data, and a hash index. The
data can contain anything. The index must *either* be empty *or* up
to date when setting up a hash layer on top using data.hash(index).
Note that either view can be blocked, and either view can be
persistent. It's all orthogonal.
> Also, use of the hash view isn't "permanent" -- in the sense that
> the underlying view retains its own integrity (so long as you don't
> try to access it directly while the hash view is "in effect")?
> That is, it *seems* to work nicely to create a view, define a hash
> view on it, populate the view, drop the hash view, and then commit
> () so that only the underlying view goes into the storage. This
> behavior also seems to imply that you should be able to lay a hash
> view on top of an existing view, access and update via the hash
> view, then drop it and commit only the underlying view. Have I
> missed anything here?
Correct. No need to explicitly "drop" anything. If the data view is
part of a storage, it gets committed. If the index view is part of a
storage, it gets committed - if not, then presumably you'll create a
new empty index view before calling hash() the next time around.
> One final question: Is there a way to create a hash view to hash
> on a particular property? I seem to frequently want, say, a two-
> column table where I can create one hash view based on the first
> column and another based on the second.
This is a design limitation that never got resolved properly. You
*might* be able to do the following:
a = db.getas(a[X,Y,Z])
a1 = a.project(a.Y,a.X,a.Z)
a2 = a1.hash(metakit.view(),1)
a3 = a2.project(a.X,a.Y,a.Z)
a = a3.hash(metakit.view(),1)
I am not sure the above is correct, nor have I ever checked whether
the above actually does offer hash O(1) search performance on both X
and Y. Might be worth fiddling a bit with this. If it doesn't work,
you could set up a second hash mapping Y to the row number in "a",
but that is obviously a bit more manual work, and perhaps not that
much more convenient than setting up a Python dict to handle
secondary hashes.
As I said, such multi-index issues (and hashes on other columns than
the first N) haven't been properly extended into today's MK, alas.
-jcw
More information about the Metakit
mailing list