[Metakit] Hash view questions

Jean-Claude Wippler jcw at equi4.com
Wed Feb 1 11:17:27 CET 2006


gary.h.merrill at GSK.COM wrote:

> 'The mapview must be empty the first time this hash view is used,  
> so that Metakit can fill it based on whatever rows are already  
> present in the underlying view.'
>
> This seems to imply that you can't "impose" a hash view on an  
> existing view.  But I seem to be able to do this.  That is, in an  
> existing storage I get an existing view (either via .view()  
> or .getas()) and then create a hash view on it.  This does work,  
> doesn't it?

Yes.  There are two views involved: the data, and a hash index.  The  
data can contain anything.  The index must *either* be empty *or* up  
to date when setting up a hash layer on top using data.hash(index).

Note that either view can be blocked, and either view can be  
persistent.  It's all orthogonal.

> Also, use of the hash view isn't "permanent" -- in the sense that  
> the underlying view retains its own integrity (so long as you don't  
> try to access it directly while the hash view is "in effect")?   
> That is, it *seems* to work nicely to create a view, define a hash  
> view on it, populate the view, drop the hash view, and then commit 
> () so that only the underlying view goes into the storage.  This  
> behavior also seems to imply that you should be able to lay a hash  
> view on top of an existing view, access and update via the hash  
> view, then drop it and commit only the underlying view.  Have I  
> missed anything here?

Correct.  No need to explicitly "drop" anything.  If the data view is  
part of a storage, it gets committed.  If the index view is part of a  
storage, it gets committed - if not, then presumably you'll create a  
new empty index view before calling hash() the next time around.

> One final question:  Is there a way to create a hash view to hash  
> on a particular property?  I seem to frequently want, say, a two- 
> column table where I can create one hash view based on the first  
> column and another based on the second.

This is a design limitation that never got resolved properly.  You  
*might* be able to do the following:
	a = db.getas(a[X,Y,Z])
	a1 = a.project(a.Y,a.X,a.Z)
	a2 = a1.hash(metakit.view(),1)
	a3 = a2.project(a.X,a.Y,a.Z)
	a = a3.hash(metakit.view(),1)
I am not sure the above is correct, nor have I ever checked whether  
the above actually does offer hash O(1) search performance on both X  
and Y.  Might be worth fiddling a bit with this.  If it doesn't work,  
you could set up a second hash mapping Y to the row number in "a",  
but that is obviously a bit more manual work, and perhaps not that  
much more convenient than setting up a Python dict to handle  
secondary hashes.

As I said, such multi-index issues (and hashes on other columns than  
the first N) haven't been properly extended into today's MK, alas.

-jcw



More information about the Metakit mailing list