Re: Questions about indexes? - Mailing list pgsql-hackers

From Curt Sampson
Subject Re: Questions about indexes?
Date
Msg-id Pine.NEB.4.51.0302180021120.997@angelic-vtfw.cvpn.cynic.net
Whole thread Raw
In response to Questions about indexes?  (Ryan Bradetich <rbradetich@uswest.net>)
Responses Re: Questions about indexes?
List pgsql-hackers
On Mon, 16 Feb 2003, Ryan Bradetich wrote:

> I am not sure why all the data is duplicated in the index ...

Well, you have to have the full key in the index, or how would you know,
when you look at a particular index item, if it actually matches what
you're searching for?

MS SQL server does have an interesting option that would help you a lot
in this case: clustered indexes. A table may have a single clustered
index, and each leaf node of the index stores not just the key but
actually the entire row. Thus, in a case like yours, you'd store the row
only once, not twice.

Without thinking too hard about it (my usual mode of operation on this
list :-)) this could probably be implemented in postgresql. But I don't
think it would be entirely trivial, and your case is unusual enough
that I very much doubt whether it would be worth implementing to fix
that alone. It would also offer the advantage that any lookup using the
clustered index would save fetching the heap page after that as well,
but it's hard to say if the savings would be worth the work.

> Since my only requirement is that the rows be unique, I have developed a
> custom MD5 function in C, and created an index on the MD5 hash of the
> concatanation of all the fields.

Well, that won't guarantee uniqueness, since it's perfectly possible
to have two different rows hash to the same value. (If that weren't
possible, your hash would have to contain as much information as the row
itself, and your space savings wouldn't be nearly so dramatic.)

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 


pgsql-hackers by date:

Previous
From: Christoph Haller
Date:
Subject: Re: IpcSemaphoreKill: ...) failed: Invalid argument
Next
From: Tom Lane
Date:
Subject: Re: client_encoding directive is ignored in