Thread: "Hash index" vs. "b-tree index" (PostgreSQL 8.0)

"Hash index" vs. "b-tree index" (PostgreSQL 8.0)

From
Ying Lu
Date:
Greetings,

We are working on speeding up the queries by creating indexes. We have
queries with searching criteria such as "select ... where *col1='...'*".
This is a simple query with only "=" operation. As a result I setup hash
index on column "col1". While, in postgreSQL 8 doc, it is wirttern:

*Note: * Testing has shown PostgreSQL's hash indexes to perform no
better than B-tree indexes, and the index size and build time for hash
indexes is much worse. For these reasons, hash index use is presently
discouraged.

May I know for simple "=" operation query, for "Hash index" vs. "B-tree"
index, which can provide better performance please?

Thanks,
Emi

Re: "Hash index" vs. "b-tree index" (PostgreSQL 8.0)

From
Neil Conway
Date:
Ying Lu wrote:
> May I know for simple "=" operation query, for "Hash index" vs. "B-tree"
> index, which can provide better performance please?

I don't think we've found a case in which the hash index code
outperforms B+-tree indexes, even for "=". The hash index code also has
a number of additional issues: for example, it isn't WAL safe, it has
relatively poor concurrency, and creating a hash index is significantly
slower than creating a b+-tree index.

-Neil

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL 8.0)

From
Christopher Petrilli
Date:
On 5/9/05, Neil Conway <neilc@samurai.com> wrote:
> I don't think we've found a case in which the hash index code
> outperforms B+-tree indexes, even for "=". The hash index code also has
> a number of additional issues: for example, it isn't WAL safe, it has
> relatively poor concurrency, and creating a hash index is significantly
> slower than creating a b+-tree index.

This being the case, is there ever ANY reason for someone to use it?
If not, then shouldn't we consider deprecating it and eventually
removing it.  This would reduce complexity, I think.

Chris
--
| Christopher Petrilli
| petrilli@gmail.com

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Neil Conway
Date:
Christopher Petrilli wrote:
> This being the case, is there ever ANY reason for someone to use it?

Well, someone might fix it up at some point in the future. I don't think
there's anything fundamentally wrong with hash indexes, it is just that
the current implementation is a bit lacking.

> If not, then shouldn't we consider deprecating it and eventually
> removing it.

I would personally consider the code to be deprecated already.

I don't think there is much to be gained b removing it: the code is
pretty isolated from the rest of the tree, and (IMHO) not a significant
maintenance burden.

-Neil

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
"Jim C. Nasby"
Date:
On Tue, May 10, 2005 at 01:34:57AM +1000, Neil Conway wrote:
> Christopher Petrilli wrote:
> >This being the case, is there ever ANY reason for someone to use it?
>
> Well, someone might fix it up at some point in the future. I don't think
> there's anything fundamentally wrong with hash indexes, it is just that
> the current implementation is a bit lacking.
>
> >If not, then shouldn't we consider deprecating it and eventually
> >removing it.
>
> I would personally consider the code to be deprecated already.
>
> I don't think there is much to be gained b removing it: the code is
> pretty isolated from the rest of the tree, and (IMHO) not a significant
> maintenance burden.

That may be true, but it's also a somewhat 'developer-centric' view. ;)

Having indexes that people shouldn't be using does add confusion for
users, and presents the opportunity for foot-shooting. I don't know what
purpose they once served, but if there's no advantage to them they
should be officially depricated and eventually removed. Even if there is
some kind of advantage (would they possibly speed up hash joins?), if
there's no plans to fix them they should still be removed. If someone
ever really wanted to do something with, the code would still be in CVS.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Neil Conway
Date:
Jim C. Nasby wrote:
> Having indexes that people shouldn't be using does add confusion for
> users, and presents the opportunity for foot-shooting.

Emitting a warning/notice on hash-index creation is something I've
suggested in the past -- that would be fine with me.

> Even if there is some kind of advantage (would they possibly speed up
> hash joins?)

No, hash joins and hash indexes are unrelated.

-Neil

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
"Jim C. Nasby"
Date:
On Tue, May 10, 2005 at 02:38:41AM +1000, Neil Conway wrote:
> Jim C. Nasby wrote:
> >Having indexes that people shouldn't be using does add confusion for
> >users, and presents the opportunity for foot-shooting.
>
> Emitting a warning/notice on hash-index creation is something I've
> suggested in the past -- that would be fine with me.

Probably not a bad idea.

> >Even if there is some kind of advantage (would they possibly speed up
> >hash joins?)
>
> No, hash joins and hash indexes are unrelated.

I know they are now, but does that have to be the case? Like I said, I
don't know the history, so I don't know why we even have them to begin
with.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Neil Conway
Date:
Jim C. Nasby wrote:
 >> No, hash joins and hash indexes are unrelated.
> I know they are now, but does that have to be the case?

I mean, the algorithms are fundamentally unrelated. They share a bit of
code such as the hash functions themselves, but they are really solving
two different problems (disk based indexing with (hopefully) good
concurrency and WAL logging vs. in-memory joins via hashing with spill
to disk if needed).

> Like I said, I don't know the history, so I don't know why we even
> have them to begin with.

As I said, the idea of using hash indexes for better performance on
equality scans is perfectly valid, it is just the implementation that
needs work.

-Neil

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Bruce Momjian
Date:
Jim C. Nasby wrote:
> On Tue, May 10, 2005 at 02:38:41AM +1000, Neil Conway wrote:
> > Jim C. Nasby wrote:
> > >Having indexes that people shouldn't be using does add confusion for
> > >users, and presents the opportunity for foot-shooting.
> >
> > Emitting a warning/notice on hash-index creation is something I've
> > suggested in the past -- that would be fine with me.
>
> Probably not a bad idea.

Agreed.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Jim C. Nasby wrote:
>>> No, hash joins and hash indexes are unrelated.
>> I know they are now, but does that have to be the case?

> I mean, the algorithms are fundamentally unrelated. They share a bit of
> code such as the hash functions themselves, but they are really solving
> two different problems

In fact, up till fairly recently they didn't even share the hash
functions.  Which was a bug not a feature, but the fact remains ---
there's not a lot of commonality.

>> Like I said, I don't know the history, so I don't know why we even
>> have them to begin with.

I think it's largely because some Berkeley grad student had a need to
implement hash indexes for academic reasons ;-)

> As I said, the idea of using hash indexes for better performance on
> equality scans is perfectly valid, it is just the implementation that
> needs work.

I was thinking about that earlier today.  It seems to me there is a
window within which hash indexes are theoretically superior, but it
might be pretty narrow.  The basic allure of a hash index is that you
look at the search key, do some allegedly-trivial computations, and go
directly to the relevant index page(s); whereas a btree index requires
descending through several upper levels of index pages to reach the
target leaf page.  On the other hand, once you reach the target index
page, a hash index has no better method than linear scan through all
the page's index entries to find the actually wanted key(s); in fact you
have to search all the pages in that index bucket.  A btree index can
use binary search to find the relevant items within the page.

So it strikes me that important parameters include the index entry size
and the number of entries matching any particular key value.

btree will probably win for smaller keys, on two grounds: it will have
fewer tree levels to descend through, because of higher fan-out, and it
will be much more efficient at finding the target entries within the
target page when there are many entries per page.  (As against this,
it will have to work harder at each upper tree page to decide where to
descend to --- but I think that's a second-order consideration.)

hash will tend to win as the number of duplicate keys increases, because
its relative inefficiency at finding the matches within a particular
bucket will become less significant.  (The ideal situation for a hash
index is to have only one actual key value per bucket.  You can't really
afford to store only one index entry per bucket, because of the sheer
I/O volume that would result, so you need multiple entries that will all
be responsive to your search.)  (This also brings up the thought that
it might be interesting to support hash buckets smaller than a page ...
but I don't know how to make that work in an adaptive fashion.)

I suspect that people haven't looked hard for a combination of these
parameters within which hash can win.  Of course the real question for
us is whether the window is wide enough to justify the maintenance
effort for hash.

            regards, tom lane

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Neil Conway
Date:
Tom Lane wrote:
> On the other hand, once you reach the target index page, a hash index
> has no better method than linear scan through all the page's index
> entries to find the actually wanted key(s)

I wonder if it would be possible to store the keys in a hash bucket in
sorted order, provided that the necessary ordering is defined for the
index keys -- considering the ubiquity of b+-trees in Postgres, the
chances of an ordering being defined are pretty good. Handling overflow
pages would be tricky: perhaps we could store the entries in a given
page in sorted order, but not try to maintain that order for the hash
bucket as a whole. This would mean we'd need to do a binary search for
each page of the bucket, but that would still be a win.

-Neil

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Tom Lane wrote:
>> On the other hand, once you reach the target index page, a hash index
>> has no better method than linear scan through all the page's index
>> entries to find the actually wanted key(s)

> I wonder if it would be possible to store the keys in a hash bucket in
> sorted order, provided that the necessary ordering is defined for the
> index keys -- considering the ubiquity of b+-trees in Postgres, the
> chances of an ordering being defined are pretty good.

I have a gut reaction against that: it makes hash indexes fundamentally
subservient to btrees.  We shouldn't bring in concepts that are outside
the basic opclass abstraction.

However: what about storing the things in hashcode order?  Ordering uint32s
doesn't seem like any big conceptual problem.

I think that efficient implementation of this would require explicitly
storing the hash code for each index entry, which we don't do now, but
it seems justifiable on multiple grounds --- besides this point, the
search could avoid doing the data-type-specific comparison if the hash
code isn't equal.

There is evidence in the code that indexes used to store more info than
what we now think of as a "standard" index tuple.  I am not sure when
that went away or what it'd cost to bring it back, but it seems worth
looking into.

            regards, tom lane

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Neil Conway
Date:
Tom Lane wrote:
> I have a gut reaction against that: it makes hash indexes fundamentally
> subservient to btrees.

I wouldn't say "subservient" -- if there is no ordering defined for the
index key, we just do a linear scan.

> However: what about storing the things in hashcode order?  Ordering uint32s
> doesn't seem like any big conceptual problem.

Hmm, my memory of the hash code is a bit fuzzy. Do I understand correctly?

- we only use some of the bits in the hash to map from the hash of a key
to its bucket

- therefore within a bucket, we can still distinguish most of the
non-equal tuples from one another by comparing their full hash values

- if we keep the entries in a bucket (or page, I guess -- per earlier
mail) sorted by full hash value, we can use that to perform a binary search

Sounds like a good idea to me. How likely is it that the hash index will
be sufficiently large that we're using most of the bits in the hash just
to map hash values to buckets, so that the binary search won't be very
effective? (At this point many of the distinct keys in each bucket will
be full-on hash collisions, although sorting by the key values
themselves would still be effective.)

> I think that efficient implementation of this would require explicitly
> storing the hash code for each index entry, which we don't do now, but
> it seems justifiable on multiple grounds --- besides this point, the
> search could avoid doing the data-type-specific comparison if the hash
> code isn't equal.

Another benefit is that it would speed up page splits -- there would be
no need to rehash all the keys in a bucket when doing the split.

-Neil

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> However: what about storing the things in hashcode order?  Ordering uint32s
> doesn't seem like any big conceptual problem.
>
> I think that efficient implementation of this would require explicitly
> storing the hash code for each index entry, which we don't do now, but
> it seems justifiable on multiple grounds --- besides this point, the
> search could avoid doing the data-type-specific comparison if the hash
> code isn't equal.

It seems that means doubling the size of the hash index. That's a pretty big
i/o to cpu tradeoff.

What if the hash index stored *only* the hash code? That could be useful for
indexing large datatypes that would otherwise create large indexes. A good
hash function should have a pretty low collision rate anyways so the
occasional extra i/o should more than be outweighed by the decrease in i/o
needed to use the index.

--
greg

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I think that efficient implementation of this would require explicitly
>> storing the hash code for each index entry,

> It seems that means doubling the size of the hash index. That's a pretty big
> i/o to cpu tradeoff.

Hardly.  The minimum possible size of a hash entry today is 8 bytes
header plus 4 bytes datum, plus there's a 4-byte line pointer to factor
in.  So under the most pessimistic assumptions, storing the hash code
would add 25% to the size.  (On MAXALIGN=8 hardware, it might cost you
nothing at all.)

> What if the hash index stored *only* the hash code? That could be useful for
> indexing large datatypes that would otherwise create large indexes.

Hmm, that could be a thought.

            regards, tom lane

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
"Jim C. Nasby"
Date:
On Tue, May 10, 2005 at 10:14:11AM +1000, Neil Conway wrote:
> Jim C. Nasby wrote:
> >> No, hash joins and hash indexes are unrelated.
> >I know they are now, but does that have to be the case?
>
> I mean, the algorithms are fundamentally unrelated. They share a bit of
> code such as the hash functions themselves, but they are really solving
> two different problems (disk based indexing with (hopefully) good
> concurrency and WAL logging vs. in-memory joins via hashing with spill
> to disk if needed).

Well, in a hash-join right now you normally end up feeding at least one
side of the join with a seqscan. Wouldn't it speed things up
considerably if you could look up hashes in the hash index instead? That
way you can eliminate going to the heap for any hashes that match. Of
course, if limited tuple visibility info was added to hash indexes
(similar to what I think is currently happening to B-tree's), many of
the heap scans could be eliminated as well. A similar method could also
be used for hash aggregates, assuming they use the same hash.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
"Jim C. Nasby"
Date:
On Tue, May 10, 2005 at 12:10:57AM -0400, Tom Lane wrote:
> be responsive to your search.)  (This also brings up the thought that
> it might be interesting to support hash buckets smaller than a page ...
> but I don't know how to make that work in an adaptive fashion.)

IIRC, other databases that support hash indexes also allow you to define
the bucket size, so it might be a good start to allow for that. DBA's
usually have a pretty good idea of what a table will look like in
production, so if there's clear documentation on the effect of bucket
size a good DBA should be able to make a good decision.

What's the challange to making it adaptive, comming up with an algorithm
that gives you the optimal bucket size (which I would think there's
research on...) or allowing the index to accommodate different bucket
sizes existing in the index at once? (Presumably you don't want to
re-write the entire index every time it looks like a different bucket
size would help.)
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> What's the challange to making it adaptive, comming up with an algorithm
> that gives you the optimal bucket size (which I would think there's
> research on...) or allowing the index to accommodate different bucket
> sizes existing in the index at once? (Presumably you don't want to
> re-write the entire index every time it looks like a different bucket
> size would help.)

Exactly.  That's (a) expensive and (b) really hard to fit into the WAL
paradigm --- I think we could only handle it as a REINDEX.  So if it
were adaptive at all I think we'd have to support multiple bucket sizes
existing simultaneously in the index, and I do not see a good way to do
that.

Allowing a bucket size to be specified at CREATE INDEX doesn't seem out
of line though.  We'd have to think up a scheme for index-AM-specific
index parameters ...

            regards, tom lane

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
"Jim C. Nasby"
Date:
On Tue, May 10, 2005 at 11:49:50AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > What's the challange to making it adaptive, comming up with an algorithm
> > that gives you the optimal bucket size (which I would think there's
> > research on...) or allowing the index to accommodate different bucket
> > sizes existing in the index at once? (Presumably you don't want to
> > re-write the entire index every time it looks like a different bucket
> > size would help.)
>
> Exactly.  That's (a) expensive and (b) really hard to fit into the WAL
> paradigm --- I think we could only handle it as a REINDEX.  So if it
> were adaptive at all I think we'd have to support multiple bucket sizes
> existing simultaneously in the index, and I do not see a good way to do
> that.

I'm not really familiar enough with hash indexes to know if this would
work, but if the maximum bucket size was known you could use that to
determine a maximum range of buckets to look at. In some cases, that
range would include only one bucket, otherwise it would be a set of
buckets. If you found a set of buckets, I think you could then just go
to the specific one you need.

If we assume that the maximum bucket size is one page it becomes more
realistic to take an existing large bucket and split it into several
smaller ones. This could be done on an update to the index page, or a
background process could handle it.

In any case, should this go on the TODO list?

> Allowing a bucket size to be specified at CREATE INDEX doesn't seem out
> of line though.  We'd have to think up a scheme for index-AM-specific
> index parameters ...
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> > What if the hash index stored *only* the hash code? That could be useful for
> > indexing large datatypes that would otherwise create large indexes.
>
> Hmm, that could be a thought.

Hm, if you go this route of having hash indexes store tuples ordered by hash
code and storing the hash code in the index, then it seems hash indexes become
just a macro for a btree index of HASH(index columns).

I'm not saying that to criticize this plan. In fact I think that captures most
(though not all) of what a hash index should be.

It would be pretty useful. In fact if it isn't how hash indexes are
implemented then it might be useful to provide a user visible hash(ROW)
function that allows creating such indexes as functional indexes. Though
hiding it would make the SQL simpler.

--
greg

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> Well, in a hash-join right now you normally end up feeding at least one
> side of the join with a seqscan. Wouldn't it speed things up
> considerably if you could look up hashes in the hash index instead?

That's called a "nestloop with inner index scan", not a hash join.

            regards, tom lane

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
>>> What if the hash index stored *only* the hash code? That could be useful for
>>> indexing large datatypes that would otherwise create large indexes.
>>
>> Hmm, that could be a thought.

> Hm, if you go this route of having hash indexes store tuples ordered by hash
> code and storing the hash code in the index, then it seems hash indexes become
> just a macro for a btree index of HASH(index columns).

No, not at all, because searching such an index will require a tree
descent, thus negating the one true advantage of hash indexes.  I see
the potential value of sorting by hashcode within an individual page,
but that doesn't mean we should do the same across the whole index.

            regards, tom lane

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Mischa Sandberg
Date:
Quoting "Jim C. Nasby" <decibel@decibel.org>:

> Well, in a hash-join right now you normally end up feeding at least
> one
> side of the join with a seqscan. Wouldn't it speed things up
> considerably if you could look up hashes in the hash index instead?

You might want to google on "grace hash" and "hybrid hash".

The PG hash join is the simplest possible: build a hash table in memory,
and match an input stream against it.

*Hybrid hash* is where you spill the hash to disk in a well-designed
way. Instead of thinking of it as building a hash table in memory, think
of it as partitioning one input; if some or all of it fits in memory,
all the better. The boundary condition is the same.

The real wizard of hybrid hash has to be Goetz Graefe, who sadly has now
joined the MS Borg. He demonstrated that for entire-table joins, hybrid
hash completely dominates sort-merge. MSSQL now uses what he developed
as an academic, but I don't know what the patent state is.

"Grace hash" is the original implementation of hybrid hash:
  Kitsuregawa, M., Tanaka, H., and Moto-oka, T. (1984).
  Architecture and Performance of Relational Algebra Machine Grace.



Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Mark Lewis
Date:
If the original paper was published in 1984, then it's been more than 20
years.  Any potential patents would already have expired, no?

-- Mark Lewis

On Tue, 2005-05-10 at 14:35, Mischa Sandberg wrote:
> Quoting "Jim C. Nasby" <decibel@decibel.org>:
>
> > Well, in a hash-join right now you normally end up feeding at least
> > one
> > side of the join with a seqscan. Wouldn't it speed things up
> > considerably if you could look up hashes in the hash index instead?
>
> You might want to google on "grace hash" and "hybrid hash".
>
> The PG hash join is the simplest possible: build a hash table in memory,
> and match an input stream against it.
>
> *Hybrid hash* is where you spill the hash to disk in a well-designed
> way. Instead of thinking of it as building a hash table in memory, think
> of it as partitioning one input; if some or all of it fits in memory,
> all the better. The boundary condition is the same.
>
> The real wizard of hybrid hash has to be Goetz Graefe, who sadly has now
> joined the MS Borg. He demonstrated that for entire-table joins, hybrid
> hash completely dominates sort-merge. MSSQL now uses what he developed
> as an academic, but I don't know what the patent state is.
>
> "Grace hash" is the original implementation of hybrid hash:
>   Kitsuregawa, M., Tanaka, H., and Moto-oka, T. (1984).
>   Architecture and Performance of Relational Algebra Machine Grace.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly


Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Tom Lane
Date:
Mischa Sandberg <mischa.sandberg@telus.net> writes:
> The PG hash join is the simplest possible: build a hash table in memory,
> and match an input stream against it.

> *Hybrid hash* is where you spill the hash to disk in a well-designed
> way. Instead of thinking of it as building a hash table in memory, think
> of it as partitioning one input; if some or all of it fits in memory,
> all the better. The boundary condition is the same.

[ raised eyebrow... ]  Apparently you've not read the code.  It's been
hybrid hashjoin since we got it from Berkeley.  Probably not the best
possible implementation of the concept, but we do understand about spill
to disk.

            regards, tom lane

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Mischa Sandberg
Date:
Quoting Tom Lane <tgl@sss.pgh.pa.us>:

> Mischa Sandberg <mischa.sandberg@telus.net> writes:
> > The PG hash join is the simplest possible: build a hash table in
> memory,  and match an input stream against it.
>
> [ raised eyebrow... ]  Apparently you've not read the code.  It's
> been hybrid hashjoin since we got it from Berkeley.  Probably not the
> best possible implementation of the concept, but we do
> understand about spill to disk.

Apologies. I stopped reading around line 750 (PG 8.0.1) in
src/backend/executor/nodeHashjoin.c

if (!node->hj_hashdone)
{
    ....
    /*
     * execute the Hash node, to build the hash table
     */
    hashNode->hashtable = hashtable;
    (void) ExecProcNode((PlanState *) hashNode);
    ...

and missed the comment:
    /*
     * Open temp files for outer batches,
     */

Will quietly go and read twice, talk once.


Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> No, not at all, because searching such an index will require a tree
> descent, thus negating the one true advantage of hash indexes.

The hash index still has to do a tree descent, it just has a larger branching
factor than the btree index.

btree indexes could have a special case hack to optionally use a large
branching factor for the root node, effectively turning them into hash
indexes. That would be useful for cases where you know the values will be very
evenly distributed and won't need to scan ranges, ie, when you're indexing a
hash function.

--
greg

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Mischa Sandberg
Date:
Quoting Mark Lewis <mark.lewis@mir3.com>:

> If the original paper was published in 1984, then it's been more than
> 20 years.  Any potential patents would already have expired, no?

Don't know, but the idea is pervasive among different vendors ...
perhaps that's a clue.

And having now read beyond the start of ExecHashJoin(), I can see that
PG does indeed implement Grace hash; and the implementation is nice and
clean.

If there were room for improvement, (and I didn't see it in the source)
it would be the logic to:

- swap inner and outer inputs (batches) when the original inner turned
out to be too large for memory, and the corresponding outer did not. If
you implement that anyway (complicates the loops) then it's no trouble
to just hash the smaller of the two, every time; saves some CPU.

- recursively partition batches where both inner and outer input batch
ends up being too large for memory, too; or where the required number of
batch output buffers alone is too large for working RAM. This is only
for REALLY big inputs.

Note that you don't need a bad hash function to get skewed batch sizes;
you only need a skew distribution of the values being hashed.



Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Bruce Momjian
Date:
Is there a TODO anywhere in this discussion?  If so, please let me know.

---------------------------------------------------------------------------

Mischa Sandberg wrote:
> Quoting Mark Lewis <mark.lewis@mir3.com>:
>
> > If the original paper was published in 1984, then it's been more than
> > 20 years.  Any potential patents would already have expired, no?
>
> Don't know, but the idea is pervasive among different vendors ...
> perhaps that's a clue.
>
> And having now read beyond the start of ExecHashJoin(), I can see that
> PG does indeed implement Grace hash; and the implementation is nice and
> clean.
>
> If there were room for improvement, (and I didn't see it in the source)
> it would be the logic to:
>
> - swap inner and outer inputs (batches) when the original inner turned
> out to be too large for memory, and the corresponding outer did not. If
> you implement that anyway (complicates the loops) then it's no trouble
> to just hash the smaller of the two, every time; saves some CPU.
>
> - recursively partition batches where both inner and outer input batch
> ends up being too large for memory, too; or where the required number of
> batch output buffers alone is too large for working RAM. This is only
> for REALLY big inputs.
>
> Note that you don't need a bad hash function to get skewed batch sizes;
> you only need a skew distribution of the values being hashed.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Mischa Sandberg
Date:
Quoting Bruce Momjian <pgman@candle.pha.pa.us>:

>
> Is there a TODO anywhere in this discussion?  If so, please let me
> know.
>

Umm... I don't think so. I'm not clear on what TODO means yet. 'Up for
consideration'?  If a "TODO" means committing to do, I would prefer to
follow up on a remote-schema (federated server) project first.
...

> > If there were room for improvement, (and I didn't see it in the
> source)
> > it would be the logic to:
> >
> > - swap inner and outer inputs (batches) when the original inner
> turned
> > out to be too large for memory, and the corresponding outer did
> not. If
> > you implement that anyway (complicates the loops) then it's no
> trouble
> > to just hash the smaller of the two, every time; saves some CPU.
> >
> > - recursively partition batches where both inner and outer input
> batch
> > ends up being too large for memory, too; or where the required
> number of
> > batch output buffers alone is too large for working RAM. This is
> only
> > for REALLY big inputs.
> >
> > Note that you don't need a bad hash function to get skewed batch
> sizes;
> > you only need a skew distribution of the values being hashed.



Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Bruce Momjian
Date:
Mischa Sandberg wrote:
> Quoting Bruce Momjian <pgman@candle.pha.pa.us>:
>
> >
> > Is there a TODO anywhere in this discussion?  If so, please let me
> > know.
> >
>
> Umm... I don't think so. I'm not clear on what TODO means yet. 'Up for
> consideration'?  If a "TODO" means committing to do, I would prefer to
> follow up on a remote-schema (federated server) project first.

TODO means it is a change that most people think would be a good idea.
It is not a committment from anyone to actually do it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Mischa Sandberg
Date:
Quoting Bruce Momjian <pgman@candle.pha.pa.us>:

> Mischa Sandberg wrote:
> > Quoting Bruce Momjian <pgman@candle.pha.pa.us>:
> > > Is there a TODO anywhere in this discussion?  If so, please let
> me
> > > know.
> > >
> >
> > Umm... I don't think so. I'm not clear on what TODO means yet. 'Up
> for
> > consideration'?  If a "TODO" means committing to do, I would prefer
> to
> > follow up on a remote-schema (federated server) project first.
>
> TODO means it is a change that most people think would be a good
> idea.
> It is not a committment from anyone to actually do it.

I think there has not been enough commentary from "most people".



Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Neil Conway
Date:
Bruce Momjian wrote:
> Is there a TODO anywhere in this discussion?  If so, please let me know.

There are a couple:

- consider changing hash indexes to keep the entries in a hash bucket
sorted, to allow a binary search rather than a linear scan

- consider changing hash indexes to store each key's hash value in
addition to or instead of the key value.

You should probably include a pointer to this discussion as well.

(I'd like to take a look at implementing these if I get a chance.)

-Neil

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> No, not at all, because searching such an index will require a tree
>> descent, thus negating the one true advantage of hash indexes.

> The hash index still has to do a tree descent, it just has a larger branching
> factor than the btree index.

There is *no* tree descent in a hash index: you go directly to the
bucket you want.

If the bucket spans more than one page, you pay something, but this
strikes me as being equivalent to the case of multiple equal keys
spanning multiple pages in a btree.  It works, but it's not the design
center.

> btree indexes could have a special case hack to optionally use a large
> branching factor for the root node, effectively turning them into hash
> indexes.

No, because you'd still have to fetch and search the root node.

            regards, tom lane

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Greg Stark wrote:
>> What if the hash index stored *only* the hash code?

> Attached is a WIP patch that implements this.

Performance?

> I'm posting mainly because I wasn't sure what to do to avoid false
> positives in the case of hash collisions. In the hash AM code it is
> somewhat awkward to fetch the pointed-to heap tuple and recheck the
> scankey.[1] I just did the first thing that came to mind -- I marked all
> the hash AM opclasses as "lossy", so the index qual is rechecked. This
> works, but suggestions for a better way to do things would be welcome.

AFAICS that's the *only* way to do it.

I disagree completely with the idea of forcing this behavior for all
datatypes.  It could only be sensible for fairly wide values; you don't
save enough to justify the lossiness otherwise.

It would be interesting to look into whether it could be driven on a
per-opclass basis.  Then you could have, eg, "text_lossy_hash_ops"
as a non-default opclass the DBA could select if he wanted this
behavior.  (The code could perhaps use the amopreqcheck flag to tell
it which way to behave.)  If that seems unworkable, I'd prefer to see us
set this up as a new index AM type, which would share a lot of code with
the old.

[ BTW, posting patches to pgsql-general seems pretty off-topic. ]

            regards, tom lane

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Greg Stark
Date:
Neil Conway <neilc@samurai.com> writes:

> I'm posting mainly because I wasn't sure what to do to avoid false positives in
> the case of hash collisions. In the hash AM code it is somewhat awkward to
> fetch the pointed-to heap tuple and recheck the scankey.[1] I just did the
> first thing that came to mind -- I marked all the hash AM opclasses as "lossy",
> so the index qual is rechecked. This works, but suggestions for a better way to
> do things would be welcome.

I would have thought that would be the only way worth considering.

Consider for example a query involving two or more hash indexes and the new
bitmap indexscan plan. You don't want to fetch the tuples if you can eliminate
them using one of the other indexes.

--
greg

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Neil Conway
Date:
Tom Lane wrote:
> Performance?

I'll run some benchmarks tomorrow, as it's rather late in my time zone.
If anyone wants to post some benchmark results, they are welcome to.

> I disagree completely with the idea of forcing this behavior for all
> datatypes.  It could only be sensible for fairly wide values; you don't
> save enough to justify the lossiness otherwise.

I think it would be premature to decide about this before we see some
performance numbers. I'm not fundamentally opposed, though.

> [ BTW, posting patches to pgsql-general seems pretty off-topic. ]

Not any more than discussing implementation details is :) But your point
is well taken, I'll send future patches to -patches.

-Neil

Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From
Bruce Momjian
Date:
Added to TODO:

* Consider sorting hash buckets so entries can be found using a binary
  search, rather than a linear scan
* In hash indexes, consider storing the hash value with or instead
  of the key itself


---------------------------------------------------------------------------

Neil Conway wrote:
> Bruce Momjian wrote:
> > Is there a TODO anywhere in this discussion?  If so, please let me know.
>
> There are a couple:
>
> - consider changing hash indexes to keep the entries in a hash bucket
> sorted, to allow a binary search rather than a linear scan
>
> - consider changing hash indexes to store each key's hash value in
> addition to or instead of the key value.
>
> You should probably include a pointer to this discussion as well.
>
> (I'd like to take a look at implementing these if I get a chance.)
>
> -Neil
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073